6.824 Spring 2018 Paper Questions

For each paper, your assignment is two-fold. By midnight the night before lecture:

You can also upload your questions and answers using curl:

## Answer goes into lecN.txt
$ curl -F file=@lec2.txt \
       -F key=XXXXXXXX \
## Question goes into sqN.txt
$ curl -F file=@sq2.txt \
       -F key=XXXXXXXX \

The assigned reading for today is not a paper, but the Online Go tutorial. The assigned "question" is the Crawler exercise in the tutorial. Also, take a look at Go's RPC package, which you will use in lab 1.

Describe a sequence of events that result in a client reading stale data from the Google File System

How does VM FT handle network partitions? That is, is it possible that if the primary and the backup end up in different network partitions that the backup will become a primary too and the system will run with two primaries?

Suppose we have the scenario shown in the Raft paper's Figure 7: a cluster of seven servers, with the log contents shown. The first server crashes (the one at the top of the figure), and cannot be contacted. A leader election ensues. For each of the servers marked (a), (d), and (f), could that server be elected? If yes, which servers would vote for it? If no, what specific Raft mechanism(s) would prevent it from being elected?

Could a received InstallSnapshot RPC cause the state machine to go backwards in time? That is, could step 8 in Figure 13 cause the state machine to be reset so that it reflects fewer executed operations? If yes, explain how this could happen. If no, explain why it can't happen.

Please read the paper's Appendices. In Spinnaker a leader to responds to a client request after the leader and one follower have written a log record for the request on persistent storage. Why is this sufficient to guarantee strong consistency even after the leader or the one follower fail?

(This paper relies on Zookeeper, which we will read later.)

One use of Zookeeper is a fault-tolerant lock service (see the section "Simple locks" on page 6). Why isn't possible that two clients can acquire the same lock? In particular, how does Zookeeper decide if a client has failed and it can give the lock to some other client?

Russ Cox is one of the leads on the Go project. What do you like best about Go? Why? Would you want to change anything in the language? If so, what and why?

6.033 Book. Read just these parts of Chapter 9: 9.1.5, 9.1.6, 9.5.2, 9.5.3, 9.6.3. The last two sections (on two-phase locking and distributed two-phase commit) are the most important. The Question: describe a situation where Two-Phase Locking yields higher performance than Simple Locking.

No compromises: distributed transactions with consistency, availability, and performance: Suppose there are two FaRM transactions that both increment the same object. They start at the same time and see the same initial value for the object. One transaction completely finishes committing (see Section 4 and Figure 4). Then the second transaction starts to commit. There are no failures. What is the evidence that FaRM will use to realize that it must abort the second transaction? At what point in the Section 4 / Figure 4 protocol will FaRM realize that it must abort?

MDCC Come back later for this year's question.

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing What applications can Spark support well that MapReduce/Hadoop cannot support?

Naiad: A Timely Dataflow System: Consider the data-flow graph in Figure 3, and assume that the vertices are instantiated as follows:

Now assume that we introduce two records in epoch e = 1 at the In vertex: (a, 2) and (b, 6); and one record in e = 2: (a, 5).

Write down the timestamp changes that the records experience as they flow through the graph, e.g., (a, 2): at A, t = (1, []); at B, t = .... You may omit vertices that do not modify the timestamp.

Informally explain when E.OnNotify((1, [])) will be called. You do not need to step through the details of the pointstamp protocol.

Parameter Server: The parameter server model was developed for running machine-learning algorithms like sparse logistic regression (§ 5.1).

What do you expect the bottleneck – that is, the most loaded – resource (e.g., CPU, memory, network) to be at the workers and at the servers?

What resource utilization would you likely observe if using Spark's logistic regression (see §3.2.1 in the Spark paper) instead of the parameter server for the experiment in §5.1?

Frangipani: A Scalable Distributed File System: Suppose a server modifies an i-node, appends the modification to its log, then another server modifies the same i-node, and then the first server crashes. The recovery system will see the i-node modification in the crashed server's log, but should not apply that log entry to the i-node, because that would un-do the second server's change. How does Frangipani avoid or cope with this situation?

Memcache at Facebook. Section 3.3 implies that a client that writes data does not delete the corresponding key from the Gutter servers, even though the client does try to delete the key from the ordinary Memcached servers (Figure 1). Explain why it would be a bad idea for writing clients to delete keys from Gutter servers.

Managing Update Conflicts in Bayou Suppose we build a distributed filesystem using Bayou, and the system has a copy operation. Initially, file A contains "foo" and file B contains "bar". On one node, a user copies file A to file B, overwriting the old contents of B. On another node, a user copies file B to file A. After both operations are committed, we want both files to contain "foo" or for both files to contain "bar". Sketch a dependency check and merge procedure for the copy operation that makes this work. How does Bayou ensure that all the nodes agree about whether A and B contain "foo" or "bar"?

For the design described in Chord: a scalable peer-to-peer lookup service for Internet applications, if two Chord nodes x and y are located nearby each other in the Internet (e.g., they are both in the same datacenter), is the number of Chord hops small (say 1 or 2 instead of log N) when x is looking up a key that is stored on node y?

Dynamo Suppose Dynamo server S1 is perfectly healthy with a working network connection. By mistake, an administrator instructs server S2 to remove S1 using the mechanisms described in 4.8.1 and 4.9. It takes a while for the membership change to propagate from S2 to the rest of the system (including S1), so for a while some clients and servers will think that S1 is still part of the system. Will Dynamo operate correctly in this situation? Why, or why not?

Bitcoin Try to buy something with Bitcoin. It may help to cooperate with some 6.824 class-mates, and it may help to start a few days early. If you decide to give up, that's OK. Briefly describe your experience.

Questions or comments regarding 6.824? Send e-mail to 6824-staff@lists.csail.mit.edu.

Top // 6.824 home //