6.824 2020 Final Exam

*** Transactions and Serializability

This is a 120-minute exam. Please write down any assumptions you make. You are
allowed to consult 6.824 papers, notes, and lab code. If you have questions,
please ask them on Piazza in a private post to the staff. Please don't discuss
the exam with anyone.

Alyssa has built a storage system that provides its clients with
transactions. Alyssa won't tell us how her system works, but she does
guarantee that the transactions are serializable. Following Section
9.1.6 of the reading for Lecture 12, serializable means that, if you
run a bunch of concurrent transactions and they produce some results
(output and updated database records), there exists some serial order
of those transactions that would, if followed, yield the same results.

You start three concurrent transactions at about the same time:

T1:
  begin()
  put(y, 2)
  end()

T2:
  begin()
  put(x, 99)
  put(y, 99)
  put(z, 99)
  end()

T3:
  begin()
  tmpx = get(x)
  tmpy = get(y)
  tmpz = get(z)
  print tmpx, tmpy, tmpz
  end()

x, y, and z are identifiers of different database records. All three
start out with value 0. put() and get() write and read database
records, and begin() and end() indicate the start and finish of a
transaction. There is no activity in the system other than these three
transactions, and no failures occur.

If T3 prints 99, 2, 99, is that a serializable result? Explain why or
why not.

---

If T3 prints 0, 2, 99, is that a serializable result? Explain why or
why not.

*** Two-Phase Commit

Recall the Two-Phase Commit protocol for distributed transactions
from Lecture 12 and Section 9.6.3 of the reading for Lecture 12.

A common criticism of two-phase commit is that it causes workers to
hold locks for a long time; for example, in Figure 9.37 (page 9-89),
each worker must hold its locks while waiting to receive the COMMIT
message.

What would go wrong if each worker released its locks after replying
to the PREPARE message?

*** Spanner

Refer to the paper "Spanner: Google's Globally-Distributed Database,"
by Corbett et al.

The start of Section 4.1.2 says that all of a read/write
transactions's writes are assigned the same timestamp (the timestamp
assigned to the Paxos write that commits the transaction).

Suppose, instead, that each write is assigned a timestamp equal to
TT.now().latest at the time when the client transaction code
calls the Spanner library function to modify the data.

What aspect of Spanner would this break, and why?

*** FaRM

Ben is using FaRM (see the paper "No compromises: distributed
transactions with consistency, availability, and performance" by
Dragojevic et al). Here is Ben's transaction code:

  begin()
  if x > y:
    y = y + 1
  else:
    x = x + 1
  end()

x and y are objects stored in FaRM. begin() and end() mark the start
and end of the transaction.


Ben starts two instances of this transaction at exactly the same time
on two different machines. The two transactions read x and y at exactly
the same time, and send LOCK messages at exactly the same time. There
is no other activity in the system, and no failures occur. x and y
both start out with value zero.

Explain what the outcome will be, and how the FaRM commit protocol
(Figure 4) arrives at that outcome.

---

Ben thinks that some transactions might execute faster if he modified
FaRM to run Figure 4's VALIDATE phase before the LOCK phase. It turns
out that this change can cause incorrect results in some situations.
Outline such a situation. Your answer can involve transactions other
than the one shown above.

*** Spark

Ben is using Spark to compute PageRank on a huge database of web
links. He's using PageRank code much like that shown in Section 3.2.2
of "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing," by Zaharia et al.

Ben's PageRank job takes hours to run, and he wants to find out which
parts of the job take the most time. He inserts code that gets the
wall-clock time before and after each line of the PageRank shown in
Section 3.2.2, and prints the difference (i.e. prints how long that
line took to execute). He sees that each line in the "for" loop in
Section 3.2.2 takes only a fraction of a second to execute, and that
the whole loop itself takes less than a second, even though the entire
Spark job takes hours. Explain why the for loop takes so much less
time than the whole job.

---

Ben thinks that telling Spark to cache the "ranks" RDD may make his
PageRank job complete in less time, since ranks is used in each
iteration of the "for" loop. He adds a call to "ranks.persist()" after
each of the two assignments to ranks. However, he finds that his
PageRank job takes the same amount of time to complete despite this
change. Explain.

*** Memcache / Facebook

Ben runs a small web site. He has a bunch of web servers, a single
memcached server, and a single MySQL DB. The web servers store data
persistently in the MySQL DB, and cache database records in memcached.
Thus Ben's web servers act as clients to the memcached and MySQL
servers.

Though Ben has read "Scaling Memcache at Facebook," by Nishtala et al,
he ignores all the ideas in that paper. For example, Ben's setup has
nothing like Figure 6's invalidates or McSqueal, and no leases.

Ben has programmed his web servers to use memcached and MySQL as
follows. When a web server needs a data item, it first tries to read
it from memcached; if that read misses, the web server reads the data
item from MySQL. However, his web servers do not insert data read from
MySQL into memcached. Instead, when one of Ben's web servers writes
data, it always inserts (or updates) the data in both memcached and
MySQL.

Here's pseudo-code describing how Ben's web server's use memcached and
MySQL:

  read(k):
    v = get(k) // is the key/value in memcached?
    if v is nil {
      // miss -- not in memcached
      v = fetch from DB
    }
    return v

  write(k,v):
    send k,v to DB
    set(k,v) // insert the new key/value into memcached

Ben reasons that, since every write updates both MySQL and memcached,
the two will always agree except during the small window of time
between the two lines of code in write(). Explain why Ben is wrong.

*** COPS

You're using COPS (see "Don't Settle for Eventual: Scalable Causal
Consistency for Wide-Area Storage with COPS," by Lloyd et al.). You
have four data centers, and clients and a COPS Data Store Cluster at
each data center.

Four clients, each at a different data center, each start executing
code that uses COPS at about the same time. Here's the code each
client executes:

Client C1:
  put(x, 1)
  put(y, 2)

Client C2:
  tmp_y = get(y)
  put(z, 100 + tmp_y)

Client C3:
  tmp_x = get(x)
  tmp_y = get(y)
  tmp_z = get(z)
  print "x=", tmp_x
  print "y=", tmp_y
  print "z=", tmp_z

Client C4:
  tmp_z = get(z)
  tmp_y = get(y)
  tmp_x = get(x)
  print "x=", tmp_x
  print "y=", tmp_y
  print "z=", tmp_z

Note that C3 and C4 read x, y, and z in opposite orders.

All values start as zero. The clients start executing these code
sequences at about the same time. The system uses COPS, not COPS-GT
(that is, there are no transactions). There is no other activity in
the system, just these four clients. There are no failures.

Which of the following are possible outputs from C3? Mark all that apply.

[ ] x=1 y=2 z=102
[ ] x=1 y=0 z=102
[ ] x=0 y=2 z=102
[ ] x=1 y=2 z=0
[ ] x=1 y=0 z=0
[ ] x=0 y=2 z=0

---

Which are possible outputs from C4? Mark all that apply.

[ ] x=1 y=2 z=102
[ ] x=1 y=0 z=102
[ ] x=0 y=2 z=102
[ ] x=1 y=2 z=0
[ ] x=1 y=0 z=0
[ ] x=0 y=2 z=0

*** Lab 3

Explain what would go wrong if clients sent Lab 3 Get() RPCs to
followers, and followers responded with the values in their replica of
the data (without talking to the leader, and without adding anything
to the log).

----

Alyssa wonders whether one could use Spanner's TrueTime (the API in
Table 1 of the Spanner paper) to help design a correct version of Lab
3 in which followers can respond to client Get() RPCs. Outline a
design for this. You can assume that both clients and servers have
TrueTime, and that the key/value service is continuously busy with
both Put() and Get() operations.

*** 6.824

Which papers should we omit in future 6.824 years, because they are
not useful or are too hard to understand?

[ ] 6.033 chapter on transactions
[ ] Spanner
[ ] FaRM
[ ] Spark
[ ] Memcached at Facebook
[ ] COPS
[ ] Certificate Transparency
[ ] Bitcoin
[ ] BlockStack
[ ] AnalogicFS

---

Which papers did you find most useful?

[ ] 6.033 chapter on transactions
[ ] Spanner
[ ] FaRM
[ ] Spark
[ ] Memcached at Facebook
[ ] COPS
[ ] Certificate Transparency
[ ] Bitcoin
[ ] BlockStack
[ ] AnalogicFS