6.824 Scaling Memcached at Facebook FAQ

Q: The paper mentions that memcached serves stale data under a number
of circumstances. Why is that OK?

A: The cached data is typically displayed to users on web pages. If
the data is out of date by a fraction of a second, users will usually
not notice.

But note that updates are sent to MySQL database servers, which
provide strong consistency, transactions, and durable storage. So
while the reads that generate web pages are not very consistent,
updates are quite careful.

Q: The paper is all about ensuring that memcached doesn't cached stale
data. Why does it make sense that the system both tolerates stale
data, and goes to great lengths to avoid stale data?

A: The big danger they are avoiding is long-term caching of stale
data. It's OK to serve data that's out of date by a few seconds. It's
not OK to serve data that's out of date by hours. Without the paper's
machinery, unbounded memcached staleness could arise due to lost
deletes or out of order updates.

Q: Why do they use memcached at all? Why not just read directly from
the MySQL database servers in the "storage cluster"?

A: The MySQL servers are not nearly fast enough to serve the volume of
reads generated by Facebook's web servers. memcached is orders of
magnitude faster than MySQL.

Q: What's the difference between the paper's "memcached" and "memcache"?

A: "memcached" refers to the software, which you can find here:

  https://github.com/memcached/memcached

memcached is a very simple and very fast key/value server. It stores
data in RAM, with no fault tolerance, so people use it for caching (not
for persistent storage).

The paper uses "memcache" to refer to Facebook's set of servers
running memcached.

Q: What is the "stale set" problem in 3.2.1, and how do leases solve it?

A: Here's an example of the "stale set" problem:

1. Client C1 asks memcache for k; memcache says k doesn't exist.
2. C1 asks MySQL for k, MySQL replies with value 1.
   C1 is slow at this point for some reason...
3. Someone updates k's value in MySQL to 2.
4. MySQL/mcsqueal/mcrouter send an invalidate for k to memcache,
   though memcache is not caching k, so there's nothing to invalidate.
5. C2 asks memcache for k; memcache says k doesn't exist.
6. C2 asks MySQL for k, mySQL replies with value 2.
7. C2 installs k=2 in memcache.
8. C1 installs k=1 in memcache.

Now memcache has a stale version of k, and it may never be updated.

The paper's leases would fix the example in the following way:

1. Client C1 asks memcache for k; memcache says k doesn't exist,
   and returns lease L1 to C1.
2. C1 asks MySQL for k, MySQL replies with value 1.
   C1 is slow at this point for some reason...
3. Someone updates k's value in MySQL to 2.
4. MySQL/mcsqueal/mcrouter send an invalidate for k to memcache,
   though memcache is not caching k, so there's nothing to invalidate.
   But memcache does invalidate C1's lease L1 (deletes L1 from its set
   of valid leases).
5. C2 asks memcache for k; memcache says k doesn't exist,
   and returns lease L2 to C2 (since there was no current lease for k).
6. C2 asks MySQL for k, mySQL replies with value 2.
7. C2 installs k=2 in memcache, supplying valid lease L2.
8. C1 installs k=1 in memcache, supplying invalid lease L1,
   so memcache ignores C1.

Now memcache is left caching the correct k=2.

Q: What is the "thundering herd" problem in 3.2.1, and how do leases
solve it?

A: The thundering herd problem:

* key k is very popular -- lots of clients read it.
* ordinarily clients read k from memcache, which is fast.
* but suppose someone writes k, causing it to be invalidated in memcache.
* for a while, every client that tries to read k will miss in memcache.
* they will all ask MySQL for k.
* MySQL may be overloaded with too many simultaneous requests.

The paper's leases solve this problem by allowing only the first
client that misses to ask MySQL for the latest data.

Q: What is McRouter?

A: The point of mcrouter is to aggregate memcached RPCs from many
clients and send them in big batches to memcached servers. It's more
efficient to have a smallish number of mcrouter servers talk to
memcached than a large number of individual clients. One reason is
that there's overhead to each network (TCP) connection; better that
each memcached have a TCP connection per mcrouter than per client.
Another reason is that there's overhead (packet header space and
interrupt) for each packet, so it's helpful that a mcrouter can pack
many client requests into each TCP packet.

Q: Isn't it wasteful that the gutter servers are idle when they aren't
taking over for a failed server? Why not use the gutter servers for
ordinary memcached service as well as gutter?

A: I think non-gutter memcached servers are often close to fully
loaded, and have little spare capacity. If one fails, the replacement
server needs to have been more or less idle, in order to handle the
failed server's load.

Q: How do Section 4.2's regional pools reduce the number of replicas?

A: Each region has multiple clusters. Each cluster has a complete cache.
Thus a given data item may be cached in each of the clusters. If there
are N clusters in a region, there may be N distinct cached copies of a
data item, one per cluster.

Items that are cached in the regional pool are only cached once per
region, not N times.

The tradeoff is that the potential serving capacity is N times higher if
there are N copies.

Q: What storage system work has gone on at Facebook since this paper?

A: Here's a sample:

https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf
https://www.cs.princeton.edu/~wlloyd/papers/existential-sosp15.pdf

Q: Why not just put a cache into MySQL, where it can be better
integrated to provide good consistency?

A: It would be fantastic if someone could add a transparent cache to
MySQL that made it as fast as memcached. But no-one knows how to do
that. MySQL in fact does quite a bit of caching, and it's still much
slower than memcached. Presumably a lot of the reason is that MySQL
presents a dramatically more powerful and complex interface than
memcached (MySQL supports SQL queries, an interface which is about
1000x as complex as memcached's put()/get()/delete).

Q: Figure 11 shows that Memcache can serve data that is even a day
old. Although this happens with very low probability, couldn't it
still cause significant, perhaps catastrophic, problems in
applications using Memcache?

Yes, indeed.  It is something FB has struggled with because it makes
writing applications more challenging. It is the topic of two
follow-on papers (see the two references above).

The bottom-line of these papers is still roughly the same: the
probability of inconsistency is so low that they are willing to accept
it: even though a few users in principle might be able to notice the
inconstency, they probably won't realize it and don't care.  (Their
target applications are not banking applications.)

Q: How does the MySQL replication system work?

See https://dev.mysql.com/doc/refman/8.0/en/replication.html.  FB uses
the log-based replication scheme as a component for publish/subscribe
system, as described in "Wormhole: Reliable Pub-Sub to support
Geo-replicated Internet"y Services, Sharma et al, 2015.  The core of
the replication scheme is to read updates from MySQL's transaction log
and send those to the backup, which applies them to its data.

Q: What exactly does "look-aside" caching refer to?

The cache sits on the side as opposed in between the application and
the storage layer.  If the application misses in the cache, the
application retrieves the database records and updates the cache,
instead of the cache doing it.