Q: Why are we reading this paper? Do people use Munin today?

A: We're reading the paper for its ideas, not because of the Munin
software artifact, which had little impact, I think.

Few people design cluster-computing systems that provide the illusion
of shared memory. Instead, people design systems such as MapReduce and
Spark that don't allow you to run your existing shared-memory code but
are good for specific application domains.

The ideas from DSM systems, however, are influential. For example, if
you squint a bit, Go's memory model is not unlike release consistency.

Q: Are there situations where sequential consistency is preferable
to release consistency?

A: Yes. If you to want to write lock-free programs (e.g., sharing
variables without protecting with locks), you probably want a memory
model that allows you to reason about the effects of load/stores on
shared variables without locks (e.g., sequential consistency).

Q: Which is a better programming model for distributed systems, shared
memory or message passing?

A: This is a long-standing debate in the parallel-computing community
and there is no agreed answer. The general argument is that shared
memory applications are easier to program but message-passing programs
can be made more efficient because the programmer can orchestrate the
communication.

Q: What is the Dash system mentioned in the Munin paper?

A: DASH is a hardware multiprocessor; that is, it implemented release
consistency as part of the hardware memory-coherence system. Munin is
a software implementation of release consistency for distributed
shared memory on clusters of separate computers connected by a LAN.

Q: Does RMDA obsolete systems like Munin?

A: RDMA is a technique that one can use to speed up DSM systems, but
RDMA by itself isn't sufficient. You will still want to make
read/writes local to the processor (i.e., caching objects) because
those reads and writes are much faster than doing an RDMA operation.
If you do caching, you need a consistency protocol (e.g., release
consistency).

People built DSM systems that exploited fast remote read/writes in the
early days of DSM systems. For example, look at Shrimp from Princeton.

Q: What does the paper mean by a "write miss"?

A: "Write miss" refers to the scenario where a thread attempts to
write a shared-memory location, but the virtual-memory page holding
that location isn't mapped into the address space of the thread. This
results in a page fault, which the OS propagates to the Munin runtime.
In the conventional DSM implementation, the runtime then fetches the
page from another machine and maps it into the thread's address space.

Q: What's the difference between "release consistency" and "strong
consistency"?

A: In strong consistency, a load from an address won't return stale
data. With release consistency, it is possible that a load returns
stale data, namely when the program hasn't protected accesses to
shared object with locks.

Q: What kinds of access patterns don't fall into Munin's annotation
categories?

A: Munin works correctly as long your program has the appropriate
synchronization. The annotations just make the program run faster. I
imagine that if they needed additional annotations for patterns that the
authors hadn't seen yet in application, they could add them.

Q: When a read is performed on a variable in a node, does the node
always end up with a local copy of the object? Does this cause more
and more replicas to be created?

A: Yes. To be able to read an object, Munin will map a copy of the
page containing that object into the address space of the reader.

Replicas will be created when a thread reads the object. If an object is
shared only between 2 machines, then the 2 machines have a copy of
object. If an object is widely shared (i.e., each machine reads it), then
each machine will have a copy of the page.

Q: Would it be easy to add fault tolerance to Munin?

A: I don't think so. The typical way fault-tolerance is done in DSM
systems is to do periodic checkpointing. If a failure happens, then
the system restarts the application from the last checkpoint.

Spark (the reading for Thursday) has a more sophisticated plan, but it
doesn't have mutable shared data structures.

Q: What if a program reads or writes shared data without a lock?

A: It is the programmer's job to make sure that shared variables are
protected by locks, and then Munin will do the right thing. This is
not unlike Go: if you forget to put in synchronization (locks,
channels, etc.) around shared variables, then a read may see
stale values and the program may compute the wrong result.

Q: How does DUO mitigate false sharing?

A: False sharing refers to the case where unrelated variables x and
y end up on the same virtual-memory page. If one thread writes x and
another thread writes y, then the page holding x and y will be
ping-ponging between the two threads, even though the two threads
don't share x and y with each other.

The challenge here is that DSM systems shares pages but threads share
variables that may end up on the same page. Annotations, diff-ing, and
DUQ address this challenge.

Q: What does it mean to have synchronization visible to the runtime
system?

A: It means that the application must use the synchronization primitives
provided by Munin. That is, a program shouldn't implement its own locks
and use its own locks for synchronization. In the latter case, Munin won't
know about those locks, and cannot provide software release
consistency.