6.824 2018 Lecture 2: Infrastructure: RPC and threads

Most commonly-asked question: Why Go?
  6.824 used C++ for many years
    C++ worked out well
    but students spent time tracking down pointer and alloc/free bugs
    and there's no very satisfactory C++ RPC package
  Go is a bit better than C++ for us
    good support for concurrency (goroutines, channels, &c)
    good support for RPC
    garbage-collected (no use after freeing problems)
    type safe
    threads + GC is particularly attractive!
  We like programming in Go
    relatively simple and traditional
  After the tutorial, use https://golang.org/doc/effective_go.html
  Russ Cox will give a guest lecture March 8th

Threads
  threads are a useful structuring tool
  Go calls them goroutines; everyone else calls them threads
  they can be tricky

Why threads?
  They express concurrency, which shows up naturally in distributed systems
  I/O concurrency:
    While waiting for a response from another server, process next request
  Multicore:
    Threads run in parallel on several cores

Thread = "thread of execution"
  threads allow one program to (logically) execute many things at once
  the threads share memory
  each thread includes some per-thread state:
    program counter, registers, stack

How many threads in a program?
  Sometimes driven by structure
    e.g. one thread per client, one for background tasks
  Sometimes driven by desire for multi-core parallelism
    so one active thread per core
    the Go runtime automatically schedules runnable goroutines on available cores
  Sometimes driven by desire for I/O concurrency
    the number is determined by latency and capacity
    keep increasing until throughput stops growing
  Go threads are pretty cheap
    100s or 1000s are fine, but maybe not millions
    Creating a thread is more expensive than a method call
    
Threading challenges:
  sharing data 
    one thread reads data that another thread is changing?
    e.g. two threads do count = count + 1
    this is a "race" -- and is usually a bug
    -> use Mutexes (or other synchronization)
    -> or avoid sharing
  coordination between threads
    how to wait for all Map threads to finish?
    -> use Go channels or WaitGroup
  granularity of concurrency
    coarse-grained -> simple, but little concurrency/parallelism
    fine-grained -> more concurrency, more races and deadlocks

What is a crawler?
  goal is to fetch all web pages, e.g. to feed to an indexer
  web pages form a graph
  multiple links to each page
  graph has cycles

Crawler challenges
  Arrange for I/O concurrency
    Fetch many URLs at the same time
    To increase URLs fetched per second
    Since network latency is much more of a limit than network capacity
  Fetch each URL only *once*
    avoid wasting network bandwidth
    be nice to remote servers
    => Need to remember which URLs visited 
  Know when finished
    
Crawler solutions [crawler.go link on schedule page]

Serial crawler:
  the "fetched" map avoids repeats, breaks cycles
  it's a single map, passed by reference to recursive calls
  but: fetches only one page at a time

ConcurrentMutex crawler:
  Creates a thread for each page fetch
    Many concurrent fetches, higher fetch rate
  The threads share the fetched map
  Why the Mutex (== lock)?
    Without the lock:
      Two web pages contain links to the same URL
      Two threads simultaneouly fetch those two pages
      T1 checks fetched[url], T2 checks fetched[url]
      Both see that url hasn't been fetched
      Both fetch, which is wrong
    Simultaneous read and write (or write+write) is a "race"
      And often indicates a bug
      The bug may show up only for unlucky thread interleavings
    What will happen if I comment out the Lock()/Unlock() calls?
      go run crawler.go
      go run -race crawler.go
    The lock causes the check and update to be atomic
  How does it decide it is done?
    sync.WaitGroup
    implicitly waits for children to finish recursive fetches

ConcurrentChannel crawler
  a Go channel:
    a channel is an object; there can be many of them
      ch := make(chan int)
    a channel lets one thread send an object to another thread
    ch <- x
      the sender waits until some goroutine receives
    y := <- ch
      for y := range ch
      a receiver waits until some goroutine sends
    so you can use a channel to both communicate and synchronize
    several threads can send and receive on a channel
    remember: sender blocks until the receiver receives!
      may be dangerous to hold a lock while sending...
  ConcurrentChannel master()
    master() creates a worker goroutine to fetch each page
    worker() sends URLs on a channel
      multiple workers send on the single channel
    master() reads URLs from the channel
    [diagram: master, channel, workers]
  No need to lock the fetched map, because it isn't shared!
  Is there any shared data?
    The channel
    The slices and strings sent on the channel
    The arguments master() passes to worker()

When to use sharing and locks, versus channels?
  Most problems can be solved in either style
  What makes the most sense depends on how the programmer thinks
    state -- sharing and locks
    communication -- channels
    waiting for events -- channels
  Use Go's race detector:
    https://golang.org/doc/articles/race_detector.html
    go test -race 

Remote Procedure Call (RPC)
  a key piece of distributed system machinery; all the labs use RPC
  goal: easy-to-program client/server communication

RPC message diagram:
  Client             Server
    request--->
       <---response

RPC tries to mimic local fn call:
  Client:
    z = fn(x, y)
  Server:
    fn(x, y) {
      compute
      return z
    }
  Rarely this simple in practice...

Software structure
  client app         handlers
    stubs           dispatcher
   RPC lib           RPC lib
     net  ------------ net

Go example: kv.go link on schedule page
  A toy key/value storage server -- Put(key,value), Get(key)->value
  Uses Go's RPC library
  Common:
    You have to declare Args and Reply struct for each RPC type
  Client:
    connect()'s Dial() creates a TCP connection to the server
    Call() asks the RPC library to perform the call
      you specify server function name, arguments, place to put reply
      library marshalls args, sends request, waits, unmarshally reply
      return value from Call() indicates whether it got a reply
      usually you'll also have a reply.Err indicating service-level failure
  Server:
    Go requires you to declare an object with methods as RPC handlers
    You then register that object with the RPC library
    You accept TCP connections, give them to RPC library
    The RPC library
      reads each request
      creates a new goroutine for this request
      unmarshalls request
      calls the named method (dispatch)
      marshalls reply
      writes reply on TCP connection
    The server's Get() and Put() handlers
      Must lock, since RPC library creates per-request goroutines
      read args; modify reply
 
A few details:
  Binding: how does client know who to talk to?
    For Go's RPC, server name/port is an argument to Dial
    Big systems have some kind of name or configuration server
  Marshalling: format data into packets
    Go's RPC library can pass strings, arrays, objects, maps, &c
    Go passes pointers by copying (server can't directly use client pointer)
    Cannot pass channels or functions

RPC problem: what to do about failures?
  e.g. lost packet, broken network, slow server, crashed server

What does a failure look like to the client RPC library?
  Client never sees a response from the server
  Client does *not* know if the server saw the request!
    Maybe server never saw the request
    Maybe server executed, crashed just before sending reply
    Maybe server executed, but network died just before delivering reply
  [diagram of lost reply]

Simplest failure-handling scheme: "best effort"
  Call() waits for response for a while
  If none arrives, re-send the request
  Do this a few times
  Then give up and return an error

Q: is "best effort" easy for applications to cope with?

A particularly bad situation:
  client executes
    Put("k", 10);
    Put("k", 20);
  both succeed
  what will Get("k") yield?
  [diagram, timeout, re-send, original arrives late]

Q: is best effort ever OK?
   read-only operations
   operations that do nothing if repeated
     e.g. DB checks if record has already been inserted

Better RPC behavior: "at most once"
  idea: server RPC code detects duplicate requests
    returns previous reply instead of re-running handler
  Q: how to detect a duplicate request?
  client includes unique ID (XID) with each request
    uses same XID for re-send
  server:
    if seen[xid]:
      r = old[xid]
    else
      r = handler()
      old[xid] = r
      seen[xid] = true

some at-most-once complexities
  this will come up in lab 3
  how to ensure XID is unique?
    big random number?
    combine unique client ID (ip address?) with sequence #?
  server must eventually discard info about old RPCs
    when is discard safe?
    idea:
      each client has a unique ID (perhaps a big random number)
      per-client RPC sequence numbers
      client includes "seen all replies <= X" with every RPC
      much like TCP sequence #s and acks
    or only allow client one outstanding RPC at a time
      arrival of seq+1 allows server to discard all <= seq
  how to handle dup req while original is still executing?
    server doesn't know reply yet
    idea: "pending" flag per executing RPC; wait or ignore

What if an at-most-once server crashes and re-starts?
  if at-most-once duplicate info in memory, server will forget
    and accept duplicate requests after re-start
  maybe it should write the duplicate info to disk
  maybe replica server should also replicate duplicate info

Go RPC is a simple form of "at-most-once"
  open TCP connection
  write request to TCP connection
  Go RPC never re-sends a request
    So server won't see duplicate requests
  Go RPC code returns an error if it doesn't get a reply
    perhaps after a timeout (from TCP)
    perhaps server didn't see request
    perhaps server processed request but server/net failed before reply came back

What about "exactly once"?
  unbounded retries plus duplicate detection plus fault-tolerant service
  Lab 3