6.5840 Lecture 8: lab 2 A+B Q&A and lab 3A

*** Lab 2A+B Q&A

Structure of Raft library I'll show
  many threads (Start() thread, thread reading from applych)
  many RPC threads to talk to peers in parallel
  many RPC handler threads (started by RPC package)
  raft state with lock
  one thread writing to applych
    use condvar to signal it
  other plans are possible

Locking
  Raft lock serializes operations
  RPC handlers hold Raft lock so are atomic
    little parallelism
  Threads don't hold lock during RPC
    risk: deadlock
  
Getting started on a lab
  make first test case work
   fill out VoteRequest structs
   do an RPC
  then much easier to get the lay of the land
  other strategy: read all guides, read test code, etc.

Debugging
  Log all action/messages in easy searchable way
   standard format: src, dst, opcode, raft state,..
  Run test case
    if ok:
      next test case
    if fail:
      repeat:
        study test case
        formulate hypothesis about what might be wrong
        study log and figure 2, run with race detector
        modify code and try test again
        
Code tour
  Raft struct
  Ticker
  Election timeout
  Start election
  VoteRequest handling
  becomeLeader
  send appends
  commit
  applier
  Start()
  
*** duplicate RPC detection (Lab 3)

Draw clients, k/v service, raft

What should a client do if a Put or Get RPC times out?
  i.e. Call() returns false
  if server is dead, or request was dropped: re-send
  if server executed, but reply was lost: re-send is dangerous

problem:
  these two cases look the same to the client (no reply)
  if already executed, client still needs the result

idea: duplicate RPC detection
  let's have the k/v service detect duplicate client requests
  client picks a unique ID for each request, sends in RPC
    same ID in re-sends of same RPC
  k/v service maintains a "duplicate table" indexed by ID
  makes a table entry for each RPC
    after executing, record reply content in duplicate table
  if 2nd RPC arrives with the same ID, it's a duplicate
    generate reply from the value in the table

how does a new leader get the duplicate table?
  put ID in logged operations handed to Raft
  all replicas should update their duplicate tables as they execute
  so the information is already there if they become leader

if server crashes how does it restore its table?
  if no snapshots, replay of log will populate the table
  if snapshots, snapshot must contain a copy of the table

what if a duplicate request arrives before the original executes?
  could just call Start() (again)
  it will probably appear twice in the log (same client ID, same seq #)
  when cmd appears on applyCh, don't execute if table says already seen

idea to keep the duplicate table small
  one table entry per client, rather than one per RPC
  each client has only one RPC outstanding at a time
  each client numbers RPCs sequentially
  when server receives client RPC #10,
    it can forget about client's lower entries
    since this means client won't ever re-send older RPCs

some details:
  each client needs a unique client ID -- perhaps a 64-bit random number
  client sends client ID and seq # in every RPC
    repeats seq # if it re-sends
  duplicate table in k/v service indexed by client ID
    contains just seq #, and value if already executed
  RPC handler first checks table, only Start()s if seq # > table entry
  each log entry must include client ID, seq #
  when operation appears on applyCh
    update the seq # and value in the client's table entry
    wake up the waiting RPC handler (if any)

but wait!
  the k/v server is now returning old values from the duplicate table
  what if the reply value in the table is no longer up to date?
  is that OK?

example:
  C1           C2
  --           --
  put(x,10)
               first send of get(x), reply(10) dropped
  put(x,20)
               re-sends get(x), server gets 10 from table, not 20

get(x) and put(x,20) run concurrently, so could run before or after;
so, returning the remembered value 10 is correct