Chord paper FAQ Q: Is hashing across machines a good way to get load balanced sharding? Why not explicitly divide up the key space so it's evenly split? A: If you could have a centralized server that assigns keys to shards then an exact division is a great plan. Many systems do just that (e.g., GFS or the shard master in lab 4). If you cannot have a central server, then you need another plan for load balancing, and consistent hashing is such a plan. Q: Does BitTorrent use Chord? A: The Bittorrent P2P Tracker uses Kademlia. Kademlia and Chord are similar. Bittorrent itself doesn't use Chord or Kademlia. Q: If you want to add fault-tolerance to a Chord-based system should you replicate each Chord node using Raft? A: Typical Chord-based applications don't need strong consistency, and have weak durability requirements (e.g., often the client must refresh the data periodically to ensure it isn't lost). So Raft seems too heavy-weight. I know of only one design (Scatter) that combines Chord and Paxos, where segments of the ring form a Paxos group to get stronger guarantees. Google "Scatter SOSP" if you are curious. Q: What if Chord DHT nodes are malicious? A: Chord (and most peer-to-peer systems) cannot handle malicious participants. An open Chord system is vulnerable to a Sybil attack: in an open Chord system, an attacker can become a participant and create many chord nodes under the attacker's control, and take over the whole system. There are DHTs that try to handle such attacks but it is challenging in a peer-to-peer setting (e.g.,see http://pdos.csail.mit.edu/papers/whanau-nsdi10.pdf). Chord and application on top of it provide some security measures, however. For example, node IDs are typically the SHA-1 of the IP address of a node, so that attacker must have control of the IP address being used. Application typically advertise data in Chord under the key that corresponds to the SHA-1 of the data; so when when application retrieves the data, it can check that is the right data. Q: Is Chord used anywhere in practice? A: We don't know. Clearly Kademlia and Amazon's Dynamo are strongly influenced by Chord. Rumor has it that Cisco uses Chord in some of its products. Q: Could the performance be improved if the nodes knew more about network locality? A: Yes. The total latency for a lookup can be improved using proximity routing (e.g., see https://pdos.csail.mit.edu/papers/dhash:nsdi/paper.pdf). Q: Is it possible to design a DHT in which lookups take less than log(N) hops? A: There are a number of O(1) hops DHTs, but they require more bandwidth. Accordion is one that dynamically adjusts between O(1) and O(log N): www.news.cs.nyu.edu/~jinyang/pub/nsdi05-accordion.pdf Q: Does consistent hashing of keys still guarantee load balanced nodes if keys are not evenly distributed? A: Chord hashes the keys provided by the application using a SHA1 so that the keys are well distributed in the key space. Q: In the case of concurrent joins and failures, Chord pauses when a get fails to find the key it was looking for. If there's constant activity, how can Chord distinguish between the system not being stable and the key not actually existing? A: The idea is not to wait until the system is stable, because there might never be a stable system. Instead, the plan is to just retry after a certain period of time, because stabilization may have fix the routing info needed for that lookup. With good chance, Chord will go back to case 1 or 2 mentioned in that paragraph of the paper. Q: If I introduce a malicious peer in Chord that keeps returning wrong values or inexistent addresses how disruptive can it be to the whole DHT? How does Bittorrent deal with particular issue? A: A malicious node usually can't forge data, since the data is usually signed or protected with a cryptographic hash. Bittorrent files are protected in this way -- the original torrent file contains the hash of the desired file, and the client hashes the file it eventually gets to check that the content is correct. Still, a malicious node can deny that content exists, or route lookups incorrectly. If it's just a few badly behaved nodes, then the system can cope by replicating data at a few different nodes near the key in the DHT's ID space. Lookups can try a few nearby nodes. An attacker could pretend to be millions of separate nodes. If all of the nodes only routed queries to other fake nodes, and they all denied that the desired data existed, they could successfully prevent clients from downloading. Bittorrent has a weak defense against this attack -- this is what the "token" mechanism in the reading refers to. Basically a node is not allowed to join the Kademlia DHT unless it can prove that it really receives packets sent to its claimed IP address. The idea is that an attacker probably only controls a handful of IP addresses, so this token defense will hopefully limit the attacker to joining the DHT only a handful of times. Some real-life attacks have managed to get control of hundreds or millions of IP addresses (by manipulating the Internet routing system), so the token defense it not bullet-proof. Q: Why isn’t there a danger of improper load balancing if some keys are simply used more than others? A: That danger does exist. I think if it was a problem in practice, system designers would replicate or cache data in the DHT nodes. Bittorrent effectively does this in the way it uses Kademlia -- there are typically many peers serving a given torrent, and each inserts an entry into the Kademlia DHT declaring that it is willing to serve pieces of the torrent. The inserts go to Kademlia nodes with IDs near the key ID, not necessarily to the Kademlia node that's the real home of the key ID. When a new client wants to find out about a torrent, its DHT lookup stops as soon as it encounters any entries for the key, so it might not have to bother the real DHT home node for the key. System built on Chord (e.g. CFS) replicate a key/value pair at the r nodes after the key's successor on the ring. get()s choose a random one of those r nodes, to avoid hot-spots.