Wednesday, October 29, 2008

DHT Survey

This article is exactly as described: a survey of DHTs. The article first describes a few of the recent approaches to P2P systems, and why they are poor choices for scalability. Among current architectures are trees; trees have a problem with failures of high-level nodes, which can do a lot of damage to the system. Another approach is a simple broadcast-to-all-neighbors mechanism, which ends up sending too many messages in increasingly large systems. The superpeer approach used by KaZaA also has single point of failure problems.

Thus, a modern area of research is the distributed hash table (DHT), which implements one function (lookup), and in which no node is more important than any other node. Among the various implementations of the DHT are Chord, Pastry, Tapestry, Kademlia, and CAN. Chord uses a circular linked list, with "fingers", which point halfway around the circle, a quarter of the way around the circle, an eight of the way around the circle, and so forth.

Pastry, Tapestry, and Kademlia use a tree structure. In the tree structure, they store several leaves (nearby nodes), as well as a collection of nodes that cover the space of the first digit, a collection that cover the space of the first two digits, a collection for the first three digits, and so forth. Thus at any node, you can find a link to another node which shares the first n digits, where n is determined by the implementation.

CAN uses n-dimensional space to represent its nodes. Each node has a pointer to any neighbor that it shares an (n-1)-dimensional hyperplane with. New nodes choose a random location in hyperspace, and then force an existing node to split its dimensions in half. Lookup requests are routed along a straight-as-possible line in hyperspace.

The key point of all of these algorithms is that there is some sort of geometry, some sort of identifier, and some sort of distance function which can tell you the difference between two given identifiers. The key point of the DHT is to provide a reliable way to lookup keys independent of joins and disconnects of the machines that create the DHT.

No comments: