Thursday, October 30, 2008

Chord

Chord is a DHT (see previous post). The basic idea is that it provides a single function: lookup(key). The keys are generated from the SHA-1 hash function, which more or less guarantees their randomness. Each node is also given a key using the SHA-1 hash function. The nodes are arranged in a circle by hash value, and each node "stores" all keys that are larger than its own hash value, but smaller than the hash values of all other nodes (with wrapping).

The way queries are distributed through the network is with pointers from one node to various neighbors. Each node has a successor, as well as a list of "fingers". The fingers point to nodes that are halfway around the DHT, a quarter of the way around, an eighth, and so forth. To process a query, a node chooses the finger with the greatest hash value that does not exceed the value of the requested key, and forwards the request to that node. The request then propagates through the chain until it reaches the correct node.

Note that in order to maintain correctness, the fingers need not exist, and only the successor pointers need to be accurate. The successors and fingers are updated by a periodic stabilization process.

They tested the network in simulation, with 1000 nodes, and then again in practice, with 10. In both cases, they found that it helped to have several virtual nodes at each real machine, as a way of spreading out the keys across more nodes. They found that the number of hops that a query required was indeed O(log(n)), and that the network was indeed resilient to massive numbers of node failures.

The main complaint I have about Chord is that they expect applications to build redundancy on top of this system, probably by creating n keys for each piece of data that they want to store (where n is the number of copies), and then either 1) looking them up one at a time until one lookup succeeds or 2) looking them all up at once, and taking the first result. The first one wastes a lot of the application's time in the case of a failure, and the second one wastes a lot of bandwidth.

I think it would be interesting to implement Chord with redundancy built in to it. Not only would applications not have to worry about implementing this on top of Chord, but there might be good performance characteristics as well. Instead of searching for the one and only copy of a given key, you could search for the copy of the key that can be found by searching the "closest" finger. Alternatively, you could store link quality data, and transmit your query over the link with the lowest latency. Or, if there is data being stored at the nodes, perhaps you wish to find the copy of the data with the highest bandwidth to the origin of the request. But all of these would need some sort of built-in redundancy.

Wednesday, October 29, 2008

DHT Survey

This article is exactly as described: a survey of DHTs. The article first describes a few of the recent approaches to P2P systems, and why they are poor choices for scalability. Among current architectures are trees; trees have a problem with failures of high-level nodes, which can do a lot of damage to the system. Another approach is a simple broadcast-to-all-neighbors mechanism, which ends up sending too many messages in increasingly large systems. The superpeer approach used by KaZaA also has single point of failure problems.

Thus, a modern area of research is the distributed hash table (DHT), which implements one function (lookup), and in which no node is more important than any other node. Among the various implementations of the DHT are Chord, Pastry, Tapestry, Kademlia, and CAN. Chord uses a circular linked list, with "fingers", which point halfway around the circle, a quarter of the way around the circle, an eight of the way around the circle, and so forth.

Pastry, Tapestry, and Kademlia use a tree structure. In the tree structure, they store several leaves (nearby nodes), as well as a collection of nodes that cover the space of the first digit, a collection that cover the space of the first two digits, a collection for the first three digits, and so forth. Thus at any node, you can find a link to another node which shares the first n digits, where n is determined by the implementation.

CAN uses n-dimensional space to represent its nodes. Each node has a pointer to any neighbor that it shares an (n-1)-dimensional hyperplane with. New nodes choose a random location in hyperspace, and then force an existing node to split its dimensions in half. Lookup requests are routed along a straight-as-possible line in hyperspace.

The key point of all of these algorithms is that there is some sort of geometry, some sort of identifier, and some sort of distance function which can tell you the difference between two given identifiers. The key point of the DHT is to provide a reliable way to lookup keys independent of joins and disconnects of the machines that create the DHT.

Active Networks

This paper basically proposes allowing user extensions to internet routing. It claims a variety of other applications, but this is the only one that I don't think is currently possible. It comes with two basic ideas: signed code, and capsules.

The signed code is signed by some authority that guarantees that it uses the API provided, as well as that it uses the internet in some reasonable manner. It is distributed to nodes throughout the internet using soft-state, and has a cryptographically secure hash associated with the code itself. Note that this makes it difficult to spoof code.

The signed code then operates on "capsules", which are essentially packets with the additional ANTS header information. The additional header also has the hash of the code that is supposed to be processing it. This feature acts as both naming and security. It allows a wide variety of different services to be deployed across the internet.

There are two obvious benefits of Active Networks. The first is as a way to test alternate routing protocols. The second is as a way to test alternate queuing protocols. Unfortunately, the code must be implemented in java, and therefore has poor performance characteristics (the java requirement is important for proof-of-correctness concerns, including being strongly typed). This means it's not spectacularly useful except to test the correctness of new algorithms.

The possible pitfalls (as they have noted in the paper) include starvation of other internet services, not only by malicious users, but also by bugs in the code. These bugs need not be limited to a single node - if a routing algorithm routes in a loop, then there is no cure for the resources that will be consumed along the loop. This proposal basically suffers from the same problems as source routing, which has been turned off by most routers in the internet.

Tuesday, October 28, 2008

RON

The basic idea behind RON is that internet routing is not perfect, and that the BGP protocol is slow to recover from failures in the internet; therefore, sometimes we want to route packets through other hosts.

There are two basic kinds of failures that RON tries to work around: outages and performance failures. In the case of an outage, a path on the internet becomes unusable. In other words, the number of packets that reach their destination is none to few. In such cases, connections are completely unusable. In the case of a performance failure, the performance is so poor that the application is unable to function. For example, it is possible that the link between two nodes has great throughput, but the latency is high. This would mean that the link is unusable for an application like SSH.

The basic idea is to have a collection of nodes that can route through eachother in order to route around failures in the core of the internet. Each node periodically measures the quality of its links to each of the other N-1 nodes, and broadcasts this information to all nodes. Thus, each node ends up with a link state table of size O(N^2). They can then choose a path in the graph through which to route their packets. This path can be a single link, or multiple links. A client on each machine receives packets, and sends them to their next hop. In this way, RON allows applications to route around failures in the internet.

Results: RON takes an average of 18 seconds to route around failures in the internet. It can improve throughput by a factor of 2 or cut the loss rate by 0.05 or more in 5% of cases (each - it's possible that these two sets overlap).

One of the great things about RON is that it allows for application-specific path optimization.

Monday, October 20, 2008

Router Topology

This paper states that measurements like degree properties are insufficient to properly construct a network topology that truly resembles the internet. It presents several different construction mechanisms for graphs with the same degree distributions, and then compares them in terms of both their likelihood and their achieved bandwidth. Likelihood is defined as a ratio between L_max - L(graph) and L_max - L_min. The value L(graph) is given by \sum_{(i,j)\in E} d_i * d_j, where d_i is the degree of node i, and the (i,j) are the edges in the graph.

The paper first spends a lot of time stating that even though two different networks may have similar degree distributions, their topologies may be very different. Near the end, it describes several different methods for creating the graphs of which it speaks. Some of the graphs are as follows:
  • Preferential Attachment - Start with 3 connected nodes, then each new node attaches to one with probability proportional to its current degree. Add additional links until the degree distribution works out.
  • GRG - uses the numbers from PA to create a randomly (possibly unconnected) graph. Results in highly connected central nodes.
  • Heuristically Optimal Topology - starts with a few lightly connected core routers, and then high degree gateways attached to these routers, spanning out into trees of edge routers.
  • Abilene-inspired Topology - I wasn't exactly sure what this was.
  • Sub-optimal Topology - The core routers are a chain, instead of being lightly connected.
It then compares the bandwidths of these topologies, which all have the same degree distribution. It finds that the HOT has the highest bandwidth, followed by the Abilene-inspired Topology. It also finds that in general, the high bandwidth topologies have lower likelihoods.

Overall, this paper builds nicely on the previous power laws paper, but I would like to see a comparison of the other properties of these topologies that were explored in the power laws paper (such as the expected neighborhood size per number of hops, and the eigenvalues). Still, it actually provides a method for constructing an internet-resembling topology, which is good for simulations and theoretical calculations.

Power Laws

I am not sure how much can be said about this paper. It's not that the paper is short on material, it's just that I can't think of any way to summarize it other than stating the laws that they have discovered.

The basic idea of this paper is that various properties of the internet follow power laws. That is, variables are proportional to other variables raised to constant powers. These constants may vary a little over time, or, as in the case of the eigenvalues, may be true constants.

Here are the results:
  • The outdegree of a node is proportional to its "rank" (its rank according to its degree) raised to the power R, which ranged from -0.82 to -0.74 .
  • The frequency of a given degree is proportional to the degree raised to the outdegree power, which seems to be a constant between -2.2 and -2.15 .
  • The number of neighbors of a node that are within h hops is proportional to h to a power H.
  • The eigen value lambda_i (where they are sorted in order) is proportional to i ^(-0.5), regardless of era.
With the exception of the outdegree, these power seem to be constants regardless of the growth of the internet.

Sunday, October 19, 2008

Diffusion

I'm not certain when this paper was published, but it appears to be a preliminary guess at the operation of sensor networks. The key idea here seems to be that both requests and data will diffuse through the network.

The requests are shaped in the form of a type of data to be gathered, a location in which it is to be gathered, and a rate at which it is to be gathered. Each node that matches the request not only gathers the data for it, but it also combines similar requests, and remains agnostic towards the intended receivers.

Nodes then broadcast their information to all nearby nodes. Intended receivers then reinforce good paths by sending the same requests with higher data rates to the nodes from which it first received data. In this way, the receiver essentially carves out a path that his data will flow through.

I don't know why they're trying to avoid using the typical routing protocols that have been developed. Maybe it's to save power. I didn't see a comparison between this protocol and traditional routing protocols such as DSDV and DSR (other than omniscient multicast). I am not certain what they mean by omniscient multicast. In theory, the omniscient multicast they describe would be a lower bound on power consumption, but they claim to do five times better. They claim that this has something to do with the result of in-network aggregation, but a factor of five seems unlikely.

Saturday, October 18, 2008

TAG

This paper presents a way to collect aggregate values (thus it is called Tiny AGgregation) across sensor networks. The key difference between a sensor network and a normal network is the need to conserve power, which is what this paper addresses by decreasing both the size of packets and the number of packets which must be sent by one of these sensors.

They first present an SQL-like query language, which basically allows the following keywords: SELECT, GROUP BY, HAVING, WHERE, and some aggregation functions (as well as EPOCH and DURATION, which aren't that important to the algorithm). Next, they build a tree of nodes (where each node has exactly one parent, and the root is the user), using an arbitrary routing algorithm. This tree is updated periodically.

Next, they classify the aggregates that they can use. There are several different properties that are important. First is montonicity. Monotonic aggregates include things like MAX and MIN, and their monotonicity can be used to 1) evaluate some HAVING predicates before passing data to a parent, potentially saving one communication and 2) snooping on other nodes to determine whether or not to send your own data. The second property is the type of partial state that must be transmitted per node. The list of types contains Distributive, Algebraic, Holistic, Unique, and Content-Sensitive, which basically boils down to either constant per node or linear in the number of total children. This has a significant affect on the amount of communication and power consumption of nodes, especially parents close to the user. The other properties are Exemplary/Summary and Duplicate Sensitive, which affect optimizations that can be achieved.

Given these properties, each aggregate function is assigned a set of these properties, as well as two functions: a combining function and an evaluating function. The combining function allows each individual node to combine data from multiple children. The evaluation node allows the user to compute their actual desired value. Each node computes which group it belongs to, and separates the data by group id. Each node then hears the data from all of its children, and combines data for identical group ids. It then passes this information on to its parent. This allows computation to be done at the nodes, rather than done by the user, which saves transmissions and therefore power.

Wednesday, October 8, 2008

COPE

This paper describes COPE, which is a way of increasing throughput by XORing various packets together. The basic idea is that you can take advantage of the broadcast nature of wireless to transmit multiple packets at once to multiple destinations. The easiest way to do this is to XOR several packets together, which you can do if you know that the destinations have all but one of the packets you are about to send. This is made possible by a combination of eavesdropping on packets and storing them for a short duration.

An important question is how the information about packet availability is transmitted between nodes. From what I understand, they piggyback on normal packets inside the XOR header. If a node has no packets to send, it sends out this information in periodic updates. Packets are cached for about half a second before they disappear. In the case that no explicit information is available about whether or not a node has cached a particular packet, a sender may guess based on the link reliability. When transmitting, a sender requires that the overall probability that everyone has the desired packets must be at least some threshold (80% in this case).

No sender will ever encode two packets headed for the same next hop, so each sender keeps a set of virtual queues, one for each neighbor. However, XORing large packets with small packets is somewhat wasteful, so there are actually two virtual queues for each neighbor. Then, to figure out which packets to XOR together, the algorithm simply considers the packets at the head of each queue. The algorithm never delays packets, so if no XORing is possible, it sends just one packet.

They get great results out of this, not just because they can reduce the number of transmissions that need to be made, but also because this algorithm can increase fairness at various relays. If a relay is receiving traffic from two sources, and must then retransmit both packets, it can reduce the number of packets it needs to send from 2 to 1. This helps with fairness, since each of the three nodes is now sending one packet, instead of the relay trying to send at a 2:1 ratio.

ExOR

If this paper tells us what ExOR stands for, I couldn't find it. In any case, ExOR is a routing algorithm that takes advantage of variations in signal quality and the broadcast nature of wireless networks. The main idea behind ExOR is that the source broadcasts every packet (where packets are grouped into batches). Then, every receiver re-broadcasts packets in order of priority, where the priority is determined by the sender and sent with the ExOR header. The retransmitters do this by determining their place on the list, and then waiting for everybody above them on the priority list to transmit. Retransmitters only retransmit packets that have not been transmitted by higher priority nodes.

How do sources determine priority? From what I could tell in the paper, periodic pairwise tests are conducted, and then the information is transmitted throughout the network. Each node therefore ends up with a full map of the network and therefore knows the preferred intermediate nodes for any destination.

The claim is that this algorithm is great for bulk data transfer, because it can take advantage of 1) packets that travel farther than expected and 2) packets that don't travel as far as expected. Packets that travel farther than expected are simply that much closer to the destination, which is good. Packets that don't travel as far as expected would normally be retransmitted by traditional routing algorithms, whereas they at least make some progress with ExOR.

I have a few complaints about ExOR. First of all, it is useful only for bulk transfers. It is very good at improving throughput, but the latency cost is (I imagine) rather high (this is not mentioned in the paper). The paper gets around this by saying that it uses split TCP connections, and that ExOR should only be used for bulk file transfer. This naturally violates the E2E principle, not only because the TCP connection is being split, but also because there must be some decision maker at the wireless-wired interconnect that decides whether or not to use ExOR. This decision maker probably has to inspect packets (to see if they are file transfers of some sort - such as FTP or torrents), and guess on this issue. This is less than ideal.

ADDENDUM (8:56 PM): I forgot to mention that this is unlikely to work for incredibly dynamic networks, and is much better in static networks, where the broadcast-your-neighbors mechanism works much better (as we saw in the previous paper comparing DSDV and DSR).

Tuesday, October 7, 2008

Routing Protocols

This paper compares several protocols: DSDV, TORA, DSR, and AODV. Another pair of papers I should have read in the opposite order. Anyway, the metrics it uses are the percentage of packets that get through, overhead introduced by the routing protocol (in number of packets), and path optimality (difference between length of chosen path and minimum length of path). It uses a simulation in which nodes move around a rectangular area at random speeds, with a variety of predetermined rest-times. These rest times range from 0 to 900 seconds (continuous motion to no motion). Furthermore, this is the first paper to gather metrics while multiple transmitters are sending data (10, 20, or 30).

First, a bit about the algorithms. DSDV (Destination Sequenced Distance Vector) is an algorithm in which each node maintains data about the next hop for each destination. Nodes regularly broadcast updates to eachother whenever the network topology changes. These broadcasts include sequence numbers, allowing other nodes to choose the most recent updates.

All other routing protocols are based on dynamic requests for routes. TORA (Temporally Ordered Routing Algorithm) responds to requests by attempting to compute the "height" of each node with respect to the destination, and then sending an update back to the requester. DSR (Dynamic Source Routing) works by simply asking for a route, and then allowing the entirety of the found route to be propagated backwards through updates. AODV (Ad hoc On demand Distance Vector) is essentially a combination of the DSR and DSDV algorithms. It keeps the same state as DSDV, but its updates are done by request-update, instead of periodic pushes.

In summary, All algorithms get approximately the same throughput across different rest times, except for DSDV, which cannot make its paths converge for small rest times. However, the overhead (in number of packets) of DSR is the smallest. The overhead of AODV is smaller in bytes than the overhead of DSR, but those bytes are distributed across many small packets. Most overheads decrease as rest time increases, with the exception of DSDV, which remains constant. Despite this, DSDV does second-best in path optimality, next to DSR.

ETX

This paper puts forth a relatively simple metric, and then measures it against the current metric, which is number of hops. It then discusses open issues.

The problem with number of hops is twofold: 1) links can be asymmetric, causing transmissions to work just fine in one direction, whereas ACKs in the other direction will not be received and 2) this metric chooses longer, less reliable links over shorter, more reliable links. To solve this, each link is given a reliability (which is directed, to avoid the asymmetry problem), which is based on the number of packets it receives from its neighbors within a certain window (ten seconds), with packets spaced at one per second. The likelihood that a packet will cross a certain link is the product of the reliabilities for each direction. The expected number of transmissions of a packet is therefore 1/(product of probabilities).

This metric is then used in both DSDV and DSR, and the results are compared. The main measure is goodput for point-to-point connections going at maximum speed. This is a somewhat troublesome metric, since TCP doesn't necessarily go at maximum speed (or UDP either for that matter), and there are usually multiple network users. Regardless, the result is that the ETX metric does not perform significantly worse than number of hops, and performs much better for some links for which number of hops got essentially 0 goodput.

There are several outstanding problems with ETX. First, the reliability of a given link for packets of the size sent to measure reliability is not necessarily the same for either larger or smaller packets. This causes ETX to overestimate the reliability for data paths, and underestimate reliability for ACK paths.

It might be possible to keep a histogram of reliability scores rather than a simple running average. Test packets could be of random length between the size of an ack and the max packet size. The histogram might then be used to determine transmission ratios for varying packet lengths. The histogram could be as simple as (small, medium, large).

Thursday, October 2, 2008

Roofnet

Roofnet is essentially a way to get wireless internet working with a distribution of nodes that is neither random nor planned. The goal is to have relatively few internet connections servicing a community of users who will connect through this wireless mesh.

In order to accomplish this, a collection of Roofnet nodes were distributed, each installed by a local community member and connected to an ISP. If a given node finds itself to be connected, it broadcasts itself as a gateway. Otherwise, it acts as a DHCP server. Roofnet nodes use NAT to translate the IPs that it assigns to attached devices. Routing works as follows: each node keeps a list of nearby nodes, as well as the quality of the links between them. It also maintains this list for other nodes in the network, and is able to request such a list if its own list is deficient. It can then run Dijkstra's algorithm to determine the best path between itself and an internet gateway.

This routing algorithm can receive updates in multiple ways. First, if the quality of a link changes, the nodes at either end of that link will notice and update themselves. Second, other nodes will periodically ask for updates. Third, nodes can overhear other nodes' updates and adjust their own lists. To guess the throughput of a multi-hop connection, the inverse of the sums of the inverses of the individual throughputs are used. No reason is given for this (or at least I could not find one), other than the fact that it seemed to match empirical data.

Suffice it to say that the paper finds that they get decent throughput, and that they do so without using RTS/CTS. My question is this: how many users can they have connected to such a network at once (no more than 256 due to internal IP representations? no more than 65536 due to NAT?). Is it possible to set up port forwarding for services that users may require? How many simultaneous connections can be made before congestion drives throughput to a standstill? In other words, would Roofnet be a suitable way to connect a community? Also, who would be willing to foot the bill for the internet that Roofnet connects to?

Some of those questions are technical, and some of those questions are more logistic. But after reading through the paper (perhaps I did not read carefully enough), I still don't know what causes reduced throughput across a multi-hop connection. They claim that it is due to interference between packets transmitted by different nodes, but that seems unlikely to account for all of the reduction (especially because RTS/CTS didn't help). Is it because the routers are slow? Is it because there are many people on each given connection, and packets going across multiple connections must be scheduled multiple times? Somebody tell me if I completely missed this section of the paper.

Wednesday, October 1, 2008

Modeling Wireless Links

This paper is essentially a survey of the properties of wireless links. The paper does a good job of splitting wireless links into three prevalent categories: WLAN, cellular links, and satellite. It then makes a long list of ways in which wireless links differ from regular links, and describes how this affects each of the three categories.

The first characteristic of wireless links is losses due to errors rather than congestion. The paper points out that this is a relatively rare occurrence, but that when errors do occur, they are somewhat clustered, and affect the link in both directions. Alongside error losses is packet reordering, which the paper claims are currently relatively rare, but have the potential to have an adverse affect on transport protocols if the reordering is severe.

In addition to the above issues, there is also the issue of variation - both in bandwidth and in delay. Variation in delay can be caused by a number of factors, including link-layer recovery and handovers. Variations in bandwidth also exist in wireless systems (due to scheduling); these kinds of variations are likely to confuse transport layer protocols, which assume that neither the bandwidth nor the RTT is fluctuating too wildly. A fluctuating RTT can be the cause of unnecessary retransmits, or unnecessarily long recovery times. Fluctuating bandwidths can cause underuse of the link or sudden congestion.

Other important aspects include bandwidth asymmetry (which is fairly obvious) and handovers, which can cause the loss of all in-flight packets and a temporary but large increase in delay, as well as permanent changes in both bandwidth and RTT.