Tuesday, October 28, 2008

RON

The basic idea behind RON is that internet routing is not perfect, and that the BGP protocol is slow to recover from failures in the internet; therefore, sometimes we want to route packets through other hosts.

There are two basic kinds of failures that RON tries to work around: outages and performance failures. In the case of an outage, a path on the internet becomes unusable. In other words, the number of packets that reach their destination is none to few. In such cases, connections are completely unusable. In the case of a performance failure, the performance is so poor that the application is unable to function. For example, it is possible that the link between two nodes has great throughput, but the latency is high. This would mean that the link is unusable for an application like SSH.

The basic idea is to have a collection of nodes that can route through eachother in order to route around failures in the core of the internet. Each node periodically measures the quality of its links to each of the other N-1 nodes, and broadcasts this information to all nodes. Thus, each node ends up with a link state table of size O(N^2). They can then choose a path in the graph through which to route their packets. This path can be a single link, or multiple links. A client on each machine receives packets, and sends them to their next hop. In this way, RON allows applications to route around failures in the internet.

Results: RON takes an average of 18 seconds to route around failures in the internet. It can improve throughput by a factor of 2 or cut the loss rate by 0.05 or more in 5% of cases (each - it's possible that these two sets overlap).

One of the great things about RON is that it allows for application-specific path optimization.

No comments: