Monday, November 24, 2008

Policy Aware Switching

I would just like to start by saying that while this paper presents a nice system, it is relatively long by virtue (or vice) of verbosity.

This paper essentially presents a system similar to the DOA we studied a few weeks ago. The idea is that there are p-switches that delegate the standard middlebox functions to boxes that are not on the path into the datacenter. This usage is slightly more easily applied, because it does not require a new naming scheme, and can be implemented one datacenter at a time.

Each p-switch has a configuration that basically uses the source MAC address and 5-tuple of each packet to decide where the packet should go next. The 5-tuple is the list of 5 datapoints: source ip, source port, destination ip, destination port, transport protocol. The way the configuration works is that each 5-tuple, source pair has a "next hop" associated with it (use of wildcards allowed). Thus, a packet that comes from (for example) a firewall, even though none of its 5-tuple fields has changed, can still be directed to the appropriate next hop (for example, a load balancer) because the p-switch can deduce its source.

This system is also capable of succeeding in situations where the packets are addressed to one of the middleboxes (say, a load balancer) which will then rewrite its IP header by using a clever combination of wildcards and extra rules to make sure that each packet passes through the same middleboxes on the way in and on the way out.

Rules can have multiple values for the "next hop" field, allowing it to load-balance the work of a single middlebox across multiple middleboxes of the same type. To make sure that all packets from a single flow go through the same middlebox, the p-switches use consistent hashing.

I have nothing bad to say about this paper, except that the recovery time from failures appears to be as large as 3 seconds. That's a huge amount of downtime or lost packets for a datacenter. Of course, that may be a result of the implementation, but it would be nice to see that number reduced.

No comments: