Monday, September 29, 2008

Wireless TCP

This paper essentially addresses the problem of using TCP over lossy wireless networks by comparing the available proposed solutions to this situation and extensions to TCP. These are split into three categories: 1) link-layer solutions, 2) connection-splitting solutions, and 3) E2E solutions. The link layer solutions attempt to do reliability at the link layer. Connection splitting solutions split the connection in to two pieces, allowing the base station to send ACKs upon receiving packets, and then making sure that the base station reliably transmits the data to the wireless nodes. E2E solutions simply modify the TCP behavior at the end nodes.

There are a lot of different methods of improving TCP performance. Among them are: SACKs (selective acknowledgements). Selective acknowledgements let the sender know exactly which packets were received, so as to avoid unnecessarily resending data. There is also explicit loss notification, which (as the name suggests) explicitly notifies the sender when a packet has been lost. This is advantageous because it helps the sender differentiate between losses that occur because of congestion and losses that occur because of noise in the link.

Other methods include retransmitting on the first duplicate ack, which is beneficial for lossy networks, where a loss is most likely to be due to noise rather than congestion. The final method is duplicate ack suppression at base stations, which gives the base station some time to retransmit the packet before it forwards the duplicate acks to the endpoint; this means that the endpoint will not decrease its congestion window when a non-congestion loss occurs, thereby keeping the throughput high.

As it turns out, the best method turns out to be a link-layer method that both suppresses duplicate acks at the base station and uses SMART based selective acknowledgments. This is called a TCP-aware algorithm because it must understand the meaning of the acknowledgements in order to work.

There's not much more to say about this paper other than the fact that there may be some overhead in determining that the connections being created are, in fact, TCP connections. It would be interesting to run the same experiments with both TCP and UDP and to determine how much overhead is involved in examining the packets. It might also be interesting to run some similar experiments with protocols that use NACKs instead of ACKs.

MACAW

While there are a lot of different corner cases that this paper presents, and a lot of different reasons for the algorithms presented, the final protocol is quite simple to explain. The protocol is essentially RRTS-RTS-CTS-DS-DATA-ACK. This is improved over the MACA algorithm, which only had RTS-CTS-DATA. The paper lists one-by-one the reasons for this algorithm. Let's start with the DS (data sending). Whenever there is congestion, a node must wait until all data has already been sent before sending an RTS. However, when the node doesn't have the ability to detect whether or not data is being sent, it has no way of knowing when to send the RTS. The DS message solves this problem by describing how much time it will take the data to be sent, thereby letting all nodes know when they can send their next RTS.

We can continue with the RRTS. If a node cannot send a CTS because it is currently overhearing traffic from some other stream, then it must notify potential senders when that traffic is done. It does so by using an RRTS, which tells the potential senders that they may now send an RTS message. Without the RRTS, potential senders would randomly send RTS messages, with almost no chance of doing so at a time when there was no interference.

The last addition to the protocol is the ACK. This comes straight out of the E2E paper, which says that if something can be done more efficiently at the lower levels, it should be done there, rather than pushed to the ends. In this case, the use of the ACK allows for much higher throughput, because wireless links are considerably noisier and less reliable than physical links. Since the loss of a packet does not necessarily mean congestion of the link in question, TCP's congestion avoidance is not an appropriate reliability mechanism for wireless networks. Thus, the reliability is programmed in at the link layer.

There is one more significant issue that this paper raises, which is the question of the back-off counter. The paper suggests using a shared value for this back-off counter, since everyone should have a consistent view of the network. Anyone that receives a back-off value immediately stores it. The paper also suggests multiple back-off values: one for the sender and one for the receiver. If an RTS is sent but not received, this means that there is congestion at the receiver. If a CTS is sent but not received, this means that there is congestion at the sender. The back-off counters are decremented accordingly. At the same time, to avoid massive fluctuations in back-off counters, the increase/decrease algorithm is changed to a increase by a factor of 1.5, and a decrease by a constant of 1. All of these, when put together, allow a conistent view of the network by all nodes, as well as a fairly stable back-off counter.

This paper mentions almost nothing about nodes entering and leaving the ranges of base stations. Suppose a node leaves the range of a base station while the base station is transmitting to it? Does the lack of an ACK mean that the base station will try to retransmit? Similarly, suppose a node enters the network. How does it go about making contact with a base station, while making sure not to cause interference with transmissions that are currently in progress? It could listen for DS messages, and then send an RTS at the next available time slot, but this would certainly not work if it were alone in the network. These questions seem to be left unanswered by the paper.

Tuesday, September 23, 2008

Gb Switched Router

This paper gives a general overview of modern techniques used to improve router performance. In retrospect, it would have been much better to read this paper before reading the other paper. Too late now. This paper covers the operation and need for switched backplanes, as well as the iSLIP and ESLIP protocols.

The general idea behind routers is that there are various linecards which take incoming packets and forward them to a decision maker, and are also able to transmit outgoing packets. The key issue is the decision maker. In early routers, it was a single CPU. As this CPU became the bottleneck in routers, it was replaced with an individual decision maker for each linecard, with a bus connecting the various CPUs (so that packets can move from one linecard to another). Then, since the bus became the bottleneck, the need for switched backplanes arose.

A switched backplane is essentially a grid that allows any linecard to connect to any other linecard. The only caveat is that a linecard can only connect to a single other linecard at a time. Therefore, there needs to be a scheduler to open and close connections on the grid. The algorithm used to determine which connections to open/close is called iSLIP. It works by allowing the sources to request destinations, letting the destinations choose a source in round-robin fashion, and then letting the sources choose a destination in round-robin fashion. The sources have multiple requests because of virtual output queuing (VOQ). In VOQ, each source has multiple queues so that no packet is stuck behind other packets which cannot reach their destinations.

A few other improvements are possible, such as the use of ESLIP to accomodate for multicast messages, and the use of priority queues to more tightly control delay.

This paper introduces a classification system for blocking in the backplane that is quite useful. The first kind is called Head of Line (HOL) blocking. HOL blocking occurs when there is no VOQ, and a packet is unable to reach its destination even though that destination is idle because that packet is second in the queue; the first packet is destined for a destination that is currently busy. While VOQ solves this issue, it does not solve the issue of input blocking our output blocking (where a packet at a given input/output cannot be transmitted because that input/output is currently transmitting a different packet). These issues cannot be solved except in the case of multicast, when they are solved by ESLIP.

Monday, September 22, 2008

Scaling Routers

I'm going to be honest here. I have no idea what this paper said. Not that it was a bad paper. I think this paper should stay in the syllabus. It's just that I would prefer to have read some more background material on router design so that I had the necessary experience to understand this paper.

That being said, this paper looks like it discusses some upcoming problems in scaling routers to deal with ever-increasing internet traffic, which seems like a good thing to do. Everybody wants their downloads to go faster, from movies and music to software updates. However, the paper seems to assume that in the future, we will have individual routers with lots of packets being pumped through them simultaneously. While I think that's possible, it seems like this router is being designed for high speed links across the continent. With large companies strategically placing caches close to clients, this kind of router may not be necessary. Then again, who knows?

The basic ideas behind this paper are doing very simple, yet elegant load balancing, and simplifying the meshing architecture. In order to get large amounts of throughput, they do a very simple round-robin type of load balancing. At the same time, by using optics to increase the speeds of the links connecting linecards, they are able to reduce the necessary number of links in the mesh.

The paper also introduces an algorithm called Full Ordered Frames First (FOFF) which allows it to guarantee that packets from the same connection are not transmitted out of order. Frankly, this seems like a somewhat silly restriction to put on a router. All of the papers we have read so far in this class have told us that the internet does not guarantee in-order delivery. The author claims that out-of-order delivery can cause unnecessary retransmissions in TCP. While these unnecessary retransmissions seem unlikely give the current implementation of TCP, if it is indeed the case that such a router would misbehave, it seems like the best thing to do would be to modify the TCP protocol.

Thursday, September 18, 2008

CSZ

This paper provides a hypothetical solution to the real-time flow vs data flow issue. It suggests that real-time flows that need certain guarantees from the network can explicitly request these guarantees, and that using routers with WFQ implemented will achieve these guarantees. Similarly, in order to achieve a good quality of service for adaptable real-time applications (that only ask for best effort) will have a FIFO+ queue within the WFQ to address their needs. However, these applications must provide guarantees to the network that they will operate within certain bounds (by providing a rate and a bucket size). If multiple levels of service are desired (such as an extra level of service for datagrams), then multiple FIFO+ queues can be used within the WFQ.

The idea behind a FIFO+ queue is that a packet should not be adversely affected by many different sources. Therefore, any router that causes a packet to be delayed (more than the average delay time for packets of its priority) will mark the packet with the extra delay time. A router receiving that packet can then "pretend" to have received it earlier. This reduces the overall jitter that will be experienced by packets on long trips. As with most algorithms, I have trouble imagining a world in which a Tier 2 ISP would not take advantage of this by marking all of the packets from its own customers.

Once again, the most difficult part of this algorithm is the incentive. There must be some sort of incentive (other than being a good samaritan) for a user to declare their packets to be of the "lowest" priority. Once again, economic incentives are an option, except that nobody wants to pay money by the byte (just ask a Verizon Wireless Customer).

Overall, I think one of the most important things that this paper has done is to classify the different kinds of traffic that flow across the internet. There exists a distinction between a two-way video chat on Skype, a video being watched on YouTube, and a high resolution trailer for Starcraft 2 that's only available for download; this paper has done an excellent job of making these distinctions clear.

Future Internet

This paper was written in 1994, and is quite insightful in that it predicts the rise of real-time video and audio links across the internet. The paper attempts to provide design decisions for the future of the internet. In particular, it suggests that there be multiple tiers of service within the internet, and that higher tiers should be explicitly requested by applications. The final suggestion is that there be admission control (i.e. the internet should reject some flows) to increase the quality of service for currently existing flows.

There are so many problems here, I don't even know where to begin. The suggestion of multiple tiers of service within the internet is a fine suggestion (because, in fact, real-time applications have very different needs when compared to applications such as FTP). Furthermore, the suggestion that the distinction be done above the IP layer (perhaps at the protocol layer) is also a good suggestion, because it allows the IP layer to serve its current purpose without taking on an unrelated purpose.

The suggestion that applications should explicitly request better service, while seemingly logical at first, creates a lot of difficulties in practice. The largest problem to be faced here is the problem of incentives. If there is no incentive to request the lower quality of service, everybody will request the higher quality of service. If the incentive is, as the paper suggests, monetary, then this will create additional problems. Nevermind the fact that people generally prefer to be billed a flat rate for unlimited internet access; if a remote host starts a high QoS TCP connection to an unsuspecting victim, will he be charged for the ACKs that he sends back? As another example, it would seem impossible for a server to declare that it wanted to stream videos to users at a high QoS, while still allowing users to "opt out" of the high QoS based on financial considerations. In general, connections between different nodes that requests different QoS levels will be tricky as a whole.

The admission control suggestion seems to be more applicable to traditional telephone conversations than it is to the internet. As far as the internet is concerned, everything is a packet, and there are no flows. Rejecting one flow so that another has a higher QoS would be very difficult to do on a router that is situated between two endpoints for several reasons: 1) the router may not see all of the traffic of that flow, making it difficult to reject the whole flow and 2) keeping track of which flows need high QoS and rejecting other flows because of it would entail large amounts of state within the router. The idea of rejecting flows so that others can have higher QoS is even more ridiculous when we consider where the responsibility should fall. Can it be done by any router? Can it be done by all ASs or just Tier 1 service providers? Who can decide which flows are more important than others? These issues will plague the idea of admission control if anybody ever tries to implement it.

I am even more surprised that this paper doesn't have a large focus on authentication. One of the largest issues of the modern internet is that in general, it is easy to spoof your own identity. If this issue could be solved, it would lead to great improvements in the internet from fairer routing to simpler authentication methods/interfaces. This issue is not considered very much in this paper. I am very disappointed with the paper's general lack of foresight.

Tuesday, September 16, 2008

XCP

This paper defines a new protocol in which routers and end nodes cooperate in order to achieve optimal flow and fairness. The router simply computes the difference between its current use and its optimal use, and then sends messages to nodes telling them how much to increase or decrease their window sizes by. Similarly, the end nodes must tell the routers their current congestion window size and estimated round trip time. By using information returned by the router (piggybacking on an ACK), the end nodes can reach optimal utilization in one round trip, as opposed to the many needed by TCP.

This system is especially useful in that it allows for efficient communications across mediums such as satellite links, which have high bandwidth, but also high round trip times. The high round trip times mean that algorithms such as TCP take an enormous amount of time to use all of the available bandwidth (since the congestion window is adjusted by 1 per round trip). In contrast, XCP allows it to be done in just one round trip, which should (hopefully) yield desirable results in such situations.

The only flaw is the obvious inability to control misbehaving users. Users who simply transmit too many packets are likely to get away with it, because the only thing the router does is ask them to slow down. It won't start dropping their packets until its queues are full, so a misbehaving client can really steal a lot of bandwidth from other users. Even when you try to push this responsibility to the edge routers, there is a problem: the Tier 2 ASs may try to "cheat" their Tier 1 providers to get more bandwidth for their own customers.

However, this proves the point that the internet could be much more efficient if only we were able to trust the end users. It also introduces a very important notion of the separation between fairness and efficiency control. By using a single algorithm to determine by how much the flow needs to change, and using a completely separate algorithm to determine how to distribute the adjustment between different nodes, they have essentially turned the router into 2 components which can be modified separately. This is important for future updates.

Speaking of updates, this paper glosses over the distribution issue. While it is true that it can probably coexist with TCP, separate queues for each is probably not ideal. Neither is sending a TCP packet to check for the existence of XCP, especially if that packet has to go through many routers to perform the check properly. And lastly, the installation of XCP one AS at a time is known as a "forklift upgrade", and is generally considered to be one of the worst ways to do an upgrade.

Monday, September 15, 2008

RED

This paper blathers on for pages and pages about nothing at all. I find it extremely boring to read. That being said, this paper describes a Random Early Detection algorithm for detecting congestion in gateways and signalling the nodes attached to the gateway by "marking" packets, which consists of either setting a bit in the header or dropping the packet. In order to accomplish this, it uses exponential averaging to keep track of the average queue length. This allows the algorithm to behave well in cases of "bursty" traffic, while still controlling the flow should it grow too large. It has a minimum threshold and a maximum threshold. When the calculated average is below the minimum threshold, no packets are marked. Above the maximum threshold, all packets are marked. Between the two thresholds, packets are marked with a probability to be calculated by the method below.

As the average queue length increases, the probability of marking a packet increases linearly from 0 to 1 between the minimum threshold and the maximum threshold. Thus, the formula looks something like p = (qlength - minthresh)/(maxthresh - minthresh). However, this formula generally leads to clumped markings. Since we would like the markings to be evenly spread out, we count the number of packets sent since the last marking, and then multiply this probability p by a factor: 1/(1 - count * p). With this formula, we can achieve a much better spacing between marked packets, and still (through tweaking of parameters) have the same number of marked packets. I didn't see a great mathematical justification for this in the paper. You can now skip reading the paper. That's really all there is to it. The rest is fluff.

I'd really recommend removing this paper from the syllabus. The idea is simple enough that it can be explained in five minutes, and the results are not fascinating. Furthermore, I find the graphs that are drawn very hard to read/understand.

The only benefits of this algorithm are 1) the small amount of state needed. An algorithm like fair queuing needs a lot of state, but this algorithm needs only a few variables per router. 2) the ability to easily handle "bursty" traffic. The good news is that this algorithm is quite easy to implement on routers that need to handle many many connections, and cannot be bothered to keep state for all of them. The bad news is that I can't see any other purpose for it.

The paper compares the algorithm to random dropping and FCFS, which are not fantastic policies to begin with. I would be more interested to see a comparison of this algorithm to fair queuing or CSFQ. In particular, an analysis of the processing time per packet would be a very interesting paper to read.

Thursday, September 11, 2008

Core Stateless Fair Queuing

This paper provides a quick and dirty solution to the large amounts processing power needed for Fair Queuing. Instead of doing the processing at the routers, this system uses the routers along the border of an AS to estimate the total amount of traffic that a given client is generating. It then labels the packets with this information, and allows the internal routers within an AS to drop packets at random, based on their current congestion. These packets are dropped randomly upon arrival at routers, which use exponential averaging to estimate what the "fair share" of traffic should be.

This paper was not a fantastic read for a couple of reasons. First and foremost was the confusing notation used to describe its exponential averaging scheme. Second was the tiny graphs that describe the results (I can barely make out which line is supposed to be which).

This paper very nicely addresses the issue of fair sharing of bandwidth for small regions, but I find it unlikely that such a system would extend to Tier 1 ISPs, which undoubtedly have enormous amounts of data flowing across their borders at any given time.

This paper also leaves out an important issue that was addressed by the Fair Queuing paper, which is latency for connections that are well under their fair share. I suspect that this kind of queuing does absolutely nothing in that regard, since it does not rearrange the packets that it receives. In this case, its best contribution is the ability to punish TCP users who try to exceed their fair share of bandwidth. However, they still can't do this to UDP users, which is an important issue.

Wednesday, September 10, 2008

Fair Queuing

This paper essentially answers the questions I raised last time about adversarial endpoints who abuse the congestion avoidance schemes to get more bandwidth for themselves. In essence, the paper describes a queuing algorithm that can be used by gateways to do two things: (1) ensure fair use of the bandwidth, and (2) punish misbehaving nodes.

The idea is relatively simple. The gateway considers packets on a source-destination pair basis. This allows it to distribute resources based on individual connections, while minimizing the amount of damage that a malicious user can do (the malicious user can get more than their fair share of bandwidth, but cannot do any useful work). Then, incoming packets are each assigned a number that is the sum of their arrival time and the length of time it will take to send the packet. The packet with the lowest value is transmitted first. There is an additional modification to the algorithm that gives an advantage to connections that have been idle for a while (by subtracting a small nonnegative value delta).

This essentially solves the problem of having adversarial TCP nodes which do not obey congestion avoidance rules and allow their sending windows to grow too large. And by punishing nodes that misbehave, it actually encourages the use of congestion avoidance algorithms, whereas a node attached to a round robin scheduler actually sacrifices some bandwidth by using the congestion avoidance algorithms.

The paper has lots of experimental data, but the data comes from relatively simple network graphs. It would be interesting to see the performance of this algorithm with a very large network with many gateways. One of the issues that is raised in the paper is that the algorithm must be both smart and fast, and this may not be possible for a gateway that has an enormous number of packets going through it.

Tuesday, September 9, 2008

Congestion Avoidance

This paper approaches the problem of congestion avoidance from a more practical and less theoretical point of view than the previous paper. It answers one very important question that I had after reading the last paper, which was the question of initial conditions. The "slow start" described in this paper is an exponential growth algorithm that will bring the TCP window to approximately the right size before the additive increase/multiplicative decrease algorithm is used.

This paper also addresses the issue of retransmission, which is specific to TCP (unlike the previous paper, which assumed a constant stream of data, this paper actually considers the ACKs). They provide an improved algorithm for calculating the amount of time to wait before retransmission that fares better under both light and hevay congestion. In particular, it does not unnecessarily retransmit when a packet has been delayed rather than dropped. This is due to an improved measure of the variance in round-trip time. Having put these two pieces in place, it becomes clear that a timeout is almost always due to packet loss (from congestion) rather than any other cause. This means that the timeout can be used as the signal that indicates that the network is congested.

This paper is quite good at explaining TCP's current implementation. It does not delve into a lot of mathematical depth, but it backs up its claims with experimental results. This paper, in conjunction with the other one, is a very good introduction to TCP in its current form.

This paper touches on the question of an adversarial client by mentioning that the gateway can be configured to drop excess packets from a client that is using more than its share of the available bandwidth. The detection of such a scenario and the policy for the packets dropped is vague.

The paper argues that the use of the timeout to detect congestion is superior to the use of an extra bit sent from the gateway to the client, since this method does not require the modification of any hardware in the network. However, it also heavily cites the previous paper on additive increase/multiplicative decrease, which clearly states that it is assuming distinct time steps and that the results of one time step are available before the next transmission. By using the timeout, this is clearly no longer the case. An important question is whether or not this inconsistency matters, and whether the algorithm would perform better if the gateways actually did return a bit that broadcasted whether or not it was experiencing congestion.

Monday, September 8, 2008

Increase/Decrease Algorithms

EDIT: Minor note - I hadn't read the other paper yet when I wrote this summary.

This paper starts with some simple assumptions: that we wish to build a distributed congestion avoidance system, which will take as input two values: the amount of data transmitted at the last time step, and whether the bottleneck was underutilized or congested last time step. Furthermore, it assumes that the new amount of data to be transmitted will be a linear function of the amount at the previous time step. It then tries to achieve the following goals: efficiency (hovering around the point of congestion), fairness, convergence, and distribution.

The last goal is easily achieved simply by the setup of the scenario. The other three goals can be described as follows:
  • efficiency - If the system is overutilized at one time step, the total utilization will go down for the next step. Similarly, if it is underutilized, the total utilization will go up for the next step.
  • fairness - Eventually, all clients will have approximately equal utilization of the bottleneck.
  • convergence - The system should eventually stabilize. Preferably, it should stabilize quickly, and the oscillations around the optimal point should be small.
By writing out a set of equations describing the utilization, and then manipulating them to guarantee the above criteria, the paper concludes that the decrease must be multiplicative, and that the increase may be additive and multiplicative. The paper also notes that a pure additive increase converges fastest, and therefore concludes that an additive increase and multiplicative increase is the best linear formula that satisfies the four conditions.

This explains the reason the TCP window grows and shrinks the way it is currently implemented. Most implementations of TCP have a window size that grows additively and shrinks exponentially. The indicator of whether or not there is congestion, in this case, is obviously the presence or absence of an ACK.

This paper makes the flawed assumption that all endpoints are willing to comply with the proposed algorithm. Since the algorithm is supposed to guarantee fairness, this is a terrible assumption. An adversarial client can ignore the multiplicative decrease and simply transmit at an unusually high rate. This will cause the adversary to have an unfairly large share of the bottleneck, while the well-behaved clients are told to continue reducing their transmission rates.

One important question that the paper doesn't address is the transience of the system. This is inherent in the way the internet works; clients will come and go frequently. There is a question of whether or not the system is still guaranteed to converge to the optimal efficiency if clients continue to arrive and leave.

This ties directly into the question of how quickly the system converges. If a single client is underutilizing its connection, does it take a long time or a short time for that client to approach a fair utilization? What about short-lived connections? What kind of initial conditions should a client set if it has just joined the network? And somewhat importantly, does this system inherently favor new clients, old clients, or neither? If it favors one over the other, is it possible for an adversary to get better performance than its peers by dropping and establishing a connection many times over? Or by keeping a connection alive long after it is no longer useful in the hopes that it will be useful again in the future?

Thursday, September 4, 2008

Inferring AS Relationships

This paper describes a technique that can be used to infer a set of relationships between different ASes. It first classifies these relationships into several different categories: provider-customer, sibling, and peer. The provider-customer relationship is obvious in that the provider shares its customers' routing information, but the customer does not share its provider's routing information. Peers will share routing information with eachother, but do not forward routing information obtained from one peer to others. And finally, siblings share all routing information with eachother.

The main way that they attempt to classify these relationships from data is to look at all the AS paths in the routing tables. Their assumptions are that for no AS path will the route go from a provider to a customer and then back to a provider. This makes sense, given the definition of the provider-customer relationship, but is not always necessarily the case. The algorithm then attempts to find the topmost provider in the path, and partitions the path accordingly. By repeating this process for a large collection of AS paths, the algorithm gains enough information to determine all of the relationships between ASes.

This algorithm has an interesting application: detecting incorrectly configured routers. By using an automated system to detect the current state of inter-AS relationships, the algorithm can determine when certain gateways have been incorrectly configured, and are forwarding reachability information that they are not supposed to be. By running the algorithm and then comparing the results with the intended use, it is easy to find misconfigured routers.

The most important improvements upon the algorithm seem to be in the area of adjusting for small anomalies in the routing tables. The algorithm was very successful at determining customer-provider relationships, but it was extremely unsuccessful at determining peer or sibling relationships. Part of the issue here may be that the reachability information being shared between different ASes may fall under different policies depending on the IP prefix. It would be interesting to try to separate these different policies for these different prefixes.

Interdomain Routing

This paper is an informal discussion of the BGP. It initially discusses the reasons for the existence of such a protocol; this includes a small lesson on the structure of the internet (with multiple tiers of ISPs, from universities and small business at tier 3 up to tier 1 ISPs). The goal of BGP4 is to allow different autonomous systems to exchange reachability information. A given tier 2 ISP wishes to inform peers about all of its customers (thereby providing its customers faster connections to customers of competing ISPs in the same region), and at the same time, it wishes to acquire reachability information from its provider (thereby giving its customers access to the parts of the internet that the ISP knows nothing about). There are a few additional constraints, such as the fact that no ISP wants to export information about its peers to other peers (since it is not benefiting directly from any traffic that its peers route through its gateways).

The protocol is fairly simple; it is simply a collection of data that is forwarded from one gateway to the next (and is sometimes changed when forwarded). The first is LOCAL PREF, which determines whether the information being given is for a local customer. This information is used (as stated above) to make decisions about whether or not to forward reachability information. The second is the ASPATH. This is the path that is taken within a single AS; its length is used as a factor to determine which routing information is optimal. MED is a number which can be used to give a preference to certain paths (i.e. routing long paths across a provider instead of internal gateways). There is also eBGP vs iBGP (prefer routes learned from peers to routes that were propagated through internal gateways), as well as IGP (internal path). If everything is a tie, then the AS employs some deterministic tiebreaker.

This paper is fairly unclear at times (since it actually appears to be lecture notes, I hope nobody's offended). It does a poor job of distinguishing between the protocol itself (which is fairly simple) and the task of designing a system that will efficiently and correctly implement the protocol. Often, there are references to the topology of the gateway connections, which is a topic that (while very interconnected with the protocol) should probably be treated separately.

One of the reasons the topology is best treated separately is because (as the paper says), there are several open problems in determining the best topology. In particular, in the section on failover and scalability, it is noted that in order to have a backup path in place in case of a failure, it is necessary to use pad the ASPATH that comes across a particular connection for particular IP prefixes. This is obviously not a scalable solution, and the question of how to do this is open-ended.

Tuesday, September 2, 2008

ARPA Protocol Design

This paper essentially describes the reasons for the split between the TCP and IP protocols, as well as for the adoption of the UDP protocol. On top of that, it explains why the internet as it is today uses packets and gateways rather than anything else. The first reason was for survivability. In case some part of the internet were to become non-functional, it was determined that the end users should not experience any issues as long as there remained a physical path between endpoints. Therefore, it was decided that the network should be stateless.

Second, it was important to link together many existing networks which did not share a lot of common features. The only features that were assumed were the ability to use some sort of addressing, and the ability to send packets across the network. The datagram based approach allowed the internet to work across all existing networks. Furthermore, one of the design goals was to allow a variety of different types of communication to occur over the internet. By providing the minimum possible building blocks, the datagram approach is essentially allowing many different protocols to be used, as long as those protocols are implemented by the endpoints of communication.

I think there are two priorities which could have been better addressed during the design of the internet. The first one is listed as the seventh and last priority in the paper: accountability. At the time, it was not considered very important, but it has actually become somewhat important right now to be able to trace cyber criminals back to their origins (consider hackers who write Trojans, or web hosts that contain illegal pornography). While this task has been taken over by ISPs, there is probably still room for improvement in this area.

The second priority is not mentioned at all in the paper: protection against malicious users. For protocols like TCP, it is assumed that during times of high congestion, each end user will use the same algorithm to reduce the amount of data they are sending through the network. Under normal circumstances, this ensures that the network will remain functional for all users. However, it is very possible (since these protocols are implemented on the host machines) for a malicious user to simply transmit as much data as he possibly can during a time of high congestion, thereby increasing his own throughput while punishing those who have implemented the protocol correctly (by decreasing everyone else's throughput). It would have been nice to see at least an in-depth discussion of this.

E2E Paper

This paper is essentially advocating a minimal network implementation in favor of having error-checking, security, and all other issues (such as guarantees of FIFO ordering) done by the programmers of the high level application. The reasoning behind this is that correctness constraints within the network itself do not necessarily guarantee correctness at the application level. Furthermore, the correctness constraints that would be provided by the network level may be more than is necessary for the application level (a typical VOIP example is provided in this situation).

Thus, the applications must provide their own error-checking routines and checksums regardless of any that may exist at the network level. This means that such routines at the network level simply add additional overhead, with no practical benefit. This paper therefore advocates having no such routines at lower levels, unless they improve the efficiency of the higher levels (and of course, there is a tradeoff to be found here). Even when this is the case, lower level protocols do not need to guarantee correctness; they only need to make incorrectness much less likely.

This paper shows a great deal of foresight; I don't believe there were any VOIP services at the time that it was written. It also hints at the counter-argument of modularity, which would suggest that the low level network routines should include all sorts of guarantees so that high level application developers don't have to rewrite the same code (with slight modifications) over and over again.

This paper sheds light on the existence of protocols. UDP is a protocol that lacks any guarantees, as the paper advocates. However, in many applications (especially applications that involve data transfer, such as instant messaging clients and web servers), the checks and guarantees provided by TCP make perfect sense and are suitable for the problem at hand.

One modern school of thought on programming in general is that as computers continue to get faster, it's favorable to sacrifice some performance in favor of decreasing the amount of time spent programming. This argument can just as easily be applied to computer networks. With the introduction of broadband and cable, computer networks are incredibly fast (and they can be expected to get faster). This allows us to ask whether it is beneficial to use a small amount of network overhead to significantly boost programmer productivity. Of course, this question is only applicable now, and was not applicable when this paper was first published.