Monday, November 17, 2008

DOT

The data-oriented transfer service is essentially an abstraction layer that abstracts the transfer of large files into a service that can be used via API calls at the application layer.

The basic concept behind DOT is that file transfer is used by a large number of protocols (HTTP, FTP, SMTP, etc.), each of which could benefit from the separation between the transfer of metadata and the transfer of the actual file. DOT suggests that each file transferred should get a unique ID, and that a sender should upload files to DOT, and then pass a unique ID (as well as some hints) to the receiver, who can then receive the file from DOT.

The suggested benefits of this scheme are:
1) abstraction - the underlying transfer method can be changed from a TCP connection to a torrenting scheme without any trouble.
2) caching - by caching the files at local nodes, you can reduce the overall bandwidth use of multiple file transfers
3) faster transfer - in theory DOT can split a file up and send it to its destination over multiple links, thereby getting the file there faster

The authors claim that this file transfer scheme will be widely applicable. Actually, I think this transfer scheme is very narrowly applicable. Certainly, for large e-mail attachments, this scheme has SOME merit (see discussion on expiration below). However, the authors provide nearly no performance data, and they avoid discussing performance for small files. This leads us to believe that for small files (such as webpages, which constitute a good chunk of internet traffic), the DOT scheme has too much overhead to actually be useful. In the world of the web, it is probably best suited to embedded videos, which are usually handled by content distribution networks anyway.

There is a serious flaw in DOT concerning the expiration and caching of data. Let's begin with caching. As we saw in the DNS caching paper, most of the requests that go out across the internet are to servers that are visited exactly once. The rest are to a few very popular servers. It is unreasonable to expect caching of large files to be effective in a scenario where caching of DNS entries is not effective. Sure, DOT could cache files for very popular websites, but this would in essence be duplicating work already done by CDNs.

The expiration of data in DOT is also a serious issue. According to this paper, data expires when the sender sends notice to the DOT that the data need no longer be stored. This makes the DOT vulnerable to denial of service attacks by clients that upload data, and then never mark it as deletable. Furthemore, consider the e-mail sender we discussed above. Suppose he sends an e-mail with a large attachment out to a large number of recipientns. When is the correct time to mark the attachment as deletable? It is likely to be AFTER all recipients have downloaded the attachment. Of course, it is very difficult for the sender to know when all the recipients have finished downloading the file; this would likely require some sort of acknowledgement of receipt of the file. And, of course, if some of the e-mails bounce, or a single address forwards to multiple recipients, this will make counting much more difficult. In other words, the protocol must be modified in order to safely deploy DOT. Or, the sender can just never mark his attachment for deletion.

I'm not actually sure what kind of client would regularly use DOT to transfer files. BitTorrent is fairly dominant in the large file transfer market, and the rest of file transfers are likely to be over HTTP, in which case the files are likely to be small, and the overhead is likely to be significant. I'm not sure what problem the DOT system solves.

1 comment:

Randy H. Katz said...

A major motivation new "file movers" is content distribution as an application. So there are least some kinds of network data that are worthwhile to cache and/or manage their residency in points in the network close to users.