RSS Bandwidth Problem already solved?

RSS Bandwidth is coming up again, as I said it would almost exactly a year ago. As I laid out in that post, and to review today, there are(/were) two basic problems:

  1. The entire RSS file is transferred on every request.
  2. There is only on source for an RSS file, and no matter how svelte you make a request for an RSS file on the network, eventually you will take down the server and eat through tons of bandwidth in a world where millions of people may ask for a file every hour.

The first problem is as solved as it is going to get. An aggregator that does not honor E-Tags, 302s, and all other such things to limit requests is a menace and should not be publically released. (Private use may be OK globally but still impolite locally.)

The solution to the second problem, that there is only one source for an RSS file, has clearly called for a P2P solution, but as of the last time I examined the issue, the closest P2P solution there was was Freenet, which had serious technical and moral issues associated with it.

In the meantime, a new semi-P2P system has sprung up that may be exactly what the doctor ordered: Coral. Coral, with the addition of a ".nyud.net:8090" on the end of a hostname, automatically mirrors the content and returns it from a local source. It appears to support E-Tags, and it claims to support the standard proxy headers that indicate time to live. According to the overview page, the content is updated at most every 5 minutes in the absense of caching instructions.

While that last little bit isn't quite optimal, a true P2P system is always going to have a delay; it is one of the prices you pay for the way a P2P system works.

Try it out: Here is my RSS file via Coral. Looks fine. Here is a copy of the headers I got, for your convenience:

HTTP/1.1 200 OK
date: Sat, 11 Sep 2004 19:08:08 GMT
server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwli
mited/1.4 FrontPage/5.0.2.2634a mod_ssl/2.8.18 OpenSSL/0.9.7a PHP-CGI/0.1b
last-modified: Fri, 10 Sep 2004 04:55:08 GMT
etag: "2502a8-a4dc-414133ac"
accept-ranges: none
content-length: 42204
connection: close
content-type: application/xml
via: HTTP/1.1 130.192.201.30:8090 (CoralWebPrx/0.1 (See http://www.scs.cs.nyu.ed
u/coral/))
cache-control: max-age=3600
expires: Sat, 11 Sep 2004 20:07:59 GMT

I don't know if the system is smart enough to honor E-Tags and such on the server containing the content, but at most one hit every five minutes I'm not convinced it matters.

Advantages: Requires no changes in aggregators, no changes in weblog software. For weblog owners all you have to do is advertise your RSS file with the Coral URL, for aggegregators they can choose to use the Coral URL as the primary URL unless it doesn't work. I don't think you can beat this convenience for both weblog owner and aggregator user; how many systems work today with existing aggregator software just by the user changing the URL?

Disadvantages:

  1. No general way to force current subscribers onto the system. (A bit of clever work and an RSS redirect might have helped, but I believe the RSS redirect never became a standard?)
  2. Coral is in beta and the URL may change, or possibly the RSS load may freak them out and shut the system down since there is AFAIK no commitment to the current public system. If I'm reading these stats right, even one large RSS file may increase the current load on the system by a large factor.
  3. There are some issues with non-compliant DNS servers. (One guess as to which company makes them...) The only way to find out the true impact of this is simply to try it and see hom many people it blows up for. There is a workaround in that link: Append "http.l2.l1.l0.nyucd.net:8090" instead of just ".nyud.net:8090", but I don't know how well that works.

The first disadvantage is mostly soluable technically. (You serve everyone but the caching network a not-yet-standardized redirect to the caching network, and give only the cache network the real file; that moves everyone over in short order.) The second may mean we bloggers need to set up our own network, but the Coral software is nearly perfect. The third is something we'd just have to try and see what happens, I think.

This may not quite be The Solution To RSS Bandwidth, but it is the closest to it that we have today, and it would probably be better for interested parties to start here than start fresh. (It is much better than what I've believed will be the necessary solution based on the P2P work I've seen so far, which was a custom RSS P2P network. It is easy to write down the words "custom RSS P2P network", but actually creating it would be a major job.)

Just for fun, if you choose to link to this post, here are the URLs of interest: Direct link, via Coral with the .nyud.net:8090, and via Coral with http.l2.l1.l0.nyucd.net:8090 (if the DNS fails).