Freenet and RSS distribution

2002-10-21

Lately there has been much discussion about the bandwidth troubles associated with serving out RSS files to lots of readers, especially when there are no changes in the file for long stretches. There have been several suggested simple changes that can alleviate the situation, but the field is still open on a final long term solution to the problem. This post explores one radical long-term solution. As such it is a technical post, so you may want to skip it.

Freenet

Freenet is an unusual peer-to-peer filesharing system with some unusual goals. First and foremost is the protection of free speech by making censorship impossible to carry out effectively without completely shutting off the Internet. Of the projects I've seen with this goal, Freenet is the best designed, most successful, and least vapor. It's a very smart design.

Technically speaking, it's a high-quality, well thought out file sharing system. Conceptually speaking, Freenet is one large write-once, read-many file system, with keys that map to distinct files, in contrast to a traditional request/receive system. You can ask for the file by its key. Once a file is put on the system, the key ('file name') can not be changed, since the (CHK) key itself is a hash of the file contents. Various solutions have been worked out for periodic updates, based on systematic additions to the file system and various indirection techniques, since it is not possible to directly "update" a file. Built on top of these conventions, a bulliten board system, an "almost instant messaging" system, and a systematic website system has been constructed.

From a technical point of view for distributing RSS files, the very definition of frequently updating content, this is Freenet's greatest weakness. Freenet does have a great strength from the bandwidth point of view: As files are requested, some of the nodes between the source and the destination will also cache the file. As more people request a given file, more Freenet nodes have the given file. More popular document natually propogate across the system. More copies of the document naturally balances the load. This is the only in-use P2P file sharing system that I am aware of that has successfully combined total decentralization with effective file replication, both of which are desirable qualities. (Even if you don't care for the total decentralization aspect from a control perspective, it still means good reliability.)

Nothing is ever free; this decentralization comes at the cost of speed. My experience browsing Freenet-based websites today was that they are slow, slow, slow: I'm on broadband and I felt like I was back in 9600 days. However, in the RSS case, this is not a disadvantage, because the user isn't waiting on us to get the RSS. So this works out OK for us.

I think we could work out a mostly-satisfactory numbering scheme for RSS distribution on Freenet, probably including providing a hint on the main website (perhaps in conjunction with RSS auto-discovery?). It might be a little clumsy but I think it could be made to work. (The biggest potential problem is that not-found results are cached, and we'd have to work out a scheme where we ask constantly ask for "the last update we received + 1". It may be a problem that those get cached, depending on how long those last.)

Advantages

Replication infrastructure already exists.
The nice free speech stuff and censorship resistance, for free.
Despite the apparent weakness that file contents can never be changed, so we need to make a sequence of new files over time, that does avoid the problem of consistency that I think would dog us in other P2P systems. It would be too easy to get old versions in other systems, and some sort of file name incrementing might end up being the only solution anyhow.

Disadvantages

Requires download of a client. (All P2P systems would probably need this anyhow, but it should be mentioned. At least there's no spyware!)
Some people may have problems with running Freenet on their computer; they might be able to share with someone else. (For what it's worth, this is inherent in any replicating P2P system.)
Publishing is a little complicated, but we could automate a lot of that away if we cared.

Radio Userland Implementation

I looked into making this work with Radio Userland. The problem is that a lot of assumptions that RU uses in its functioning (like the addresses in aggregatorData.services are the URLs of the RSS files) need to be violated; on every request, the request URL will change. The definition of "error" changes (not finding the file simply means that it hasn't been created yet, not that it's gone). Periodically, we may want to check in on more conventional channels to make sure we're in sync with the author. On a new subscription, we may want to check in on the author to see how far in the sequence they are.

I would need to write my own subscriber, which will drop a tag in the service table to remind myself that this is a Freenet subscription. I don't need anything from Userland to do that; I could do that in a Tool. I think I could get by with a callback in xml.rss.readService that allows me to effectively completely override the process of retreiving the RSS file, allowing me to take care of the error count and such as I see fit. (Basically, completely re-write xml.rss.readService.) I'd need an upstream callback for when I update my site, which appears to exist already. I think that's all I'd need to make this work, even despite the previous paragraph. Since this applies equally to any other new form of RSS consumption and distribution, and it would be neat to let the RU developer community explore this problem rather then just talk about it (*grin*), I think it's worth adding the xml.rss.readService callback.