I stated earlier that prevention of surveillance is less fundamental then prevention of communication of information, and that preventing surveillance is more a practical concern then a theoretical one. By this I mean to imply that a focus purely on surveillance, without addressing to whom that information is communicated, is doomed to failure, because of the number of fully legitimate sources of information.

I do not mean to imply that it is a waste of time to pursue limitations on surveillance, though. Obviously, information never collected can never be communicated in such a way as to violate privacy. In the long run this will not be sufficient, though, because the amount of information that can be extracted from communication always exceeds the literal content of the communication.

Suppose you are given all the receipts from my grocery shopping trips. Along with the literal information contained directly on the receipt, which is simply a list of items, you can derive much interesting information. With a good baseline understanding of the shopping patterns of people in my demographic group, you could probably derive the fact that I am trying to lose weight on a high protein, low-carb diet, but that someone else in my household is not on that diet. You could derive I like certain types of food, and perhaps that I dislike others.

Beyond that, if you had a large enough database, you could derive other things. If someone buys a lot of Gatorade or other sports drinks, it is more likely they are males age 14-30. Buying mineral water would be associated with other personality traits. Buying a lot of herbal remedies would be associated with other traits. The amount of information obtainable just from a large collection of your grocery receipts would probably surprise you.

Start combining sources of information together and the possibilities increase even more. While it is not possible to build a 100% accurate model of a person, a lot of privacy-sensitive information exists in data that could only be discovered by combing through a lot of data. This is exactly the sort of thing the government was proposing with its Total Information Awareness program.

This suggests another practical avenue for controlling privacy violations, which is enacting restrictions on who can combine what data. Exactly what restrictions would be in place is a matter for specific law, but I would suggest that licensing people who can access this sort of information would be a fine idea. The privacy value of even such mundane data as how much Gatorade I buy increases as it is combined with other data, and that should affect how we perceive the ethics of such actions. This is another sort of thing where sufficient changes in quantity becomes a change in quality; adding two pieces of data together is probably harmless, adding millions very intrusive, and while there is no obvious, firm line we can draw where the transition occurs, a transition occurs nonetheless.

Again, lest you think this is theoretical, watch a detective drama on television sometime, such as CSI. While the television detectives of course live in an idealized world where every problem has a neat solution, the general principle of convicting a criminal based on a scrap of thread, the precise impact angle of a bullet, a thirty-second cell phone call record (not even the contents of the call, just the fact one was placed), and the microscopic striations on a bullet shows how much information can be extracted from even the simplest scraps of data, when intelligently assembled. In fact, there's nothing particularly hard about this, we all do similar things all the time; the only challenge is automating such logic.


