The latest revelation that the National Security Agency has gathered phone records of millions of ordinary Americans has generated outrage and controversy across the political spectrum. The NSA has gathered phone records apparently without court orders in violation of existing statutes. It appears that the NSA is attempting to use this vast database of phone records to connect the dots between known terrorists by using software to look for links and patterns in the records. Unfortunately, the fact that the phone records contain the phone numbers of millions of ordinary and innocent Americans opens the door to abuse of the database and guilt by association.
The NSA is likely using link analysis techniques in an attempt to connect known targets separated by multiple degrees of separation. Link analysis is a simple yet powerful tool that can be used very effectively on structured relational data. Link analysis is nothing but the high tech equivalent of the "Kevin Bacon Game".
The image above [click image for a larger image] shows an example of how NSA would connect Bad Guy #1 with Bad Guy #2. To do so, NSA would need the phone records of Bad Guy #1, Person A, Person D, Person G and Bad Guy #2. By traversing the phone record tree from both directions the NSA could connect Bad Guy #1 and Bad Guy #2 by finding that they both are connected to intermediate Persons A, D or G.
In order for the NSA to do link analysis with a court order, the NSA would have to first get a warrant for the phone records of Bad Guy #1. It would then have to get a warrant for phone records for each person on Bad Guy #1′s phone record (i.e., persons A and B) and then get warrants for the persons on the phone records of the next set of people and so on. At some point, the NSA would have a difficult case to make that one of these intervening people was legitimately connected to an ongoing investigation. Even if it succeeded in making the case for the warrant, the logistics of getting a warrant at every step of the process would make this kind of link analysis cumbersome and nearly impossible to perform in real time. I suspect that is why the NSA and the President decided to go around the law. When faced with a question of law, instead of asking Congress to update the law, the Government chose to ignore the law.
The problem in this approach for the NSA was that getting the phone records of intervening persons between two known bad guys requires court orders. There is perhaps a simple way to achieve the goals of the NSA without the court orders and the violations of privacy that results if the court orders are not sought. I propose that instead of seeking the actual phone numbers from the phone companies, the NSA should seek secure hashed equivalents of the phone numbers. That is, all phone records handed over to the NSA should contain secure hashed ids instead of the actual phone numbers of American citizens. The phone company would keep the actual phone records and the mappings between the phone numbers and their hashed equivalents. This will ensure that the NSA does not have a database of phone numbers of ordinary Americans. I also believe there is no law that would be violated by the phone companies turning over this data to the NSA.
Briefly, secure hashing is a technique that is commonly used to store passwords and to digitally sign electronic messages. The power of secure hashing lies in that when a number or string is hashed to produce a message digest, there is no way to get back to the original number or string. However, the same number, if secure hashed repeatedly will result in the same message digest. This feature allows one to store data, a password or phone record for example, in a database without the original password or phone record being compromised. Given the original phone number or password, one can secure hash it and then compare it to data in the database to find its matching hash. SHA-1, the most commonly used secure hashing algorithm was designed by none other than the National Security Agency.
This new database maintained at the NSA, using secure hashed ids in lieu of phone numbers, would be just as effective for data mining and link analysis. If the NSA knows the phone number(s) of a known target or targets, they can simply convert the phone number to its secure hashed equivalent (or "message digest" ). These message digests then can be used to perform link analysis on the database. Using the example in the image, the NSA would secure hash the phone number of Bad Guy #1 and look up the phone record equivalents in the database. They would find the hashed message digests representing Persons A and B. When they look up the records for the message digest of person A, they would similarly find the message digest of Person D. Similarly, coming from the other side, the NSA would secure hash the actual number for Bad Guy #2 and find the message digest of Person G. In looking at the records of Person G, the NSA would find the message digest of Person D. Then, Voila!, the NSA will have connected Bad Guy #1 to Bad Guy #2 without knowing the phone numbers of Persons A, D and G. Armed with the message digests of Persons A, D and G, the NSA can now approach the court for a warrant based on probable cause. The phone companies can then provide the NSA with the actual numbers and identities of Persons A, D and G by mapping the message digests to their original phone numbers that the companies would keep in their own databases. The phone records of all other persons not involved between Bad Guy #1 and Bad Guy #2 will remain unknown to the NSA.
This simple use of existing cryptography techniques may eliminate the need for the massive intrusion into the privacy of ordinary Americans that is currently occurring. This solution allows the NSA to troll and mine to their hearts content in an attempt to keep us safe without violating our hard earned civil liberties. Who knows, with any luck it will come to light that the NSA is already doing this and all this fuss will have been about nothing. However, the fact that Qwest balked at handing over phone records to the NSA suggests to me that the NSA is not using this simple but effective technique.