The PII Cocktail – Combining User Databases
Posted: May 24, 2012
Personally Identifiable Information – otherwise known as PII. For online businesses, it is perhaps the “holy grail” of information. Some regulators and plaintiffs’ lawyers, however, think it should be treated as “forbidden fruit.” PII is variously defined, depending on the context. But it can be thought of generally as any information that is detailed enough to identify a specific person. Many online businesses depend on PII to run their business. It’s especially valuable when it comes to advertising. By identifying specific people, advertisers can sometimes target ads towards individuals based on characteristics unique to that person, such as spending habits. This gives the advertisers the ultimate bang for their buck.
But various privacy laws have been enacted to protect consumers from being identified without their knowledge and affirmative consent. In the healthcare arena, the Health Insurance Portability and Accountability Act precludes covered entities from disclosing health-related information that can be used by a third party to identify specific individuals without their consent. Collection of information regarding children under 13 is regulated by the Children’s Online Privacy Protection Act. Other regulatory schemes are designed to protect consumers in specific sectors (e.g. credit reports, financial information, employment background checks) from the collection and dissemination of PII. In other parts of the world (like Europe), collection of PII is generally restricted to very specific purposes, and it’s transmission to certain other countries (like the United States) basically prohibited, except under certain rules.
Some would argue that PII is THE asset of the modern information world. Facebook’s recently announced acquisitions of Instagram (pre-IPO) and Karma (post-IPO) signal the social media giant’s desire (need?) to increase the information contained in their user databases to meet advertisers’ expectations. And FaceBook isn’t the only one. More and more online businesses are gathering, storing, and analyzing vast amounts of user data that is willingly submitted by their users. For many companies, the ability to use and manipulate this data for the purpose of targeted ads is their means of attracting advertisers, often their primary source of income. While these companies require user identifying information in order to sign up for their services, they often employ practices to scrub or screen their data when sharing it with marketing firms or other third parties. Nonetheless, it is the acquisition of user data itself which may become the larger issue in the end.
That’s because sanitizing data is getting more complicated. Before the 1990’s, the scrubbing of databases containing PII was “relatively” easy. Data was typically 1) limited because of size and the cost of memory, 2) collected by third parties, and 3) rarely shared across platforms. Back then, even if user databases were combined, because of the limited capabilities of computing resources, mining very large databases to retrieve PII was difficult, if not impossible. Because of these limitations, the data were typically “anonymized,” or in other words, it was not feasible to mine the data to determine information about specific individuals, or “reidentify” individuals. But the world of 1990 is significantly different from the world of today. Computing resources have increased exponentially. Memory is cheap and abundant. Consumers are increasingly using online websites for purchasing goods. Friends and family are connecting socially online. All of this activity generates vast amounts of valuable data. In the past, removing PII and anonymizing data meant deleting a couple of data fields. Anonymizing data in today’s world means using complex algorithms with the aim of scrubbing the databases of PII, while leaving enough useful data to broker or otherwise use. In other words, cut the fat, not the meat. In a single entity, these algorithms have worked fairly well to remove PII. Recent incidents involving the dissemination of PII have typically involved security breaches or inadvertent data dumps.
But other issues can arise when companies merge or are acquired. The same algorithms used to protect against the inadvertent disclosure of PII may be less effective when the databases of two companies are combined. In fact, the combination of the scrubbed (i.e. clean) data from two different databases may create “new data” in and of itself. Increasingly, the FTC is asked by privacy groups to review privacy issues relating to user database consolidations. We saw a little of this in 2007 when Google announced the acquisition of DoubleClick, an online advertising company. Numerous consumer privacy advocates petitioned the FTC to block the acquisition, alleging that the combination of the user databases would compromise consumer privacy. The FTC investigation into the Google/DoubleClick merger was closed the following December with the FTC doing relatively little, noting that it does not have the legal authority to block mergers based on privacy related issues. The FTC instead issued proposed “Self-Regulatory” privacy principles that it suggested companies follow. These non-binding principles are used by some companies to establish their own privacy rules, and the FTC has been active in pressuring companies to comply with their own privacy rules. For example, the FTC and Myspace recently entered into a settlement in which Myspace will be required to change their operating procedures based on their website’s privacy rules, a similar settlement to the one reached with Facebook.
Although the FTC closed the Google/Doubleclick investigation with a rare admission to the limits of its regulatory authority, this issue isn’t going away. With the increased number of users that willingly submit sensitive data online, and the computing resources available now to reidentify users, there will likely be future government investigations into user database mergers. Companies that broker or make other use of vast amounts of user data, especially in advertising, should be aware of possible database consolidation issues and take a proactive approach in trying to figure out if the combination of data is an issue, or else the combination of databases can be a toxic cocktail.