Searching encrypted data

Security Briefs

Syndication

This morning I received a question from a reader of my column. He has a database with user identity details that must be encrypted (Social Insurance Numbers, other personal data), and he was interested in applying the techniques I presented in Encrypting Without Secrets.

But then he realized that he needed to also search that data.

Searching is a feature that presupposes the ability to read the data, and if that data is encrypted, searching becomes very difficult, if not impossible. There are techniques you can use to enable searching in some cases, but they all tend to weaken the protection on the encrypted data because they leak information about the cleartext. Here's a great discussion of the issues (the context is data encryption and searching in SQL Server 2005, but the ideas can be applied anywhere). Here's a snippet:

Let's first start by figuring out why we can't directly index the encrypted data and search on it. The encryption algortihms in SQL Server 2005 are salted. By salting, I mean that the encryption algorithms are always using a random initialization vector (IV), which leads to the following property: encrypting twice the same piece of data using the same key will produce two different ciphertexts. The benefit of this is that if in a table we have a column value appearing several times and we encrypt that column, then the fact that the value repeats in the column is no longer apparent by examining the encrypted data. This adds additional protection against the analysis of the ciphertext (encrypted data), but also prevents the ability to search it efficiently. Also note that while the absence of salting would have permited equality searches, range searches would still not be possible as encryption algorithms are not preserving order.

So, how can we perform an equality search given that the encryption is salted and does not permit this? The solution is to add an additional column for holding hashes of the cleartext. The column containing hashes can then be indexed, and searching a piece of sensitive data can be done by first hashing it and then searching the hash value in the new column. This is all nice and easy, but it introduces the threat of a dictionary attack: an attacker could build hashes for as much secret data as he can generate or guess, and then he can verify whether this secret data exists in the table by comparing his hashes with the ones stored in the hash column. This is a significant threat.

Laurentiu then goes on to say that you could use a MAC instead of a raw hash. A MAC incorporates a secret into the data before hashing, which prevents an attacker from computing the hash unless he knows the secret. Unfortunately, this solution also requires that the secret be available to the code doing the search, which directly contradicts the main point of my article, which was to separate the secret from the ciphertext.

Laurentiu makes a point that is good to keep in mind whenever you're considering storing encrypted data:

Encryption and searching are conflicting objectives.


Posted May 11 2007, 02:00 PM by keith-brown
Filed under: ,

Comments

John St. Clair wrote re: Searching encrypted data
on 05-12-2007 1:16 AM
Couldnt you keep the secret in SQL Server, as an example, and expose the search functionality via a stored procedure?
Keith Brown wrote re: Searching encrypted data
on 05-12-2007 5:33 AM
John,

Yes, all degrees of protection are available. In this particular case, my reader wanted to separate the key from the ciphertext so that if the database were compromised, the attacker would be stuck with ciphertext. Having the decryption key available to the database breaks that countermeasure.

Add a Comment

(required)  
(optional)
(required)  
Remember Me?