How to Search on Encrypted Data (2013)

xyzzy4747 · on Aug 26, 2023

Sunscreen could be used: https://docs.sunscreen.tech/fhe/fhe_programs/pir_intro.html

>With private information retrieval (PIR), a user can retrieve an item from a database without revealing to the server which item she's interested in. PIR is useful for both web2 and web3 applications. In web2, for example, PIR can be used to help detect harmful images in end-to-end encrypted messaging. For private cryptocurrencies, PIR can help light clients retrieve relevant transactions.

bawolff · on Aug 26, 2023

PIR is usually about preventing the server from knowing what subset of data a specific client is retrieving, not preventing the server from knowing the contents of the database. So i don't really see the applicability.

munchler · on Aug 26, 2023

As a programmer with little crypto experience, the thing that jumped out at me about the first solution is that it only supports searching for a predetermined set of keywords. That makes the problem much easier and seems like a pretty major limitation.

jll29 · on Aug 26, 2023

Minor correction: "Practical Techniques for Searches on Encrypted Data" by Song et al. was published in 2000, not in 2001 (at the IEEE S&P conf.).

katella · on Aug 26, 2023

I wonder if zk can be used?

noam_k · on Aug 26, 2023

No, ZK solves a different problem. (Namely: providing a proof (usually of knowledge), without compromising some secret information.)

cantrevealname · on Aug 26, 2023

> The High-Level Idea ... where di,j=EncDK2(wi,j), ci=EncRK1(Di) and ptr(ci) is a pointer to ciphertext c

My eyes just glaze over when even the "high-level explanation" jumps into formal notation like this.

I wish the article would spend a few more sentences explaining each of the 6 ideas in plain English and with simple examples.

Here, I'll do a high-level explanation of idea #1, property-preserving encryption:

Imagine that in a database of encrypted emails you might need to search for keywords like payroll, revenue, and dividends. Before encrypting and uploading the emails to the server, you would pick out all the likely keywords and encrypt them individually. So that payroll becomes 6hD4jFFjk, revenue is GG5fDK00hFFC, and dividends turns into IbggfFJJ7h. You would them upload the encrypted emails and the list of encrypted keywords (6hD4jFFjk, GG5fDK00hFFC, IbggfFJJ7h).

If you later wanted to see all the emails that mentioned payroll, you would encrypt the word payroll on the client side (i.e., on your own computer) to get 6hD4jFFjk again, then send a request to the server to find all emails containing the string 6hD4jFFjk. The server would send back to you all the encrypted emails containing 6hD4jFFjk.

For this idea to work, you would have to use a method of encryption where the same plaintext word always turns into the same ciphertext string. This way of encrypting data is almost never used these days because it leaks a lot information. For example, the server knows that every occurrence of the string 6hD4jFFjk must refer to the same word. Therefore, although this method would work and be acceptably fast, it would be considered extremely poor security.

thrw37272217 · on Aug 26, 2023

> My eyes just glaze over when even the "high-level explanation" jumps into formal notation like this.

This is simply not factual. I read your comment and then found the offending passage in page 2. As a matter of fact, it starts with definition of terms, plainly stating we have two encryption schemes, one deterministic, the other randomized. We then take a set of documents, and for each document we take its set of keywords and then encrypt them using the two schemes and this what each 'record' ("tuple") in the db looks like. Now when you want to search, you take the deterministic scheme and encrypt using that and send it over to the EDB ...

It then goes on to discuss why this is not the most secure way ("it leaks too much information").

madars · on Aug 26, 2023

I like your exposition but also want to defend formal notation :) Formal notation is great because it lets you compare different constructions side-by-side, manipulate them (incl. building upon earlier constructions - "same as above except A_i = ..."), etc whereas prose is less suited for that. Once your eye gets used to it, it is much much more efficient at conveying various details. (Akin to (a + b)^2 = a^2 + b^2 + 2ab vs "If a straight line be cut at random, the square on the whole is equal to the squares on the segments and twice the rectangle contained by the segments." (Euclid, Elements, II.4, 300 B.C.))

username135 · on Aug 26, 2023

To absolutely reduce this idea you could probably say 'vlookup = bad'

_a_a_a_ · on Aug 26, 2023

Agree. Moar of this kind of thing plz.