So basically - You take a picture. Apple encrypts it and uploads it to their ser...

Arn_Thor · 2024-12-29T06:14:44 1735452884

They don’t send the photo. They send some encrypted metadata to which some noise is added. The metadata can be loosely understood as “I have this photo that looks sort of like this”. Then the server takes that encrypted data from the anonymized device and responds something like “that looks like the Eiffel Tower” and sends it back to the device. The actual photo never goes to the server.

dwaite · 2024-12-29T08:14:49 1735460089

With the added caveat that HE is magic sauce - so the server cannot see the metadata (cropped/normalized image data), and doesn't know how much it does or does not look like the Eiffel Tower.

sinuhe69 · 2024-12-29T04:15:10 1735445710

They don’t send the photos. Nobody sent your photos anywhere but only certain meta data and its similarity vectors for matching purpose.

goodluckchuck · 2024-12-29T03:24:19 1735442659

How cannot tell the picture contains the Eiffel Tower if the image is not decrypted?

ls612 · 2024-12-29T03:41:35 1735443695

Because it turns out that mathematicians and computer scientists have devised schemes that allow for certain computational operations to be performed on encrypted data without revealing the data itself. You can do a+b=c and it doesn’t reveal anything about what a and b are is the intuition here. This has been mostly confined to the realm of theory and mathematics until very recently but Apple has operationalized it for the first time.

Klonoar · 2024-12-29T12:10:18 1735474218

It’s not the first time Apple operationalized it, they did it for Caller ID awhile back.

CodeWriter23 · 2024-12-29T04:14:37 1735445677

And then when the system does the computation to determine your location (wait.what?)

dwaite · 2024-12-29T08:31:57 1735461117

The phone has intelligence to detect things that look like landmarks, and does cropping/normalization and converts to a mathematical form.

Apple has a database trained on multiple photos of each landmark (or part of a landmark), to give a likelihood of a match.

Homomorphic encryption means that the encrypted mathematical form of a potential landmark from the phone can be applied to the encrypted set of landmark data, to get an encrypted result set.

The phone can then decrypt this and see the result of the query. But anyone else sees this as noise being translated to new noise, including Apple's server.

The justification for this approach is storage - the data set of landmarks can only get larger as the data set gets more comprehensive. Imagine trying to match photos for inside castles, cathedrals and museums as examples.

CodeWriter23 · 2024-12-30T01:04:08 1735520648

> get an encrypted result set.

seems to me at that point, the server knows what segment of the overall dataset is being returned.

internetter · 2024-12-30T02:15:39 1735524939

I don't completely understand the maths of how this works, but no, they don't.

Here's a theoretical way I wrote in another comment:

> I think they have more efficient ways, but theoretically what you could do is apply each row in your database to this encrypted value, in such a way that the encrypted value becomes the name of the POI of the best match, or otherwise junk is appended (completely changing the encrypted value) Again, the server has not read the encrypted value, it does not know which row won out. Only the client will know when it decrypts the new value.

They do something like this, using homomorphic encryption. Whatever they do, there is no doubt they incur serious performance hits.

You may also be interested: https://arxiv.org/abs/2406.06761

dwaite · 2024-12-30T04:48:20 1735534100

> They do something like this, using homomorphic encryption. Whatever they do, there is no doubt they incur serious performance hits.

Right, I've seen similar engineering efforts to target this sort of functionality fail because of the computational cost and resulting latency. I'm curious to read the paper for the tradeoffs they made toward practicality at Apple's scale of users.

RandallBrown · 2024-12-29T00:58:16 1735433896

Is this a niche feature? I use this kind of search very often in my photos.

aeyes · 2024-12-29T01:34:49 1735436089

What are some example keywords? I have never searched for landmarks, I only search for location.

How many landmarks are there to justify sending your data off? Can't the database be stored on the device?

sprayk · 2024-12-29T06:07:50 1735452470

cat, pizza, dog, red car, Tokyo, dolphins, garden

The usual context is "oh I saw some dolphins forever ago. let me see if I can find the photos..."

aeyes · 2024-12-29T18:16:00 1735496160

All of these worked in the past without sending any data off the device.

But now you suddenly need the cloud to have it recognize the Eiffel tower.

mp05 · 2024-12-29T17:17:17 1735492637

One time in Paris I received a crème brûlée that strongly resembled the face of Jesus Christ, so naturally I took a picture before devouring it.

Last night I was able to find the image using the single word "creme". Definitely saved the story.

IncreasePosts · 2024-12-29T02:28:18 1735439298

Not really. It's more like apple runs a local algorithm that takes your picture of the Eiffel tower, and outputs some text "Eiffel tower, person smiling", and then encrypts that text and sends it securely to apples servers to help you when you perform a search.

internetter · 2024-12-29T03:54:51 1735444491

OP was wrong, but this is even wronger

Locally, a small ML model identifies potential POIs in an image.

Another model turns these regions into a series of numbers (a vector) that represent the image. For instance, one number might correlate with how "skyscraper-like" the image is. (We don't actually know the definition of each dimension of the vector, but we can turn an image that we know is the eiffel tower into a vector, and measure how closely our reference image and our sample image are located)

The thing is, we aren't storing this database with the vectors of all known locations on our phone. We could send the vector we made on device off to Apple's servers. The vector is lossy, after all, so apple wouldn't have the image. If we did this, however, apple would know that we have an image of the eiffel tower.

So, this is the magic part. The device encrypts the vector using a private key known only to it, then sends this unreadable vector off to the server. Somehow, using Homomorphic Encryption and other processes I do not understand, mathematical operations like cosine similarity can be applied to this encrypted vector without reading the actual contents of the vector. Each one of these operations changes the value, which is still encrypted, but we do not know how the value changed.

I don't know if this is exactly what Apple does, I think they have more efficient ways, but theoretically what you could do is apply each row in your database to this encrypted value, in such a way that the encrypted value becomes the name of the POI of the best match, or otherwise junk is appended (completely changing the encrypted value) Again, the server has not read the encrypted value, it does not know which row won out. Only the client will know when it decrypts the new value.