I've always thought something could be done with faces when you blur them out in video. The fact that the grid moves and the colors change make me intuitively feel as if I could guess their face. I bet the 3 letter agencies already have something like that.
Here's a simplified example of your scenario. Imagine a video of a pattern of black dots on a white surface, moving around. The video is pixelated in an attempt to hide the precise layout of the pattern. But the nearest-neighbour pixelation would make it easy to see when a black dot moves into different pixel. Therefore you'd be able to reconstruct the pattern precisely.
I agree. Each frame is new information causing new quantisation decisions. Especially if there is a limited pool of potential people and you can get good photographs of them from various angles (which these days is virtually everyone if you have an agency budget), and you can estimate the head position from torso movement, then try out and see which match the best. TV programs love blurred faces and voices, and they should really use a stand-in for the actual person for interview shots.
I doubt the US TLAs have it, because the software they have is often rubbish generated by a cartel of low skill providers, and frequently 10 years behind the times. But I bet the Chinese Ministry of State Security has it. So so many dissidents to keep track of, to identify after they have moved overseas and apply pressure to.
I don't have a link at hand but yes, I believe I've seen this done fairly well somewhat recently. Modeling the actual faces might give false positives as someone else said, but depending on how strongly the blurring/pixelation is performed, what the movement of the camera is, and how many frames you have access to, there's a lot of information in those blotches and you can reconstruct arbitrary images from underneath.
There are AI algorithms that 'enhance' blurred faces, however, the output 'looks' like real human face, but in all likelihood, different from the ground truth (i.e. the real face that got blurred). Therefore caution must be exercised in advocating such algorithms (e.g. in court of law)
Whereas this algorithm is 'just' enumerating the text, generating blur and finding the closest match to the original blur.
Sure, but you could create physical models, rotate those physical models on top of the video, and see if certain faces are a better match. Obviously it helps if there's more pixels and it doesn't need to be proof in a court of law to be useful in some way, or at the very least an interesting experiment. Maybe another algorithm could do the trick. It does seem like a lot of information about a face is leaked when you have multiple samples, a moving grid sampling different parts of the face, etc. I mean, just by watching certain pixels get extra dark when they pass over parts of the face you can figure out approximately where the eyes are. That's probably enough to narrow down a large pool of suspects to a small one.