Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Yale researchers reconstruct facial images locked in a viewer’s mind (yale.edu)
159 points by turing on March 27, 2014 | hide | past | favorite | 50 comments



Cool result, but where is a publication showing the faces that were actually tested and "reconstructed"? Many, many submissions to HN (like this one) are press releases, and press releases are well known for spinning preliminary research findings beyond all recognition. This has been commented on in the PhD comic "The Science News Cycle,"[1] which only exaggerates the process a very little. More serious commentary in the edited group blog post "Related by coincidence only? University and medical journal press releases versus journal articles"[2] points to the same danger of taking press releases (and news aggregator website articles based solely on press releases) too seriously. I look forward to seeing how this finding develops as it is commented on and reviewed by other researchers in peer-reviewed publications and attempts to replicate the finding.

The most sure and certain finding of any preliminary study will be that more research is needed. Disappointingly often, preliminary findings don't lead to further useful discoveries in science, because the preliminary findings are flawed. If the technique reported here can generalize at sufficiently low expense, it could lead to a lot of insight into the workings of the known-to-be complicated neural networks of the human brain used for recognizing faces.

A useful follow-up link for any discussion of a report on a research result like the one kindly submitted here is the article "Warning Signs in Experimental Design and Interpretation"[3] by Peter Norvig, director of research at Google, on how to interpret scientific research. Check each news story you read for how many of the important issues in interpreting research are NOT discussed in the story.

[1] http://www.phdcomics.com/comics.php?f=1174

[2] http://www.sciencebasedmedicine.org/index.php/related-by-coi...

[3] http://norvig.com/experiment-design.html


Paper accessible via http://www.sciencedirect.com/science/article/pii/S1053811914... if you have an academic login. Your points on experiemtnal design are good but I feel you're grandstanding a bit here; the reason the press release is posted is because it's so hard to get access to the text of scientific papers thanks to the publishing industry, so citing essays about experimental design ends up casting implicit aspersions on the quality of the authors' work, as if the paper would be more publicly available but for some reticence on their part.

I would like to see drastic changes to the journal publishing model, but I don't consider article inaccessibility to be correlated with flaws in the findings. To bring up the latter out of frustration over the former is poisoning the well of debate.

Incidentally, I found the paper in under 30 seconds by searching for the journal name, the senior author, and a few technical terms, all of which were in the press release and which gave me a correct first search result. It would be nice if the press release also contained a DOI link and other identifying information, but I can't really blame press departments for supplying journalists with the information they want and omitting that which most of them don't.


For those who don't have journal access, here's the part that you're probably most interested in: http://imgur.com/xzNwUTL

The left column is the original image. Next to it is a non-neural PCA/Eigenface reconstruction. Next to that are the reconstructions from a variety of brain regions.


Would someone please upload the PDF for those of us who don't have academic logins? The current state of affairs precludes people from thinking critically. Instead we're forced to take publications at face value.


Here's the PDF. Science publishing sucks.

http://cl.ly/3F140E1t0p0j


You're wonderful! Thank you so much.


I'm away from campus now. I'll post if I remember. But for now, this is one of the investigator's website: http://camplab.psych.yale.edu/papers.html

He seems to have a pretty good track record of posting his papers there. This one isn't up yet (for one thing it isn't really fully published yet). Maybe you can check back later...

I know, it's super shitty. I have no idea what I'm going to do when I leave school.


If I'm interpreting the visuals there correctly and some assumptions from it are correct (neither all that likely), it looks like it's fairly accurate at first, and the more images that are shown and reconstructed the more bleed-through of prior images can be sees when reconstructing.


The caption for this image seems to indicate that the reconstructed images shown here are averaged over all test subjects; i.e., no one test subject was able to reconstruct the faces as well. There were only six of them, I suppose, but still.


My god. Those faces are like something out of my worst nightmares.


What I would do to talk to researcher on this topic. My visual system has been playing games with my mind for a few years. It's like the internal 'reconstruction' device is fubar and can't stand still.


Next challenge for the Yale team: a way to reconstruct the actual research from science news.


Although it's hard to tell from the images presented with the article, the face generation looks like it could be similar to the techniques used in Nishimoto et al., 2011, which used a similar library of learned brain responses, though for movie trailers:

http://www.youtube.com/watch?v=nsjDnYxJ0bo

Their particular process is described in the YouTube caption:

The left clip is a segment of a Hollywood movie trailer that the subject viewed while in the magnet. The right clip shows the reconstruction of this segment from brain activity measured using fMRI. The procedure is as follows:

[1] Record brain activity while the subject watches several hours of movie trailers.

[2] Build dictionaries (i.e., regression models) that translate between the shapes, edges and motion in the movies and measured brain activity. A separate dictionary is constructed for each of several thousand points at which brain activity was measured. (For experts: The real advance of this study was the construction of a movie-to-brain activity encoding model that accurately predicts brain activity evoked by arbitrary novel movies.)

[3] Record brain activity to a new set of movie trailers that will be used to test the quality of the dictionaries and reconstructions.

[4] Build a random library of ~18,000,000 seconds (5000 hours) of video downloaded at random from YouTube. (Note these videos have no overlap with the movies that subjects saw in the magnet). Put each of these clips through the dictionaries to generate predictions of brain activity. Select the 100 clips whose predicted activity is most similar to the observed brain activity. Average these clips together. This is the reconstruction.

With the actual paper here:

http://www.cell.com/current-biology/retrieve/pii/S0960982211...


Select the 100 clips whose predicted activity is most similar to the observed brain activity. Average these clips together. This is the reconstruction.

Mind to explain to me why the right clip is inconsistent with the left clip?

https://www.youtube.com/watch?v=nsjDnYxJ0bo

Starting at 20-second I feel like the right clip is out of touch with the left clip. between 20th and 22nd second I see at least three individuals rendered from the reconstruction.

From 26th to the end of the clip I also see multiple individuals. The names also look different from one another... When you say find the closest is that an expected result?


I'll give the disclaimer that this paper isn't in my field, and I'm merely an observer. However, I'll do my best to explain, since it's a little unclear.

Based on my perspective, there were three sets of videos:

1) The several hours of "training" video, that they used to learn how the test subject's brain acted based on different stimuli. (The paper (which I've only skimmed) says 7,200 seconds, which is two hours)

2) 18,000,000 individual seconds of YouTube video that the test subject has never seen.

3) The test video, aka the video on the left.

So, the first step was to have the subject watch several hours of video (1), and watch how their brain responded.

Then, using this data, they predicted a model of how they thought the brain would respond for eighteen million separate one second clips sampled randomly from YouTube (2). They didn't see these, but they were only predictions.

As an interesting test of this model, they decided to show the test subject a new set of videos that was not contained in (1) or (2), the video you see in the link above, (3). They read the brain information from this viewing, then compared each one second clip of brain data to the predicted data in their database from (2).

So, they took the first one second of the brain data, derived from looking at Steve Martin from (3), then sorted the entire database from (2) by how similar the (predicted) brain patterns were to that generated by looking at Steve Martin.

They then took the top 100 of these 18M one second clips and mixed them together right on top of each other to make the general shape of what the person was seeing. Because this exact image of Steve Martin was nowhere in their database, this is their way to make an approximation of the image (as another example, maybe (2) didn't have any elephant footage, but mix 100 videos of vaguely elephant shaped things together and you can get close). They then did this for every second long clip. This is why the figure jumps around a bit and transforms into different people from seconds 20 to 22. For each of these individual seconds, it is exploring eighteen million second-long video clips, mixing together the top 100 most similar, then showing you that second long clip.

Since each of these seconds has its "predicted video" predicted independently just from the test subject's brain data, the video is not exact, and the figures created don't necessarily 100% resemble each other. However, the figures are in the correct area of the screen, and definitely seem to have a human quality to them, which means that their technique for classifying the videos in (2) is much better than random, since they are able to generate approximations of novel video by only analyzing brain signal.

Sorry, that was longer than I expected. :)

Edit: Also, if you see the paper, Figure 4 has a picture of how they reconstructed some of the frames (including the one from 20-22 seconds), by showing you screenshots whence the composite was generated.


Alternatively, instead of reading all those words I just said, you can watch [1], which is a video explanation of Figure 4 from the paper. :)

https://www.youtube.com/watch?v=KMA23JJ1M1o


Forgot about this, thanks for re-posting.


Thanks for posting that. I count myself as enormously skeptical of TFA research, but that paper appears to be quite good. I may need to re-evaluate my biases.

On the other hand, this is presented as in the press release as mind reading, but the reality is more like trying design something similar to a cochlear implant.


Too bad the reconstructed faces don't look anything like the presented faces and I'm sure these two example are some of the best results.

I suspect the algorithm always outputs some face generated by a paramaterized face model (neutral net based?). Therefore even random output would generate a face. Then with some "tuning" and a little wishful thinking you might convince yourself this works.

Am I being too skeptical?


I would say not skeptical enough.

Generating a result which looks like some combination of the inputs, has a little bit of 'Wow' factor if you just look at the pictures and don't think too hard about it. But ultimately its not an exercise in which they can be wrong, and since the image returned is always going to be a face, they'll always kind of be right.

An impressive result would be if they trained their system on the the 300 pictures, and then we're reliably able identify which picture the subject was looking at. That would be quantifiable and testable, and I presume the result would expose that this is all nonsense.


"Since the image returned is always going to be a face, they'll always kind of be right"

So, non-falsifiable...?


From TFA:

"Working with funding from the Yale Provost’s office, Cowen and post doctoral researcher Brice Kuhl, now an assistant professor at New York University, showed six subjects 300 different “training” faces while undergoing fMRI scans. They used the data to create a sort of statistical library of how those brains responded to individual faces. They then showed the six subjects new sets of faces while they were undergoing scans. Taking that fMRI data alone, researchers used their statistical library to reconstruct the faces their subjects were viewing."

So yes, it will always output something like a face. It's more like they are using the FMRI to select among preset options. It's still potentially a great result, but we need more detail than this article provides.


You are right that they assume the subject is looking at a face, and so will always produce "face-like" output.

The paper includes a quantitative evaluation, in which they take a set of 30 distractor face images not used elsewhere in the study, and for each of them determine whether the reconstructed face is more similar to it, or to the original face the subject was looking at. On average, the reconstructed face is closer to the correct face than the distractor 62.5% of the time.

So it's better than random, and I think it's pretty cool work, but the quality of the reconstructions is pretty terrible, considering that a randomly chosen distractor will usually be very different from the test face (~half the time opposite sex, frequently different race, different age, etc). For comparison, it would be interesting to evaluate some simple, obviously terrible reconstructions by the same metric. For example, we could "reconstruct" the face as a image that is a single, solid color, the average RGB value of the pixels in the original face. Another "reconstruction" that it seems would very likely do better under this evaluation metric is something like the "race-gender-age" of perp descriptions in the news ("white male in his thirties").


Yes they do. http://imgur.com/xzNwUTL taken from icegreentea's comment, which was taken from the paper.


Which of the outputs do you think look like the original face? Note: the second column is not output.

Note the cherry picking of the output for the bottom picture in the Yale press release (forth row in this image).


They share many features with the original face. No they are not great, but it's pretty remarkable and far better than random.


By many features do you mean two eyes, nose, mouth?


Just at a glance it got skin color, gender, hair color, and the smile.


Let's stop with the "mind reading" warnings before they get too far out of hand and consider what's really happening: Six subjects were shown a "training" corpus of images first. Then shown new images. By comparing the subjects' responses to the new images, the software in the study presumably did its best to create composite images by pulling from the corpus.

So this raises many questions: How diverse were the faces in the training corpus? How close were the new images to those in the corpus? When you're looking at hundreds of images to train the machine, are you also unknowingly being trained to think about images in a certain way? What happens when you try to recreate faces based on the fMRI responses of subjects who didn't contribute to the initial training set?

The implications of the last question are pretty interesting . If different people have different brain responses to looking at the same image, does that help us begin to understand why you and I can be attracted to different types of people? Does it help begin to explain why two people can experience the same event but walk away with two completely different interpretations?


"If use terms like 'mind reading' in our press release we can get picked up by a major news outlet, despite having no real notable scientific result."

Hypothesis confirmed!


My first thought of this would be its use in constructing a image of a wanted criminal, as a way to replace police sketch artists. When I viewed their image, they're incredibly close, but I don't think they're quite there. I'm really looking forward to seeing this improve as they've stated it will.

I thought the woman looked close enough to be able to identify, but the man was not. Still, very impressive work.


My first thought of this would be its use in constructing a image of a wanted criminal, as a way to replace police sketch artists.

They're reconstructing faces the subject is viewing, not remembering. To replace police sketch artists, the criminal would have to be present...


Ah, I guess I misread. Thanks for the correction.


The interesting bit would be reconstructing the right face memory rather than one of the interviewers.

Actually, there's an obvious scifi plot: investigator is a serial killer but excluded from memory extraction because the witness just saw them (again).


Soon we will be able to finally do away with that hoary old libertarian canard that the state cannot judge you and punish you for what you think inside your own head. Just imagine how harmony and true equality will blossom then!


You know maybe people wearing foil hats were on to something after all.

In all seriousness this is a great advance in neuroscience that would help understand many things about brain. On the other hand, potential for misuse is enormous. Can you even prove you had been interrogated if such a device is used on you?


I think it's going to be quite a long time before fMRI machines get to the point that you don't know one is being used on you. Room-temperature superconductors are probably a requirement.


This one is crazy: https://www.youtube.com/watch?v=SjbSEjOJL3U (Mary Lou Jepsen on basically recreating video of what a person is watching through fMRI)


We now have read access to a person's mind and its visual information when dealing with faces. Naturally, write access cannot be far away ...


The faces got into their brains in the first place because we already have write access. Writing to the brain is easy -- it's reading what we've put there that is difficult.


It is possible to reconstruct an image by measuring the activation of V1 (primary visual cortex). The same technique works whether the image is currently on the visual field, the person simply recalls an image from memory.


Writing to a person's visual system is easy - just show them a picture.


We already have write access: Subliminal stimuli and it has been used for a while now ! http://en.wikipedia.org/wiki/Subliminal_stimuli


Actually, we're already there. It's called a visual prosthesis.

http://en.wikipedia.org/wiki/Visual_prosthesis


I found it interesting that the researcher thought that there was no possibility of receiving external funding for something like this. I would have thought the opposite. In fact I can think of a bunch of companies who would be only too happy to throw money at so ething like this.


I wonder whether this will work with different person, not the one who participated in the machine learning process. Someone, who has different brain activities when the same faces from the training set are shown. Is that possible that our brains analyze the seen differently?


so...we can eventually create games and movies by imagining it without the need to do it by hand and input it into a computer


Possibly. One thing I'm worried about is that our brains might not be good generative models. We remember the high level details, not a pixel by pixel map of what we imagine. That's probably why humans aren't naturally good at drawing. We can imagine a picture but trying to create an image on paper that matches our imagination is quite difficult.


True. However, if we are able to elevate this type of technology to that level, people will train and be trained, to focus on such detail and learn to control the output via the feedback loop. At first you'd probably "draw" like a baby, but with enough practice, some would become true artists, and the average somewhere in the middle, but things would be recognizable and tada, telepathy.


Great, now I can figure out who I slept with after blacking out.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: