This is a great example of how Google could out-do the iPhone. The more non-trivial mobile applications have some processing that happen in the cloud. If Android phones could translate voice in real-time during a phone call and the iPhone could not, which phone would you consider buying?
Realistically though Google wants their services and applications on all platforms; now if only the App store would approve these...
Yeah but to me the quality of any manufacturer then Apple falls short. I've owned Windows Mobile, Andriod (HTC Hero) and now iPhone. iPhone for me is superior.
Google i think needs to make their own device to match the quality/stability of the iPhone!
One step closer to photographing a sign in a foreign country and doing OCR followed by Google Translate, maybe followed by overlaying the translated text back on the sign.
I think this would be a killer feature. I've never had much of a positive response when posting it as an idea previously - is it the case that if it really would be a killer feature, people would be all over the idea as well as the implementation, or is it possible to be a little supported idea that turns into a killer feature?
(Edit: Or maybe it's just more a European thing where several foreign languages are a few hours drive in any direction?)
One thought I had is that there really aren't enough different signs.
In most foreign countries, I've been able to recognize signs for bathrooms and such, and most place names I can memorize (even if I can't read the language I can still think "okay, I want to go to the one with the squiggly second character").
I actually think this has high "cool factor", but have difficulty really coming up with uses. I think restaurant menus (as mentioned by my sibling comment) is probably the only real use, and even then knowing the name of the dish doesn't guarantee an item I'm able to eat.
Also, when I go abroad, I am definitely not enabling data roaming. In some countries, buying a prepaid SIM requires lots of documentation which is difficult to do if wandering around doing tourist-y things, etc.
(Sorry, I do think the idea is cool, I'm just trying to come up with reasons why its not the neatest thing since sliced bread)
Uses would be more about being a tourist and getting a feel for where you are than mens/womens toilets - as you say, you can memorise those fairly quickly.
More like, standing in a subway, there's a poster, it says something about the train something ... wonder what? Walking past a big lake, there's a sign about some project that involves a pipe low down and a pipe further up, but what is it doing - taking thermal energy or using the water as coolant? Walking through the city streets, there's a plaque on the side of a building - something something 1872 something national? people? something something Joan 1st. Eh? What's on these fliers that are being handed out? What are the menu descriptions?
The easier it is to do, the more likely you are to do it - it's only important things that you would fish out a dictionary and start deciphering, this would be for everything and anything which attracts your interest.
> Also, when I go abroad, I am definitely not enabling data roaming.
I think this is an important point, which is freqently forgotten. So many apps that would be useful in a foreign country are useless because they require internet access. When I went to Bulgaria last year I would have loved to have a translator app for my iphone, but every single one available required internet access, which there's no way I'm paying for at £3/MB!
If you are with a provider that is everywhere anyway, like Vodafone or T-mobile, the idea of roaming is very obviously just about gouging money. It's not as if T-mobile has to pay a third party by the Mb to ship data from Germany to the UK, they own all the infrastructure anyway!
True, but without a data connection you're limited to the client's processing power. With analysis on the server, you can use algorithms that require more code, more CPU, or more memory, as well as much larger datasets.
I’ve seen a working demo of this in an iPhone app, a few months ago: it’s like some kind of dark magic to see the translated words show up in the still-moving picture. I don’t know if the guy has released his stuff yet though.
Confirmed on my MyTouch (worst phone name evar - though very happy with the beast).
App works fantastic in practice in my tests, though
choked on one case tonight. Had kids in car, drove by movie theater -- no showtimes/reader board visible outside (theater is in the mall).
Tried to get showtimes by 'reading' AMC theater logo on side of building ("daddy, why are you taking a picture of that building?"). No go. Google maps and gps to the rescue.
Augmented reality starts becoming more than just a pipe dream. Expect a lot more of this.
P.S. Just went to ARdevcamp at Hacker Dojo last Saturday- it was a really exciting event. There were a lot more people than I expected, and lots of interesting discussion about this emerging space.
Fully integrated augmented reality. Once we get mind-machine interfaces working, we could have an application that looks at every object around you, checks which one you're focusing on and somehow the entire Wikipedia article on it is already in your mind.
Can anyone who has an Android phone who has tried this out for a few hours/days report on how well this works in practice? This looks pretty fantastic, especially if it can also read stuff like QR codes, etc.
Just gave it a test and it works pretty well. The image needs to be clear, but it was able to read the American Express logo off of my bill, a website address off of a keychain, and the Time Warner logo off of the remote.
Edit: I tried an image of the Centrino and Vista logos side by side on my laptop and its top result was for a German book. It did get the Centrino logo in the Other results, though.
Tried it out searched a book cover...worked searched some text on a page worked... I wasn't going outside in 20 degree weather to check on the locations but from inside it kinda worked... not bad
Worked pretty good on logos around the office. Also great for business card, recognised all the data, but the add to contact was not selecting the most relevant data.
It did great with QRCode too, but those were already easy to read. It's faster with Barcode Scanner.
This works really well. Picked out the KU logo and Swingline staplers perfectly. Choked on a big box o Jolly Ranchers. I'll prolly keep futzing with this for the rest of the day now.
Damn you for making me want an Android-based phone, Google! First free turn-by-turn navigation, now this. Argh. Once AT&T gets a good Android phone, I'm switching.
cell phone cams + face recognition + facebook ==> super creepy
EDIT: since people post so many pics of themselves on facebook under all sorts of lighting conditions and angles, that might actually make face recognition somewhat feasible from a cell phone cam. of course, efficiently searching thru a corpus of millions of faces (each taken at several different angles) is an enormous technical challenge. i envision some app like Shazam being developed to recognize faces rather than songs ...
Yep, FB has an enormous corpus of accurate training data due to its millions of users performing tagging on hundreds or thousands of photos each. I would venture to say that they're the only ones that could do this/enable this to be done.
I've done this kind of research 4 years ago based on general Internet image. That was a failure, but because of the rising of Facebook, it might work again. Someone already work on this for sometime: face.com.
I agree that for some specific tasks OpenCV is not particularly useful (I have to write detector and trainer from scratch even there is opencv haartraining. The OpenCV crowd are most interested in applying it to real-time video processing and put little effort on Internet scale problem).
But for assembly optimization it is really not as useful as it used to be. Scalability is more about how to get sub-linear time complexity and efficient communication pattern. Nowadays c compiler can get fairly good assembly code and for low level optimization, human cannot compete with machine (how many people knows the particular cache line alignment trick on old core i7?). Multimedia instructions (SSE/MMX/3D Now) are useful but most of them can be done by function call instead of hand-crafted assembly.
> of course, efficiently searching thru a corpus of millions of faces
There was research cited here a while ago on the high accuracy obtained by constraining the search space to within one's social graph (hundreds vs millions). While it might not identify the stranger at the bar, it might identify their companions and so on. Six degrees of separation in real-time.
If I understand it correctly it just uses your GPS and magnetometer to work out where you are, it doesn't actually use the video data like it does for the still pictures you feed it.
reminds me of a much more practical version of some of the cool bits from the "sixth sense" interface that was presented at TED / all over the news awhile ago - http://www.pranavmistry.com/projects/sixthsense/ . There, their prototype was also able to "look at" an object and return information about it immediately (e.g. you hold up a book in front of the camera, a processor recognizes it and pulls some relevant information, and the mini projector next to the camera projects the information back onto the book cover)
The latest iPod Touch has a video camera but no GPS. The same is true of various other non-phone handheld tablets. (Though landmark recognition might not be useful on those, since apparently it actually uses the GPS and compass as input...)
Speculation: Because those aren't as mobile as a phone, and because Android needs something to differentiate itself from the iPhone. First seamless turn by turn directions, now this.
Realistically though Google wants their services and applications on all platforms; now if only the App store would approve these...