It doesn't matter if a company takes your data, does a poor job of anonymizing it, and then decides to label it as "training" for their "AI" or if they just stick it all in a flat .txt files and process it in Fortran.
It's the exact same thing. You should be mad about the data being saved and used. Splitting hairs over implementation details to try and find a loophole is just a waste of time.
You could also compare this to hospitals sending voice files to India to be transcribed. This is not automated at all. It's not clear that hospitals are any better at getting informed consent for this than Google.
1) A person going into a hospital, having their voice recorded, and then having the recording sent to another hospital where it might help treat and/or save their lives
vs
2) A company exploiting the lack of regulation and public knowledge/education on the dangers of mining personal data, to mine personal data and make a profit with no regard for the safety of the individual
If explicit consent had to be obtained, with the requirement that the person consenting be fully informed on the details of what they're giving up, in which of the scenarios above do you think people would be more likely to refuse consent?
Identity theft, regular theft, harassment, stalking, sexual assault, discrimination, reputational harm, etc.
Example scenario:
I tell a friend that I voted for Trump, my Google home hears it, a Google employee eavesdrops, leaks on twitter that I voted for Trump along with my home address, the likely times I'll be in my home, and even the pin to disable my alarm, etc. Then a group of left-wing extremists uses that information to harass/rob/murder me.
Alternate scenario:
Google employee uses their access to find an attractive woman with a Google home, steal nudes, spy on conversations, etc. That escalates into stalking, and eventually sexual assault and/or murder.
Both of those scenarios are possible today, and we're just supposed to "trust" Google is being responsible because they say so.
Whether these threats are realistic depends on how good Google's internal controls are. It's likely that there are Internet companies where internal controls are very weak (random Internet of things companies) and others where they are stronger. Stalking cases have happened, so you can say it's "possible," but to assess risk we need to do better than making a binary distinction between possible versus impossible.
In the case of the contractor described in this article, it sounds like they are pretty well isolated, so I don't see these scenarios happening: On the one hand, the audio snippets are more personal, being recorded in the home. On the other hand, having any idea who they're listening to will be rare, the snippets are short, and they are unlikely to hear the same person twice. I don't see them getting enough data to do damage.
You might compare with a store employee or waitress hearing a bit of conversation, or someone eavesdropping on your conversation or screen on a bus or plane. While people should be on guard, often they're not, and an eavesdropper can find out a lot more of any one person's data.
Other Google employees might have different access (for example tech support), but they'd be foolish to basically give employees remote root on Google Home devices, and I don't think Google security is that foolish.
I don't get your point here. You start off by questioning if the threats are realistic, then questioning if they're even possible, then you end by saying it's not that bad because waitresses can overhear your conversations too.
1) Those threats are 100% possible and realistic. If you think they're not just because the guy in this article is a contractor, then you're being incredibly naive and shortsighted.
2) Google employees have complete access to this data, and to think that they don't means you've decided to trust their word. Maybe you like Google, and that's fine, but it's not smart to trust them on this whether you're a fan or not. If their internal security policies for this type of data are terrible, they're never going to admit it and will definitely lie about it.
3) What people say in a restaurant and what they say in the privacy of their own homes are completely different. Can't believe I have to explain that.
> but they'd be foolish to basically give employees remote root on Google Home devices, and I don't think Google security is that foolish.
Why would you need remote root access when Google Home already uploads conversations to Google servers by default? That's the only part that matters.
Why do you think "Google employees have full access to this data?"
It seems strange that they would have permission, unless there were some reason it was necessary for the job.
This is sort of like assuming telephone company employees can listen to whatever conversations they want. Wiretaps exist, but it's not like just anyone gets to use them.
> Why do you think "Google employees have full access to this data?"
Because they do. It's literally there on their servers. You're assuming that they have some really good policies to prevent employees from accessing that data. Maybe they do, I don't know. But it doesn't matter because those are just internal policies. If some employee just says "fuck it" and ignores those policies, then if they're caught they'll just be silently fired and we'll never hear about it. There's no external audit; this is all unregulated territory.
Since this is HN, I'll give you a scenario that might hit closer to home: let's say you want to apply to work at Google. You send in your perfect application/resume, but you never hear back because your recruiter peaked into your Google Home files and noticed that you once told your friend that the Dodgers suck. Since your recruiter is a Dodgers fan, they decided to just throw your resume in the trash.
It doesn't matter if a company takes your data, does a poor job of anonymizing it, and then decides to label it as "training" for their "AI" or if they just stick it all in a flat .txt files and process it in Fortran.
It's the exact same thing. You should be mad about the data being saved and used. Splitting hairs over implementation details to try and find a loophole is just a waste of time.