Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This classification is very useful to discuss this issue.

The difference between 3 and 4, noble as it is, can be caused by feasability concerns that push people into 3, not just ignorance of the privacy impact. Human labelling of training data sets is a big thing in supervised learning. Methods that dispense with this would be valuable for purely economic reasons beyond privacy - the cost of human labelling of data samples. Yet we don't have them!

Techniques like federated learning or differential privacy can train models on opaque (encrypted or unavailable) data. This is nice, but they assume too much: that the data is already validated and analyzed. In real life modelling problems, one starts with an exploratory data analysis, the first step being looking at data samples. Opaque encrypted datasets also stop ML engineers from doing error analysis (look at your errors to better target model/dataset improvements) which is an even bigger issue, IMO, as error analysis is crucial when iterating on a model.

Even for an already productivized model, one has to do maintenance work like checking for concept drift, which I can't see how to do on an opaque dataset.



It's not wrong for humans to label training data. It's wrong to let humans listen to voice recordings that users believed would be between them and a computer. The solutions are obvious: sell the things with a big sticker that says, "don't say anything private in earshot," revert to old fashioned research methods where you pay people to participate in your studies and get their permission, or ask people for permission to send in mis-heard commands like how Ubuntu asks me if I want to send them my core dumps.


> ask people for permission to send in mis-heard commands

Note that you also want the "correctly" heard commands, because some of them will have been incorrect. It's frustrating when an assistant gives the "I don't know how to do that", but it's even more frustrating to get "OK, doing (the wrong thing)".

Also, another alternative: provide an actual bug reporting channel. "Hey Google, report that as a bug" "Would you like to attach a transcript of the recent interaction? Here's what the transcript looks like." "Yes."


To be fair the system already has something like that. If you complain to the Home it'll ask if you want to provide feedback and give you a few seconds to verbally explain what went wrong.

I'm not sure if humans will then review that feedback of if it goes through a speech to text algorithm first but the mechanism for feedback is there.


Yeah, i think I've experienced that. I was driving with Maps directions, and while i was driving Google decided to show me new things Maps can do.

I tried to voice my way back to directions, unsuccessfully. I said "Fuck you Google."

"I see that you're upset," followed by some instructions on how to give feedback. While I was driving. It sounded almost exactly like "I'm sorry Dave, I can't help you."


iOS voicemail transcription has this.


> like how Ubuntu asks me if I want to send them my core dumps

While I like how Ubuntu does it, I actually like better how Fedora does it. Not only do they ask to submit core dumps but gives you the ability to annotate and inspect what gets sent as well as gives you a bug report ID which you can use to follow up on.


Agreed, I'd like to support Ubuntu development, I often run it on bleeding edge hardware I'd like to submit crash reports for, but the inability to sanitise the data causes me not to unless it's a "fresh" device.


Just give participants the choice to opt in for a chance to get early access to new products. Make it invite only to feel exclusive. They will have millions of willing test subjects.


good point, there's precedent from hospitals wrt IRB and other infrastructure involved w/ data gathering. Hospitals/research institutions self-regulate in this regard, doesn't appear tech does


Handling the data in an ethical way doesn't need to be handling the data in an completely anonymous fashion. That would be one solution, but you can also create a tust-based system for how the data being labeled is handled, similar to HPAA. In addition, there are simple operational methods that could help ensure the data is processed as close to anonymously as possible. For example with voice data, you could filter the voices, work with the data in segments, and ensure that metadata for the samples is only accesible by trusted individuals certified under the above framework.


In trust-based systems like HIPPA or Clearances, there is a fundamental aspect of requiring 2 conditions to access data: privilege, and the necessity to know. Taking data and mining for valuable insights isn't a "need to know" it's a "need to discover something unknown". This is where the security breaks down. In a conventional HIPPA system, only your doctor needs to access your info. You don't have to worry about some other doctors accessing your information in bulk to try and conduct a study on cancer rates. They don't NEED to know your info, they just WANT to know. When you WANT to know how to accurately fingerprint people by their voice, then obfuscating it is counterproductive.


>You don't have to worry about some other doctors accessing your information in bulk to try and conduct a study on cancer rates.

This not only happens, it's my job (though I'm not a doctor). Of course, it's tightly controlled on my end. I work for the government, but health systems have their own analysts. As part of my job, I have access to sensitive and identifying information.

This isn't to be contrairian. There are existing systems using very personal data in bulk for analysis. The wheel doesn't need reinvented.


Is it feasibility, or just laziness?

My car has a little blurb that explains that they collect data to use for training and gives me the choice to participate or not. Opting out doesn’t affect any functionality. Why can’t Google do the same thing?


That should never be an opt-out. It is both ethically and in some regions legally required to be opt-in.


Or just an opt, where you have to make a choice during setup.


Because Google's first allegiance is to the shareholders and data has value so it's not in their best interest to make it easy not to share your data.


The shareholder value theory is rubbish, because it has no predictive or descriptive powers for why one decision was made over another.

I can just as easily say that the best way to maximize shareholder value is to minimize public scandal, scrutiny, and potential for legislature.

Nearly every single decision, including contradictory ones, made by every single company, everywhere, can be retroactively justified to have been done in the name of shareholder value.


> I can just as easily say that the best way to maximize shareholder value is to minimize public scandal, scrutiny, and potential for legislature.

Scandals can get free marketing, for example, Nike and Colin Kaepernick. Attention is always better than no attention at all for a business. Every single decision is made to increase profit but there might be many things that need to be accomplished first so its hard to see the big picture. For example, a developer might want to improve a feature because they want more people to use their product. A manager gets approval to pay that developer because the investment is deemed a profitable one. What does the person who gave them that money care about the number of users. It's not their invention and they don't even use the service? They give the money because they know that More users = more market share = more ads to sell = a return greater than the initial investment. Until a business can run with people working for free, the person paying for things always dictates what is bought and thus the direction the company is headed.

Let's say that direction is contrary to the direction of another prominent member of the business wants it to go. Whether you want to believe it or not, the same calculus goes on in every person's mind: Is this the potential payoff of Option A greater than the potential loss of Option B given the risk?


This is a wonderfully condescending response but it answers nothing. The question was, why can’t google do it differently? This doesn’t answer the question. We can plainly see this from the fact that other companies, operating under the same conditions you describe, make different choices.

This is the business equivalent of saying “because physics.” It’s not wrong, it’s just not useful.


Sorry, I didn't mean to be condescending. To answer your question, the reason Google can't do things differently is that they have already established themselves as first and formost and advertisement company and the way to do that best is to know their audience very intimately. Other businesses like Apple have established themselves as a hardware company first so they aren't dependent on user data as much so they took advantage of that and established themselves as the "Secure" phone. Google is too large and it makes too much money from its core business which is ad drive. As long as search and ads are their cash cow they cannot change in the way you hope.


Right! All the companies doing it differently are also trying to satisfy their shareholders.


That's what is so great about capitalism. If one company starts to take advantage of its users for profit, it opens up a niche for another company to take a different approach.


No it’s not.

Google has many primary concerns it needs to manage. That’s how you get big - by managing lots of concerns successfully.

If they drop one too long, they start going backwards very quickly.


Then explain why they changed to Alphabet. Shareholders were sick of things like project loon, siphoning cash from google search. You are extremely naive if you think there are many concerns of higher importance than profit. Everything else is about maintaining and growing profit even if that means doing an ad campaign convincing people you are fighting the good fight..for profit.


> Then explain why they changed to Alphabet. Shareholders were sick of things like project loon, siphoning cash from google search.

Alphabet is still spending billions from Google into "other bets" like Loon, so I don't see how this explains the change.


Because now they have to report it to their shareholders where the money is going so that if the board doesn't like it they can replace the CEO. Before since it was all google, the money went where they said it went, there was no oversight. They had this massive R&D budget that was opaque to the investors. Money that could have been paid to shareholders as a dividend or return was instead spent on projects they had no idea about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: