You're free to have your own opinion, but anything specific beyond "it doesn't w...

RosanaAnaDana · on July 10, 2019

Representative sampling, does 'work' in the sense that it may or may not 'prove' whatever it is you had a question about. But the issue is that you effectively building in your assumptions about what is 'representative' into your sample. Its (imo) the central issue in the reproducibility crisis: our assumptions about the world and how that impacts the questions we ask about it.

It was previously intractable to do a census rather than a sample, and maybe for your purposes a sample is good enough or a census remains intractable. In my field , this is how things were done for decades (and still largely is), and even though (imo) it did a piss-poor job, it was good enough for some purposes. As piss-poor job is still better than knowing nothing. Maybe this is good enough for your purposes.

There's a third way however, which is to move beyond sampling and to perform a census. This is the difference I'm speaking of. We're at the point where we don't have to sample because we can measure. Effectively, this is what modern data science is. We've always had the ability to sample and interpolate. It doesn't work very well (imo: https://en.wikipedia.org/wiki/Replication_crisis) and usually is reflecting back to us something about our assumptions in how we sampled. But thats just it. We don't have to rely on a sample if we can take a census.

la_barba · on July 11, 2019

>But the issue is that you effectively building in your assumptions about what is 'representative' into your sample.

Even if I agree with your premise, Google is not going to build a custom voice model for every individual anyway. There will be simplifications made. There will be assumptions made, and they will end up with a representative model anyway. So you're actually just bolstering my point. It makes a ton of sense to record people in a known, controlled environment and tweak variables one by one- such as the size of the room, the location of the microphone, introducing varying amounts of background chatter, etc etc. This is how normal science happens all the time, and it has worked for us so far. And we haven't even addressed the ethics of spying on people in such a blatant manner. That is a whole another conversation.

> It doesn't work very well (imo: https://en.wikipedia.org/wiki/Replication_crisis) and usually is reflecting back to us something about our assumptions in how we sampled. But thats just it.

Modelling aggregate human behavior/psychology is not a proper science. The same is true of macro economics and other such non-exact fields. Problems in those fields do not apply across other fields.