> average age that skews older than, say, 30 - to which you unfortunately seem to belong
Haha, unfortunately you would be correct.
> in response to this post there are multiple posts each suggesting their own conspiracy. I wonder if this actually points to something, or it too is a just random noise.
After I posted my original comment full of personal anecdotes, I thought “It’s always my useless posts that get the most responses.” Checking HN later in the day confirmed my suspicion.
As a data scientist, there are certain fields of study concerning the physical world that overwhelm me with the vast number of baseless and likely wrong theories within them: psychology, sociological, and medicine. Basically any field that attempts to study the human body or mind.
We likely have sufficiently advanced ML and inference techniques by this point to discover the true root causes of allergies and autoimmune disease (and all disease for that matter), but unfortunately this is a case where the problem is the data, not the algorithm.
We have no reasonable system in place to accurately collect, process, and store vast amounts of observational health data. And beside the data quality concerns, the privacy implications of attempting to store all of that sensitive data securely are mind-boggling.
Nevertheless, with observations from billions of people, it seems like even simple techniques could extract a lot of signal from the noise, which is just too difficult with the tiny number of low sample, high variance RCT studies we currently use. When you’re trying to predict causation from one or a few variables, it’s simple. When you’re considering thousands of variables at once, it’s next to impossible.
> it seems like even simple techniques could extract a lot of signal from the noise, which is just too difficult with the tiny number of low sample, high variance RCT studies we currently use.
On the other hand, RCTs gather relatively well-conditioned data. And we have enough study power across them that we end up with a lot of statistically significant, real findings.... that still have effect sizes that aren't clinically meaningful.
We even know about things with large effect sizes that are controlled by a few variables, that are rare enough that it's hard to find the patients that they apply to.
And, of course, correlation ain't causation: if we used your system and found a bunch of things, we'd still need to figure out how to reduce them down enough to be something we could test before advising practitioners to do things a certain way.
Haha, unfortunately you would be correct.
> in response to this post there are multiple posts each suggesting their own conspiracy. I wonder if this actually points to something, or it too is a just random noise.
After I posted my original comment full of personal anecdotes, I thought “It’s always my useless posts that get the most responses.” Checking HN later in the day confirmed my suspicion.
As a data scientist, there are certain fields of study concerning the physical world that overwhelm me with the vast number of baseless and likely wrong theories within them: psychology, sociological, and medicine. Basically any field that attempts to study the human body or mind.
We likely have sufficiently advanced ML and inference techniques by this point to discover the true root causes of allergies and autoimmune disease (and all disease for that matter), but unfortunately this is a case where the problem is the data, not the algorithm.
We have no reasonable system in place to accurately collect, process, and store vast amounts of observational health data. And beside the data quality concerns, the privacy implications of attempting to store all of that sensitive data securely are mind-boggling.
Nevertheless, with observations from billions of people, it seems like even simple techniques could extract a lot of signal from the noise, which is just too difficult with the tiny number of low sample, high variance RCT studies we currently use. When you’re trying to predict causation from one or a few variables, it’s simple. When you’re considering thousands of variables at once, it’s next to impossible.