More

asgraham · 2024-09-25T13:22:39 1727270559

re: the arxiv link

Why is it that microarray true positive p-values follow a beta distribution? Following the citations led to a lot of empirical confirmation but I couldn't find any discussion of why.

More to the point of this rebuttal, though: why would we expect the amalgamation of 70k micro-array experiments' abstract-reported p-values to follow a single beta distribution? And what about modeling the bias-induced bump of barely-significant results?

If there's some theoretical reason why the meta-study can use the beta-uniform model, then I could see this being only a mild underestimation of the proportion of false positives (14%), but otherwise I'm confused how we can interpret this.

asgraham · 2024-05-03T18:30:53 1714761053

I knew it wasn't for inspectors of physical buildings (though that took a minute to accept; I thought it might be a joke), but I could not figure out what kind of buildings it would be. Was it some lisp thing I hadn't heard of? I've been coming back to these comments waiting for someone to explain and it finally hit me: it's for the building of inspectors. Language is weird.

asgraham · on Feb 15, 2024

As a chronic premature optimizer my first reaction was, "Is this even possible in vanilla python???" Obviously it's possible, but can you train an LLM before the heat death of the universe? A perceptron, sure, of course. A deep learning model, plausible if it's not too deep. But a large language model? I.e. the kind of LLM necessary for "from vanilla python to functional coding assistant."

But obviously the author already thought of that. The source repo has a great motto: "It don't go fast but it do be goin'" [1]

I love the idea of the project and I'm curious to see what the endgame runtime will be.

[1] https://github.com/bclarkson-code/Tricycle

gkbrk · on Feb 15, 2024

Why wouldn't it be possible? You can generate machine code with Python and call into it with ctypes. All your deep learning code is still in Python, but in the runtime it gets JIT compiled into something faster.

asgraham · on Oct 4, 2023

For me it makes it harder to read. I think it's because my skimming muscles are so tuned to both bold and first-sentence text that when presented such a stark opportunity, I naturally try to jump from one bold title to the next. But the bolded "titles" contain no substantive information, so I have to fight that urge, which means I can't skim at all, even though my attention reeeeally wants to jump.

asgraham · on March 1, 2023

I'm absolutely behind both the message and call to action of this article; the evidence is irrefutable; this comment should not be interpreted as calling into question the headline at all: the underlying data looks solid.

However. Why oh why did this author trash their own credibility by explicitly citing the "even more alarming [surge] at the state level" where "walker deaths had increased a shocking 266.67 percent in Nebraska, 150 percent in New Hampshire, and 87.5 percent in Delaware." They of course give the bare minimum caveat that "[t]hose dramatic numbers are partly explained by those states' small populations," but they are entirely explained by those states' small pedestrian fatality rates. Such as New Hampshire's increase from 2 to 5 fatalities. The other two "alarming" surges are equally unmentionable[1] in the light of actually reading the numbers.

I do have to give them credit for including the relevant numbers directly in an image in the article so that I didn't have to do any digging to figure out why those states had such high increases. But that just makes me even more baffled why they didn't catch themselves.

[1] Literally unmentionable, as in: those particular percentages cannot in good conscience be mentioned anywhere near a self-respecting argument for transportation reform.

Falkon1313 · on March 1, 2023

Yes. "From 2019-2022, pedestrian deaths tripled in Vermont!" sounds alarming and then you see it went from 1 to 3. It's still bad that people died, and bad that it's more, but that would really be reaching for clickbait territory.

And is it even a rate at all? They give number of fatalities, but how does that compare to the number of pedestrians? Or rather number x time spent walking? Car statistics are often given in terms of total miles traveled, for instance. A total is not a rate.

Totally agree our streets are way too dangerous, and being a pedestrian can be terrifying. That really needs to be fixed.

But not in a misleading way. If anything, things like this might scare people out of walking, meaning less pedestrians, and therefore less reason to solve it. That's counterproductive.

k2enemy · on March 1, 2023

Imagine if a state had gone from 0 to 1 pedestrian deaths. An infinity percent increase. Walking anywhere would be a death sentence.

chaps · on March 1, 2023

Streetsblog has a tendency to use infrequent/irrelevant traffic deaths and such to paint pictures about wider things. Things like, using hit and runs at stop signs to argue about speed limits. They did that a lot in Chicago when the city was pushing to raise the speed camera sensitivity from 10mph to 5mph. It makes it really difficult to take their writing (about genuine problems) seriously.

ryukafalz · on March 1, 2023

> Things like, using hit and runs at stop signs to argue about speed limits.

That doesn't sound irrelevant to me. Drivers around my area have a tendency to not quite stop at stop signs before rolling into the crosswalk. If they're going slower they have more ability to react to a pedestrian stepping out into the crosswalk.

drewbug01 · on March 1, 2023

> infrequent/irrelevant traffic deaths

I’m all onboard the accuracy train with you, but I must say that describing any kind of traffic death as “irrelevant” seems very cold. Every person that dies on the road matters.

chaps · on March 1, 2023

Of course, but when you stretch the truth, you lose the message and lose the people you need to reach the most. To me, that sort of truth stretching is way colder than I could ever be.

gorbachev · on March 1, 2023

> Why oh why did this author trash their own credibility...

I follow streetsblog regularly, because, like you, I also think the message they are advocating for is important and more or less completely neglected in the United States, though a few cities/states are getting better. Cops regularly victim blame when pedestrians or cyclists are hit by cars, and people who kill others with their cars often have no consequences whatsoever. When I was still living in the US, I didn't feel my children were safe in traffic, ever. They do this stuff so much better in many other parts of the world.

That being said, that blog's writing style is really grating at times. They often raise these sort of scandalizing points in posts that are otherwise on point. I don't know why, maybe to produce sound bytes, maybe because the authors are so personally invested in the topic they can't help but exaggerate, or whatever.

It's really annoying.

asgraham · on Feb 16, 2023

It's hard to say what the hardest constraint will be, at this point. Imaging and scanning are definitely hard obstacles; right now even computational power is a hard obstacle. There are 100 trillion synapses in the brain, none of which are simple. It's reasonable to assume you could need a KB (likely more tbh) to represent each one faithfully (for things like neurotransmitter binding rates on both ends, neurotransmitter concentrations, general morphology, secondary factors like reuptake), none of which is constant. That means 100 petabytes just to represent the brain. Then you have to simulate it, probably at submillisecond resolution. So you'd have 100 petabytes of actively changing values every millisecond or less. That's 100k petaflops, at a bare, bare, baaaare minimum, more like an exaflop.

This ignores neurons since there are only like 86 billion of them, but they could be sufficiently more complex than synapses that they'd actually be the dominant factor. Who knows.

This also ignores glia, since most people don't know anything about glia and most people assume that they don't do much with computation. Of course, when we have all the neurons represented perfectly, I'm sure we'll discover the glia need to be in there, too. There are about as many glia as neurons (3x more in the cortex, the part that makes you you, coloquially), and I've never seen any estimate of how many connections they have [1].

Bottom line: we almost certainly need exaflops to simulate a replicated brain, maybe zettaflops to be safe. Even with current exponential growth rates [2] (and assuming brain simulation can be simply parallelized (it can't)), that's like 45 years away. That sounds sorta soon, but I'm way more likely to be underestimating the scale of the problem than overestimating it, and that's how long until we can even begin trying. How long until we can meaningfully use those zettaflops is much, much longer.

[1] I finished my PhD two months ago and my knowledge of glia is already outdated. We were taught glia outnumbered neurons 10-to-1: apparently this is no longer thought to be the case. https://en.wikipedia.org/wiki/Glia#Total_number

[2] https://en.wikipedia.org/wiki/FLOPS#/media/File:Supercompute...

etienne618 · on Feb 17, 2023

I remember reading a popular science article a while back: apparently we have managed to construct the complete neural connectome of C. Elegans (a flatworm) some years ago and scientist were optimistic that we would be able to simulate it. The article was about how this had failed to realize because we don't know how to properly model the neurons and, in particular, how they (and the synapses) evolve over time in response to stimuli.

asgraham · on Jan 27, 2023

"...a sorta-popular JavaScript package. I say “sorta” because it’s used by lots of people, but it’s not pervasive. It had 105 million downloads in 2022."

Is there any standard of popularity where 100 million+ downloads in a year is only "sorta" popular?

LtWorf · on Jan 27, 2023

I think companies don't set up local caches. So all those million downloads could just be a few companies running tests on every commit.

tipiirai · on Jan 27, 2023

I think GitHub stars tell much more about the popularity. NPM download count prefers dependencies like lodash.

irthomasthomas · on Jan 27, 2023

90% of my github stars are projects which I skimmed over and thought looked cool and I might download one day.

johannes1234321 · on Jan 27, 2023

Which means it is a project you should some basic level of interest in. A download might be a dependency of a dependency of a dependency you don't really care about and actually would like to get rid of ...

elliotpage · on Jan 27, 2023

Ditto. I use them to (hopefully) send good vibes the the creator / maintainer - you never know, they may need the tiny endorphin hit!

actionfromafar · on Jan 27, 2023

Indeed. "13% as popular as React". Sheesh.

fmajid · on Jan 27, 2023

It might just be a humblebrag

amelius · on Jan 27, 2023

Well, I didn't know about Helmet.js, but I've seen React mentioned many times on HN.

asgraham · on Jan 23, 2023

They can't possibly be working with genomes spanning the past 250k years--- the oldest known human remains are only estimated to be ~230k years old, and I doubt they have parent/child trios nicely spanning the intervening few hundred thousand years. So they have to be working on mutation rate inferences.

From a non-expert reading of the article, their pipeline is more like: modern human population genomes -> estimates of modern mutations' ages -> estimate of average historical parental ages.

The first estimation (genome -> mutation ages) was carried out in a prior study "GEVA" [1], and this paper's contribution seems to be estimating the average parental ages based on those previously estimated historical mutation rates.

I couldn't find any mention of using old DNA samples. The GEVA study pulls from two genome databases, TGP [2] and SGDP [3], both of which seem to be entirely modern genomes. I'm not an expert in the field, so maybe it's obvious to a population geneticist that these databases do include old genomes.

Given that they're only using modern (surviving) genomes, the critique of survivorship bias seems valid.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6992231/

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4750478/

[3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5161557/

asgraham · on Dec 31, 2022

The author of The Wandering Inn [1], a web serial, has been trying to write less the past few months (the past year?). They peaked around 80k words/week, and have been aiming closer to 30-40k words/week with more breaks. They've had mixed success (e.g. last week they wrote and released 70k).

They only post updates/reflections on their progress towards this goal in the author's notes on the biweekly chapter releases, so I'm not sure how you could follow along, if that's what you're looking for. I guess you could just follow the releases and only read the notes. Or you could try catching up haha.

[1] https://wanderinginn.com/

asgraham · on Sept 15, 2022

Sure, this is ideal given unlimited time and money, but hiring dedicated devs for tooling is going to be expensive. And once they fix "the problem," they're still on payroll. Yay they'll keep fixing problems and improving the software, but it's basically the most expensive software subscription model imaginable.

mixmastamyk · on Sept 15, 2022

Good thing they invented contracting… several hundred years ago?

bolt7469 · on Sept 15, 2022

Contracting requires the devs to get up to speed on the software and your company needs. On the other hand, a private company selling commercial software has a comparative advantage in already knowing the codebase and user needs.

10 companies each contracting 1 dev for an open-source project is a less-efficient allocation of resources than 1 company hiring 10 devs for their commercial software project.

mixmastamyk · on Sept 15, 2022

Lost in the noise when the community is large enough.

bolt7469 · on Sept 15, 2022

I'd appreciate it if you could provide some evidence for your claim that comparative advantage is "lost in the noise."

Efficient allocation of scarce resources is best achieved with comparative advantages. A commercial software team has shared context, management, and knowledge that cannot be as efficiently achieved by a decentralized community of contributors. So the commercial team can produce the same software at a cheaper cost. This is a good thing for the economy.

mixmastamyk · on Sept 15, 2022

Design tools are a multi-billion dollar market in just the US, and useful worldwide. Potential resources are not even a bit scarce.

Figma already did the hard work of prod/tech design and fixing browsers, meaning followers will have a much easier path. https://madebyevan.com/figma/building-a-professional-design-...

The same short-term thinking has been espoused at the dawn of every innovation. Thankfully some folks don’t listen.