Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure if you should be reminded of how alpha fold started, it started by winning a competition thought un winnable by academics. Top labs working in protein structure prediction have fundamentally changed direction after alpha fold and are working to do the same even better.

This is not the first (or even tenth) time I’m seeing an academic trying to undermine genuine progress almost to the level of gaslighting. Comparing alphafold to conventional homology modeling is disingenuous at its most charitable interpretation.

Not sure what else to say. Structural biology has always been the weirdest field I’ve seen, the way students are abused (crystallize and publish in nature or go bust), and how every nature issue will have three structure papers as if that cures cancer every day. I suppose it warps one’s perception of outsiders after being in such a bubble?

signed, someone with a PhD in biomedical engineering, did a ton of bio work.



> Not sure if you should be reminded of how alpha fold started, it started by winning a competition thought un winnable by academics. Top labs working in protein structure prediction have fundamentally changed direction after alpha fold and are working to do the same even better.

Not sure what part of "it does homology modeling 2x better" you didn't see in my comment? AlphaFold scored something like 85% in CASP in 2020, in CASP 2016, I-TASSER had I think 42%? So it's ~2x as good as I-TASSER which is exactly what I said in my comment.

>This is not the first (or even tenth) time I’m seeing an academic trying to undermine genuine progress almost to the level of gaslighting. Comparing alphafold to conventional homology modeling is disingenuous at its most charitable interpretation.

It literally is homology modeling. The deep learning aspect is to boost otherwise unnoticed signal that most homology modeling software couldn't tease out. Also, I don't think I'm gaslighting, but maybe I'm wrong? If anything, I felt gaslit by the language around AlphaFold.

>Not sure what else to say. Structural biology has always been the weirdest field I’ve seen, the way students are abused (crystallize and publish in nature or go bust), and how every nature issue will have three structure papers as if that cures cancer every day. I suppose it warps one’s perception of outsiders after being in such a bubble?

What on earth are you even talking about? The vast, VAST majority of structures go unpublished ENTIRELY, let alone published in nature. There are almost 200,000 structures on deposit in the PDB.


What ramraj is talking about: if you go into a competitive grad program to get a PhD in structural biology, your advisor will probably expect that in 3-4 years you will: crystallize a protein of interest, collect enough data to make a model, and publish that model in a major journal. Many people in my program could not graduate until they had a Nature or Science paper (my advisor was not an asshole, I graduated with just a paper in Biochemistry).

In a sense both of you are right- DeepMind is massively overplaying the value of what they did, trying to expand its impact far beyond what they actually achieved (this is common in competitive biology), but what they did was such an improvement over the state of the art that it's considered a major accomplishment. It also achieved the target of CASP- which was to make predictions whose scores are indistinguishable from experimentally determined structures.

I don't think academics thought CASP was unwinnable but most groups were very surprised that an industrial player using 5 year old tech did so well.


To add to this, the deep learning field has already moved on towards MSA-less structure prediction. None of this would be possible without building on top of the work open sourced by Deepmind.

https://www.biorxiv.org/content/10.1101/2022.07.21.500999v1 https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1

To be overly dismissive is to lack imagination.


How do we know these "MSA-less" models aren't cheating (i.e. learning all MSAs implicitly from their training data)? If they are, they would similarly fail on any "novel" AA sequence (i.e. one without known/learned MSAs)


> What ramraj is talking about: if you go into a competitive grad program to get a PhD in structural biology, your advisor will probably expect that in 3-4 years you will: crystallize a protein of interest, collect enough data to make a model, and publish that model in a major journal.

All of that is generally applicable to molecular biology in general, and I don't see how the field of structural biology is especially egregious, the way ramraj is making it out to be.


Protein crystallization can be very difficult and there is no general solution. Kits that screen for crystal growth conditions usually help but optimization is needed in most cases. Then, that crystal must have certain properties that allow for good data acquisition at the X-ray facility. That’s another problem by itself and months or years can pass until you get a suitable protein crystal and X-ray diffraction dataset where you can model your structure.


I'm familiar with protein crystallization and the difficulties associated with it. What I don't agree with is the characterization of the field as especially difficult, above and beyond modern biology in general. Nor can I support the assertion that structural biology students are subject to special abuse that regular grad students are not.

> ... can be very difficult and there is no general solution

This is true of pretty much any graduate work in molecular biology.


> Nor can I support the assertion that structural biology students are subject to special abuse that regular grad students are not.

I didn’t say anything regarding that.

> This is true of pretty much any graduate work in molecular biology.

Just to elaborate my point: The process of protein cristallization is not understood at a level that allows the design of general and reproducible protocols. This inherent obscurity means that every new protein needs to undergo an ad hoc, heuristic, iterative process to obtain high quality crystals. This is an early methodological hurdle, at a stage where other routine procedures in biochemistry or molecular biology are usually successful.


I said that. We had a saying in grad school, "the very best protein structures are crystallized from postdoc tears".


As a current postdoc (genetics) I think postdoc tears are the fuel that academia runs on - as well as those of our significant others and kids.


> I didn’t say anything regarding that.

I know you didn't - this was one of the claims of ramraj I was responding to.

> The process of protein cristallization is not understood at a level that allows the design of general and reproducible protocols. This inherent obscurity means that every new protein needs to undergo an ad hoc, heuristic, iterative process to obtain high quality crystals. This is an early methodological hurdle, at a stage where other routine procedures in biochemistry or molecular biology are usually successful.

I don't disagree, though I would suggest that there's just as much grunt work, frustration, and hand wringing in other fields of molecular biology at the graduate level and above. Even if other fields have reproducible protocols established, that's not what gets papers published. With the possible exception of clinical samples, more often than not we have no clue if the analyses we're doing will yield anything, and the high risk zone is where all grad students live.


In most other sub fields you don’t get to not publish if exactly one endpoint never comes to pass. I know I didn’t have something like that, and most of my non crystallographer friends didn’t.

There’s a lot of structural biology apologists here in this thread. Happy to crap on DeepMind but not ready to take criticism of their own field.

For anyone outside of the field wanting to learn more, check out this documentary: https://en.m.wikipedia.org/wiki/Naturally_Obsessed


> In most other sub fields you don’t get to not publish if exactly one endpoint never comes to pass. I know I didn’t have something like that, and most of my non crystallographer friends didn’t.

How is this a problem unique to structural biology? In every subfield we're hoping to publish interesting results, and that endpoint is defined by the nature of the field. As a geneticist, in the early 90s, sequencing & characterizing a single bacterial gene would have been the focus of an ambitious PhD thesis and would yield multiple papers. Sequencing at that time period had a dozen points of failure and high risk to set as the goal for a thesis. Today, sequencing a whole genome is unlikely to yield a single publication. If you're setting the ability to crystallize as the single point of failure endpoint, that logic applies to every subfield. We all have something that could potentially derail our plans, and I fail to see how structural biology is unique in that respect.

> There’s a lot of structural biology apologists here in this thread. Happy to crap on DeepMind but not ready to take criticism of their own field.

I'm not a structural biologist - I'm a Geneticist who disagrees with your characterization of SB. The issues you've mentioned are not unique to SB, but apply to pretty much all subfields. I see grad students in general lament their life choices when their cell culture fails, their mice die, protocols just don't work, or their results just don't make sense.


> If you're setting the ability to crystallize as the single point of failure endpoint, that logic applies to every subfield.

I agree that there are other fields with similar issues. What baffles me is how long protein crystallization has been a problem.

I’ll use your example:

Nowadays, sequencing a gene is unlikely to yield a single publication by itself but is no early point of failure. It’s a solved problem with protocols that have been thoroughly developed and explained to boredom. New early points of failure arise (sample related, maybe?).

Nowadays, determining the structure of a protein is unlikely to yield a single publication by itself but has a clear, early, unsolved point of failure. No understandable protocol other than buying $creening plate$, fetching cat whiskers, drawing a theoretical phase diagram that tells you nothing, and pray that your crystallization tray doesn’t show a scrambled egg tomorrow or in six weeks. This has been an issue for more than fifty years and almost 200k published structures. The jump you mentioned in sequencing hasn’t happened yet in protein crystallography and might never happen because our understanding of macromolecular crystallization is lacking and thus we cannot predict proper crystallization conditions.


Sure, I agree that crystallization in particular has faced this particular bottleneck for a long time. The field of SB, however, has still managed to advance massively too. For example, Cryo-EM can do things we could barely imagine a decade ago.

The point I'm trying to make is that from the perspective of a grad student, no field is devoid of risk, and it's surprisingly easy to be stuck by something that's a solved problem on paper. For example, I know of a grad student that's been trying to develop a mouse line for about a year now, and has now discovered that this strain just won't work for what they have in mind - and must now recreate the mutant combinations in a different strain that's at least a year's work - if it even works. I've heard stories of entire mouse lines die, and you're back to square one - years of work lost.

The other thing that complicates some of these fields is the massive pace of innovation they're undergoing that it is very hard for an individual lab to keep up to date. Grad students are using techniques that were published less than 5 years ago, and there's no locally available expertise to tap into. What remains the same is the level of grunt work grad students and postdocs have to do, even if the techniques get more sophisticated over time.


I did rotations in multiple types of lab as part of my program and I can't say I ever found that students in regular molecular biology labs had nearly as hard a time as structural biologists; SB is its own class of hell. Given the number of papers published in molecular biology that turn out to be "gel was physically cut and reasssembled to show the results the authors desired" (it's much harder to cheat on a protein structure)...


I think this is highly subjective and that every field has its own special hells. For example, in computational biology it's a lot easier to generate results (when things actually work) but conversely it's a lot harder to convince journals. The burden of proof required to publish is sometimes ridiculously high - I had a paper spend almost 3 years in review.


Hear, hear. This is probably the best take.


> Not sure what part of "it does homology modeling 2x better" you didn't see in my comment? AlphaFold scored something like 85% in CASP in 2020, in CASP 2016, I-TASSER had I think 42%? So it's ~2x as good as I-TASSER which is exactly what I said in my comment.

Wait, stop, I don't know anything about proteins but 84% success is not ~2x better than 42%.

It doesn't really make sense to talk about 2x better in terms of success percentages, but if you want a feel, I would measure 1/error instead (a 99% correct system is 10 times better than a 90% correct system), making AlphaFold around 3.6 times better.


I think odds ratio ( p/(1-p) ) is the thing I'd use here. It gives the right limiting behavior (at p ~= 0, doubling p is twice as good, and at p~=1, halving 1-p is twice as good) and it's the natural way to express Bayes rule, meaning you can say "I'm twice as sure (in odds ratio terms) based on this evidence" and have that be solely a property of the update, not the prior.


For the lazy, this would make alphafold 7.25x better than the previous tools


Excellent comment. I think the issue is that "better" is underspecified and needs some precisification to be useful. The metric you are using here is the proper response to the question "how many times more surprising is it when method A fails than method B?". This is in many cases what we care about. Probably, it's what we care about here. The odds ratio seems to do a good job of capturing the scale of the achievement.

On the other hand, it's not necessarily the only thing we might care about under that description. If I have a manufacturing process that is 99.99% successful (the remaining 0.01% has to be thrown out), it probably does not strike me as a 10x improvement if the process is improved to 99.999% success. What I care about is the cost to produce the average product that can be sent to market, and this "10x improvement" changes that only a very small amount.


TIL, thanks for this.


> AlphaFold scored something like 85% in CASP in 2020, in CASP 2016, I-TASSER had I think 42%? So it's ~2x as good as I-TASSER

As someone who doesn't know proteins, but is decent at math, I would not describe it this way. You are assuming a linear relationship between effort and value, but more often than not, effort has diminishing returns. 80dB is not 2x as loud as 40 dB. An 8K image doesn't have 2x the fidelity of a 4K image. If Toyota unveiled a new engine that was 60% efficient tomorrow, no one in their right mind would say "eh, it's just 2x better". If we came out with a CPU that could clock up to 10Ghz we wouldn't say "meh, that's just 2x what we had".

Without being able to define the relationship here, I could just as well say that 85% is 1000x better than 42%. There's just no way to put a number on it. What we can say is that we completely blew all projections out of the water.

Again, I'm not someone working with proteins, but to me it sounds as revolutionary as a 60%+ efficient engine, or a 10Ghz CPU. No one saw it coming or thought it feasible with current technology.


I think the debate between "does amazing on metric X" versus "doesn't really understand the problem" reappears many places and doesn't have any direct way to be resolved.

That's more or less because "really understands the problem" generally winds-up being a placeholder for things the system can't. Which isn't to say it's not important. One thing that is often included in "understanding" is the system knowing the limits of its approach - current AI systems have a harder time giving a certainty value than giving a prediction. But you could have a system that satisfied a metric for this and other things would pop up - for example, what kind of certainty or uncertainty are we talking about (crucial for decision making under uncertainty).


> Comparing alphafold to conventional homology modeling is disingenuous at its most charitable interpretation.

It's really not - have you played around with AF at all? Made mutations to protein structures and asked it to model them? Go look up the crystal structures for important proteins like FOXA1 [1], AR [2], EWSR1 [3], etc (i.e. pretty much any protein target we really care about and haven't previously solved) and tell me with a straight face that AF has "solved" protein folding - it's just a fancy language model that's pattern matching to things it's already seen solved before.

signed, someone with a PhD in biochemistry.

[1] https://alphafold.ebi.ac.uk/entry/P55317 [2] https://alphafold.ebi.ac.uk/entry/P10275 [3] https://alphafold.ebi.ac.uk/entry/Q01844


I can see the loops in these structures. I dont see the problem. It still added a structure to every embl page, and people are free to judge the predictions themselves. For all I care (ostensibly as the end customer of these structures) I don’t mind having a low confidence structure for any arbitrary protein at all. It’s only marginally less useful to actual biology than full on X-ray structures anyway.


> It’s only marginally less useful to actual biology than full on X-ray structures anyway.

I'm not sure what you're implying here. Are you saying both types of structures are useful, but not as useful as the hype suggests, or that an X-Ray Crystal (XRC) and low confidence structures are both very useful with the XRC being marginally more so?

An XRC structure is great, but it's a very (very) long way from getting me to a drug. Observe the long history of fully crystalized proteins still lacking a good drug. Or this piece on the general failure of purely structure guided efforts in drug discovery for COVID (https://www.science.org/content/blog-post/virtual-screening-...). I think this tech will certainly be helpful, but for most problems I don't see it being better than a slightly-more-than-marginal gain in our ability to find medicines.

Edit: To clarify, if the current state of the field is "given a well understood structure, I often still can't find a good medicine without doing a ton of screening experiments" then it's hard to see how much this helps us. I can also see several ways in which a less than accurate structure could be very misleading.

FWIW I can see a few ways in which it could be very useful for hypothesis generation too, but we're still talking pretty early stage basic science work with lots of caveats.

Source: PhD Biochemist and CEO of a biotech.


This isn’t a good use of the term gaslighting. Accusing someone of gaslighting takes what we used to call a ‘difference of opinion’ and mutates it into deliberate and wicked psychological warfare.

Incidentally, accusing someone of gaslighting is itself a form of gaslighting.


Well, it can be gaslighting but not always. A knowingly false accusation, repeated often enough and in a way to make the accused question their own perception of reality, would be gaslighting.


Not only is CASP not "unwinnable," it's not even a contest. The criteria involved are rated as "moderately difficult." Alphafold is a significant achievement but it sure as hell hasn't "revealed the structure of the protein universe," whatever that means.

Which top labs have changed direction? Because Alphafold can't predict folds, just identify ones it's seen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: