Thanks, awesome! So what do molecular biologists do with these 3D representations once they have them? Do they literally just see how they fit to other proteins?
There are many uses for structure. Personally, I find the 3d structures to be useful as a mental guide for picturing things, and certainly people do try to "dock" proteins that have complementary structures, but unfortunately, the biophysics of protein complexes suggests that the conformation change on binding is so large that the predicted structures aren't super-helpful.
Certainly, in a corpo like mine (Genentech/Roche) protein structures have a long history of being used in drug discovery- not typically a simple "dock a ligand to a protein" but more for constructing lab experiments that help elucidate the actual mechanistic biology going on. That is only a tiny part of a much larger process to work on disease targets to come up with effective treatments. Genentech is different from most pharma in that their treatments are themselves typically proteins, rather than small molecules.
I think many people would say that in principle, you could make a QM force field with an accurate enough basis function that an infinitely long simulation would recapitulate the energy landscape of a protein, and that information could be used to predict the kinetically accessible structures the protein adopts.
In practice, the force fields are well understood but to be computationally efficient, they have to approximate just about everything. Examples: since number of inter-atom distance pairs goes up with N**2 atoms, you need to have tricks to avoid that and instead scale around n log n or even n if you can do it. When I started, we just neglected atoms more than 9 angstrom apart, but for highly charged molecules like DNA, that leads to errors in the predicted structure. Next, typically the force fields avoid simulating polarizability (the ability of an atom's electron cloud to be drawn towards another atom with opposite charge), also because expensive. They use simplified spring models (lterally hooke's equation) for bond lengths, bond angles. The torsions (the angle formed by 4 atoms in a row) haev a simplified form. The interatomic relationships are not handled in a principled way, instead treating atoms as mushy spheres....
After having made major contributions in this area, I don't think that improvements to force fields are going to be the most effective investment in time and energy. There are other bits of data that can get us to accurate structures with less work.
Yes, that's a fantasy world. I explored this using the Exacycle system at Google and we did actually do a couple things that nobody else could have at the time, but even that extraordinary amount of computing power really is tiny. The problem is the "force field" isn't just the enthalpic contributions I listed above, but also depends intimately on much more subtle entropic details- things like the cost of rearranging water into a more ordered structure have to be paid for. Estimating those is very expensive- far worse than just enumerating over large numbers of proteins "in vacuo", and probably cannot be surmounted, unless quantum computing somehow becomes much better.
Instead, after spending an ordinate amount of Google's revenue on extra energy, I recommended that Google instead apply machine learning to protein structure prediction and just do a better job of extracting useful structural information (note: this was around the time CNNs were in vogue, and methods like Transformers didn't exist yet) from the two big databases (all known proteins/their superfamily alignments, and the PDB).
Note that this conclusion was a really hard one for me since I had dedicated my entire scientific career up to that point in attempting to implement that fantasy world (or a coarse approximation of it), and my attempts at having people develop better force fields (ones that didn't require as much CPU time) using ML weren't successful. What DeepMind did was, in some sense, the most parsimonious incremental step possible to demonstrate their supremacy, which is far more efficient. Also, once you have a trained model, inference is nearly free compared to MD simulations!
That's interesting. Thanks for the info. They're getting better at Quantum.
It's going to be fascinating to see the future of this field and all the potential medicine waiting to be discovered and the lifespan improvements and just sheer biological discoveries.
It feels almost like the new panning for gold. :)
It's pretty crazy to see how human advancement parallels computing power in so many areas.
A structure is bascially another tool for producing hypotheses. In my case, I often use structures to predict effects of genetic lesions. If your protein has a clearly defined active site, you can get a rough sense of where on the enzyme that active site is relative to other mutations. Often residues that are distant in sequence end up right next to each other in the folded structure, so certain residues can have unexpected roles.
It gives a picture of the enzyme as a machine, and lets you look at specific parts and say “this residue is probably doing this job in the whole system”.
Often the ribbons (alpha-helices and beta=sheets) form "protein domains". Canonically, these are stable, folded structures with conserved shapes and functions that serve as the building blocks of proteins, like lego pieces. These protein domains can be assembled in different ways to form proteins of different function. Different protein domains that have the same evolutionary origin have conserved structure even when the underlying amino acid sequence, or DNA sequence has changed beyond recognition over millions of years of evolution.
In other words, molecular biologists use structure as a proxy for function.
Looking at how the same protein domains works in different proteins in different species can give us clues as to how a protein might work in human biology or disease.
Basically, the shape of the protein determines how it interacts with other things. So knowing the structure enables better prediction of how the pathways it is involved in work and how other things (say, potential drugs) would affect that pathway.