That's absolutely true, but also keep in mind the limitations of both the examples you give.
Protein folding works pretty ok for short proteins on the order of ~200 amino acids. It gets considerably less good as the proteins get bigger. This is unsurprising, bigger proteins represent an exponentially bigger search space and there are fewer and fewer known protein structures as the size of protein increases, so your library of exemplars is smaller (hurting ML approaches). Lets not even speak of protein complexes or dynamic protein interactions. Even at the end of the prediction, you still have to prove your result by getting an actual crystal which is not always easy.
Cell biology simulation is a whole other ball game, and I don't know that area nearly as well. But from the little I understand, we have some very basic mathematical models of a cell that work in a limited way at a very high level. To get those things to work a number of extremely simplifying (but useful in some contexts) assumptions are made, which limits the broader applications of those simulations. Simulating a whole cell very quickly gets into the realm of definitely not computationally tractable given the enormous number of entities in the cell. It's a fiendishly hard problem, I hope we get there someday, but I think it'll be a long while.
> Even at the end of the prediction, you still have to prove your result by getting an actual crystal which is not always easy.
From my experience, this might be the largest hurdle. At least in biology, beyond proving your result, a lot of the experimentalists that you're talking to likely think that "all of this simulation stuff is complete bullshit." That might be a generational thing, though.
I think it's a pretty giant hurdle, and I don't think it's a strictly generational thing. Having been on both sides of the computational/experimental divide, I still fundamentally can't trust a result that has no experimental validation. At some point, even if you believe the computation's result, it has to be proved out in the physical world to be useful.
I disagree on protein folding- It seems to me that AlphaGo Zero is a great analogy to the protein folding problem: If you have a good algorithm for folding 100 amino-acid proteins you can use that to brute force an algorithm for 110 amino-acid proteins, then use that as a training set to develop a good algorithm for 110 amino-acid proteins, then continue iterating in this way (or so I hope)
I think AlphaGo is a fundamentally different problem than is the protein folding problem. I can generate a Go board and a sequence of strictly valid moves because I know the rules for Go. With certain limited high level exceptions (like the sequence is linear, you can't have bond angles of a certain degree etc.) those rules don't exist as such for protein folding. In other words if I generate 1000 protein folds, I don't know which ones are physically valid. That's a problem for the kind of iteration I think you're describing, though I may be misunderstanding.
Another way of seeing this limitation is that we have great models for predicting what the weather will be like a few days from now, almost down the hour. But when asked to predict whether it will snow a month from now, we can't. That's because highly accurate models on small scale timescales become too noisy on longer timescales. In protein folding, just s/timescale/sequence length/.
Protein folding works pretty ok for short proteins on the order of ~200 amino acids. It gets considerably less good as the proteins get bigger. This is unsurprising, bigger proteins represent an exponentially bigger search space and there are fewer and fewer known protein structures as the size of protein increases, so your library of exemplars is smaller (hurting ML approaches). Lets not even speak of protein complexes or dynamic protein interactions. Even at the end of the prediction, you still have to prove your result by getting an actual crystal which is not always easy.
Cell biology simulation is a whole other ball game, and I don't know that area nearly as well. But from the little I understand, we have some very basic mathematical models of a cell that work in a limited way at a very high level. To get those things to work a number of extremely simplifying (but useful in some contexts) assumptions are made, which limits the broader applications of those simulations. Simulating a whole cell very quickly gets into the realm of definitely not computationally tractable given the enormous number of entities in the cell. It's a fiendishly hard problem, I hope we get there someday, but I think it'll be a long while.