It sounds like your memory from that class is pretty good, and you’re right, it depends on what we’re trying to solve, but the problem in this case is protein folding, so if a neural network spotted a pattern, that’s what we want. Figuring out the generalisable “rules” (i.e. why proteins fold a certain way) isn’t what we’re trying to do with these tools (yet), we’re just on the pattern finding side, which is why the developments from AlphaFold are so incredible, it’s just limited.
It feels like my job for the next few years is going to be “professional killjoy”, because I get people’s excitement, but we can’t properly use these tools if we don’t acknowledge their limitations. If we did that, they’d actually become more powerful because we could develop new and different tools, or go gather experimental data to validate some of the generated structures (or to round out the training data).
I don’t know if this would count as overtraining, because it has so far performed amazingly on structures that are similar to the training data but not in the training data. The problem is we don’t have much training data for the tricky parts. That’s fine, it just means it won’t help us learn much about those areas, but headlines like “alphafold predicts the structures of all human proteins” are so misleading
It sounds like your memory from that class is pretty good, and you’re right, it depends on what we’re trying to solve, but the problem in this case is protein folding, so if a neural network spotted a pattern, that’s what we want. Figuring out the generalisable “rules” (i.e. why proteins fold a certain way) isn’t what we’re trying to do with these tools (yet), we’re just on the pattern finding side, which is why the developments from AlphaFold are so incredible, it’s just limited.
It’s articles like this that do my head in https://www.scientificamerican.com/article/one-of-the-biggest-problems-in-biology-has-finally-been-solved/
It feels like my job for the next few years is going to be “professional killjoy”, because I get people’s excitement, but we can’t properly use these tools if we don’t acknowledge their limitations. If we did that, they’d actually become more powerful because we could develop new and different tools, or go gather experimental data to validate some of the generated structures (or to round out the training data).
I don’t know if this would count as overtraining, because it has so far performed amazingly on structures that are similar to the training data but not in the training data. The problem is we don’t have much training data for the tricky parts. That’s fine, it just means it won’t help us learn much about those areas, but headlines like “alphafold predicts the structures of all human proteins” are so misleading