Summary: Antibody language models have been treating antibodies as simple amino acid strings, but this approach fundamentally conflates mutation processes with selection effects. A new framework called DASM explicitly separates the two, delivering better performance on functional prediction tasks.
A little over a decade ago, antibody language models did not exist. Today, they are central to computational drug discovery. But a paper published in eLife this April reveals that the entire field has been building on a flawed assumption, one that runs deep into how these models actually learn.
Why treating antibodies as amino acid strings falls short
Antibodies do not appear out of nowhere. Your immune system generates them through a process called V(D)J recombination, then refines them through mutation and selection. These are two fundamentally different biological mechanisms happening in sequence.
Most antibody language models ignore this biology entirely. They treat antibodies as flat strings of amino acids, the same way you might treat a sentence in English. The model learns to predict the next token, or fill in masked positions, using standard language modeling objectives.
But here is the problem. When you train a masked or autoregressive model on antibody sequences, you are not just teaching it about antibody structure and function. You are implicitly folding nucleotide-level mutation processes into the protein-level model. The model cannot tell the difference between an amino acid that appeared because of a random mutational event and one that was kept because it improved antibody function.
The hidden cost of conflating mutation and selection
This might sound like a technical subtlety. It is not. When mutation and selection get tangled together inside a language model, the model learns a corrupted representation of what makes an antibody work.
The degradation shows up exactly where it hurts most: predicting how specific mutations affect antibody function. If your model has been learning mutation patterns alongside functional signals, its predictions about mutation effects will be noisy at best and misleading at worst.
This is precisely the task that matters in drug discovery. You want to know: if I change this amino acid, does the antibody bind better or worse? A model that cannot cleanly isolate functional effects from mutational noise will struggle to answer that question reliably.
Enter DASM: factoring out the noise
The eLife paper introduces a framework called the Deep Amino Acid Selection Model, or DASM, which takes a fundamentally different approach. Instead of absorbing mutation processes into its learned representations, DASM explicitly factors them out.
What remains is a model that focuses on the functional effects of amino acid mutations, specifically those that change some aspect of antibody function. The factorization leads to improved performance over standard protein language model objectives, according to the eLife assessment.
The paper also describes DASM as readily interpretable, which matters in a field where black-box predictions are a real obstacle to clinical adoption.
A paradigm shift with details still to explore
eLife assessed the significance of this work as 'Fundamental' and the strength of evidence as 'Convincing'. That is a strong signal from a journal known for rigorous review.
But it is worth being transparent about what we do not yet know from the public abstract alone. Specific benchmark numbers, architecture details, and training procedures are not available in the open assessment. The conceptual framework is clear and compelling, but the quantitative specifics require reading the full paper.
What this means for protein AI going forward
DASM's core idea is simple but profound: if you want to model selection, you have to separate it from mutation. This principle likely extends well beyond antibodies to any protein shaped by evolutionary processes.
The fact that a biologically informed model can be more interpretable and perform better suggests that the field's obsession with scaling up raw sequence models may have been heading in the wrong direction. Sometimes the best way forward is not more data and more parameters, but a clearer understanding of what the data actually represents.
The real question is whether this factorization approach will spread to other areas of protein language modeling. If biology keeps giving us these kinds of shortcuts, maybe we should start listening more carefully. What other biological processes are currently hiding inside our language models, masquerading as learned structure?
Comments