Scientists at the California-based Arc Institute have released the first biological foundation model trained on DNA which can predict and design, named ‘Evo.’
While other models are trained on text, Evo learns from the information encoded within DNA.
This allows it to create interpretative and generative capabilities and can now generate DNA sequences of over one million bases.
The new tool has already accurately predicted how DNA changes would affect bacteria, with this being a major breakthrough for research in the future.
Evo was originally introduced in a preprint earlier this year, but it has now been published in Science with the researchers showing how this can help form a deeper understanding of biological sequences.
The model has been created by the labs of Brian Hie, Arc Innovation Investigator and Stanford Assistant Professor of Chemical Engineering, and Patrick Hsu, Arc Core Investigator and UC Berkeley Assistant Professor of Bioengineering. Twenty scientists across biological and computational disciplines have been involved.
“Evo deciphers the patterns written into DNA over billions of years of evolution, breaking new ground in our ability to understand and engineer biology,” said Hsu. “Just as generative AI has revolutionized how we work with text, audio, and video, these same creative capabilities can now be applied to life’s fundamental codes.”
“What makes Evo exciting is that it’s a true foundation model for biology,” added Hie. “Being both multimodal and multiscale, it gives us a unified approach for harnessing the immense complexity of living systems.”
Foundation model for biological and DNA research created by Arc Institute
While the model can generate a huge amount of DNA sequences, the team are now looking to increase this further as they hope to scale Evo to more complex organisms.
The team describes Evo as being just the beginning. “Our next goal is to move beyond single-cell life to understand the multicellular organisms that evolution has created over billions of years. Long-term, we’re working toward a new field of ‘genome design’ where we can create entire cellular pathways and potentially entire organisms,” said Hsu.
“As we scale Evo to more complex datasets and broader scales, we’re working to make that complexity programmable, allowing researchers to leverage these learned rules for biological design in a way that’s never been accessible before,” added Hie.
Featured Image: AI-generated via Ideogram