Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Towards multimodal foundation models in molecular cell biology

Abstract

The rapid advent of high-throughput omics technologies has created an exponential growth in biological data, often outpacing our ability to derive molecular insights. Large-language models have shown a way out of this data deluge in natural language processing by integrating massive datasets into a joint model with manifold downstream use cases. Here we envision developing multimodal foundation models, pretrained on diverse omics datasets, including genomics, transcriptomics, epigenomics, proteomics, metabolomics and spatial profiling. These models are expected to exhibit unprecedented potential for characterizing the molecular states of cells across a broad continuum, thereby facilitating the creation of holistic maps of cells, genes and tissues. Context-specific transfer learning of the foundation models can empower diverse applications from novel cell-type recognition, biomarker discovery and gene regulation inference, to in silico perturbations. This new paradigm could launch an era of artificial intelligence-empowered analyses, one that promises to unravel the intricate complexities of molecular cell biology, to support experimental design and, more broadly, to profoundly extend our understanding of life sciences.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Multimodal analytical technologies and their applications.
Fig. 2: Diverse data context in pretraining and iterative improvement by lab-in-the-loop.
Fig. 3: Computational components of multimodal foundation models.
Fig. 4: Potential training tasks and challenges.

Similar content being viewed by others

References

  1. Alberts, B. et al. Molecular Biology of the Cell 6th edn (W. W. Norton, 2020).

  2. Keller, E. F. Making Sense of Life: Explaining Biological Development with Models, Metaphors, and Machines (Harvard Univ. Press, 2002).

  3. Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004). A seminal review on network biology, elucidating how molecular interactions shape cellular and organismal function.

    Article  PubMed  Google Scholar 

  4. Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).

    Article  CAS  PubMed  Google Scholar 

  5. Goldberg, A. P. et al. Emerging whole-cell modeling principles and methods. Curr. Opin. Biotechnol. 51, 97–102 (2018).

    Article  CAS  PubMed  Google Scholar 

  6. Johnson, G. T. et al. Building the next generation of virtual cells to understand cellular biology. Biophys. J. 122, 3560–3569 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Karr, J. R., Takahashi, K. & Funahashi, A. The principles of whole-cell modeling. Curr. Opin. Microbiol. 27, 18–24 (2015).

    Article  CAS  PubMed  Google Scholar 

  8. Freddolino, P. L. & Tavazoie, S. The dawn of virtual cell biology. Cell 150, 248–250 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Georgouli, K., Yeom, J.-S., Blake, R. C. & Navid, A. Multi-scale models of whole cells: progress and challenges. Front. Cell Dev. Biol. 11, 1260507 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Karr, J. R. et al. A whole-cell computational model predicts phenotype from genotype. Cell 150, 389–401 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. HuBMAP Consortium. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019).

    Article  ADS  CAS  Google Scholar 

  12. Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 83 (2017). The potential of multi-omics in uncovering molecular underpinnings of diseases and informing precision medicine.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Regev, A. et al. Science Forum: the Human Cell Atlas. eLife https://doi.org/10.7554/eLife.27041 (2017). An introduction of the HCA initiative, a pivotal project for mapping cellular diversity across human tissues.

  14. Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181, 236–249 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116.e20 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Deng, Y. et al. Spatial-CUT&Tag: spatially resolved chromatin modification profiling at the cellular level. Science 375, 681–686 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  18. Swanson, E. et al. Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. eLife 10, e63632 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol. 39, 1246–1258 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

    Article  ADS  CAS  PubMed  Google Scholar 

  21. Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021). An overview of the concept, opportunities and challenges of foundation models for diverse artificial intelligence applications.

  22. Vaswani, A. et al. Attention is all you need. Preprint at https://arxiv.org/abs/1706.03762 (2017). An introduction of the transformer architecture, the cornerstone of modern foundation models.

  23. Brown, T. et al. Language models are few-shot learners. In Proc. 34th International Conference on Neural Information Processing Systems 1877–1901 (Curran Associates Inc., 2020). An introduction of GPT-3, a 175-billion parameter language model demonstrating strong few-shot learning capabilities across diverse natural language processing tasks.

  24. Ouyang, L. et al. Training language models to follow instructions with human feedback. In Proc. 36th International Conference on Neural Information Processing Systems 27730–27744 (Curran Associates Inc., 2022).

  25. Touvron, H. et al. LLaMA: open and efficient foundation language models. Preprint at https://arxiv.org/abs/2302.13971 (2023). An introduction to LLaMA, a suite of open-source language models (7B to 65B parameters) trained on publicly available data.

  26. Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).

  27. llama3: The official meta Llama 3 GitHub site. GitHub https://github.com/meta-llama/llama3 (2024).

  28. Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10674–10685 (IEEE/CVF, 2022).

  29. Podell, D. et al. SDXL: improving latent diffusion models for high-resolution image synthesis. Preprint at https://arxiv.org/abs/2307.01952 (2023).

  30. Blattmann, A. et al. Stable video diffusion: scaling latent video diffusion models to large datasets. Preprint at https://arxiv.org/abs/2311.15127 (2023).

  31. Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. In Proc. 37th International Conference on Neural Information Processing Systems 34892–34916 (Curran Associates Inc., 2023).

  32. Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Li, R., Li, L., Xu, Y. & Yang, J. Erratum to: Machine learning meets omics applications and perspectives. Brief. Bioinform. 23, bbab560 (2022).

    Article  PubMed  Google Scholar 

  34. Klein, D. et al. Mapping cells through time and space with moscot. Nature 638, 1065–1075 (2025).

  35. Brbić, M. et al. MARS: discovering novel cell types across heterogeneous single-cell experiments. Nat. Methods 17, 1200–1206 (2020).

    Article  PubMed  Google Scholar 

  36. Brbić, M. et al. Annotation of spatially resolved single-cell data with STELLAR. Nat. Methods 19, 1411–1418 (2022).

    Article  PubMed  Google Scholar 

  37. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024). A deep learning model integrating gene–gene relationship knowledge graphs to predict transcriptional responses.

  40. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). An introduction to AlphaFold, a deep learning model achieving near-experimental accuracy in predicting protein structures.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  41. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  42. Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. Preprint at bioRxiv https://doi.org/10.1101/2022.07.20.500902 (2022).

  43. ESM3: simulating 500 million years of evolution with a language model. EvolutionaryScale https://www.evolutionaryscale.ai/blog/esm3-release (2024). A frontier language model for biology that simultaneously reasons over the sequence, structure and function of proteins.

  44. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Cui, H. et al. scGPT: towards building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024). The development of scGPT, a generative pre-trained transformer model, leveraging over 33 million single-cell datasets to advance single-cell biology.

    Article  CAS  PubMed  Google Scholar 

  46. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023). A large model pretrained on 30 million single-cell transcriptomes, facilitating accurate predictions in gene network biology.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).

    Article  Google Scholar 

  48. Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).

    Article  ADS  CAS  PubMed  Google Scholar 

  49. Sverchkov, Y. & Craven, M. A review of active learning approaches to experimental design for uncovering biological networks. PLoS Comput. Biol. 13, e1005466 (2017).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  50. Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86–91 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  51. Foster, A., Ivanova, D. R., Malik, I. & Rainforth, T. Deep adaptive design: amortizing sequential Bayesian experimental design. In Proc. 38th International Conference on Machine Learning Vol. 139 3384–3395 (PMLR, 2021).

  52. Rainforth, T., Foster, A., Ivanova, D. R. & Smith, F. B. Modern Bayesian experimental design. Statist. Sci. 39, 100–114 (2024).

  53. Vanlier, J., Tiemann, C. A., Hilbers, P. A. J. & van Riel, N. A. W. A Bayesian approach to targeted experiment design. Bioinformatics 28, 1136–1142 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  55. Eyler, C. E. et al. Single-cell lineage analysis reveals genetic and epigenetic interplay in glioblastoma drug resistance. Genome Biol. 21, 174 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Chevrier, S. et al. An immune atlas of clear cell renal cell carcinoma. Cell 169, 736–749.e18 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Zhu, C., Preissl, S. & Ren, B. Single-cell multimodal omics: the power of many. Nat. Methods 17, 11–14 (2020).

    Article  CAS  PubMed  Google Scholar 

  58. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).

    Article  CAS  PubMed  Google Scholar 

  60. Battich, N. et al. Sequencing metabolically labeled transcripts in single cells reveals mRNA turnover strategies. Science 367, 1151–1156 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  61. Cao, J., Zhou, W., Steemers, F., Trapnell, C. & Shendure, J. Sci-fate characterizes the dynamics of gene expression in single cells. Nat. Biotechnol. 38, 980–988 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Qiu, Q. et al. Massively parallel and time-resolved RNA sequencing in single cells with scNT-seq. Nat. Methods 17, 991–1001 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Qiu, X. et al. Mapping transcriptomic vector fields of single cells. Cell 185, 690–711.e45 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Zhou, Z. et al. DNABERT-2: efficient foundation model and benchmark for multi-species genome. Preprint at https://arxiv.org/abs/2306.15006 (2023).

  66. Han, H. et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 46, D380–D386 (2018).

    Article  CAS  PubMed  Google Scholar 

  67. Liu, Z.-P., Wu, C., Miao, H. & Wu, H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database 2015, bav095 (2015).

  68. Margolin, A. A. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7, S7 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Badia-I-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).

    Article  CAS  PubMed  Google Scholar 

  71. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Qin, Q. et al. Lisa: inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. Genome Biol. 21, 32 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Kim, S. & Wysocka, J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol. Cell 83, 373–392 (2023).

    Article  CAS  PubMed  Google Scholar 

  74. Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  75. Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Proc. 36th International Conference on Neural Information Processing Systems 26711–26722 (Curran Associates Inc., 2022).

  77. Joung, J. et al. A transcription factor atlas of directed differentiation. Cell 186, 209–229.e26 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575.e28 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A. & Teichmann, S. A. The Human Cell Atlas: from vision to reality. Nature 550, 451–453 (2017).

    Article  ADS  CAS  PubMed  Google Scholar 

  81. Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48, D882–D889 (2020).

    Article  CAS  PubMed  Google Scholar 

  82. Stunnenberg, H. G., International Human Epigenome Consortium & Hirst, M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1897 (2016).

    Article  CAS  PubMed  Google Scholar 

  83. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Tabula Muris Consortium. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

    Article  ADS  Google Scholar 

  85. Yao, Z. et al. A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex. Nature 598, 103–110 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  86. CZI Single-Cell Biology Program et al. CZ CELL×GENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Nucleic Acids Res. 53, D886–D900 (2025).

  87. Chameleon Team. Chameleon: mixed-modal early-fusion foundation models. Preprint at https://arxiv.org/abs/2405.09818 (2024).

  88. Gage, P. A new algorithm for data compression. C Users J. Arch. 12, 23–38 (1994).

    Google Scholar 

  89. OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

  90. Barnum, G., Talukder, S. & Yue, Y. On the benefits of early fusion in multimodal representation learning. Preprint at https://arxiv.org/abs/2011.07191 (2020). An investigation into early-fusion strategies in multimodal learning, demonstrating that immediate integration of inputs enhances model performance and robustness.

  91. Liu, Z. et al. Swin Transformer: hierarchical vision transformer using Shifted Windows. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 9992–10002 (IEEE/CVF, 2021).

  92. Fan, H. et al. Multiscale vision transformers. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 6804–6815 (IEEE/CVF, 2021).

  93. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 4171–4186 (Association for Computational Linguistics, 2019).

  94. Grill, J.-B. et al. Bootstrap your own latent—a new approach to self-supervised learning. In Proc. 34th International Conference on Neural Information Processing Systems 21271–21284 (Curran Associates Inc., 2020).

  95. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds. Iii, H. D. & Singh, A.) 1597–1607 (PMLR, 2020).

  96. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning Vol. 139 8748–8763 (PMLR, 2021).

  97. AlQuraishi, M. & Sorger, P. K. Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nat. Methods 18, 1169–1180 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Pan, S. et al. Unifying large language models and knowledge graphs: a roadmap. IEEE Trans. Knowl. Data Eng. 36, 3580–3599 (2024).

  99. Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).

    Article  ADS  CAS  PubMed  Google Scholar 

  100. Fabregat, A. et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).

    Article  CAS  PubMed  Google Scholar 

  101. Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 (Association for Computing Machinery, 2016).

  102. Hamilton, W. L., Ying, R. & Leskovec, J. Inductive representation learning on large graphs. In Proc. 31st International Conference on Neural Information Processing Systems 1–19 (Curran Associates Inc., 2017).

  103. Zhao, W. X., Liu, J., Ren, R. & Wen, J.-R. Dense text retrieval based on pretrained language models: a survey. ACM Trans. Inf. Syst. Secur. 42, 1–60 (2024).

    Google Scholar 

  104. Jeong, J. et al. Multimodal image-text matching improves retrieval-based chest X-ray report generation. In Proc. Machine Learning Research. Medical Imaging with Deep Learning Vol. 227 (eds Oguz, I. et al.) 978–990 (PMLR, 2024).

  105. Luo, R. et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinform. 23, bbac409 (2022).

    Article  PubMed  Google Scholar 

  106. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  107. Lv, L. et al. ProLLaMA: a protein large language model for multi-task protein language processing. Preprint at https://arxiv.org/abs/2402.16445 (2024).

  108. Debus, C., Piraud, M., Streit, A., Theis, F. & Götz, M. Reporting electricity consumption is essential for sustainable AI. Nat. Mach. Intell. 5, 1176–1178 (2023).

    Article  Google Scholar 

  109. Hu, E. J. et al. LoRA: low-rank adaptation of large language models. Preprint at https://arxiv.org/abs/2106.09685 (2021).

  110. Pfeiffer, J. et al. AdapterHub: a framework for adapting transformers. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 46–54 (Association for Computational Linguistics, 2020).

  111. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  112. Meyer, P. & Saez-Rodriguez, J. Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges. Cell Syst. 12, 636–653 (2021).

    Article  CAS  PubMed  Google Scholar 

  113. Saez-Rodriguez, J. et al. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat. Rev. Genet. 17, 470–486 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Lance, C. et al. Multimodal single cell data integration challenge: results and lessons learned. In Proc. NeurIPS 2021 Competitions and Demonstrations Track Vol. 176 (eds Kiela, D., Ciccone, M. & Caputo, B.) 162–176 (PMLR, 2022).

  115. Liu, Z. et al. KAN: Kolmogorov–Arnold networks. Preprint at https://arxiv.org/abs/2404.19756 (2024).

  116. Maynez, J., Narayan, S., Bohnet, B. & McDonald, R. On faithfulness and factuality in abstractive summarization. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 1906–1919 (Association for Computational Linguistics, 2020).

  117. Ji, Z. et al. Survey of hallucination in natural language generation. ACM Comput. Surv. 55, 248 (2022).

  118. Manakul, P., Liusie, A. & Gales, M. J. F. SelfCheckGPT: zero-resource black-box hallucination detection for generative large language models. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing 9004–9017 (Association for Computational Linguistics, 2023).

  119. Yin, Z. et al. Do large language models know what they don’t know? In Proc. Findings of the Association for Computational Linguistics: ACL 2023 8653–8665 (Association for Computational Linguistics, 2023).

  120. Tian, K., Mitchell, E., Yao, H., Manning, C. D. & Finn, C. Fine-tuning language models for factuality. Preprint at https://arxiv.org/abs/2311.08401 (2023).

  121. Bommasani, R. et al. The foundation model transparency index. Preprint at https://arxiv.org/abs/2310.12941 (2023).

  122. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT−4. Preprint at https://arxiv.org/abs/2303.12712 (2023).

  123. Rood, J. E., Maartens, A., Hupalowska, A., Teichmann, S. A. & Regev, A. Impact of the Human Cell Atlas on medicine. Nat. Med. 28, 2486–2496 (2022).

    Article  CAS  PubMed  Google Scholar 

  124. Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  125. Baysoy, A., Bai, Z., Satija, R. & Fan, R. The technological landscape and applications of single-cell multi-omics. Nat. Rev. Mol. Cell Biol. 24, 695–713 (2023).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank our collaborators across institutions for their support and insightful contributions throughout the development of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

H.C., B.W. and F.J.T. conceptualized the study. H.C. led the development of the manuscript and drafted the initial version. H.C., A.T.-L., M.B., J.S.-R., S.C., H.G., M.L. and F.J.T. contributed to refining the concepts and methodology. A.T.-L., M.B. and H.G. provided substantial input on the sections related to single-cell data and opportunities. J.S.-R. and S.C. contributed to discussions on gene function prediction and regulatory network reconstruction. F.J.T. and B.W. supervised the overall research direction and advised on critical revisions. All authors reviewed, edited and approved the final manuscript.

Corresponding authors

Correspondence to Fabian J. Theis or Bo Wang.

Ethics declarations

Competing interests

F.J.T. consults for Immunai, CytoReason, Cellarity, BioTuring and Genbio.AI, and has an ownership interest in Dermagnostix GmbH and Cellarity. B.W. serves as a scientific advisor to Shift Bioscience, Deep Genomics and Vevo Therapeutics, and acts as a consultant for Arsenal Bioscience. H.G. has an ownership interest in Vevo Therapeutics, and is an advisor to Verge Genomics and Deep Forest Biosciences. J.S.-R. reports funding from GSK, Pfizer and Sanofi, and fees and/or honoraria from Travere Therapeutics, Stadapharm, Astex, Owkin, Pfizer, Grunenthal, Moderna and Tempus. M.L. owns interests in Relation Therapeutics and AIVIVO, and is a scientific cofounder and part-time employee at AIVIVO. The other authors declare no competing interests.

Peer review

Peer review information

Nature thanks Marinka Zitnik and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, H., Tejada-Lapuerta, A., Brbić, M. et al. Towards multimodal foundation models in molecular cell biology. Nature 640, 623–633 (2025). https://doi.org/10.1038/s41586-025-08710-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-025-08710-y

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research