Uncovering Characteristic Traits of Earth’s Microbiomes
Authors:
Marcin P. Joachimiak1*([email protected]), Ziming Yang1, William J. Riehl1, Christopher Neely1, Prachi Gupta1,
Sean P. Jungbluth1, Paramvir S. Dehal1, Adam P. Arkin1,2 (PI)
Institutions:
1Lawrence Berkeley National Laboratory; 2University of California–Berkeley
Goals
The goal of this project is to use a vast, standardized collection of metagenome-sequenced samples from diverse ecosystems, to uncover microbial traits predictive of different environmental niches. Through advanced machine learning techniques, this research team aims to identify signature metagenome features—spanning sequence domains, functions, and taxa—that correspond to traits characteristic of specific ecosystems and indicate underlying ecosystem relationships at the microbial biogeographic scale.
Abstract
Earth’s biosphere is an interconnected, dynamically changing network of ecosystems, with microbes playing a significant environmental role. Advances in microbial metagenomics have recently provided extensive data on microbial communities across ecosystems, including biological sequences, taxonomy, and functional annotations. Previous research, focusing on 16S taxonomic data, has shown promising results and indicated vast uncharacterized biological diversity, but 16S barcoding-based methods are constrained by the weak connections to functional annotation data. This project hypothesized that using diverse metagenome features, including sequence domains, functions, and taxa, could reveal traits essential for survival in different ecosystems. Using the largest standardized metagenome sample collection across varied ecosystems, the group trained machine learning models to predict the source ecosystem for metagenome samples. Group members identified optimal metagenome feature types and model parameters, resulting in models that performed well in cross-validation, and training at different ecosystem classification levels improved performance for ecosystems with sparse training data. Model interpretation methods identified signature metagenome features for distinguishing 41 ecosystems, leading to insights about traits that are characteristic of specific ecosystems. This collection of traits, which may have adaptive significance, reveals examples of direct linkages between microbial functions and environmental properties, highlights important unknown functions, and implies ecosystem relationships that align well with established classifications but with ecosystems being more interlinked than is currently appreciated.
References
Park, H., et al. 2024. “A Bacterial Sensor Taxonomy Across Earth Ecosystems for Machine Learning Applications,” mSystems 9(1), e0002623. DOI:10.1128/msystems.00026-23.
Funding Information
This work is supported as part of the Genomic Science program of BER. The DOE Systems Biology Knowledgebase (KBase) is funded by the DOE, Office of Science, BER program under DE-AC02-05CH11231, DE-AC02-06CH11357, DE-AC05-00OR22725, and DE-AC02-98CH10886.