Supporting Continental Scale Research in the National Microbiome Data Collaborative Data Portal
Authors:
Alicia Clum1* ([email protected]), Anastasiya Prymolenna2, Antonio Camargo1, Bea Meluch2, Bin Hu3, Brynn Zalmanek2,
Cameron Giberson2, Camilo Posso2, Chien-Chi Lo3, Chris Mungall1, Donny Winston5, Eric Cavanna1, Francie Rodriguez3, Grant Fujimoto2, James Tessmer2, Jeff Baumes4, Jing Cao5, Julia Kelliher3, Kaitlyn Li3, Katherine Heal2, Kaylee Kudish3, Kevin Fox2, Kjiersten Fagnan1, Leah Johnson3, Lee Ann McCue2, Mark Flynn3, Mark Miller1, Mary Salvi4, Michael Thornton1, Michal Babinski3, Migun Shakya3, Mike Nagler4, Montana Smith2, Patrick Chain3, Patrick Kalita1, Paul Piehowski2, Po-E (Paul) Li3, Samuel Purvine2, Set Sarrafan1, Shalki Shrivastava1, Shane Canon1, Shreyas Cholia1, Sierra Moxon1, Simon Roux1, Sujay Patil1, Wendi Lynch1, Yan Xu3, Yuri Corilo2, Emiley Eloe-Fadrosh1 (PI)
Institutions:
1Lawrence Berkeley National Laboratory; 2Pacific Northwest National Laboratory; 3Los Alamos National Laboratory; 4Kitware Inc; 5Polyneme LLC
Abstract
Continental scale research is important to understand global processes such as climate change and ecosystem dynamics and can be used to identify patterns and trends including spatial variations and temporal trends. The National Microbiome Data Collaborative (NMDC) Data Portal (Eloe-Fadrosh et al. 2022) has sample and processing information as well as standardized workflow results for several continental scale datasets of high value to the research community. To support a continental-scale understanding of terrestrial ecosystems, the NMDC hosts soil data from National Ecological Observatory Network (NEON) sites, Environmental Molecular Sciences Laboratory’s (EMSL) 1,000 Soils Research Campaign, and the Earth Microbiome Project 500 (EMP500). To enable continental- scale research of aquatic ecosystems, the NMDC hosts freshwater and benthic data from NEON sites and freshwater samples used to generate the Genome Resolved Open Watershed (GROW) database. These efforts all leverage standardized and documented sampling protocols, enabling comparisons of datasets across sites.
The Data Portal is focused on making multiomics datasets more findable to enable data reuse. It provides search tools to find information by principal investigator or study name, by sample information like geographic location, collection date, and depth, or by information about how the omics data was generated. Additionally, data can be searched by Kyoto Encyclopedia of Genes and Genomes (KEGG) terms to identify samples by molecular function. Once datasets have been identified, standardized workflow results can be downloaded in bulk via the website.
References
Eloe-Fadrosh, E. A., et al. 2022. “The National Microbiome Data Collaborative Data Portal: An Integrated Multi-Omics Microbiome Data Resource,” Nucleic Acids Research 50(D1), D828–36. DOI:10.1093/nar/gkab990.