Collaboratively Assembling a Toolkit in KBase to Leverage Probabilistic Annotation and Multiomics Data to Improve Mechanistic Modeling of Metabolic Phenotypes
Authors:
José P. Faria1 ([email protected]), Filipe Liu1, Patrik D’haeseleer2, Jeremy Jacobson3, Bill Nelson3, Jason McDermott3, Aimee K. Kessell4, Hugh C. McCullough4, Hyun-Seob Song4, Janaka N. Edirisinghe1, Nidhi Gupta1, Samuel Seaver1, Andrew P. Freiburger1, Qizhi Zhang1, Pamela Weisenhorn1, Neal Conrad1, Raphy Zarecki5, Matthew DeJongh6, Aaron A. Best6, KBase Team1,7,8,9, Robert W. Cottingham7, Rhona Stuart2, Kirsten Hofmockel3, Christopher S. Henry1, Adam P. Arkin8 (PI)
Institutions:
1Argonne National Laboratory; 2Lawrence Livermore National Laboratory; 3Pacific Northwest National Laboratory; 4University of Nebraska–Lincoln; 5Newe Ya’ar Research Center, Agricultural Research Organization; 6Hope College; 7Oak Ridge National Laboratory; 8Lawrence Berkeley National Laboratory; 9Brookhaven National Laboratory
URLs:
Goals
The DOE Systems Biology Knowledgebase (KBase) is a knowledge creation and discovery environment designed for biologists and bioinformaticians. KBase integrates many data and analysis tools from DOE and other public services into an easy-to-use platform that leverages scalable computing infrastructure to perform sophisticated systems biology analyses. KBase is a publicly available and developer- extensible platform that enables scientists to analyze their data within the context of public data and share their findings across the system. This presentation describes a new modeling pipeline developed by a collaborative project between KBase and the μBiosphere Science Focus Area (SFA) and Pacific Northwest National Laboratory (PNNL) Soil SFA.
Abstract
Mechanistic understanding of a biological system begins with accurate functional annotation of the system’s proteins. Unfortunately, in most cases, protein annotations are uncertain and error-prone, while most analytical pipelines treat annotations as either present or absent. Genome-scale metabolic models (GEMs) permit the evaluation of metabolic annotations within the broader context of the living machines they characterize and, thus, are ideal tools for considering and resolving uncertainty to arrive at the optimal set of annotations that best describe all experimental observations about an organism.
Here, researchers describe an ecosystem of metabolic modeling tools collaboratively developed within KBase to accomplish this goal. The system begins by annotating protein sequences using Rapid Annotation using Subsystems Technology (RAST), Protein Data Bank (PDB), Distilled and Refined Annotation of Metabolism (DRAM), Prokka, and GLM4EC. The system also supports the upload of annotations produced outside of KBase. These tools provide a pool of probabilistic protein annotations that this modeling framework will draw upon to mechanistically explain organism phenotypes.
Next, the newly developed ModelSEED2 tool is used to build a draft GEM. This tool offers significant enhancements over the previous reconstruction apps in KBase, including (1) dramatically improved representation of energy metabolism; (2) improved archaea and cyanobacteria reconstruction; and (3) curation of all metabolic pathways with mappings to RAST subsystems annotations (Faria et al. 2023). The ModelSEED2 generates larger models with more reactions and genes and fewer gaps. Applying ModelSEED2 to thousands of diverse species, group members see conserved patterns in the adenosine triphosphate biosynthesis mechanism across phylogeny and identify clades where understanding of energy biosynthesis is still poor. Gaps in the draft GEMs also offer a metric to evaluate annotation quality at the systems level.
New ensemble modeling tools then sample from the probabilistic pool of hypothesized protein annotations to produce an ensemble of potential draft GEMs. GEM ensembles are evaluated based on: (1) adenosine triphosphate biosynthesis mechanisms, (2) gap filling needed to replicate observed phenotypes, and (3) agreement of simulated flux with multiomics data. A subset of best-performing models can then be extracted and retained for further analysis. Gap filling is essential to this ecosystem as it selects the most probable annotations that best fit experimental observations (e.g., observed growth phenotypes or multiomics data). The new OMEGGA gap-filling algorithm globally fits a GEM to available phenotype data using reactions associated with the highest probability annotations and genes with expression in multiomics datasets.
With KBase building the ModelSEED2, μBiosphere SFA building the probabilistic annotation system, and PNNL Soil SFA building OMEGGA, this has been a collaborative endeavor. This project demonstrates the efficacy of the tools by applying them to study isolates and omics-datasets from the μBiosphere and PNNL’s Soil Microbiome SFA (McClure et al. 2022). The group annotates each isolate, constructs and optimizes GEMs, and fits the GEMs to phenotype and expression data generated for the isolates. As a result, researchers greatly reduce gaps in GEM pathways and improve isolate annotations.
References
Faria, J. P., et al. 2023. Preprint. “ModelSEED v2: HighThroughput Genome-Scale Metabolic Model Reconstruction with Enhanced Energy Biosynthesis Pathway Prediction,” bioRxiv. DOI:10.1101/2023.10.04.556561.
Henry, C. S., et al. 2010. “High-Throughput Generation, Optimization and Analysis of Genome-Scale Metabolic Models,” Nature Biotechnology 28(9), 977–82. DOI:10.1038/ nbt.1672.
McClure, R., et al. 2022. “Interaction Networks are Driven by Community Responsive Phenotypes in a Chitin Degrading Consortium of Soil Microbes,” mSystems 7(5), e00372–22. DOI:10.1128/msystems.00372-22.
Funding Information
This work is supported as part of the Genomic Science program of BER. The DOE Systems Biology Knowledgebase (KBase) and the Science Focus Area-KBase supplements are funded by the DOE, Office of Science, BER program under award numbers DE-AC02-05CH11231, DE-AC02-06CH11357, DE-AC05-00OR22725, DE-AC02-98CH10886, DEAC0576RL01830, and DE-AC52-07NA27344.