AI Pilot Project: Oak Ridge National Laboratory
- Principal Investigator: Lianhong Gu (ORNL)
- Scope/Objectives: Develop first holistic model of photosynthesis to support bioenergy production, scaling from genomics to phenomics with biological mechanisms embedded in neural networks.
- Potential Impact and Interface with the American Science Cloud (AmSC) and Transformational AI Models Consortium (ModCon): Develop GPT-gp model as model team, connect to AmSC through Data Lakehouse.
Summary
The Generative Pretrained Transformer for Genomic Photosynthesis (GPTgp) project seeks to develop the first holistic model of photosynthesis to support abundant bioenergy production. Building on the remarkable progress in natural language processing (NLP), GPTgp postulates that photosynthesis is conducted with genes by species in particular environments, just like texts are written with words by authors in particular social settings. GPTgp will adapt and expand core philosophies, approaches, and architectures from NLP to innovate a holistic model that scales from genomics to phenomics with biological mechanisms embedded in neural networks. A lakehouse will be developed to facilitate heterogeneous photosynthetic data for training and validating GPTgp. GPTgp will allow design-specific training and transfer learning across reactions, pathways, biodesigns, and species. It can be fine-tuned for downstream applications such as predicting genetic perturbations, optimizing photosynthetic apparatus for performance, selecting topper-forming genotypes for various conditions, and estimating gross primary production at multiple scales. Plant breeders and scientists will leverage GPTgp as an intelligent assistant to harness complementary natural and engineered genetic variations to accelerate the design of high-performance crop cultivars.