AI Frameworks for Biological Science (DRAFT)

The Genomic Science program supports AI and machine learning capabilities to explore new concepts for genomics-based research that will underpin innovation in biotechnology

DOE Genesis Mission

DOE has launched the Genesis Mission to dramatically accelerate scientific discovery and engineering through the strategic integration of artificial intelligence. Genesis is building an integrated platform connecting DOE supercomputers, experimental facilities, AI systems, and unique datasets across the sciences—including BER-relevant biotechnology and critical minerals and materials research. Two key Genesis components are (1) the American Science Cloud (AmSC) and the Transformational AI Models Consortium (ModCon) (link not yet available). AmSC is developing a secure, federated, science-optimized environment that integrates DOE’s computing and experimental facilities, data resources, and high-performance networks. ModCon will build and deploy self-improving AI models that advance science, engineering, and energy missions by harnessing DOE’s unique data, facilities, and expertise.

Screenshot 2025 03 13 At 9.01.41 am

AI and Systems Biology

Biology has increasingly become a quantitative science, driven by massive biotechnology advances and the proliferation of related omics data, including genome sequencing, annotation, and imaging information that crosses scales, systems, and environments. An ongoing challenge is to make these stovepiped, nonstandardized data types, libraries, metadata systems, and search systems integrated, accessible, and usable by the broader research community.

The Genesis platform, including data and AI initiatives supported by the Genomic Science program, offers a unique and timely opportunity to address these challenges and enable researchers to develop next-generation tools and models that can transform fundamental biological understanding. These capabilities could uncover insights into how genomic information in plants and microorganisms is translated into molecular function, how function shapes physiology, and how physiology scales to influence macroscale processes.

These insights—derived by combining the power of AI with mature genome sequencing, genome editing, and lab automation capabilities—are expected to unleash and empower a new U.S. bioeconomy. Opportunities include (1) advancing predictive understanding and manipulation of biological systems, (2) enabling researchers to organize and simulate biological processes across vast scales, and (3) facilitating the discovery and design of new behaviors, mechanisms, and biological processes relevant to DOE missions.

BRIDGE: BER Data Lakehouse

With the goal of developing a unified data ecosystem, BER in FY24 launched the primary component of its AI infrastructure: the BRIDGE (Biological and EnviRonmental Infrastructure for Data ManaGement and Exploration) Data Lakehouse.

Data lakehouses merge raw, varied data lake content from multiple sources with the reliability, structure, and governance of a relational data warehouse. With foundational partners from DOE’s Joint Genome Institute (JGI), Systems Biology Knowledgebase (KBase), and National Microbiome Data Collaborative (NMDC), BRIDGE has created a project-driven collaboration between infrastructure developers and researchers. These collaborations are advancing projects with high-impact science goals and empowering early adopters whose research requires a unified data infrastructure. When fully implemented, BRIDGE’S extensible Data Lakehouse will support seamless access and integration of diverse BER datasets with AI frameworks.

BRIDGE is fundamental to addressing DOE’s Biotechnology AI Lighthouse challenge area, designated by the Genesis Mission as a major AI target to advance science and technology. As part of this effort, DOE has funded several pilot projects that will contribute to the BRIDGE Data Lakehouse.