KBase Research Assistant and Genome Annotation Agent
Authors:
Prachi Gupta1* ([email protected]), William J. Riehl1, Meghan Drake1,2, Sean P. Jungbluth1, Christopher Neely1, Ziming Yang3, Marcin P. Joachimiak1, Mikaela Cashman1, R. Shane Canon1, Paramvir S. Dehal1, Adam P. Arkin1 (PI)
Institutions:
1Lawrence Berkeley National Laboratory; 2Oak Ridge National Laboratory; 3Brookhaven National Laboratory
Goals
The DOE Systems Biology Knowledgebase (KBase) is a knowledge creation and discovery environment designed for both biologists and bioinformaticians. It is a powerful platform for biological research and analysis, providing a breadth of reference data, analysis applications, and resources to the scientific community. Usage of these apps and narrative descriptions of their results can be easily presented in the KBase Narrative Interface. This poster presents a novel set of artificial intelligence–driven tools to assist researchers in using KBase to analyze their data and publish their findings.
Abstract
Rapid advances are being made in artificial intelligence using large language models (LLMs) as a natural language processing tool for various applications. This project takes advantage of the growth of this technology to produce a KBase Research Assistant that will serve as a guide, facilitating navigation and analysis within the KBase platform. The team envisions the Assistant as a tool that can converse with a KBase user, understand their data and its relationship with public data on the system, and leverage this to reach the user’s analysis goals. This Assistant will also aid in communicating results via static narratives and academic publications.
The initial target for the KBase Assistant will help a user start with a set of sequenced reads from a microbial isolate to produce an annotated genome that can be used in further analyses by the community. The general workflow here uses KBase apps to assemble and annotate the reads, accompanied by quality assurance and control at each step. The Assistant will help with interpretation of the output of each app and craft each subsequent step in the workflow, with variation where needed.
To build this first genome builder assistant, several LLM-related tools are being created to interact with each other and the user’s data in a user-driven workflow. After uploading their reads to a KBase Narrative, a user will be able to activate the Assistant, which will orchestrate different groups of artificial intelligence agents. The first agent will be a modular reasoning, knowledge, and language agent that will make use of retrieval augmented generation (RAG) tools to perform app recommendation. These RAG tools provide knowledge from KBase documentation and tutorials to ensure proper use of KBase apps. The second agents will manage running the apps and provide results to the assistant for analysis and interpretation, followed by suggestions of the next steps. A third set of agents will ensure that the narrative gets populated properly with the apps and summaries of results. Once analysis is complete and findings are gathered, an agent will assist with developing a publication.
The KBase Research Assistant will be a powerful step forward in enabling KBase users to take advantage of the full breadth of computational tools and public data that KBase provides. Although its initial focus will be on genome annotation, the Assistant will grow to provide insight and utility for other biological analyses.
Funding Information
This work is supported as part of the BER Genomic Science program. The DOE Systems Biology Knowledgebase (KBase) is funded by the DOE, Office of Science, BER program under award numbers DE-AC02-05CH11231, DE-AC02-06CH11357, DE-AC05-00OR22725, and DE-AC02-98CH10886. The Educator Working Groups described is funded by a National Science Foundation RCN-UBE Incubator under award number 2316244.