Deep-Plant: A Deep Learning Platform for Plant Genomics
Authors:
Asa Ben-Hur* ([email protected], PI)
Institutions:
Colorado State University
Abstract
Gene regulation is governed by a multitude of proteins and RNAs, and especially transcription factors. Transcription factors control gene expression by binding proximal to genes in their promoter regions or at distal enhancers. The binding of transcription factors is modulated by the state of the DNA molecule, namely whether it is accessible or wrapped around histones, and by DNA and histones modifications. In recent years, several databases that provide vast amounts of plant genomics data from various types of assays have been curated from thousands of published studies. These include genome-wide expression, transcription factor binding, histone modifications, and DNA accessibility. Deep learning has demonstrated its value in modeling large and complex genomics compendia in mammals, providing insights into gene regulation in those systems. However, very little such work has been carried out in plants. This initiative proposes to leverage the wealth of data available in plants to create a deep learning framework called DEEP-PLANT to model plant chromatin state and its consequences for gene regulation. More specifically, the DEEP-PLANT model will predict transcription factor binding and chromatin state directly from sequence. These models will provide a detailed picture of gene regulation and will support downstream applications including the prediction of gene expression, enhancers, and the effects of genetic variation. This work will be carried out in Arabidopsis and rice and shed light on conserved aspects of gene regulation across dicots and monocots and provide plant biologists with tools to form hypotheses on the factors that drive gene expression. The poster will provide a summary of preliminary work towards these goals.