Development of Computational Tools for Integrated, Exascale Analysis of Chromatin Configurations and Epigenomics Datasets for Profiling Host-Pathogen Interactions
Authors:
Cullen Roth1* ([email protected]), Daniel Jacobsen2, Karissa Sanbonmatsu1, Ankush Singhal1, Anna Lappala3,4, David Rogers1, Shounak Banerjee1, Christina R. Steadman1 (PI), Shawn R. Starkenburg1(PI)
Institutions:
1Los Alamos National Laboratory; 2Oak Ridge National Laboratory; 3Massachusetts General Hospital; 4Harvard University
References
Epigenetic mechanisms and associated chromatin structure regulate the functionality of the genome and play profound roles in host-pathogen interactions. However, the field currently lacks systems-level algorithms, methods, and visualization tools to quickly compare and identify pathogen-induced changes to host epigenomes and chromatin structure to understand susceptibility and resiliency. In this project, group members are building a singular platform that includes tools, pipelines, and software for analysis and visualization of chromatin and epigenomic datasets. Here, the team presents two novel components of this platform: (1) SLUR(M)-py; a Python-based pipeline that leverages the Simple Linux Utility for Resource Management tool (SLURM) to processes paired-end sequencing data from several different sequencing strategies, including whole genome, assay for transposase-accessible chromatin with sequencing, chromatin immunoprecipitation assays with sequencing, and Hi-C sequencing, and generates outputs for further analysis and publication; and (2) 4D Genome Browse (4DGB), a visualization and analysis tool that transforms chromatin conformation data into 3D genome models. The 4DGB workflow takes user Hi-C files, produces 3D chromosome reconstructions, and paints the chromosome structures with epigenetic information.
The first implementation of the workflow encapsulates data preparation, a molecular dynamics simulation, and a web server for the 4DGB in a single executable. Furthermore, to analyze complex omics datasets and determine the nonlinear relationships among them, the team is conducting additional research to utilize explainable artificial intelligence from the SLUR(M)-py output to enable interactive, comparative exploration and predictive dynamic modeling across time. Collectively, the team envisions that these novel analysis tools will propel deeper understanding of host-pathogen interactions in mammalian and plant systems.