All organisms face three information challenges, and all life on earth, from invisible microbes to the largest plants and most exotic animals, uses the same fundamental biochemical strategies to meet these challenges. First, the organism must encode and store, within each cell, all the instructions needed to build, operate, maintain, and reproduce itself and to respond to varied environmental conditions. DNA, the biochemical solution to this coding and storage problem, is made up of four chemical building blocks (nucleotide bases): adenine (A), thymine (T), cytosine (C), and guanine (G). These building blocks are organized in long chains like chemically linked beads, whose precise order spells out the organism's full set of genetic instructions—its genome. With the advent of whole-genome sequencing, the assembly and study of the entire instruction set have become possible.
But the information stored in DNA is "lifeless" by itself, just as a recipe in a book is not a delectable dessert, nor a musical score a majestic symphonic performance. In the same way, the DNA sequence must be "expressed" to give life to a cell or organism. Furthermore, the sequence alone does not automatically provide understanding of how each segment contributes to the whole cell or organism. The overarching aim for Genomic Science program is to understand how the information in DNA spells out a living cell or organism.
The second information challenge is to read out the genome's instructions in the proper order, time, and amount for each gene product. The biochemical answer begins with the selective readout (transcription) of each functional segment of DNA sequence (gene) in the form of RNA, which is a close chemical relative of DNA. The set of RNA transcripts generated for a cell is called its transcriptome. RNA, in turn, is the direct molecular instruction for a specific protein's synthesis, accomplished by the cell in a process known as translation. Selective gene readout in the chemical form of RNA, therefore, can govern the identity and quantity of proteins, which are the cell's workhorse molecules and the ultimate physical embodiment of information encoded in DNA. The constellation of proteins in a cell is called its proteome.
Cells are the fundamental working units of living systems. The range of life's complexity varies from invisible bacteria that carry out all functions as single-celled organisms to complex plants and animals containing millions or trillions of cells, many with highly specialized functions. With the availability of gene microarrays containing segments of many different genes coupled with the knowledge of entire genome sequences, scientists now can rapidly monitor the identities and amounts of RNA made from each of thousands of genes in cells and organisms living under hundreds or thousands of varied conditions. This capability may provide insight into how gene readout is regulated. Mapping and modeling the cellular circuitry governing this process is a major goal for the Genomic Science program.
Like DNA and RNA, proteins are synthesized like "beads on a string" but with 20 different kinds of beads (amino acids) rather than the 4 of RNA or DNA. Chemical properties that distinguish different amino acids ultimately cause the protein chains to fold up into specific three-dimensional structures. It is the proteins that meet the third and greatest information challenge, which is to act out the instructions encoded in DNA. Although DNA and RNA are information rich, they are chemically simple and homogeneous. Proteins, by contrast, are chemically complex and diverse, properties that enable them to do so many different jobs. Proteins are "where the action is" in living systems. They are motors, pumps, chemical catalysts, detectors, signals and signalers, conveyers, structural units, gateway keepers, dismantlers, assemblers, and garbage handlers. They regulate cell replication, survival, and even death. Recent progress in whole-genome DNA sequencing and in areas of protein-structure determination have brought investigators to the point of knowing the composition of most proteins from model organisms. The challenge is to know how proteins give cells their capabilities, structure, and higher-order properties.
Proteins rarely solo. More often, they work by assembling into larger multiprotein complexes, some of which have the characteristics of rather complicated protein "machines." These machines, in turn, execute such major functions as protein synthesis and degradation, cell-to-cell signaling, and a host of other operations. The properties of each kind of protein, which cause it to assemble with others into machines and to execute very specific and critical reactions in the cell, are the direct consequence of the protein's amino acid sequence that dictates its final folded structure. That is, a protein's chemistry and behavior are specified by the gene sequence and by the number and identities of other proteins made in the same cell at the same time and with which they associate and react. A major focus for the Genomic Science program—and its first goal—is to learn the repertoire of protein complexes and machines needed to make different kinds of microbes and cell types function. These machines shift and change in composition, making their dynamics a further focus.
Cells do not solo very often, either. Although microbes are single-cell organisms, they typically live in communities composed of more than one kind of microbe—often many different kinds. The Genomic Science program seeks to understand the properties of these cell communities by first learning about the "community" genome and relating it to the community's capabilities to perform processes vital to DOE mission goals. Considering that life is found in virtually every environmental niche from arctic tundra to parched deserts to boiling sea vents on the deepest ocean floor, the global genetic "catalog" encoding all of life's amazingly diverse capabilities must be astonishing, yet very few details are known. The recently discovered Prochlorococcus bacteria, for example, are now thought to be among earth's major photosynthetic organisms, using carbon to produce life-sustaining oxygen. Scientists believe that harnessing the capabilities of these and other bacteria may offer revolutionary ways to solve environmental challenges related to DOE's missions in, for example, global climate stabilization through carbon reduction, toxic-waste cleanup, and new and efficient energy sources.