Advancements in genomic research reveal alternative transcription initiation sites in thousands of soybean genes

by

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

proofread

Jianxin Ma (right) and Jingbo Duan, a first author on the recent paper, take samples from a soybean plant in the Lilly Greenhouse. Credit: Purdue Agricultural Communications/Joshua Clark

Rosalind Franklin, James Watson and Francis Crick discovered the structure of DNA—that molecular blueprint for life—over 70 years ago. Today, scientists are still uncovering new ways to read it.

In 2010, Jianxin Ma, a professor of agronomy, and his collaborators built the first reference genome for soybeans on the widely studied Williams 82 variety. Thousands of scientists and plant breeders have since used that genome in their own research on the genetic makeup underlying various characteristics, such as seed protein and oil content, plant architecture and productivity, and disease resistance and abiotic stress tolerance in soybeans.

Through the last decade, Ma, who is the Indiana Soybean Alliance Inc. Endowed Chair in Soybean Improvement, has been recognized internationally for his contribution to the soybean genome as well as for his continued research and innovation in the field. His most recent work, published in The Plant Cell, used advancements in genomic research to fill in gaps of the original soybean reference genome.

"The reference genome was like a dictionary when we announced it," Ma said. "Each gene was like a single word. However, there was a piece of critical information lacking: transcription initiation sites for individual genes."

Transcription initiation sites are locations in the DNA where a specialized transcription-factor protein can attach and then build an mRNA copy of the gene in front of it. That mRNA is read and translated at a cell's ribosome to create more proteins, important for the chemical and physical function of every organism.

Knowing where the mRNA begins formation on the DNA strand is a significant part of understanding how genes are expressed. These initiation sites contain regulatory elements and provide information to the cell about when and where to transcribe each gene to make protein, and how frequently to do so at any point in time.

In genetics, it has generally been accepted that each gene has one transcription initiation site, located downstream of a core promoter region and typically around a TATA box—a DNA sequence rich in thymine and adenine repeats. But Ma and his colleagues no longer think this is the case.

"There is a set of predicted transcription start sites for over 50,000 genes in soy, but based on our new study, less than 3% of those predicted transcription initiation sites actually are correct," Ma said.

In 2020, the development of the Survey of TRanscription Initiation at Promoter Elements Sequencing (STRIPE-seq) technique offered Ma's lab an effective, efficient, faster and more affordable way to identify transcription initiation sites across the entire soybean genome. It also provided information about the relative abundance of every mRNA copy, which gives clues as to how much a gene is expressed in different tissues and times.

Discover the latest in science, tech, and space with over 100,000 subscribers who rely on Phys.org for daily insights. Sign up for our free newsletter and get updates on breakthroughs, innovations, and research that matter—daily or weekly.

Subscribe

Ma and his lab performed STRIPE-seq analyses on eight different tissues in soybean: leaves, stems, stem tips, roots, nodules, flowers, pods and developing seeds. Even though the plant's DNA is consistent across these tissues, the expression of genes differs.

In their recent paper, the Ma lab identified transcription initiation sites for about 40,000 genes in soy. They discovered widespread alternative transcription initiation sites outside of the TATA box region and other sequences thought to be promoters.

Some newly identified sites actually occur in the coding sequence of the gene that becomes an mRNA. Thus, transcription-factor proteins can bind to several different sections of the gene and begin making mRNA, each copy different from ones started at other sites. Each alternative transcription site could potentially create a different protein from the same gene.

One specialized subset of transcription initiation sites the group found was in root nodules, a structure on legumes' roots that harbors interaction between the plant and Rhizobia bacteria. These soil-dwelling microbes fix nitrogen for specialized plants like legumes in return for sugars and protection. This symbiosis increases a plant's survival in nitrogen-deficient soils without the use of nitrogen fertilizers.

"We found these particular transcription initiating sites in nodules, but not in the roots or any other tissues, suggesting they are for tissue-specific transcription and associated with nodule-specific function," said Ma.

In order for DNA to fit within a cell's nucleus, it is wound up around histone proteins to form a structure called "chromatin." Depending on chemical markers placed on these histones, the chromatin can be wound tightly—preventing transcription factors from binding—or loosely, making it accessible for generating mRNA copies.

Ma believes that these "epigenetic" changes are working hand-in-hand with the alternative transcription initiation sites in gene expression. Different transcription initiation sites can become available as a gene is tightened or loosened, and different proteins may be created.

"We have found nearly 7,000 genes that have the alternative transcription initiation within the coding sequences. These alternative transcription initiation sites tend to be tissue-specific and associated with histone modifications," Ma said.

Evolutionarily, these alternative sites may have been beneficial to soybeans and other plants because they allowed for increased complexity and adaptability under a limited genome. Soybeans have experienced two whole-genome duplication events throughout their history, both several millions of years ago. Although some of the duplicated genes have since been lost, Ma thinks the duplication events may have given rise to altered or alternative transcription sites.

"After duplication, the majority of genes are still in pairs; however, they show different expression patterns, and many have functionally diverged to regulate different traits," Ma said. "They start to transcribe from different sites, potentially contributing to their functional divergence."

Currently, Ma is coordinating with USDA Agricultural Research Service scientists Rex Nelson and Jacqueline Campbell on making this research data accessible for others, just as he did with the original reference genome. The group is adding the data to SoyBase, a collaborative online database for soybean research.

Nelson, curator of SoyBase, explained, "Having even a potential transcription start site will aid in the analysis of soybean gene promoter regions. This may shed light on the proteins that interact with promoters and induce transcription."

Campbell, co-curator of the database, added that "the identification of transcription factors that bind promoter regions will allow researchers to identify gene regulatory interaction networks involved in the complex regulation of genes in agronomical important phenotypes."

Ma says, "The database serves as an important resource for both basic and applied research. By making our data available there, we catalyze further research in understanding gene functions, regulatory mechanisms, gene networks and genetic variations associated with specific traits of interest. As we better understand how these alternative transcription sites affect particular traits, the hope is to see this lead to better soybean varieties."

More information: Xutong Wang et al, Noncanonical transcription initiation is primarily tissue specific and epigenetically tuned in paleopolyploid plants, The Plant Cell (2024). DOI: 10.1093/plcell/koae288

Journal information: Plant Cell

Provided by Purdue University