High-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (2024)

Show simple item record

dc.contributor.authorShi, Tian-LeHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (1)
dc.contributor.authorJia, Kai-HuaHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (2)
dc.contributor.authorBao, Yu-TaoHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (3)
dc.contributor.authorNie, ShuaiHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (4)
dc.contributor.authorTian, Xue-ChanHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (5)
dc.contributor.authorYan, Xue-MeiHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (6)
dc.contributor.authorChen, Zhao-YangHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (7)
dc.contributor.authorLi, Zhi-ChaoHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (8)
dc.contributor.authorZhao, Shi-WeiHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (9)
dc.contributor.authorMa, Hai-YaoHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (10)
dc.contributor.authorZhao, YeHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (11)
dc.contributor.authorLi, XiangHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (12)
dc.contributor.authorZhang, Ren-GangHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (13)
dc.contributor.authorGuo, JingHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (14)
dc.contributor.authorZhao, WeiHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (15)
dc.contributor.authorEl-Kassaby, Yousry AlyHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (16)
dc.contributor.authorMueller, NielsHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (17)
dc.contributor.authorVan de Peer, YvesHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (18)
dc.contributor.authorWang, Xiao-RuHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (19)
dc.contributor.authorStreet, Nathaniel RobertHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (20)
dc.contributor.authorPorth, IlgaHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (21)
dc.contributor.authorAn, XinminHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (22)
dc.contributor.authorMao, Jian-FengHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (23)
dc.date.accessioned2024-08-13T05:52:46Z
dc.date.available2024-08-13T05:52:46Z
dc.date.issued2024-05
dc.descriptionDATA AVAILABILITY :The whole genome sequencing raw data, genome assemblies, and annotations have been deposited in the Genome Sequence Archive in National Genomics Data Center (https://ngdc.cncb.ac.cn/gwh) under the accession number GWHBJXC00000000 (Bio-Project ID: PRJCA010836). The genome assembly and annotation data for subgenomes A and G have also been deposited in the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/) under Biological Project accession numbers PRJNA1025943 and PRJNA1025942, respectively. Scripts used for centromere identification are publicly available at: https://github.com/ShuaiNIEgithub/Centromics. Codes and data for allele identification, allele-specific gene expression, and XGBoost model construction are available on Git-hub (https://github.com/shitianle77/84K_genome) and figshare (https://figshare.com/articles/dataset/Gap-free_genome_assembly_of_hybrid_poplar_84K_/24279211). A computational pipeline for allele identification and allele-specific gene expression with haplotype-resolved diploid genome assembly is available at: https://github.com/shitianle77/Allele_auto.en_US
dc.descriptionSUPPLEMENTARY DATA : SUPPLEMENTARY FIGURE S1. Images of the sequenced individual (the F1 hybrid poplar “84K”). SUPPLEMENTARY FIGURE S2. The schematic diagram illustrates the overall process of the assembly of the poplar “84K” genome and the data required for the assembly process. SUPPLEMENTARY FIGURE S3. Putative centromeres (green boxes) are determined based on the distribution of the tandem repeat with the highest frequency. SUPPLEMENTARY FIGURE S4. Telomere sequences assembled in each chromosome. SUPPLEMENTARY FIGURE S5. Positions of the two gaps located on chromosome 9A (chr09A). SUPPLEMENTARY FIGURE S6. Genome-wide analysis of chromatin interactions in the genome based on Hi-C data. SUPPLEMENTARY FIGURE S7.K-mer frequency distribution estimated from (A) Illumina, (B) HiFi, and (C) ONT sequences after filtering and correction at K-mer size of 17. SUPPLEMENTARY FIGURE S8. Collinearity of 2 haplotype genomes of the poplar clone “84K” with that of P. trichocarpa. SUPPLEMENTARY FIGURE S9. Collinearity of 2 haplotype genomes of the current (this study) with published genomes of “84K” (Qiu et al. 2019). SUPPLEMENTARY FIGURE S10. Distribution of rDNA on chromosomes. SUPPLEMENTARY FIGURE S11. Distribution of rDNA on chromosomes of Salicaceae species. SUPPLEMENTARY FIGURE S12. Gene family evolution and collinearity analyses among Salicaceae species. SUPPLEMENTARY FIGURE S13. Length of structural variation and local sequence differences between the subgenomes A and G (subgenome G for the assembly of P. tremula var. glandulosa and subgenome A for the assembly of P. alba). SUPPLEMENTARY FIGURE S14. Statistics on overlaps between the inversion regions and different TE types (left panel) and between breakpoint region of inversion and different TE types (right panel) in the 2 subgenomes (G for the assembly of P. tremula var. glandulosa and A for the assembly of P. alba). SUPPLEMENTARY FIGURE S15. DNA methylation patterns. SUPPLEMENTARY FIGURE S16. Collinearity of a pair of alleles on 2 parental genomes. SUPPLEMENTARY FIGURE S17. Absolute TPM expression abundance for Diff00, Diff0, Diff2, and Diff8. SUPPLEMENTARY FIGURE S18. GO enrichment analysis of 5 categories of allelic expression bias. SUPPLEMENTARY FIGURE S19. Importance ranking and ROC curves of Model 0 (with 46 predictors/features). SUPPLEMENTARY FIGURE S20. Pair-wise correlation among 46 predictors (features) used in modeling (Model 0). SUPPLEMENTARY FIGURE S21. Ranking of the 15 features in the XGBoost model (Model 2) and the model assessment. SUPPLEMENTARY FIGURE S22. Ranking of the 15 features in the XGBoost model (Model 3) and the model assessment. SUPPLEMENTARY TABLE S1. Statistics of whole genome sequencing data. SUPPLEMENTARY TABLE S2. Summary of the Illumina reads for the genome assembly of “84K”. SUPPLEMENTARY TABLE S3. Statistics of the different versions of genome assembly. SUPPLEMENTARY TABLE S4. Statistics of the genome quality for the final assembly. SUPPLEMENTARY TABLE S5. Mapping rates of Illumina reads, HiFi reads, and ONT reads to the present genome assembly of “84K”. SUPPLEMENTARY TABLE S6. Summary of BUSCO evaluation for genome assembly and gene prediction. SUPPLEMENTARY TABLE S7. Summary statistics of the gene annotation of the “84K” genome. SUPPLEMENTARY TABLE S8. Summary of functional annotation of predicted genes. SUPPLEMENTARY TABLE S9. Summary of the annotated RNA genes. SUPPLEMENTARY TABLE S10. Summary of the repeat elements annotated in the “84K.” SUPPLEMENTARY TABLE S11. Annotated TF gene families in the “84K” genome. SUPPLEMENTARY TABLE S12. Summary of gene family expansion and contraction in the “84K” genome. SUPPLEMENTARY TABLE S13. Summary of identified SVs between 2 parental genomes. SUPPLEMENTARY TABLE S14. Summary of the percentage of methylation sites of CG, CHG, and CHH in DNA methylation. SUPPLEMENTARY TABLE S15. Categories and number of allelic expression biases between 2 parental genomes. SUPPLEMENTARY TABLE S16. 46 features used in the XGBoost machine-learning modeling of ASE. SUPPLEMENTARY TABLE S17. Ranking of the 46 features in the XGBoost model (Model 0). SUPPLEMENTARY TABLE S18. Ranking of the 15 features in the XGBoost models (Model 1, Model 2, and Model 3). SUPPLEMENTARY TABLE S19. Evaluation of the classification XGBoost models (Model 0, Model 1, and Model 2). SUPPLEMENTARY TABLE S20. Evaluation of the regression XGBoost model (Model 3). SUPPLEMENTARY TABLE S21. Statistics of transcriptome assembly by different methods. SUPPLEMENTARY NOTE S1. 46 features used in the XGBoost machine-learning modeling of ASE.SUPPLEMENTARY NOTE S2. Library construction and sequencing.SUPPLEMENTARY NOTE S3. Genome assembly and quality assessment.SUPPLEMENTARY NOTE S4. Gene prediction and functional annotation.SUPPLEMENTARY NOTE S5. Phylogenetics and gene collinearity in the Salicaceae.SUPPLEMENTARY NOTE S6. Variation between the 2 parental genomes.SUPPLEMENTARY NOTE S7. RNA-seq data and allelic gene expression.SUPPLEMENTARY NOTE S8. DNA methylation quantification from ONT long reads.SUPPLEMENTARY NOTE S9. Feature extraction for machine-learning modeling.SUPPLEMENTARY NOTE S10. Model construction.en_US
dc.descriptionSUPPLEMENTARY DATA SET 1. RNA-Seq data used for gene expression analysis. SUPPLEMENTARY DATA SET 2. Statistics of mRNA sequencing data for gene annotation. SUPPLEMENTARY DATA SET 3. Summary of the amount of rDNA on different chromosomes among the Salicaceae species and 2 subgenomes (Subgenomes A and G). SUPPLEMENTARY DATA SET 4. GO enrichment of the significantly expanded gene families in 1 parental genome (P. alba genome, the subgenome A). SUPPLEMENTARY DATA SET 5. GO enrichment of the significantly expanded gene families in one parental genome (the P. tremula var. glandulosa genome, the subgenome G).en_US
dc.description.abstractPoplar (Populus) is a well-established model system for tree genomics and molecular breeding, and hybrid poplar is widely used in forest plantations. However, distinguishing its diploid homologous chromosomes is difficult, complicating advanced functional studies on specific alleles. In this study, we applied a trio-binning design and PacBio high-fidelity long-read sequencing to obtain haplotype-phased telomere-to-telomere genome assemblies for the 2 parents of the well-studied F1 hybrid “84K” (Populus alba × Populus tremula var. glandulosa). Almost all chromosomes, including the telomeres and centromeres, were completely assembled for each haplotype subgenome apart from 2 small gaps on one chromosome. By incorporating information from these haplotype assemblies and extensive RNA-seq data, we analyzed gene expression patterns between the 2 subgenomes and alleles. Transcription bias at the subgenome level was not uncovered, but extensive-expression differences were detected between alleles. We developed machine-learning (ML) models to predict allele-specific expression (ASE) with high accuracy and identified underlying genome features most highly influencing ASE. One of our models with 15 predictor variables achieved 77% accuracy on the training set and 74% accuracy on the testing set. ML models identified gene body CHG methylation, sequence divergence, and transposon occupancy both upstream and downstream of alleles as important factors for ASE. Our haplotype-phased genome assemblies and ML strategy highlight an avenue for functional studies in Populus and provide additional tools for studying ASE and heterosis in hybrids.en_US
dc.description.departmentBiochemistryen_US
dc.description.departmentGeneticsen_US
dc.description.departmentMicrobiology and Plant Pathologyen_US
dc.description.librarianhj2024en_US
dc.description.sdgSDG-15:Life on landen_US
dc.description.sponsorshipThe National Key R&D Program of China and National Natural Science Foundation of China.en_US
dc.description.urihttps://academic.oup.com/plphysen_US
dc.identifier.citationTian-Le Shi, Kai-Hua Jia, Yu-Tao Bao, Shuai Nie, Xue-Chan Tian, Xue-Mei Yan, Zhao-Yang Chen, Zhi-Chao Li, Shi-Wei Zhao, Hai-Yao Ma, Ye Zhao, Xiang Li, Ren-Gang Zhang, Jing Guo, Wei Zhao, Yousry Aly El-Kassaby, Niels Müller, Yves Van de Peer, Xiao-Ru Wang, Nathaniel Robert Street, Ilga Porth, Xinmin An, Jian-Feng Mao, High-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar, Plant Physiology, Volume 195, Issue 1, May 2024, Pages 652–670, https://doi.org/10.1093/plphys/kiae078.en_US
dc.identifier.issn0032-0889 (print)
dc.identifier.issn1532-2548 (online)
dc.identifier.other10.1093/plphys/kiae078
dc.identifier.urihttp://hdl.handle.net/2263/97579
dc.language.isoenen_US
dc.publisherOxford University Pressen_US
dc.rights© The Author(s) 2024. Published by Oxford University Press on behalf of American Society of Plant Biologists.This is an Open Access article distributed under the terms of the Creative Commons Attribution License.en_US
dc.subjectPoplar (Populus)en_US
dc.subjectTree genomicsen_US
dc.subjectMolecular breedingen_US
dc.subjectHybrid poplaren_US
dc.subjectForest plantationsen_US
dc.subjectTrio-binning designen_US
dc.subjectPacBio high-fidelity long-read sequencingen_US
dc.subjectAllele-specific gene expressionen_US
dc.subjectGenome assemblyen_US
dc.subjectPredictionen_US
dc.subjectSDG-15: Life on landen_US
dc.titleHigh-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplaren_US
dc.typeArticleen_US



Files in this item

Name:Shi_HighQualitySu ...

Size:4.624Mb

Format:PDF

Description:Supplementary Data

View/Open

Name:Shi_HighQualitySu ...

Size:61.16Kb

Format:Microsoft Excel 2007

Description:Supplementary Data ...

View/Open

This item appears in the following Collection(s)

  • Research Articles (Biochemistry, Genetics and Microbiology (BGM))397
  • Research Articles (University of Pretoria)35875

Show simple item record

High-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Sen. Emmett Berge

Last Updated:

Views: 5545

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Sen. Emmett Berge

Birthday: 1993-06-17

Address: 787 Elvis Divide, Port Brice, OH 24507-6802

Phone: +9779049645255

Job: Senior Healthcare Specialist

Hobby: Cycling, Model building, Kitesurfing, Origami, Lapidary, Dance, Basketball

Introduction: My name is Sen. Emmett Berge, I am a funny, vast, charming, courageous, enthusiastic, jolly, famous person who loves writing and wants to share my knowledge and understanding with you.