跳到主要导航 跳到搜索 跳到主要内容

TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads

  • Mengyang Xu
  • , Lidong Guo
  • , Shengqiang Gu
  • , Ou Wang
  • , Rui Zhang
  • , Brock A. Peters
  • , Guangyi Fan
  • , Xin Liu
  • , Xun Xu
  • , Li Deng
  • , Yongwei Zhang
  • BGI-Shenzhen
  • University of Chinese Academy of Sciences
  • Complete Genomics Inc.
  • China National GeneBank

科研成果: 期刊稿件文章同行评审

365 引用 (Scopus)

摘要

Background: Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited. Findings: We developed a software tool to close sequence gaps in genome assemblies, TGS-GapCloser, that uses low-depth (∼10×) long single-molecule reads. The algorithm extracts reads that bridge gap regions between 2 contigs within a scaffold, error corrects only the candidate reads, and assigns the best sequence data to each gap. As a demonstration, we used TGS-GapCloser to improve the scaftig NG50 value of 3 human genome assemblies by 24-fold on average with only ∼10× coverage of Oxford Nanopore or Pacific Biosciences reads, covering with sequence data up to 94.8% gaps with 97.7% positive predictive value. These improved assemblies achieve 99.998% (Q46) single-base accuracy with final inserted sequences having 99.97% (Q35) accuracy, despite the high raw error rate of single-molecule reads, enabling high-quality downstream analyses, including up to a 31-fold increase in the scaftig NGA50 and up to 13.1% more complete BUSCO genes. Additionally, we show that even in ultra-large genome assemblies, such as the ginkgo (∼12 Gb), TGS-GapCloser can cover 71.6% of gaps with sequence data. Conclusions: TGS-GapCloser can close gaps in large genome assemblies using raw long reads quickly and cost-effectively. The final assemblies generated by TGS-GapCloser have improved contiguity and completeness while maintaining high accuracy. The software is available at https://github.com/BGI-Qingdao/TGS-GapCloser.

源语言英语
期刊GigaScience
9
9
DOI
出版状态已出版 - 1 9月 2020
已对外发布

指纹图谱

探究 'TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads' 的科研主题。它们共同构成独一无二的指纹。

引用此