Title: Environmental metagenomics for plant pathogen surveillance: the promise and the practice
Presenter: Prof. Wen Chen
University: Biodiversity, Bioinformatics and Metagenomics, Agriculture & Agri-Food Canada
Time: 9:00-10:00, September 30, 2015
Venue: Room A201, Institute of Microbiology, Chinese Academy of Sciences
Abstract: Accurate identification of species is paramount because most trade/quarantine regulations are based on species names. Metagenomics using next generation sequencing (NGS) allows broad-range detection of pathogens at low concentrations, but interpretation of the data can be misleading because of insufficient sampling effort, low discriminatory power of DNA barcodes, imperfect computational tools in data processing, large gap in taxonomic coverage of error-prone public reference sequence databases, and the confusing dual nomenclature of fungi. For example, the percent error rate of the widely used 454 pyrosequencing technology is higher than the ITS sequence differences between Tilletia indica (karnal bunt) and the innocuous Tilletia walkeri, causing karnal bunt false positives in grain samples using automated bioinformatics identification pipelines. Such not-so-rare false classification of quarantine species generated by available data processing tools could have significant trade consequence if taken at face value. We will first discuss the major reasons for the difficulties in accurate identification of microbes using amplicon-based NGS approach, using close to 100 million DNA metabarcodes generated from commodities and agriculture environment as the baseline data. We propose a proof-of-concept of using short signature oligonucleotide (SO, 18-35 mer) with specificity and fidelity to a targeted taxon group for facilitating species/sub-species level classification of NGS data. Two bioinformatics tools, the Automated Oligo Design Pipeline (AODP) and Oligo Fishing Pipeline (OFP), have been developed and made available to the public. Furthermore, we will demonstrate that accurate estimation of species richness and the structure of fungal communities depend on the quality and taxonomic breath and quality of reference sequence databases. The existence of synonymous genus and/or species names in the sequence reference databases used for NGS data classification can result in taxonomic inflation in biodiversity surveys, skew the representation of fungal populations, and lead to inaccurate assessments of pathogen prevalence / emergence. It is important that sequence reference databases only use one name per fungus with the suppressed and synonymous names being cross referenced to the protected or accepted names. By refining existing or developing new bioinformatics tools and curating reference sequences databases, we will increase our capacity to rapid and robustly monitor and accurately identify regulated and economically important pathogens. We want to point out that the pathogen dispersal pathways can be monitored using NGS, ONLY if we can maintain reference biological collections and expertise in taxonomy.