Reference-free approach for population genomic analysis using metagenomic and metranscriptomic data

Amin Madoui (Genoscope CEA)
Thursday, July 5, 2018 - 14:00
Room Aurigny
Talk abstract: 

The availability of large datasets of metagenomic sequences offers new opportunities for population genetics research. However, for many non-model species, the lack of reference genomes remains an important issue and constitutes an obstacle for population genomics studies. We took advantages of the discoSnp++ program and proved first its usefulness in the context of metagenomics. Then we developed a new reference-free method to identify species named metavariants species (MVS) by analogy to the metagenomic species (MGS). After detecting biallelic loci directly from metagenomic reads using discoSnp++, MVSs are identified by density-based clustering on biallelic loci depth sequencing coverage across all sampled populations. Then, the allele frequencies of MVS can be used for population genomic analyses to identify population differentiation and loci under natural selection. We applied this method to decipher population structure and differentiation on Tara Oceans metagenomic data. We also combined metatranscriptomic and metagenomic data with discoSnp++ to identify allele-specific expression (ASE) at the population level and detected a strong link between loci under natural selection and loci under ASE.