Symbiose seminars

  • Reference-free approach for population genomic analysis using metagenomic and metranscriptomic data

    Amin Madoui (Genoscope CEA)
    Thursday, July 5, 2018 - 14:00
    Room Aurigny
    Talk abstract: 

    The availability of large datasets of metagenomic sequences offers new opportunities for population genetics research. However, for many non-model species, the lack of reference genomes remains an important issue and constitutes an obstacle for population genomics studies. We took advantages of the discoSnp++ program and proved first its usefulness in the context of metagenomics. Then we developed a new reference-free method to identify species named metavariants species (MVS) by analogy to the metagenomic species (MGS). After detecting biallelic loci directly from metagenomic reads using discoSnp++, MVSs are identified by density-based clustering on biallelic loci depth sequencing coverage across all sampled populations. Then, the allele frequencies of MVS can be used for population genomic analyses to identify population differentiation and loci under natural selection. We applied this method to decipher population structure and differentiation on Tara Oceans metagenomic data. We also combined metatranscriptomic and metagenomic data with discoSnp++ to identify allele-specific expression (ASE) at the population level and detected a strong link between loci under natural selection and loci under ASE.


  • Spectral methods for reconstructing latent orderings, and applications to de novo genome assembly.

    Antoine Recanati (ENS)
    Thursday, June 28, 2018 - 10:30
    Room Aurigny
    Talk abstract: 

    The seriation problem seeks to recover a latent ordering from similarity information. We typically observe a matrix measuring pairwise similarity between a set of n elements and assume they have a serial structure, i.e. they can be ordered along a chain where the similarity between elements decreases with their distance within this chain. In the context of de novo prokaryotic genome assembly, within the Overlap-Layout-Consensus paradigm, we collect fragments of DNA (reads) randomly sampled across the genome, with sufficient coverage so that a given read overlaps with the neighboring reads. However, the position of the reads within the genome is unknown, so one has to solve a sort of jigsaw puzzle (the layout). This layout step can fit in the framework of Seriation, where the similarity measures the overlap (if any) between two reads, and we wish to reorder the reads such that two neighboring reads have a large overlap, and two reads far apart do not overlap.

    In this talk, I will present a basic spectral method for Seriation, akin to the Spectral Clustering method widely used in Machine Learning, together with a simple extension that allows to deal with circular orderings and improves robustness to noise. I will then present results of this method applied to finding the layout of E. coli reads sequenced with Oxford Nanopore Technology MinION device.


  • Thèses encadrées en bioinformatique

    Mourad Elloumi (univ. Tunis)
    Thursday, June 21, 2018 - 10:30
    Room Aurigny
    Talk abstract: 

    Dans cet exposé, je vais présenter les activités de notre groupe,
    le BioInformatics Group (BIG), à travers les thèses encadrées.

    Je présente, particulièrement, nos travaux sur :

     . Le Biregroupement des Données Biopuces

     . Correction des Données de Séquençage de 3ème Génération

    Je termine cet exposé, par présenter mes autres activités

  • Genome scale metabolic modeling and study of secondary metabolism of 24 Penicillium species

    Sylvain Prigent
    Thursday, May 31, 2018 - 10:30
    Room Aurigny
    Talk abstract: 

    During this presentation I will talk about my 3 years of Postdoc in Jens Nielsen lab in Sweden. Where I studied metabolism of Penicillium species. Modelling of metabolism at the genome scale have proved to be an efficient for explaining observed phenotypic traits in living organisms. Further, it can be used as a means of predicting the effect of genetic modifications e.g. for generation of microbial cell factories. With the increasing amount of genome sequencing data available, a need exists to accurately and efficiently generate such genome scale metabolic models (GEMs) of non-model organisms, for which there are few data. In this talk, I will present a semi-automatic reconstruction approach applied to 24 Penicillium species, which have potential for production of pharmaceuticals secondary metabolites or used in foods such as cheeses. The models were based on the MetaCyc database and a previously published Penicillium GEM, and gave rise to comprehensive genome scale descriptions of their metabolism. The models proved that while central carbon metabolism is highly conserved, secondary metabolic pathways represent the main diversity among the species. I will also present some work we did on prediction of production of secondary metabolites based on genomic sequence and some RNA-seq analysis on those 24 species.

    At the end of my presentation I will also talk a little about my current and future work on fruit biology at INRA Bordeaux.


  • BRAvo : A tool for regulatory network assembly through Linked Open Data

    Marie Lefebvre (LN2S (Nantes))
    Thursday, May 24, 2018 - 10:30
    Room Aurigny
    Talk abstract: 

    A few years ago, SyMeTRIC health actors have proposed personalized medicine approaches in the context of several pathologies or therapeutical approaches such as cancer, transplantation, cardiovascular, respiratory or metabolic diseases. In spite of pathological diversities, these actors share methodological and technological commonalities. In particular they strongly rely on the exploration and combination of several heterogeneous and massive biological and clinical datasets to discover multi-parameter pathological signatures and new biomarkers.

    In order to assemble data arising from different scales, technologies or localities, huge efforts address the organization of biological knowledge through linked open databases. These databases are supposed to be automatically queryable in order to reconstruct regulatory and signaling networks. Nevertheless, assembling networks usually implies manual operations due to source-specific identification of biological entities and relationships, multiple life-science databases with redundant information and difficulty to recover the logical flow of a biological pathway.

    In this talk, I will provide a framework based on Semantic Web technologies for automating the assembly of regulatory and signaling networks. To this purpose,  I developed BRAvo, an interactive web tool, allowing users to interact with the reconstruction process, and a command-line tool allowing to address larger scale models in a batch mode.

    Our results show that BRAvo is able to retrieve networks of 5000 nodes from 200 input genes by querying the full PathwayCommons database in less than one hour. BRAvo can also provide interesting filters of data sources, depth reconstruction and biological entities type. Thanks to BRAvo, we are now able to address issues of heterogeneous data integration for biological network reconstruction intended for computational and predictive models.

  • De la reconstruction de génomes ancestraux vers l'aide à l'assemblage

    Sèverine Bérard (Institut des Sciences de l'Evolution - Montpellier)
    Thursday, May 17, 2018 - 10:30
    Room Minquiers
    Talk abstract: 

    Nous présenterons une méthode permettant de reconstruire des génomes ancestraux dans un contexte phylogénétique. Cette approche retrace l'histoire évolutive des adjacences de gènes en prenant en entrée les ordres de gènes dans les espèces actuelles et les arbres phylogénétiques des gènes et des espèces, et en optimisant un critère de parcimonie. L'algorithme sous-jacent est basé sur le principe de programmation dynamique, ce qui permet de traiter de gros jeux de données en temps raisonnable. Plusieurs modifications de ce premier algorithme ont permis, entre autre, de reconstruire à la fois les ordres de gènes ancestraux et actuels, permettant ainsi d'améliorer le scaffolding des génomes actuels. La méthode est également capable d'inférer des scores de confiance aux adjacences prédites et de prendre en entrée des données de séquençage appariées. Nous montrerons l'application de notre méthode sur un jeu de données de 18 moustiques Anopheles où nous avons pu réduire le nombre de scaffolds de plus de 60 %. Cette méthode est implémentée dans le logiciel DeCoSTAR (

  • Semantic Web of Linked Data

    Olivier Corby (Inria sophia/nice)
    Thursday, April 5, 2018 - 10:30
    Room Aurigny
    Talk abstract: 

    En introduction nous rappelons très brièvement les principes du Web sémantique et du Web de données.
    Puis nous présentons deux langages complémentaires issus de nos recherches sur le Web sémantique.
    Le premier langage est STTL, SPARQL Template Transformation Language. Il permet d'écrire des transformations de graphes RDF vers des formats texte tels que Turtle, RDF/XML,  OWL functional syntax,  HTML, etc. 
    Le second est LDScript, Linked Data Script Language, un DSL dédié à la définition de fonctions d'extension pour SPARQL qui ne nécessite pas de compilation. Il permet  d'exécuter des requêtes SPARQL select et construct et de manipuler les résultats. Les objets du langages sont les  graphes, les triplets et les termes RDF,  les solutions de requêtes SPARQL et des listes de tels objets. Il offre des fonctions du second ordre: funcall, apply, map, reduce.

  • Decentralized Data Management for the Semantic Web

    Hala Skaf-Molli et Pascal Molli
    Thursday, March 29, 2018 - 10:30
    Room Aurigny
    Talk abstract: 

    The semantic web is an extension of the web where information has a precise meaning. Thousands of linked datasets are available on the web. Important problems concerning quality, deep web access and availability still unsolved. For data quality, we propose to transform the web of data into a read/write web of data. A data consumer will able to correct an error. Allowing consumers to write the semantic web poses the problem of data consistency. We define synchronization algorithms for RDF data model. To access to the deep web, we propose a mediator approach allowing to combine semantic data and deep web data. The problem is to improve the performance of queries in the presence of a large number of data sources. Finally, to ensure the availability, we propose a replication model for the web of data. The problem is to optimize federated SPARQL queries in the presence of replicas selected at queries execution time.

     In this talk, we will present our contribution concerning  the deep web access and data availability in the semantic web.
  • A metagenomics (and few other things) perspective on the "star" diatom Asterionella formosa

    Adrien Villain (IGS CNRS)
    Tuesday, March 27, 2018 - 10:30
    Room Aurigny
    Talk abstract: 

    Diatoms are unicellular microalgae often found in association with numerous bacterial partners. These interactions may be beneficial, neutral or detrimental, and may evolve over time depending on environmental conditions. Here I will describe how we used laboratory techniques, metagenomics and 16S barcoding to characterize the bacterial community of a non-model freshwater species, Asterionella formosa. Emphasis will be put on the technical challenges of metagenomics assembly, an assessment of the pros and cons of the methods we used, and our recent collaborative work on the prediction of metabolic complementarities within the community.