Symbiose seminars

  • bistro: a library to build large-scale workflows in computational biology

    Philippe Veber (LBBE)
    Thursday, June 13, 2019 - 10:30 to 11:00
    Room Aurigny
    Talk abstract: 

    Computational pipelines for analyzing high-throughput genomics datasets typically consist of tens to hundreds of shell commands, generating thousands of files and running for days or weeks. While becoming rather complex pieces of software, they are most of the time still programmed using rudimentary tools like shell scripts, which offer very little help to develop large and reusable programs. In addition to being error-prone, implementing computational pipelines using shell scripts leaves lots of tedious aspects to the programmer, diverting her/his attention from data analysis considerations. In this work, I propose to leverage a modern, statically typed programming language to implement as a simple library a comfortable environment to develop bioinformatics pipelines. This library is named bistro and is written in the OCaml language. Among other features, it provides dependency tracking, parallel execution, resume-on-failure, automatic naming of intermediate files, easy deployment of pipelines using Docker or Singularity for enhanced reproducibility. Thanks to the compiler type checker, errors on file formats or typos in command arguments are detected at compile-time, that is even before running the pipeline. I'll show various benefits of embedding a pipeline development framework in a generalist language. Among other things, it becomes very easy to integrate a pipeline into a web server, or write extensible libraries of highly configurable pipelines.

  • From QC to isoform characterization : Evaluation and improvements of Nanopore sequencing in a RNASeq context

    Sophie Lemoine (IBENS)
    Thursday, June 6, 2019 - 10:30 to 11:00
    Room Aurigny
    Talk abstract: 

    Transcript identification is a real challenge with short read sequencing. With Oxford Nanopore Technologies (ONT), our aim is to sequence full-length cDNA to directly access isoforms. We have successfully validated analysis of differential expressed genes on a mouse model of myelination blockage following the standard ONT protocol. The mean length of our reads was 1.2kb, which is lower than the estimated 2kb mean length of the transcripts and even worse if we consider the TSL1 tagged transcripts (2.6kb). To improve our results, we combined SmartSeq and ONT technologies to synthesize full-length cDNA from total RNA. The cDNA were barcoded in order to sequence multiplesamples on a single MinION run and allow differential expression analyses. The SmartSeq/ONT protocol allowed us to sequence much longer cDNAs. The mean length of thereads was then about 2.6kb and the small reads that were the majority of the population with ONT protocol were eradicated. We were able to detect more differentially expressed targets. The targets detected were longer than the ONT protocol ones. The optimized protocol globally achieved a better 5’-3’ transcripts coverage and not surprisingly, for those longer than 2kb. If it does not ensure you have full-length cDNAs, it can be reliable for cDNA sequencing and improve isoform annotation andquantification using dedicated pipelines, such as FLAIR or Pinfish.The goal of my talk is to give an idea of :- the evolution of the protocols tested and improved;- the developments we had to perform to make the QC of our runs;- the ongoing evaluation of FLAIR and Pinfish in our context.

  • Approches génomiques d’étude de l’évolution des systèmes de détermination du sexe chez les poissons

    Yann Guiguen (INRA IPGP)
    Thursday, May 16, 2019 - 10:30 to 12:00
    Room Aurigny
    Talk abstract: 

    Les poissons présentent une grande variété de leurs mécanismes de détermination du sexe allant de systèmes purement génétiques à des systèmes déterminés complétement ou en partie par l'environnement (température, densité ...). Curieusement, cette variabilité ne suit aucun schéma phylogénétique évident, avec des transitions rapides au sein d’espèces étroitement apparentées, voire même au sein de populations différentes de la même espèce. Pour mieux comprendre cette diversité et les mécanismes qui régissent l’évolution des chromosomes sexuels, nous avons appliqué des approches de séquençages génomiques partiels (Rad-Sequencing) ou complets (Pool-Sequencing) sur un grand nombre d’espèces de poissons pour pouvoir caractériser les systèmes de détermination du sexe, délimiter les régions chromosomiques des loci sexuels et identifier des gènes candidats comme déterminants majeurs du sexe. Ces stratégies ont conduit à l'identification du type de déterminisme sexuel chez de nombreuses espèces avec des systèmes monofactoriels simples (XX/XY ou ZZ/ZW), mais également des espèces avec des systèmes de détermination du sexe plus complexes. Ces résultats nous ont aussi permis d’identifier de nouveaux gènes déterminants majeurs du sexe et de montrer que ceux-ci sont souvent « recrutés » dans un nombre relativement faible de voies de signalisation.

  • From alignment-free heuristics to an interactive visualization: V(D)J repertoire analysis in the Vidjil platform

    Mikaël Salson (CRIStAL U. Lille)
    Thursday, April 25, 2019 - 10:30 to 12:00
    Room Aurigny
    Talk abstract: 

    The diversity of the immune repertoire is grounded on V(D)J recombinations. Many algorithms and software identify these recombinations inside high-throughput sequencing data. We introduce new Aho-Corasick based heuristics to speed up the detection of V(D)J sequences in high-throughput sequencing data. We also show how those heuristics can speed up the identification of V(D)J recombinations. Our experiments show that those new heuristics improve time and space consumption of our previous algorithm — Vidjil-algo — while keeping its sensitivity and specificity. Such improvements are of importance when dozens of samples are to be analysed as is commonly the case in a clinical setting. In such a case users launch their analyses and interpret their results through a web application we have designed for this purpose.

  • Hi-C guided assemblies reveal conserved regulatory topologies on X and autosomes despite extensive genome shuffling

    Gautier Richard (IGEPP INRA)
    Thursday, April 18, 2019 - 10:30
    Room Aurigny
    Talk abstract: 

    Genome rearrangements that occur during evolution impose major challenges on regulatory mechanisms that rely on three-dimensional genome architecture. Here, we developed a scaffolding algorithm and generated chromosome-length assemblies from Hi-C data for studying genome topology in three distantly related Drosophila species. We observe extensive genome shuffling between these species with one synteny breakpoint after approximately every six genes. A/B compartments, a set of large gene-dense topologically associating domains (TADs) and spatial contacts between high-affinity sites (HAS) located on the X chromosome are maintained over 40 million years, indicating architectural conservation at various hierarchies. Evolutionary conserved genes cluster in the vicinity of HAS, while HAS locations appear evolutionarily flexible, thus uncoupling functional requirement of dosage compensation from individual positions on the linear X chromosome. Therefore, 3D architecture is preserved even in scenarios of thousands of rearrangements highlighting its relevance for essential processes such as dosage compensation of the X chromosome.

  • Influence of urbanization on the human gut and oral microbiome

    Laure Ségurel (mnhn)
    Thursday, April 11, 2019 - 10:30 to 12:00
    Room Aurigny
    Talk abstract: 

    Industrialization has been associated with a loss of human gut microbiota diversity. As a decreased gut microbiome diversity is also correlated with a number of modern diseases, understanding what factors drive this loss is vital for public health. It is also of evolutionary interest to understand how gut bacteria are adapting to rapidly changing environments. However, industrialized and non-industrialized populations differ in many ways, making it practically impossible to disentangle the effects of diet, sanitary conditions, medical practices or other factors. Moreover, gut protozoa, who have likely shaped the human-gut microbiota interactions throughout their coevolutionary history but are virtually absent from industrialized populations, are rarely taken into account. Finally, even less is known about the effects of industrialization on other microbiomes, including the oral microbiome, another important health-associated microbial community. To address some of these limitations, we examined oral and gut microbiomes of 140 individuals from Cameroon along a small-scale urbanization gradient. Apart from metagenetic and metagenomic data, we collected a number of ethnological, medical, sanitary and parasitological parameters in order to identify factors that influence microbiome diversity and variation. 

  • Alignements et distances d'édition avec fragmentations, de la bioinformatique à l'informatique musicale

    Mathieu Giraud (U. Lille, CRIStAL)
    Thursday, March 28, 2019 - 10:30 to 12:00
    Room Aurigny
    Talk abstract: 

    Les comparaisons de séquences jouent un grand rôle en bioinformatique mais aussi en informatique musicale : pourquoi et comment mesurer la similarité de deux séquences d'ADN ou de deux mélodies ?Les opérations habituelles de substitutions, d'insertions et de délétions peuvent être étendues pour mieux modéliser ces similarités. L'algorithme de Mongeau-Sankoff (1990) a ainsi introduit les opérations de fragmentations (et de consolidations), faisant correspondre à une note un ensemble de notes - comme on peut faire correspondre à un nucléotide un homopolymère résultant d'une erreur de séquençage. Je présenterai quelques résultats sur l'étude de variations utilisant les fragmentations ainsi que sur la correspondance entre aligments et distance d'édition dans ce cas, et sur les défis du calcul de telles distances. Ces travaux ont été effectués lors de collaborations avec Henry Boisgibault et Florent Jacquemard ainsi qu'avec Emilios Cambouropoulos et Ken Déguernel.

  • Livestock genome annotation: transcriptome and chromatin structure profiling in cattle, goat, chicken and pig.

    Sarah Djebali-Quelen (INRA GenPhySE)
    Thursday, March 14, 2019 - 10:30 to 12:00
    Room Aurigny
    Talk abstract: 

    Functional annotation of livestock genomes is a critical step to decipher the genotype-to-phenotype relationship underlying complex traits. As part of the Functional Annotation of Animal Genomes (FAANG) action, the FR-AgENCODE project aims at profiling the landscape of transcription (RNA-seq) and chromatin accessibility and conformation (ATAC-seq and Hi-C) in four livestock species representing ruminants (cattle, goat), monogastrics (pig) and birds (chicken), using three target samples related to metabolism (liver) and immunity (CD4+ and CD8+ T cells). Standardized protocols were applied to produce transcriptome and chromatin datasets for the four species. RNA-seq assays allowed to considerably extend the available catalog of protein-coding and non-coding transcripts. Gene expression profiles were consistent with known metabolic/immune functions and revealed differentially expressed transcripts with unknown function, including new lncRNAs in syntenic regions. The majority of ATAC-seq peaks of chromatin accessibility mapped to putative regulatory regions, with an enrichment of predicted transcription factor binding sites in differentially accessible peaks. Hi-C provided the first set of genome-wide maps of three-dimensional interactions across livestock and showed consistency with results from gene expression and chromatin accessibility in topological compartments of the genomes. We report the first multi-species and multi-assay genome annotation results obtained by a FAANG pilot project. The global consistency between gene expression and chromatin structure data in these four livestock species adds up to previous findings in model animals. Overall, these results emphasize the value of FAANG for the research on domesticated animals and strengthen the importance of future meta-analyses of the reference datasets being generated by this community on different species.

  • CG-alcode : explorer l'expression alternative du gène

    Jean-Stéphane Varré (CRIStAL U. Lille)
    Thursday, March 7, 2019 - 10:30 to 12:00
    Room Aurigny
    Talk abstract: 

    Dans cet exposé nous présenterons la méthode CG-alcode qui permet de comparer deux ensembles de transcrits pour une paire de gènes orthologues chez deux espèces en construisant un modèle pour chaque gène, puis grâce au modèle construit, d’identifier les « orthologues d’épissage » et d’inférer des transcrits putatifs. Nous insisterons sur l’algorithme d’identification des signaux fonctionnels connus et prédits pour la construction du modèle. Puis nous présenterons deux pistes pouvant utiliser les résultats de modélisation de CG-alcode : l’exploration de l’ensemble des transcrits potentiels par identification de « régulateurs » et l’identification de transcrits alternatifs à partir de données de séquençage de troisième génération.

  • Analyses of thousands of molecular events, example of RNA metabolism in Fronto-Temporal Dementias

    Vincent Anquetil (ICM)
    Thursday, February 28, 2019 - 10:30 to 12:00
    Room Aurigny
    Talk abstract: 

    FrontoTemporal dementias (FTD) are characterized by progressive behavioral and language changes, associated with an atrophy of the frontal and temporal lobes. Amyotrophic lateral sclerosis (ALS) is a rapidly progressive and fatal motor neuron disease. If ALS is a poorly heritable disorder (about 10%), up to 50% of the FTD correspond to forms with genetic transmission. Mutations in 3 genes are responsible for most of the FTD genetic cases: microtubule associated protein tau (MAPT), progranulin (PGRN) and chromosome 9 open reading frame 72 (C9orf72). Genetic or sporadic FTDs share common neuropathological features such as neuronal Tubulin-Associated Unit (TAU), Tar-DNA binding Protein 43 (TDP43), or Fused in Sarcoma (FUS) inclusions. TDP43 and FUS neuronal inclusions are common to FTD and ALS. Up to 50% of ALS patients develop FTD symptoms and around 15% of FTD patients display motor neuron dysfunction typical of ALS. To date, no treatment is available for these disorders, and the molecular mechanisms at stake in the different pathological subtypes remain elusive.We analyzed, at the molecular level, the affected (frontal) and preserved (occipital) cortices of FTD +/- ALS patients. High-throughput RNA sequencing was performed to analyze transcriptome, splicing profiles and micro-RNAs misregulation for a subset. The samples were sorted according to their genetic mutation (C9orf72, MAPT, PGRN), neuropathology (TAU+, TDP+, FUS+), phenotype (FTD, FTD+ALS, ALS) and compared to a set of controls. Gene expression data allowed to differentiate the three phenotypes: pure FTD, pure ALS and FTD/ALS. Hundreds of differential RNA maturation profiles (splicing) were observed between mutations. Globally, less than 10% of the computed changes in RNA processing lead to modification of RNA expression. Therefore, the differently processed mRNAs can lead to the synthesis of 1) a different ratio of existing proteins 2) mis-localization of the newly synthesized protein 3) the synthesis of aberrant proteins in FTD patients. So, these diseases known as proteinopathies can be due to an accumulation of RNA processing defects, making FTD +/- ALS also general RNAopathies.

Pages