Filling Scaffolds

Laurent Bulteau ( LIGM)
Thursday, June 8, 2017 - 10:30
Room Aurigny
Talk abstract: 


The Scaffold Filling problem was introduced by Muñoz et al. with the objective
of using, for genomic distance purposes, not only perfectly sequenced genomes but
also unfinished drafts. Indeed, with the development of NGS technologies, it has
become much faster and cheaper to produce a first draft of any genome. However,
the cost of “polishing” the draft to a complete sequence has not decreased with the
same rate, thus many species are left with a genome in scaffold form. In such a form,
a genome is only known as a series of contigs (i.e., contiguous segments of genes),
separated by unknown gaps, sometimes with an indication on the length of the gap.
With the help of a reference genome, that is, the complete genome of a close-
enough species, one can hope to fill the gaps. Indeed, Muñoz et al. proposed a
polynomial-time algorithm computing a most parsimonious rearrangement scenario
between a scaffold and a reference genome, thereby completing the scaffold. However,
this approach can only be applied in the absence of duplications – the problem
becomes computationaly hard otherwise. From then on, several algorithms have been
proposed to deal with gene duplications in order to compute simplified rearrangement
distances, using both approximation and parameterized techniques. We will
review these methods, as well as possible extensions of the problem.