Looking inside the dark proteome universe. A tale on protein regions that do not correspond to protein domain families.

Tristan Bitard Feildel (UPMC)
Thursday, November 15, 2018 - 10:30 to 12:00
Room Aurigny
Talk abstract: 

71% is the percentage of the Human proteins with at least one domain annotation (Pfam v31) and 45% is the percentage of Human protein amino acid sequences corresponding to a domain (set made of canonical proteins and isoforms). Surprisingly not much isn'it? And these percentages is for the Human proteome, the most (probably?) extensively studied Eukaryotic organisms even if a huge bias has been recently pointed out regarding which genes are studied or left out. To what correspond the remaining amino acids? Very often they are viewed as intrinsically disordered regions (IDRs) or proteins (IDPs), part or whole proteins that do not fold into a stable structure. In this presentation, I will present an other view on these proteins and protein regions. Particularly I will show that restricting sequence analyses to annotation from protein domain databases is a too stringent approaches as to be part of a protein domain databases, a sequence must either have been lucky enough to have a corresponding X-Ray/NMR structure or be conserved and old enough to be in many different proteins and different organisms. I will present two examples, an evolutionary analyses of novel (recent) protein domains on Insect and a comparative  analyses of the dark proteome from Uniprot/Swissprot, to illustrate that the dark proteome might not be as dark as thought, but as often it is a matter of which tools to use.Keywords: protein domain, intrinsically disordered domains, novel domain