Séminaire de Fabio Cunial (College of Computing, Georgia Institute of Technology, Atlanta, USA)
Lieu du séminaire :
INRIA Rennes - Bretagne - AtlantiqueThe subsequence composition of polypeptides
Fabio Cunial
The quantitative underpinning of the information content of biosequences represents an elusive goal and yet also an obvious prerequisite to the quantitative modeling and study of biological function and evolution. Several past studies have addressed the question of what distinguishes biosequences from random strings, the latter being clearly unpalatable to the living cell. Such studies typically analyze the organization of biosequences in terms of their constituent characters or substrings and have, in particular, consistently exposed a tenacious lack of compressibility on behalf of biosequences. This research attempts, perhaps for the first time, an assessment of the structure and randomness of polypeptides in terms on newly introduced parameters that relate to the vocabulary of their (suitably constrained) subsequences rather than their substrings. Such parameters grasp structural/functional information, and are related to each other under a specific set of rules that span biochemically diverse polypeptides.
Measures on subsequences separate few amino acid strings from their random permutations, but show that the random permutations of most polypeptides amass along specific linear loci.
Keywords: Information content of polypeptides, constrained subsequences, suffix graph.