Mathematical Representations of DNA Sequences and the Planetary Biodiversity Mission

Prof. Lila Kari, Cheriton School of Computer Science, University of Waterloo

Feb 3, 2023, 11:30am EC4-2101A

Even though biologists discover and classify thousands of new species every year, it is estimated that 95% of the over 20 million multicellular species on Earth do not yet have a scientific name or classification. The long term objectives of our research tie in with the Planetary Biodiversity Mission to map all multicellular life on Earth by 2045, and with deciphering the “Rosetta Stone” of genomics, understand the semantics and utility of the mathematical structure of genomic sequences-.

In this talk I discuss several mathematical representations of  DNA sequences, and their use in conjunction with supervised machine learning, and unsupervised deep learning  techniques for ultrafast, accurate, and scalable genome classification at all taxonomic levels.  This effort is part of BIOSCAN, an international project involving over 1,000 researchers representing more than 40 countries, which uses DNA-based technologies to map and analyze Earth’s biodiversity.