Using Topology for exploring virus mutations


Research activities and initiatives:

Topological data analysis and critical mutations of the coronavirus

  • PD Dr. Andreas Ott (IAG), KIT
  • Michael Bleher M.Sc., Univ. Heidelberg
  • Lukas Hahn M.Sc., Univ. Heidelberg
  • Prof. Dr. Ulrich Bauer, TU Munich
  • Prof. Dr. Raul Rabadan, Columbia Univ.
  • Dr. Juan Patiño-Galindo, Columbia Univ.
  • Dr Mathieu Carrière, INRIA Nice


This research collaboration, launched in March 2020, is the first to use methods from Topological Data Analysis to track critical mutations in the evolution of a virus in real time, specifically the SARS-CoV-2 coronavirus.

An algorithm was developed and implemented that can detect and analyse critical mutations of the coronavirus genome in very large gene sequence datasets, such as the GISAID dataset with currently around 300000 coronavirus gene sequences from all over the world, in a very short time and is therefore suitable for real-time tracking. In this way, (i) mutations with which the virus adapts to humans or evades countermeasures (such as vaccinations) and (ii) the emergence of new virus variants through recombination can be automatically detected. The knowledge gained in this way is on the one hand very useful for the further development and adaptation of vaccines, and on the other hand helps to detect the emergence of new variants of the coronavirus at an early stage and to assess their relevance so that their dangerousness can then be tested in laboratory experiments.

An important advantage of the novel topological methods used in this project compared to the usual standard methods in bioinformatics is that the resulting algorithms are already optimised by design for the above-mentioned applications and can therefore process the large and, in the further course of the pandemic, constantly growing data volumes of coronavirus gene sequences in the first place, and then reliably and in a very short time. This is important for the most comprehensive real-time tracking of the worldwide pandemic. In the long term, the methods developed in this project can also be used in pandemic prevention for other virus species (e.g. influenza).