Our research focuses on mathematical models describing the evolution and ecology of microbial genes and genomes.
We use mathematical and computational approaches to understand how the observed diversity of microbes emerged and how bacterial populations cooperate to adapt to their environment.

In population genetics, many theoretical results have been developed in times where not much genomic and genetic data were available. Although these theory-driven results are still essential, data-driven discoveries have meanwhile dramatically changed our view of evolution and ecology, in particular for prokaryotes. Today we are able to see low-frequency variations in genomic data, sequence the genomes of thousands of individuals, even at the level of a single cell, and track the occurrence of mutations and genes over time in experimental evolution.

We are working at the interface of these two worlds combining mathematical population genetics theory, computational biology, and machine learning.

For example, we introduced models that can explain the existence of huge gene reservoirs in bacterial populations (the pangenome) and analyzed the evolution of the CRISPR-Cas immune system against phages. More recently, we started to study how the cooperation of closely related bacterial strains affects genomic diversity and how machine learning methods can improve inference in bacterial population genetics.


In contrast to eukaryotic populations, bacteria frequently gain, loose, and exchange genes. This gives rise to the so-called pan-genome, which is the set of all genes that are present in any fraction of a bacterial population. We aim to understand how the transfer of genetic material and the evolutionary dynamics of gene gain and loss influence the composition of bacterial pan-genomes.

The CRISPR-Cas system, the immune system of bacteria against phages, is an interesting model system to answer fundamental questions concerning host-parasite co-evolution.
Most CRISPR-Cas systems possess an array that contains short "spacer" sequences that align with the target sequence. We are interested in the evolution of CRISPR spacer arrays, shaped by spacer insertions and deletions.
Furthermore, there are many different sub-types of CRISPR systems. Some of them are potentially performing functions beyond defense. We use individual-based mathematical models to analyze the fitness costs of different CRISPR systems that can have several reasons, such as imperfect self/non-self distinction or blocking beneficial horizontal gene transfer. Ultimately, we hope to explain the maintenance and spread of CRISPR-Cas systems in prokaryotic populations.

The massive amount of newly sequenced genetic data gives rise to a variety of interesting applications in the emerging field of machine learning (ML) in population genetics. The main challenge is that sequence data are not independent but connected by their phylogenetic relationship. Our aim is to develop, analyze, and apply supervised machine learning tools that can exploit this phylogenetic relationship to improve estimation and classification in bacterial genome evolution and human population history.