Data-Driven Population Genetics

Our research focuses on mathematical models describing the evolution and ecology of genes and genomes.
We use mathematical and computational approaches to understand how the observed diversity of microbes emerged and how bacterial populations adapt to their environment and impact our planet and human health.
We use efficient simulations to train machine learning methods for population genetics that help to analyze the population history of humans, bacteria, plants, and many other species.

In population genetics, many theoretical results have been developed in times where not much genomic and genetic data were available. Although these theory-driven results are still essential, data-driven discoveries have meanwhile dramatically changed our view of evolution and ecology, in particular for prokaryotes. Today, we are able to see low-frequency variations in genomic data, sequence the genomes of thousands of individuals, even at the level of a single cell, and track the occurrence of mutations and genes over time in experimental evolution and from ancient samples.

We are working at the interface of these two worlds, combining mathematical population genetics theory, computational biology, and machine learning.

For example, we introduced models that can explain the existence of huge gene reservoirs in bacterial populations (the pangenome) and analyzed the evolution of the CRISPR-Cas immune system against phages.
Our results help to fight back against antibiotic-resistant bacteria. We also study how machine learning methods can improve inference in population genetics in general, and more specifically, how we can leverage the ancestral recombination graph, a structure that describes the relationships in biobanks at a fine scale, to improve genome-wide association studies.