Seminar Machine Learning in Phylogenetics and Population Genetics SS 2024

FIRST MEETING at room A104 at Sand at 18.04.24, 09:00-09:45

further meetings will be scheduled together at the first meeting.

I am providing a seminar "Machine Learning in Phylogenetics and Population Genetics" where we will consider data-driven and simulation based machine learning methods in population genetics with a wide range of research questions and application scenarios. Bioinformaticians, Machine Learning students, mathematicians, and methodologically interested biologists are invited to participate in the seminar. Topics will be assigned according to your background.

Learning goals

We will cover machine learning methods for the analysis of genomes in populations, including phylogenetic and demographic inference, analysis of population structure, and detection of selection, etc.
Students get to know about advanced topics in population genetics theory and the corresponding machine learning applications. They can reflect current research questions and investigate complex research topics. Students will be able to acquire knowledge about current findings through comprehensive literature search. They will know the importance of current topics in the area of computational population genetics and will be aware that there are still many open questions.
In particular, students will learn about recent attempts to transfer machine learning concepts (CNNs, GCNs, generative models, etc.) to the field of phylogenetics and population genetics. Students will not only have improved their study, reading, and writing skills but will also have enhanced their capability of working independently. The teaching method in this seminar aims to boost the students’ confidence (oral presentation), enhance their communication skills, and enable them to provide and accept constructive criticism (discussion session and review of the written report following their presentation).

Preparation for a possible Master Thesis Project

If you are interested in one of the interesting and timely master thesis projects currently available in the group we can adjust your topic in this seminar to serve as an ideal preparation for the master thesis project. Please let me know in advance if you are interested in joining our group for your Master's thesis.

 

Relevant Literature


Kevin Korfmann, Oscar E Gaggiotti, Matteo Fumagalli, Deep Learning in Population Genetics, Genome Biology and Evolution, Volume 15, Issue 2, 2023, evad008, https://doi.org/10.1093/gbe/evad008
Daniel R. Schrider, Andrew D. Kern, Supervised Machine Learning for Population Genetics: A New Paradigm, Trends in Genetics, Volume 34, Issue 4, 2018, 301-312, https://doi.org/10.1016/j.tig.2017.12.005.
Yu K Mo, Matthew Hahn, Megan L Smith, "Applications of Machine Learning in Phylogenetics", preprint, https://doi.org/10.32942/X2XG7G
Yun S. Song, "Lecture Notes on Computational and Mathematical Population Genetics", github/popgenmethods/lecture_notes
Graham Coop, "Population and Quantitative Genetics", https://github.com/cooplab/popgen-notes

Registration

If you want to participate in the course, please send an email to Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein! .
You’ll be contacted later with more details. The course is limited to 12 people on a first come first serve basis.

Organization and Evaluation

Credits: 3 ECTS

Language: German or English

 

Dates & Room: The first meeting, including the distribution of topics among the participants, will be at Thursday, April 18th at 9 am, at Sand in room A104.
All other meetings will be in several or a single block seminar session, such that as many participants as possible can attend. The dates will be decided at the first meeting.
We are aiming at in-person sessions but virtual, and hybrid sessions are possible if you prefer that e.g. due to childcare etc.


Evaluation: Presentation (45min + 10min Q&A), written review of two other student papers (roughly 1 page each), written seminar paper (5-8 pages)

Only the final seminar paper (50%) and the presentation (50%) are graded.
The reviews and the first version of the paper have to be handed in on time to pass the course.

 Tentative Course Plan

Dates & Room: The irregular meeting dates will be chosen at the first meeting.

Week 1: Initial Meeting
Week 2 ff: Start of Individual Working Phase
Week 2: Introduction to Academic Writing and Presenting
Week 3 ff: joint reading sessions and individual Q&A sessions
second half of the semester but not later than 12.07.24:
Presentations and Discussions including small exercises or other interactive elements for non-presenters
19.06.24 Deadline First Hand-in Seminar paper
31.07.24 Deadline Revised Seminar Paper

  

 
 
 

Available Master's theses in the group


If you are interested in doing a master's thesis in our group, e.g. at the interface of machine learning and population genetics or microbial population dynamics, please contact me.

Currently, the following topics are available:

  • Simulation of genotypes and prediction of phenotypes using machine learning in plant breeding programs (bioinfo or machine learning)
    (in collaboration with computomics)
  • Annual plants with a seed bank: On the evolution of diversity in light of  genetic architecture
    (in the group of Max Schmid (plant ecology) for more information click here)
  • classifying (rare) human disease traits from genealogical neighbors in large biobanks
    (in collaboration with Stephan Ossowski)
  • simulation-based inference in population genetics (machine learning or bioinfo)
  • Inferring and visualizing parental ancestry proportions (bioinfo)
  • co-occurrence of defense genes in prokaryotes (bioinfo)
  • Improving most likelihood tree reconstruction based on CRISPR arrays (bioinf)
  • Modeling CRISPR population dynamics (math)
  • Analyzing mutational patterns in CRISPR repeats (bioinfo)
  • Simulating the mutation rate in fluctuating environments (comp bio or bioinfo)
  • CNNs for lottery outcome prediction (really!) (machine learning or informatics)
  • ... your own idea?
  • ---------------------------------------------------------------------------
  • The ancestors of a microbial genome sequence (math) (in progress)
  • Detecting oversampled strains in pathogen databases (bioinfo) (in progress)
  • ------------past topics (for inspiration) ---------------------------
  • Visualization of ancestral CRISPR spacer array reconstructions (bioinfo)
  • ancestral gene presence/absence reconstruction (bioinfo)

 

If you are interested in doing a bachelor's thesis in our group you can of course also contact us.