panX - Pan-genome Analysis and Exploration
Richard Neher, Wei Ding, and I created a pipeline to automatically analyze pan-genomes. The most outstanding part of this project is the visualization of the pan-genome in a browser. It is now very easy to explore the pangenome and search for certain genes or features within a pan-genome.
Have a look at what we have done at pangenome.de
(formerly known as IMaGe)
Panicmage is a shortcut for "pangenome analyzer for infinitely - which means considerably - many genes".
The name changed from IMaGe to panicmage, since "image" is possibly one of the most stupid words to search for on google. In addition, the new name emphasizes that in our model, while there are infinitely many possibly existing genes, at any time a finite number of genes exists in the population.
- a genealogy
- the gene frequency spectrum
- the number of generations to the most recent common ancestor (optional),
- the parameters of a neutral Infinitely Many Genes Model (gene gain and gene loss rates)
- the number of core genes of the whole population
- the expected number of new genes found in the next sequenced strain
- the size of the persistent pangenome
- the size of the total pangenome.
In addition, panicmage computes the p-value of gene frequency spectra for a given genealogy under neutral evolution and can simulate distributed genomes. So far p-values for neutral evolution and for existing sampling bias can be computed.
To install panicmage please visit the panicmage GitHub repository.
I made a Mathematica notebook to compute the number of sequenced genomes necessary for a complete picture of the persistent pangenome, i.e. of all genes present in at least 1 percent of the population. You can download the notebook here.