Create Phylogenetic Tree of SARS CoV 2 by UShER

There are nearly 14 million viral genome sequences right now in the GISAID EpiCoV ™ database. It is not likely to infer the phylogenetic relationships for such a huge dataset by traditional maximum likelyhood or Bayesian methods in a shor time period. The UShER package was developed to generate ultra-large phylogenetic tree of SARS-CoV-2 genomes. The algorithm of the UShER program is to place new samples onto an existing phylogeny using maximum parsimony method. It is able to place given SARS-CoV-2 genome sequences into the GISAID global phylogeny in a couple of hours. This program is particularly helpful in understanding the relationships of newly sequenced SARS-CoV-2 genomes with each other and with previously sequenced genomes in a global phylogeny.

The UShER package is composed by four programs:

  1. UShER: a program that rapidly places new samples onto an existing phylogeny using maximum parsimony.
  2. matUtils: a toolkit for querying, interpreting and manipulating the mutation-annotated trees (MATs).
  3. matOptimize: a program to rapidly and effectively optimize a mutation-annotated tree (MAT) for parsimony using subtree pruning and regrafting (SPR) moves within a user-defined radius.
  4. RIPPLES: a program that uses a phylogenomic technique to rapidly and sensitively detect recombinant nodes and their ancestors in a mutation-annotated tree (MAT).

The taxoniumtools and Taxonium website are used to display the MAT generated by UShER.