Phylogenetic Analysis SARS CoV 2 With UShER

Page content

Create Multiple sequence alignment of your sequences

Use SARS-CoV-2 isoloate Whuan-Hu-1 (NC_045512.2) as the reference.

Download metadata

Metadata from GISAID

Data are available from EpiCoV.

Fileds of GISAID metadata:

# Column Example
1 Virus name hCoV-19/Ireland/LS-NVRL-M35IRL03088/2021
2 Last vaccinated
3 Passage details/history Original
4 Type betacoronavirus
5 Accession ID EPI_ISL_3848896
6 Collection date 2021-08-07
7 Location Europe / Ireland / Laois
8 Additional location information
9 Sequence length 29715
10 Host Human
11 Patient age 21-30
12 Gender unknown
13 Clade GK
14 Pango lineage AY.4
15 Pango version PLEARN-v1.18
16 Variant AA VOC Delta GK (B.1.617.2+AY.*) first detected in India
17 Substitutions (N_G215C,NSP3_A1711V,Spike_T95I,…,Spike_L452R)
18 Submission date 2021-07-16
19 Is reference?
20 Is complete?
21 Is high coverage? True
22 Is low coverage? True
23 N-Content
24 GC-Content 0.379589505862

Included hosts:

# Host name GenBank Common name Common name
1 Aonyx cinereus Asian small-clawed otter otter
2 Arctictis binturong binturong binturong
3 canis lupus gray wolf wolf
4 Canis lupus gray wolf wolf
5 Canis lupus familiaris gray wolf wolf
6 Chaetophractus villosus large hairy armadillo armadillo
7 Chiroptera bats bat
8 Chlorocebus sabaeus green monkey monkey
9 Crocuta crocuta spotted hyena hyena
10 Cygnus columbianus tundra swan swan
11 Environment environment
12 Felis catus domestic cat cat
13 Felis Catus domestic cat cat
14 Foreign human
15 Foreing human
16 Gorilla gorilla gorilla
17 Gorilla gorilla western gorilla gorilla
18 Gorilla gorilla gorilla western lowland gorilla gorilla
19 Hippopotamus amphibius hippopotamus hippo
20 Human human human
21 Humano human human
22 Laboratory derived lab
23 Lynx lynx familiaris Eurasian lynx lynx
24 Manis javanica Malayan pangolin pangolin
25 Manis pentadactyla Chinese pangolin pangolin
26 Mesocricetus auratus golden hamster hamster
27 Mus musculus house mouse mouse
28 Mustela furo domestic ferret ferret
29 Mustela putorius furo domestic ferret ferret
30 Nasua nasua ring-tailed coati coati
31 Neogale vison American mink mink
32 Neovison vison American mink mink
33 Odocoileus virginianus white-tailed deer deer
34 Panthera leo lion lion
35 Panthera tigris tiger tiger
36 Panthera tigris jacksoni Malayan tiger tiger
37 Panthera tigris sondaica Javan tiger tiger
38 Panthera tigris tigris Bengal tiger tiger
39 Panthera uncia snow leopard leopard
40 Phodopus roborovskii desert hamster hamster
41 Prionailurus bengalensis euptilurus Amur leopard cat cat
42 Prionailurus viverrinus fishing cat cat
43 Puma concolor puma puma
44 Rhinolophus affinis intermediate horseshoe bat bat
45 Rhinolophus bat horseshoe bat bat
46 Rhinolophus malayanus Malayan horseshoe bat bat
47 Rhinolophus marshalli Marshall’s horseshoe bat bat
48 Rhinolophus pusillus Least horseshoe bat bat
49 Rhinolophus shameli Shamel’s horseshoe bat bat
50 Rhinolophus sinicus Chinese rufous horseshoe bat bat
51 Rhinolophus stheno Lesser brown horseshoe bat bat
52 unknown unkonwn

Generate metadata for visualization:

# Get Virus name (1), Accession ID (5), Collection date (6), Location (7), Host (10)
# Pango lineage (14)
cut -f 1,5,6,7,10,14 gisaid-metadata.tsv > gisaid-metadata-plot.tsv

# Remove leading `hCoV-19/` from Virus name column
perl -lpi -e 's/^hCoV\-19\///' gisaid-metadata-plot.tsv

# Extract country from Location field

# Rebuild virus name to `<virus name>|<Accession number>|<collection date>` format

UShER pre-processed data

Available from UShER’s pre-processed mutation-annotated tree object for public SARS-CoV-2 sequences, which provides these files:

  • Protobuf file for use with usher --load-mutation-annotated-tree:
    • public-latest.all.masked.pb[.gz]
  • Variant Call Format (VCF) file containing mutations in public sequences, generated from public-latest.all.masked.pb with matUtils extract.
    • public-latest.all.masked.vcf.gz
  • Newick tree file:
    • public-latest.all.nwk.gz
  • Information about each public sequence, e.g. collection date, location, Nextstrain clade and Pango lineage.
    • public-latest.metadata.tsv.gz

      Dates and locations are not available for some sequences.

  • A brief description including date, sources and number of sequences.
    • public-latest.version.txt

Columns of the metadata:

# Column Example GenBank Example Other
1 strain CHN/20221209-188/2022|OQ048281.1|2022-12-09 100002|LR824035.1|2020-03-05
2 genbank_accession OQ048281.1 LR824035.1
3 date 2022-12-09 2020-03-05
4 country China Switzerland
5 host Homo sapiens Homo sapiens
6 completeness
7 length 29769 29903
8 Nextstrain_clade 22B 20A
9 pangolin_lineage BF.5.1 B.1
10 Nextstrain_clade_usher 22B (Omicron) 20A
11 pango_lineage_usher BF.5.1 B.1

Included hosts:

# Host name GenBank common name Common name
1 unknown
2 Bos taurus cattle cattle
3 Canis lupus gray wolf wolf
4 Canis lupus familiaris gray wolf wolf
5 Capra hircus goat goat
6 Chlorocebus aethiops grivet monkey
7 Chlorocebus sabaeus green monkey monkey
8 Cricetinae hamsters hamster
9 Crocuta crocuta spotted hyena hyena
10 Environment environment
11 Feliformia cats cat
12 Felis catus domestic cat cat
13 Gorilla gorilla gorilla
14 Gorilla gorilla gorilla western lowland gorilla gorilla
15 Homo humans human
16 Homo sapiens human human
17 Iguania iguanian lizards lizard
18 Mesocricetus auratus golden hamster hamster
19 Mus musculus house mouse mouse
20 Mustela lutreola European mink mink
21 Mustela putorius furo domestic ferret ferret
22 Neogale vison American mink mink
23 Odocoileus virginianus white-tailed deer deer
24 Ovis aries sheep sheep
25 Panthera leo lion lion
26 Panthera leo persica Asiatic lion lion
27 Panthera tigris tiger tiger
28 Panthera tigris jacksoni Malayan tiger tiger
29 Prionailurus bengalensis euptilurus Amur leopard cat cat
30 Rodentia rodent
31 Sus scrofa pig

Generate phylogenetic tree by UShER

UCSC hgPhyloPlace, where will generate a newick format phylogenetic tree.

The tip (taxon) name of this tree is like:

`<virus name>|<Accession number>|<collection date>`


- `CHN/YN-0306-466/2020|MT396241.1|2020-03-06`
- `Ireland/LS-NVRL-M35IRL03088/2021|EPI_ISL_3848896|2021-08-07`

Visualize the phylogenetic tree by Taxonium
