Phylogenetic Analysis SARS CoV 2 With UShER
Page content
Create Multiple sequence alignment of your sequences
Use SARS-CoV-2 isoloate Whuan-Hu-1 (NC_045512.2) as the reference.
Download metadata
Metadata from GISAID
Data are available from EpiCoV.
Fileds of GISAID metadata:
# | Column | Example |
---|---|---|
1 | Virus name | hCoV-19/Ireland/LS-NVRL-M35IRL03088/2021 |
2 | Last vaccinated | |
3 | Passage details/history | Original |
4 | Type | betacoronavirus |
5 | Accession ID | EPI_ISL_3848896 |
6 | Collection date | 2021-08-07 |
7 | Location | Europe / Ireland / Laois |
8 | Additional location information | |
9 | Sequence length | 29715 |
10 | Host | Human |
11 | Patient age | 21-30 |
12 | Gender | unknown |
13 | Clade | GK |
14 | Pango lineage | AY.4 |
15 | Pango version | PLEARN-v1.18 |
16 | Variant AA | VOC Delta GK (B.1.617.2+AY.*) first detected in India |
17 | Substitutions | (N_G215C,NSP3_A1711V,Spike_T95I,…,Spike_L452R) |
18 | Submission date | 2021-07-16 |
19 | Is reference? | |
20 | Is complete? | |
21 | Is high coverage? | True |
22 | Is low coverage? | True |
23 | N-Content | |
24 | GC-Content | 0.379589505862 |
Included hosts:
# | Host name | GenBank Common name | Common name |
---|---|---|---|
1 | Aonyx cinereus | Asian small-clawed otter | otter |
2 | Arctictis binturong | binturong | binturong |
3 | canis lupus | gray wolf | wolf |
4 | Canis lupus | gray wolf | wolf |
5 | Canis lupus familiaris | gray wolf | wolf |
6 | Chaetophractus villosus | large hairy armadillo | armadillo |
7 | Chiroptera | bats | bat |
8 | Chlorocebus sabaeus | green monkey | monkey |
9 | Crocuta crocuta | spotted hyena | hyena |
10 | Cygnus columbianus | tundra swan | swan |
11 | Environment | environment | |
12 | Felis catus | domestic cat | cat |
13 | Felis Catus | domestic cat | cat |
14 | Foreign | human | |
15 | Foreing | human | |
16 | Gorilla | gorilla | gorilla |
17 | Gorilla gorilla | western gorilla | gorilla |
18 | Gorilla gorilla gorilla | western lowland gorilla | gorilla |
19 | Hippopotamus amphibius | hippopotamus | hippo |
20 | Human | human | human |
21 | Humano | human | human |
22 | Laboratory derived | lab | |
23 | Lynx lynx familiaris | Eurasian lynx | lynx |
24 | Manis javanica | Malayan pangolin | pangolin |
25 | Manis pentadactyla | Chinese pangolin | pangolin |
26 | Mesocricetus auratus | golden hamster | hamster |
27 | Mus musculus | house mouse | mouse |
28 | Mustela furo | domestic ferret | ferret |
29 | Mustela putorius furo | domestic ferret | ferret |
30 | Nasua nasua | ring-tailed coati | coati |
31 | Neogale vison | American mink | mink |
32 | Neovison vison | American mink | mink |
33 | Odocoileus virginianus | white-tailed deer | deer |
34 | Panthera leo | lion | lion |
35 | Panthera tigris | tiger | tiger |
36 | Panthera tigris jacksoni | Malayan tiger | tiger |
37 | Panthera tigris sondaica | Javan tiger | tiger |
38 | Panthera tigris tigris | Bengal tiger | tiger |
39 | Panthera uncia | snow leopard | leopard |
40 | Phodopus roborovskii | desert hamster | hamster |
41 | Prionailurus bengalensis euptilurus | Amur leopard cat | cat |
42 | Prionailurus viverrinus | fishing cat | cat |
43 | Puma concolor | puma | puma |
44 | Rhinolophus affinis | intermediate horseshoe bat | bat |
45 | Rhinolophus bat | horseshoe bat | bat |
46 | Rhinolophus malayanus | Malayan horseshoe bat | bat |
47 | Rhinolophus marshalli | Marshall’s horseshoe bat | bat |
48 | Rhinolophus pusillus | Least horseshoe bat | bat |
49 | Rhinolophus shameli | Shamel’s horseshoe bat | bat |
50 | Rhinolophus sinicus | Chinese rufous horseshoe bat | bat |
51 | Rhinolophus stheno | Lesser brown horseshoe bat | bat |
52 | unknown | unkonwn |
Generate metadata for Taxonum.org visualization:
# Get Virus name (1), Accession ID (5), Collection date (6), Location (7), Host (10)
# Pango lineage (14)
cut -f 1,5,6,7,10,14 gisaid-metadata.tsv > gisaid-metadata-plot.tsv
# Remove leading `hCoV-19/` from Virus name column
perl -lpi -e 's/^hCoV\-19\///' gisaid-metadata-plot.tsv
# Extract country from Location field
# Rebuild virus name to `<virus name>|<Accession number>|<collection date>` format
UShER pre-processed data
Available from UShER’s pre-processed mutation-annotated tree object for public SARS-CoV-2 sequences, which provides these files:
- Protobuf file for use with usher
--load-mutation-annotated-tree
:- public-latest.all.masked.pb[.gz]
- Variant Call Format (VCF) file containing mutations in public sequences, generated from public-latest.all.masked.pb with matUtils extract.
- public-latest.all.masked.vcf.gz
- Newick tree file:
- public-latest.all.nwk.gz
- Information about each public sequence, e.g. collection date, location, Nextstrain clade and Pango lineage.
- public-latest.metadata.tsv.gz
Dates and locations are not available for some sequences.
- public-latest.metadata.tsv.gz
- A brief description including date, sources and number of sequences.
- public-latest.version.txt
Columns of the metadata:
# | Column | Example GenBank | Example Other |
---|---|---|---|
1 | strain | CHN/20221209-188/2022|OQ048281.1|2022-12-09 | 100002|LR824035.1|2020-03-05 |
2 | genbank_accession | OQ048281.1 | LR824035.1 |
3 | date | 2022-12-09 | 2020-03-05 |
4 | country | China | Switzerland |
5 | host | Homo sapiens | Homo sapiens |
6 | completeness | ||
7 | length | 29769 | 29903 |
8 | Nextstrain_clade | 22B | 20A |
9 | pangolin_lineage | BF.5.1 | B.1 |
10 | Nextstrain_clade_usher | 22B (Omicron) | 20A |
11 | pango_lineage_usher | BF.5.1 | B.1 |
Included hosts:
# | Host name | GenBank common name | Common name |
---|---|---|---|
1 | unknown | ||
2 | Bos taurus | cattle | cattle |
3 | Canis lupus | gray wolf | wolf |
4 | Canis lupus familiaris | gray wolf | wolf |
5 | Capra hircus | goat | goat |
6 | Chlorocebus aethiops | grivet | monkey |
7 | Chlorocebus sabaeus | green monkey | monkey |
8 | Cricetinae | hamsters | hamster |
9 | Crocuta crocuta | spotted hyena | hyena |
10 | Environment | environment | |
11 | Feliformia | cats | cat |
12 | Felis catus | domestic cat | cat |
13 | Gorilla | gorilla | gorilla |
14 | Gorilla gorilla gorilla | western lowland gorilla | gorilla |
15 | Homo | humans | human |
16 | Homo sapiens | human | human |
17 | Iguania | iguanian lizards | lizard |
18 | Mesocricetus auratus | golden hamster | hamster |
19 | Mus musculus | house mouse | mouse |
20 | Mustela lutreola | European mink | mink |
21 | Mustela putorius furo | domestic ferret | ferret |
22 | Neogale vison | American mink | mink |
23 | Odocoileus virginianus | white-tailed deer | deer |
24 | Ovis aries | sheep | sheep |
25 | Panthera leo | lion | lion |
26 | Panthera leo persica | Asiatic lion | lion |
27 | Panthera tigris | tiger | tiger |
28 | Panthera tigris jacksoni | Malayan tiger | tiger |
29 | Prionailurus bengalensis euptilurus | Amur leopard cat | cat |
30 | Rodentia | rodent | |
31 | Sus scrofa | pig |
Generate phylogenetic tree by UShER
UCSC hgPhyloPlace, where will generate a newick format phylogenetic tree.
The tip (taxon) name of this tree is like:
`<virus name>|<Accession number>|<collection date>`
e.g.,
- `CHN/YN-0306-466/2020|MT396241.1|2020-03-06`
- `Ireland/LS-NVRL-M35IRL03088/2021|EPI_ISL_3848896|2021-08-07`