Phylogenetic Analysis SARS CoV 2 With UShER
Page content
Create Multiple sequence alignment of your sequences
Use SARS-CoV-2 isoloate Whuan-Hu-1 (NC_045512.2) as the reference.
Download metadata
Metadata from GISAID
Data are available from EpiCoV.
Fileds of GISAID metadata:
| # | Column | Example |
|---|---|---|
| 1 | Virus name | hCoV-19/Ireland/LS-NVRL-M35IRL03088/2021 |
| 2 | Last vaccinated | |
| 3 | Passage details/history | Original |
| 4 | Type | betacoronavirus |
| 5 | Accession ID | EPI_ISL_3848896 |
| 6 | Collection date | 2021-08-07 |
| 7 | Location | Europe / Ireland / Laois |
| 8 | Additional location information | |
| 9 | Sequence length | 29715 |
| 10 | Host | Human |
| 11 | Patient age | 21-30 |
| 12 | Gender | unknown |
| 13 | Clade | GK |
| 14 | Pango lineage | AY.4 |
| 15 | Pango version | PLEARN-v1.18 |
| 16 | Variant AA | VOC Delta GK (B.1.617.2+AY.*) first detected in India |
| 17 | Substitutions | (N_G215C,NSP3_A1711V,Spike_T95I,…,Spike_L452R) |
| 18 | Submission date | 2021-07-16 |
| 19 | Is reference? | |
| 20 | Is complete? | |
| 21 | Is high coverage? | True |
| 22 | Is low coverage? | True |
| 23 | N-Content | |
| 24 | GC-Content | 0.379589505862 |
Included hosts:
| # | Host name | GenBank Common name | Common name |
|---|---|---|---|
| 1 | Aonyx cinereus | Asian small-clawed otter | otter |
| 2 | Arctictis binturong | binturong | binturong |
| 3 | canis lupus | gray wolf | wolf |
| 4 | Canis lupus | gray wolf | wolf |
| 5 | Canis lupus familiaris | gray wolf | wolf |
| 6 | Chaetophractus villosus | large hairy armadillo | armadillo |
| 7 | Chiroptera | bats | bat |
| 8 | Chlorocebus sabaeus | green monkey | monkey |
| 9 | Crocuta crocuta | spotted hyena | hyena |
| 10 | Cygnus columbianus | tundra swan | swan |
| 11 | Environment | environment | |
| 12 | Felis catus | domestic cat | cat |
| 13 | Felis Catus | domestic cat | cat |
| 14 | Foreign | human | |
| 15 | Foreing | human | |
| 16 | Gorilla | gorilla | gorilla |
| 17 | Gorilla gorilla | western gorilla | gorilla |
| 18 | Gorilla gorilla gorilla | western lowland gorilla | gorilla |
| 19 | Hippopotamus amphibius | hippopotamus | hippo |
| 20 | Human | human | human |
| 21 | Humano | human | human |
| 22 | Laboratory derived | lab | |
| 23 | Lynx lynx familiaris | Eurasian lynx | lynx |
| 24 | Manis javanica | Malayan pangolin | pangolin |
| 25 | Manis pentadactyla | Chinese pangolin | pangolin |
| 26 | Mesocricetus auratus | golden hamster | hamster |
| 27 | Mus musculus | house mouse | mouse |
| 28 | Mustela furo | domestic ferret | ferret |
| 29 | Mustela putorius furo | domestic ferret | ferret |
| 30 | Nasua nasua | ring-tailed coati | coati |
| 31 | Neogale vison | American mink | mink |
| 32 | Neovison vison | American mink | mink |
| 33 | Odocoileus virginianus | white-tailed deer | deer |
| 34 | Panthera leo | lion | lion |
| 35 | Panthera tigris | tiger | tiger |
| 36 | Panthera tigris jacksoni | Malayan tiger | tiger |
| 37 | Panthera tigris sondaica | Javan tiger | tiger |
| 38 | Panthera tigris tigris | Bengal tiger | tiger |
| 39 | Panthera uncia | snow leopard | leopard |
| 40 | Phodopus roborovskii | desert hamster | hamster |
| 41 | Prionailurus bengalensis euptilurus | Amur leopard cat | cat |
| 42 | Prionailurus viverrinus | fishing cat | cat |
| 43 | Puma concolor | puma | puma |
| 44 | Rhinolophus affinis | intermediate horseshoe bat | bat |
| 45 | Rhinolophus bat | horseshoe bat | bat |
| 46 | Rhinolophus malayanus | Malayan horseshoe bat | bat |
| 47 | Rhinolophus marshalli | Marshall’s horseshoe bat | bat |
| 48 | Rhinolophus pusillus | Least horseshoe bat | bat |
| 49 | Rhinolophus shameli | Shamel’s horseshoe bat | bat |
| 50 | Rhinolophus sinicus | Chinese rufous horseshoe bat | bat |
| 51 | Rhinolophus stheno | Lesser brown horseshoe bat | bat |
| 52 | unknown | unkonwn |
Generate metadata for Taxonum.org visualization:
# Get Virus name (1), Accession ID (5), Collection date (6), Location (7), Host (10)
# Pango lineage (14)
cut -f 1,5,6,7,10,14 gisaid-metadata.tsv > gisaid-metadata-plot.tsv
# Remove leading `hCoV-19/` from Virus name column
perl -lpi -e 's/^hCoV\-19\///' gisaid-metadata-plot.tsv
# Extract country from Location field
# Rebuild virus name to `<virus name>|<Accession number>|<collection date>` format
UShER pre-processed data
Available from UShER’s pre-processed mutation-annotated tree object for public SARS-CoV-2 sequences, which provides these files:
- Protobuf file for use with usher
--load-mutation-annotated-tree:- public-latest.all.masked.pb[.gz]
- Variant Call Format (VCF) file containing mutations in public sequences, generated from public-latest.all.masked.pb with matUtils extract.
- public-latest.all.masked.vcf.gz
- Newick tree file:
- public-latest.all.nwk.gz
- Information about each public sequence, e.g. collection date, location, Nextstrain clade and Pango lineage.
- public-latest.metadata.tsv.gz
Dates and locations are not available for some sequences.
- public-latest.metadata.tsv.gz
- A brief description including date, sources and number of sequences.
- public-latest.version.txt
Columns of the metadata:
| # | Column | Example GenBank | Example Other |
|---|---|---|---|
| 1 | strain | CHN/20221209-188/2022|OQ048281.1|2022-12-09 | 100002|LR824035.1|2020-03-05 |
| 2 | genbank_accession | OQ048281.1 | LR824035.1 |
| 3 | date | 2022-12-09 | 2020-03-05 |
| 4 | country | China | Switzerland |
| 5 | host | Homo sapiens | Homo sapiens |
| 6 | completeness | ||
| 7 | length | 29769 | 29903 |
| 8 | Nextstrain_clade | 22B | 20A |
| 9 | pangolin_lineage | BF.5.1 | B.1 |
| 10 | Nextstrain_clade_usher | 22B (Omicron) | 20A |
| 11 | pango_lineage_usher | BF.5.1 | B.1 |
Included hosts:
| # | Host name | GenBank common name | Common name |
|---|---|---|---|
| 1 | unknown | ||
| 2 | Bos taurus | cattle | cattle |
| 3 | Canis lupus | gray wolf | wolf |
| 4 | Canis lupus familiaris | gray wolf | wolf |
| 5 | Capra hircus | goat | goat |
| 6 | Chlorocebus aethiops | grivet | monkey |
| 7 | Chlorocebus sabaeus | green monkey | monkey |
| 8 | Cricetinae | hamsters | hamster |
| 9 | Crocuta crocuta | spotted hyena | hyena |
| 10 | Environment | environment | |
| 11 | Feliformia | cats | cat |
| 12 | Felis catus | domestic cat | cat |
| 13 | Gorilla | gorilla | gorilla |
| 14 | Gorilla gorilla gorilla | western lowland gorilla | gorilla |
| 15 | Homo | humans | human |
| 16 | Homo sapiens | human | human |
| 17 | Iguania | iguanian lizards | lizard |
| 18 | Mesocricetus auratus | golden hamster | hamster |
| 19 | Mus musculus | house mouse | mouse |
| 20 | Mustela lutreola | European mink | mink |
| 21 | Mustela putorius furo | domestic ferret | ferret |
| 22 | Neogale vison | American mink | mink |
| 23 | Odocoileus virginianus | white-tailed deer | deer |
| 24 | Ovis aries | sheep | sheep |
| 25 | Panthera leo | lion | lion |
| 26 | Panthera leo persica | Asiatic lion | lion |
| 27 | Panthera tigris | tiger | tiger |
| 28 | Panthera tigris jacksoni | Malayan tiger | tiger |
| 29 | Prionailurus bengalensis euptilurus | Amur leopard cat | cat |
| 30 | Rodentia | rodent | |
| 31 | Sus scrofa | pig |
Generate phylogenetic tree by UShER
UCSC hgPhyloPlace, where will generate a newick format phylogenetic tree.
The tip (taxon) name of this tree is like:
`<virus name>|<Accession number>|<collection date>`
e.g.,
- `CHN/YN-0306-466/2020|MT396241.1|2020-03-06`
- `Ireland/LS-NVRL-M35IRL03088/2021|EPI_ISL_3848896|2021-08-07`