Archlinux WSL Installation and Configuration

Perhaps installing Windows Subsystem for Linux (WSL) from the Windows Store is the easiest way, and this method will install WSL (such as Ubuntu) on the Windows system drive by default. However, this will compress the storage space of the system drive, so sometimes we want to install WSL on other disk partitions. Here, taking archlinuxwsl as an example, we introduce a general process of installing WSL.

Phylogenetic Analysis SARS CoV 2 With UShER
Create Multiple sequence alignment of your sequences Use SARS-CoV-2 isoloate Whuan-Hu-1 (NC_045512.2) as the reference. Download metadata Metadata from GISAID Data are available from EpiCoV. Fileds of GISAID metadata: # Column Example 1 Virus name hCoV-19/Ireland/LS-NVRL-M35IRL03088/2021 2 Last vaccinated 3 Passage details/history Original 4 Type betacoronavirus 5 Accession ID EPI_ISL_3848896 6 Collection date 2021-08-07 7 Location Europe / Ireland / Laois 8 Additional location information 9 Sequence length 29715 10 Host Human 11 Patient age 21-30 12 Gender unknown 13 Clade GK 14 Pango lineage AY.

Create Phylogenetic Tree of SARS CoV 2 by UShER

There are nearly 14 million viral genome sequences right now in the GISAID EpiCoV ™ database. It is not likely to infer the phylogenetic relationships for such a huge dataset by traditional maximum likelyhood or Bayesian methods in a shor time period. The UShER package was developed to generate ultra-large phylogenetic tree of SARS-CoV-2 genomes. The algorithm of the UShER program is to place new samples onto an existing phylogeny using maximum parsimony method. It is able to place given SARS-CoV-2 genome sequences into the GISAID global phylogeny in a couple of hours. This program is particularly helpful in understanding the relationships of newly sequenced SARS-CoV-2 genomes with each other and with previously sequenced genomes in a global phylogeny.

The UShER package is composed by four programs:

  1. UShER: a program that rapidly places new samples onto an existing phylogeny using maximum parsimony.
  2. matUtils: a toolkit for querying, interpreting and manipulating the mutation-annotated trees (MATs).
  3. matOptimize: a program to rapidly and effectively optimize a mutation-annotated tree (MAT) for parsimony using subtree pruning and regrafting (SPR) moves within a user-defined radius.
  4. RIPPLES: a program that uses a phylogenomic technique to rapidly and sensitively detect recombinant nodes and their ancestors in a mutation-annotated tree (MAT).

The taxoniumtools and Taxonium website are used to display the MAT generated by UShER.

Bulk Load Tsv File Into SQLite Database

The Tab-Separeted Values (TSV) file is a simple text format and widely supported. The data are stored in a tabular structure, each record in the table is one line of the text file. And each field value of a record is separated by a tab character.

It is easy to operate TSV files in programming languages. But if the file were quite large, for instance, millions of lines, it would be difficult to operate the file. Under this circumstances, load the large TSV file into an SQLite3 databasea would be convenient for further operations.

SQLite3 banner