Install and Setup MetaPhlAn 3.0
MetaPhlAn (Metagenomic Phylogenetic Analysis) is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
This post describes the installation and setup of MetaPhlAn 3.0 in Debian 11/bullseye.
Last updated at: 2021-12-17.
1. Install MetaPhlAn 3.0
The official recommended installation method is conda
via the Bioconda channel.
For a Linux computer, download and install Miniconda first.
The latest version is Conda 4.10.3 Python 3.9.5 released July 21, 2021.
Then setup bioconda
and conda-forge
channels. It is important to add them in this order so that the priority is set correctly (that is, conda-forge is highest priority).
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
The conda-forge channel contains many general-purpose packages not already found in the defaults channel. Otherwise, the setup database step will be fail.
Now, create an isolated conda environment mpa
and install MetaPhlAn into it.
$ conda create --name mpa -c bioconda python=3.7 metaphlan
2. Download and setup database
MetaPhlAn needs the clade markers and the database to be downloaded locally. The default MetaPhlAn 3.0 database folder path is $HOME/miniconda3/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan_databases/
. But it’s recommended to install the database in a folder outside the Conda environment.
# Database dir: $HOME/db/metaphlan_databases
$ cd
$ mkdir -p db/metaphlan_databases
Then download and install latest databases:
# Enter `mpa` environment first
$ conda activate mpa
# Download & install latest database
(mpa) $ metaphlan --install --bowtie2db ~/db/metaphlan_databases
You could also manual download related files from:
Only download the .tar, .md5, and the mpa_latest files and place them in the metaphlan_databases folder.
3. Test the installation
Create a new folder metaphlan_analysis
and enter:
(mpa) $ mkdir metaphlan_analysis
(mpa) $ cd metaphlan_analysis
Then download a sample file:
(mpa) $ curl -LO https://github.com/biobakery/biobakery/raw/master/demos/biobakery_demos/data/metaphlan3/input/SRS014476-Supragingival_plaque.fasta.gz
# or
(mpa) $ wget https://github.com/biobakery/biobakery/raw/master/demos/biobakery_demos/data/metaphlan3/input/SRS014476-Supragingival_plaque.fasta.gz
Run a single sample:
(mpa) $ metaphlan SRS014476-Supragingival_plaque.fasta.gz --input_type fasta > SRS014476-Supragingival_plaque_profile.txt
It will output two files:
- SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt: contains the intermediate mapping results to unique sequence markers.
- SRS014476-Supragingival_plaque_profile.txt: contains the final computed organism abundances.
More details please see MetaPhlAn 3.0 Tutorial and Wiki.