Setup BD Single Cell Genomics Rhapsody Analysis

Page content

This post provides instruction of installing DB Genomics Rhapsody™ Analysis pipeline in a local Linux server.

1. Minimal System Requirements

  • Operating system:
    • macOS® X or Linux®.
    • Microsoft® Windows® is not supported.
  • 8-core processor (>16-core recommended)
  • 32 GB RAM (128 GB recommended)
  • 250 GB free disk space

In this tutorial, we will work on Ubuntu 18.04.5 LTS Server amd64.

Note:

  • This pipeline has been tested working on Ubuntu 18.04 LTS, Ubuntu 20.04 LTS and CentOS 7.x.
  • Might not work for other Linux distributions.

2. Install necessary softwares

$ sudo apt update
$ sudo apt install git cwltool

Note: cwltool from Ubuntu package repository works well. So it is not necessary to install cwltool/cwl-runner via pip.

3. Install Docker

3.1 Uninstall old versions of Docker if necessary

$ sudo apt remove docker docker-engine docker.io containerd runc

3.2 Install from repository

  1. Install necessary packages at first:
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
  1. Add Docker official GPG key:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg  
  1. Setup the stable repository:
$ echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  1. Install Docker Engine
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
  1. Verify installation
$ sudo docker run hello-world

This command downloads a test image and runs it in a container.

  1. Manage Docker as a normal user
$ sudo usermod -aG docker $USER
$ newgrp docker

Now you can run docker commands without sudo.

$ docker run hello-world

4. Download BD Genomics Rhapsody image

All bdgenomics rhapsody images are avaliable from Docker Hub. Here we install the most recent one.

$ docker pull bdgenomics/rhapsody:1.9.1

It need a while according to you net speed. After this process completed, verify it by:

$ docker images

5. Download CWL and YML files

These files are available from BD Genomics repository at Bitbucket.

$ git clone https://bitbucket.org/CRSwDev/cwl/ bd-cwl

Files in the sub-directory v1.9.1/ are what we need.

Now the BD Genomics Rhapsody Analysis Pipeline installed completely.

6. A pseudo-run

Here show a sample yml file: demo.yml for Rhapsody pipeline:

#!/usr/bin/env cwl-runner

cwl:tool: rhapsody

Reads:
 - class: File
   location: "demo_S1_L001_R1_001.fq.gz"
 - class: File
   location: "demo_S1_L001_R2_001.fq.gz"

Reference_Genome:
   class: File
   location: "GRCh38-gencodev29.tar.gz"

Transcriptome_Annotation:
   class: File
   location: "gencodev29.gtf"

Sample_Tags_Version: human
Subsample_Tags: 0.2

Please see more details and description in the template file:

bd-cwl/v1.9.1/template_wta_1.9.1.yml.

Next launch the pipeline:

$ cwltool \
    --parallel \
    --tmpdir-prefix tmp_ \
    --outdir result/ \
    rhapsody_wta_1.9.1.cwl \
    demo.yml > demo.log 2>&1

Note:

  • This pipeline will generate lots of temporary files. So don’t forget use the option --tempdir-prefix. Otherwise all temporary files/dirs will be stored in the /tmp directory.
  • Output directory result/ need to be created before hand.
  • The final message “Final process status is success” indicates the pipeline completed success!

A. Reference

  1. BD Single Cell Genomics Analysis Setup User Guide
  2. Docker Documentation: Install Docker Engine on Ubuntu