Install ChIP-Seq training command line software

From BITS wiki
Jump to: navigation, search

[ Main_Page ]


Tools describes here are used in the ChIP-Seq training (2014). We show here how you can install these tools on your own personal computer with Unix or macOSX operating system.

For Windows PC users, a third-party addition is required to emulate Unix (a good choice is Virtual Box provided you have enough free resources on your machine [1]). A detailed protocol is provided on the WIKI on how to install VirtualBox in Create a virtual machine running a Linux distribution using VirtualBox.

BITS laptops were installed with Mint 13 aka Maya [2] (more recent versions of Mint were tried on the BITS laptops but only Maya did the job so far, you can try newer versions at your own risks). Other Unix operating system will also do the job, among them Ubuntu [3] is the most popular among human beings (and related to Mint) while Centos [4] is favored by computing specialists because it is widely used by corporate computing infrastructures. You may pick any of these, they are all free.

Mint (and other systems) come with a built-in 'software manager' (called either apt-get or yum) allowing easy installation of the many system resources required for biocomputing in general but not necessarily of the 'bioware' specifically required during the BITS trainings. You should always use the 'software manager' to add dependencies required by 'bioware' as it will take care of storing these in the right place on your system.

Mac OSX users benefit of the built-in Darwin Unix environment that is part of OSX. They will however first need to install the Apple Developer tools and a package installer like MacPort before they can build 'bioware' and install it on their computer. This operation is described in the companion WIKI page Turn you Apple laptop into a Unix data crunching beast.

Now that your Unix system is up and running you can add 'bioware' applications used in the training. To do so, please refer to the remaining part of this document.

Downloading and installing CLI tools

[edited, 2014-02-26]

The current version of the tools described below was selected at the time of writing of this page, please check if more recent stable versions exist on the respective developer pages.

The training requires a number of tools that will now be installed on your computer. One exception is the webtool RSAT which can be used from the web browser.


Standalone tool with graphical interface

Compiled packages or Source code to build tools

From now on, we work under terminal and type commands to interact with the operating system.


Technical.png We first create a folder where we will install all packages, this folder is kept distinct from the system default installation folder so that we do not interfere with our system and do not take risks. We can create this folder in our own HOME folder where we have full privileges and can install and run files without restrictions.

# define a default folder where to store and build bioware
export BIOWARE=$HOME/bioware
# create a folder where to decompress all source packages
mkdir -p ${BIOWARE}/downloads
# also create a bin folder where programs will be aliased
mkdir -p ${BIOWARE}/bin

Download Samtools, decompress, build

Samtools [5] is the official software to process alignment data (SAM | BAM) and is also used by Picard tools [6] as a java implementation. Both tools perform simlarly but samtools is sometimes easier to use in command-line pipelines.

cd ${BIOWARE}/download/
wget --no-check-certificate \
tar -xjvf samtools-0.1.19.tar.bz2
# the result is a folder named <samtools-0.1.19>
cd ${BIOWARE}/download/samtools-0.1.19
# if this fails, your machine is lacking some vital parts, 
# reading the error messages should help define what is missing 
# and install it using your package installer ( yum, apt-get, macport, ...)
# one more command to also make 'razip'
make razip
# test the make result by running samtools from the current folder
Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.19-44428cd
Usage:   samtools <command> [options]
Command: view        SAM<->BAM conversion
         sort        sort alignment file
         mpileup     multi-way pileup
         depth       compute the depth
         faidx       index/extract FASTA
         tview       text alignment viewer
         index       index alignment
         idxstats    BAM index stats (r595 or later)
         fixmate     fix mate information
         flagstat    simple stats
         calmd       recalculate MD/NM tags and '=' bases
         merge       merge sorted alignments
         rmdup       remove PCR duplicates
         reheader    replace BAM header
         cat         concatenate BAMs
         bedcov      read depth per BED region
         targetcut   cut fosmid regions (for fosmid pool only)
         phase       phase heterozygotes
         bamshuf     shuffle and group alignments by name
Usage:   razip [options] [file] ...
Options: -c      write on standard output, keep original files unchanged
         -d      decompress
         -l      list compressed file contents
         -b INT  decompress at INT position in the uncompressed file
         -s INT  decompress INT bytes in the uncompressed file
         -h      give this help
# finally we can test the samtools 'manpage' with
 man ./samtools.1
# all source files ending with .a .h .c may be deleted now but they can also be left here

Download Bedtools, decompress, build

BedTools [7] is the mother of all tools in NGS analysis. You will quickly become highly dependent on it to performa all kind of tasks. Have a look to the nice online readthedocs manual [8].

cd ${BIOWARE}/download/
wget --no-check-certificate \ -O
# the result is a folder named <bedtools2-master>
cd ${BIOWARE}/download/bedtools2-master
# wait for several pages if all goes right
# check that things were made
ls bin/
annotateBed  bedToIgv	 complementBed	genomeCoverageBed  mapBed	      nucBed	  slopBed	  windowBed
bamToBed     bedpeToBam  coverageBed	getOverlap	   maskFastaFromBed   pairToBed   sortBed	  windowMaker
bamToFastq   bedtools	 expandCols	groupBy		   mergeBed	      pairToPair  subtractBed
bed12ToBed6  closestBed  fastaFromBed	intersectBed	   multiBamCov	      randomBed   tagBam
bedToBam     clusterBed  flankBed	linksBed	   multiIntersectBed  shuffleBed  unionBedGraphs
# quite many tools there? nice!

Download cutadapt, decompress, install

cutadapt v1.3 is a complete command able to find adaptor sequences in short reads and treat them as they diserve (choice of the user). The command line application can be downloaded [9] and was described in a short EMBL publication [10]

Installations and example command to clip adaptors from 'infected' reads, leaving the remaining sequence untouched; Please read the command help for the rich list of options.

# download
cd ${BIOWARE}/download/
wget --no-check-certificate
#decompress it
tar -xzvf cutadapt-1.3.tar.gz
# the result is a folder named <cutadapt-1.3>
# install the python package
python2.7 install
# empty run to get command details
# run example syntax
cutadapt -e ERROR-RATE -a ADAPTER-SEQUENCE input.fastq > output.fastq

Download Bowtie1, decompress

Bowtie1 [11] is the original version mapper for NGS reads that is prefered when working with shrt reads (<50bp) as in ChIP-Seq. It can be tuned to omit specific base positions (known as bad) and tolerate more or less mismatches as well as alternatie mappings. Bowtie1 comes already built, we only need to decompress it. Bowtie also needs a reference index for each genome you plan to map against. Building specific bowtie indexes is detailed in a separate document Indexing genomes for bowtie.

# alt for unix computers: 
wget --no-check-certificate \
# unzip
# the result is a folder named <bowtie-1.0.0>

Download Bowtie2, decompress

Bowtie2 [12] is the mapper for NGS reads that will be used by most current NGS pipelines (list of tools using Bowtie(2) [13]). It needs reads (obviously) as well as the reference genome and the annotation file corresponding to your transcript model of choice. We here only install Bowtie2 but do not take care of the reference genome and annotations (done elsewhere). Bowtie2 comes already built, we only need to decompress it. Bowtie also needs a reference index for each genome you plan to map against. Bowtie2 indexes differ from Bowtie1 versions. Building specific bowtie1 indexes is detailed in a separate document Indexing genomes for bowtie but the method is very similar with bowtie2-build.

# alt for unix computers: 
wget --no-check-certificate \
# unzip
# the result is a folder named <bowtie2-2.2.0>

Download MACS, decompress, install

MACS1.4[14] is a python application and needs to be added to your python environment using the provided installer. A developer version of MACS (aka 2.0) is also available but not yet stable enough to be trusted, check it in the future as it looks promising).

cd ${BIOWARE}/download/
wget --no-check-certificate
# decompress
tar -xzvf MACS-1.4.2-1.tar.gz
#the result is a folder named <MACS-1.4.2>
cd MACS-1.4.2
# the next may need root access
python2.7 install

alias all executable commands to the bin folder and add bin to your PATH

This step is very important, it will ensure that all build programs will be found in one place by your system and executed from the terminal without having to provide their full address (aka path).

# move to the 'bin' folder
cd ${BIOWARE}/bin
# make symbolic links to all executable files present in the neighbor 'download' folder
# now comes the complex command
find ${BIOWARE}/download -type f ! -name Makefile -perm +111 -exec ln -s -f {} ${BIOWARE}/bin/ \;
# dissecting the find command into pieces gives:
# finds in '${BIOWARE}/download' 
# looks for files '-type f'
# does not look for files named 'Makefile' as they are only needed to build
# looks only at executable files '-perm +111'
# for each found file (-exec), create an alias to it in /bin 'ln -s -f {} ${BIOWARE}/bin/'
# -f (force) replaces existing alias when running this again

Technical.png The PATH variable is the address book where your system looks for executable commands (programs). Now that we put aliases to our newly build commands into the bin folder we can at once add bin to the PATH. We add bin before the remaining of PATH so that our version of the executable will be found first even if an older version already exists elsewhere on that computer. Note the absence of '$' when defining a variable and its presence when using the value of that variable.

Handicon.png The next command will add the path for the current terminal session, if you open a new terminal you will loose this addition to your PATH. One way to make this permanent is to paste the text below at the end of a file already present in your home folder and named .bashrc. This file gets executed by the system each time a new terminal window opens.

# add bin to the PATH
export BIOWARE=$HOME/bioware
export PATH=${BIOWARE}/bin:$PATH

We can test the final step by asking where to find one of our programs

which bowtie2


[ Main_Page ]