Install ChIP-Seq training command line software
[ Main_Page ]
Contents
Introduction
Tools describes here are used in the ChIP-Seq training (2014). We show here how you can install these tools on your own personal computer with Unix or macOSX operating system.
For Windows PC users, a third-party addition is required to emulate Unix (a good choice is Virtual Box provided you have enough free resources on your machine [1]). A detailed protocol is provided on the WIKI on how to install VirtualBox in Create a virtual machine running a Linux distribution using VirtualBox.
BITS laptops were installed with Mint 13 aka Maya [2] (more recent versions of Mint were tried on the BITS laptops but only Maya did the job so far, you can try newer versions at your own risks). Other Unix operating system will also do the job, among them Ubuntu [3] is the most popular among human beings (and related to Mint) while Centos [4] is favored by computing specialists because it is widely used by corporate computing infrastructures. You may pick any of these, they are all free.
Mint (and other systems) come with a built-in 'software manager' (called either apt-get or yum) allowing easy installation of the many system resources required for biocomputing in general but not necessarily of the 'bioware' specifically required during the BITS trainings. You should always use the 'software manager' to add dependencies required by 'bioware' as it will take care of storing these in the right place on your system.
Mac OSX users benefit of the built-in Darwin Unix environment that is part of OSX. They will however first need to install the Apple Developer tools and a package installer like MacPort before they can build 'bioware' and install it on their computer. This operation is described in the companion WIKI page Turn you Apple laptop into a Unix data crunching beast.
Now that your Unix system is up and running you can add 'bioware' applications used in the training. To do so, please refer to the remaining part of this document.
Downloading and installing CLI tools
[edited, 2014-02-26]
The current version of the tools described below was selected at the time of writing of this page, please check if more recent stable versions exist on the respective developer pages.
The training requires a number of tools that will now be installed on your computer. One exception is the webtool RSAT which can be used from the web browser.
WebTool(s)
- RSAT peak motifs (http://rsat.ulb.ac.be/peak-motifs_form.cgi [web-tool URL])
Standalone tool with graphical interface
- FASTQC v0.10.1 (http://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc [Win/Unix Mac and source]). FastQC is also available for use at command line as illustrated in Perform basic read QC at command line prior to mapping. Please instal both versions and decide later which one you prefer.
Compiled packages or Source code to build tools
- samtools v0.1.19 (http://sourceforge.net/projects/samtools/files/latest/download) [source]
- Bedtools2 v2.19 (https://github.com/arq5x/bedtools2 [source])
- Bowtie v1.0.0 original version preferred for short reads (http://bowtie-bio.sourceforge.net/index.shtml, http://sourceforge.net/projects/bowtie-bio/files/bowtie/1.0.0/ [source])
- Bowtie2 v2.2.0 NEW from Februari 10 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml | https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.0 [source])
- MACS v1.4.2 (http://liulab.dfci.harvard.edu/MACS/ | http://liulab.dfci.harvard.edu/MACS/Download.html [source]) - MACS needs python 1.6 installed on your machine. Installing python can be tricky and is not detailed here. If python is not installed or not from the right version the MACS install will fail and complain.
- cutadapt v1.3 (https://code.google.com/p/cutadapt/wiki/documentation [doc and source]) - clip or filter out reads containing adaptor sequence provided by the preliminary FastQC run.
From now on, we work under terminal and type commands to interact with the operating system.
We first create a folder where we will install all packages, this folder is kept distinct from the system default installation folder so that we do not interfere with our system and do not take risks. We can create this folder in our own HOME folder where we have full privileges and can install and run files without restrictions.
# define a default folder where to store and build bioware export BIOWARE=$HOME/bioware # create a folder where to decompress all source packages mkdir -p ${BIOWARE}/downloads # also create a bin folder where programs will be aliased mkdir -p ${BIOWARE}/bin
Download Samtools, decompress, build
Samtools [5] is the official software to process alignment data (SAM | BAM) and is also used by Picard tools [6] as a java implementation. Both tools perform simlarly but samtools is sometimes easier to use in command-line pipelines.
cd ${BIOWARE}/download/ wget --no-check-certificate \ http://downloads.sourceforge.net/project/samtools/samtools/0.1.19/samtools-0.1.19.tar.bz2 tar -xjvf samtools-0.1.19.tar.bz2 # the result is a folder named <samtools-0.1.19> cd ${BIOWARE}/download/samtools-0.1.19 make # if this fails, your machine is lacking some vital parts, # reading the error messages should help define what is missing # and install it using your package installer ( yum, apt-get, macport, ...) # one more command to also make 'razip' make razip # test the make result by running samtools from the current folder ./samtools Program: samtools (Tools for alignments in the SAM format) Version: 0.1.19-44428cd Usage: samtools <command> [options] Command: view SAM<->BAM conversion sort sort alignment file mpileup multi-way pileup depth compute the depth faidx index/extract FASTA tview text alignment viewer index index alignment idxstats BAM index stats (r595 or later) fixmate fix mate information flagstat simple stats calmd recalculate MD/NM tags and '=' bases merge merge sorted alignments rmdup remove PCR duplicates reheader replace BAM header cat concatenate BAMs bedcov read depth per BED region targetcut cut fosmid regions (for fosmid pool only) phase phase heterozygotes bamshuf shuffle and group alignments by name ./razip Usage: razip [options] [file] ... Options: -c write on standard output, keep original files unchanged -d decompress -l list compressed file contents -b INT decompress at INT position in the uncompressed file -s INT decompress INT bytes in the uncompressed file -h give this help # finally we can test the samtools 'manpage' with man ./samtools.1 # all source files ending with .a .h .c may be deleted now but they can also be left here
Download Bedtools, decompress, build
BedTools [7] is the mother of all tools in NGS analysis. You will quickly become highly dependent on it to performa all kind of tasks. Have a look to the nice online readthedocs manual [8].
cd ${BIOWARE}/download/ wget --no-check-certificate \ https://github.com/arq5x/bedtools2/archive/master.zip -O bedtools.zip unzip bedtools.zip # the result is a folder named <bedtools2-master> cd ${BIOWARE}/download/bedtools2-master make # wait for several pages if all goes right # check that things were made ls bin/ annotateBed bedToIgv complementBed genomeCoverageBed mapBed nucBed slopBed windowBed bamToBed bedpeToBam coverageBed getOverlap maskFastaFromBed pairToBed sortBed windowMaker bamToFastq bedtools expandCols groupBy mergeBed pairToPair subtractBed bed12ToBed6 closestBed fastaFromBed intersectBed multiBamCov randomBed tagBam bedToBam clusterBed flankBed linksBed multiIntersectBed shuffleBed unionBedGraphs # quite many tools there? nice!
Download cutadapt, decompress, install
cutadapt v1.3 is a complete command able to find adaptor sequences in short reads and treat them as they diserve (choice of the user). The command line application can be downloaded [9] and was described in a short EMBL publication [10]
Installations and example command to clip adaptors from 'infected' reads, leaving the remaining sequence untouched; Please read the command help for the rich list of options.
# download cd ${BIOWARE}/download/ wget --no-check-certificate https://cutadapt.googlecode.com/files/cutadapt-1.3.tar.gz #decompress it tar -xzvf cutadapt-1.3.tar.gz # the result is a folder named <cutadapt-1.3> # install the python package python2.7 setup.py install # empty run to get command details cutadapt # run example syntax cutadapt -e ERROR-RATE -a ADAPTER-SEQUENCE input.fastq > output.fastq
Download Bowtie1, decompress
Bowtie1 [11] is the original version mapper for NGS reads that is prefered when working with shrt reads (<50bp) as in ChIP-Seq. It can be tuned to omit specific base positions (known as bad) and tolerate more or less mismatches as well as alternatie mappings. Bowtie1 comes already built, we only need to decompress it. Bowtie also needs a reference index for each genome you plan to map against. Building specific bowtie indexes is detailed in a separate document Indexing genomes for bowtie.
# alt for unix computers: # http://sourceforge.net/projects/bowtie-bio/files/bowtie/1.0.0/bowtie-1.0.0-linux-x86_64.zip/download wget --no-check-certificate \ http://sourceforge.net/projects/bowtie-bio/files/bowtie/1.0.0/bowtie-1.0.0-macos-x86_64.zip/download # unzip bowtie-1.0.0-linux-x86_64.zip unzip bowtie-1.0.0-macos-x86_64.zip # the result is a folder named <bowtie-1.0.0>
Download Bowtie2, decompress
Bowtie2 [12] is the mapper for NGS reads that will be used by most current NGS pipelines (list of tools using Bowtie(2) [13]). It needs reads (obviously) as well as the reference genome and the annotation file corresponding to your transcript model of choice. We here only install Bowtie2 but do not take care of the reference genome and annotations (done elsewhere). Bowtie2 comes already built, we only need to decompress it. Bowtie also needs a reference index for each genome you plan to map against. Bowtie2 indexes differ from Bowtie1 versions. Building specific bowtie1 indexes is detailed in a separate document Indexing genomes for bowtie but the method is very similar with bowtie2-build.
# alt for unix computers: # http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.0/bowtie2-2.2.0-linux-x86_64.zip wget --no-check-certificate \ http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.0/bowtie2-2.2.0-macos-x86_64.zip # unzip bowtie2-2.2.0-linux-x86_64.zip unzip bowtie2-2.2.0-macos-x86_64.zip # the result is a folder named <bowtie2-2.2.0>
Download MACS, decompress, install
MACS1.4[14] is a python application and needs to be added to your python environment using the provided installer. A developer version of MACS (aka 2.0) is also available but not yet stable enough to be trusted, check it in the future as it looks promising).
cd ${BIOWARE}/download/ wget --no-check-certificate https://github.com/downloads/taoliu/MACS/MACS-1.4.2-1.tar.gz # decompress tar -xzvf MACS-1.4.2-1.tar.gz #the result is a folder named <MACS-1.4.2> cd MACS-1.4.2 # the next may need root access python2.7 setup.py install
alias all executable commands to the bin folder and add bin to your PATH
This step is very important, it will ensure that all build programs will be found in one place by your system and executed from the terminal without having to provide their full address (aka path).
# move to the 'bin' folder cd ${BIOWARE}/bin # make symbolic links to all executable files present in the neighbor 'download' folder # now comes the complex command find ${BIOWARE}/download -type f ! -name Makefile -perm +111 -exec ln -s -f {} ${BIOWARE}/bin/ \; # dissecting the find command into pieces gives: # finds in '${BIOWARE}/download' # looks for files '-type f' # does not look for files named 'Makefile' as they are only needed to build # looks only at executable files '-perm +111' # for each found file (-exec), create an alias to it in /bin 'ln -s -f {} ${BIOWARE}/bin/' # -f (force) replaces existing alias when running this again
The PATH variable is the address book where your system looks for executable commands (programs). Now that we put aliases to our newly build commands into the bin folder we can at once add bin to the PATH. We add bin before the remaining of PATH so that our version of the executable will be found first even if an older version already exists elsewhere on that computer. Note the absence of '$' when defining a variable and its presence when using the value of that variable.
The next command will add the path for the current terminal session, if you open a new terminal you will loose this addition to your PATH. One way to make this permanent is to paste the text below at the end of a file already present in your home folder and named .bashrc. This file gets executed by the system each time a new terminal window opens.
# add bin to the PATH export BIOWARE=$HOME/bioware export PATH=${BIOWARE}/bin:$PATH
We can test the final step by asking where to find one of our programs
which bowtie2 /Users/bits/bioware/bin/bowtie2
References:
- ↑ https://www.virtualbox.org
- ↑ http://www.linuxmint.com/oldreleases.php
- ↑ http://www.ubuntu.com/download/desktop
- ↑ http://www.centos.org/download/
- ↑ http://samtools.sourceforge.net
- ↑ http://picard.sourceforge.net/index.shtml
- ↑ https://github.com/arq5x/bedtools2
- ↑ http://bedtools.readthedocs.org/en/latest/
- ↑ https://code.google.com/p/cutadapt/downloads/detail?name=cutadapt-1.3.tar.gz
- ↑ http://journal.embnet.org/index.php/embnetjournal/article/view/200
- ↑ http://bowtie-bio.sourceforge.net/index.shtml
- ↑ http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
- ↑ http://bowtie-bio.sourceforge.net/bowtie2/other_tools.shtml
- ↑ http://liulab.dfci.harvard.edu/MACS/Download.html
[ Main_Page ]