Turn you Apple laptop into a Unix data crunching beast

From BITS wiki
Jump to: navigation, search


[ Main_Page ]


Install Apple developer tools, XQuartz and a package management system

2014-02-21, initial version 1.0

This page is dedicated to installing required developer files on a OSX Apple laptop. This will then allow building biocomputing applications from source or installing open-source packages developed for different Unix flavours including the Apple 'Darwin' flavor.

Mortasecca.png Warning: Please follow each step cautiously as it could lead to data loss on your machine or even worse break your system. Use Time Machine to make a full backup before attempting this (but you already do this on daily basis right?)

apple_vs_cluster.png

Introduction

Welcome to the world of command line, after a period of pain and fear, this will open new dimensions to your work if you try hard enough

First of all you need ADMIN access to your machine (aka root access) see with you IT if this is not the case as this is mandatory

Second, follow the introduction to Linux BITS trainings as you WILL need this

Finally, you need at least 5GB free space on your mac to work comfortably with additional installs (and at least 10GB for your future data, 30GB would be better)

Minimal requirements

You should not attempt this if you own an old laptop with less than 2GB RAM. For NGS data analysis prefer 4GB or more (if you can), especially if you plan to map reads to a mammalian or large plant reference genome.

  • you will not perform full genome mapping for human or mouse reads on a laptop, for that you need a much stronger machine with multiple CPUs and lots of RAM ( 8 cores and 48+ GB RAM is a reasonable minimum configuration for serious work with human or mouse reads, it will allow mapping one Illumina genome sequencing sample in ~1day doing nothing else in the meantime!! ).
  • by contrast you can map your reads on a strong computer and copy mapping results back to your laptop for the remaining of the NGS analysis.
  • for exome sequencing, reasonably sized RNA sequencing, small insect or microorganism genomes, OSX >= snow-leopard with 2GB will do perfectly for the full workflow (and of course be slow for heavy tasks).

We consider here that you own a recent macbook pro with >=4GB RAM, at least two cores, and a lot of free hard disk space (30GB or more).

Adding required developer tools to your mac OSX basic installation

Xcode_icon.png

The instructions below are kept to an 'anorexic' minimum. Please read manuals online and detailed instructions for each step.

Apple developer tools

appstore_install_xcode.png

  • Create a new account on the Apple developer page [1]and keep your credentials at hand for later downloads (if you already have an Apple ID, you may use it together with the associated password)
  • with this account login and download Xcode 5.0.2 (3+ GB!). On mavericks, you may have to do this under App Store (in the 'apple menu')
  • Install XCode and reboot your machine. Check for updates after rebooting and apply any. You will never need to know more about Xcode, it has to be here, that’s all.

After installing Xcode, you should also install the command line tools. this can be done from the terminal with the following command

xcode-select --install

The command will trigger download of a new package that you should install.

command-line-tools.png

X11 (older OSX) or XQuartz (mavericks)

XQuartz_icon.png

Most command-line programs save their results to file(s) but some need to output graphically, for which you will need a specific version of the terminal. Xquartz [2] replaces the defunct X11 in mavericks and is required to generate graphical output under command line, it also allows building interfaces to command-line programs and help interactivity between the user, the program, and the data.

xquartz_package.png

  • click on the dmg link and install
  • add the terminal app to your bar from /Applications/utilities/terminal.app
  • add the freshly installed XQuartz.app (X11.app) to your toolbar

terminal_x11.png

you will use these two quite often

Mac Port - or similar package installer

You may install everything from source but for that you will need Unix sysadmin knowledge and take risks. On the other hand, you can rely on a package installer to install things for you. This is what we suggest, using Mac Port.

macport_icon.png

Mac Port will help you maintain most of the installations ranging from bio-software to other trillions different command line utilities (The MacPorts Project currently distributes 18185 ports, organized across 87 different categories [3]). Alternatives to macport include Fink (older) and Homebrew [4], that is reported as lighter than macport but has a more restricted choice of available packages. Google MacPorts vs. Homebrew for side by side comparison of the two (read this post [5] for more info on installing Homebrew).

[6] please take the time to read once through the manual, it may sound chinese but there is a lot of important stuf there you will once be happy to have read once and will find back

download_macport.png

!!! follow the instruction to install the ‘command line tools' from the link on top

then follow the next link to download and install the Port system (Port is like windows 'software updates' but dedicated to command line things

install_macport.png

The link downloads the package, install it with admin access (password will be asked)

macport_installer.png

Now that port is installed, you should always first try to install things using it and move to ‘building from source’ when this does not work. Port is smarter in locating files in the right pace on your system while 'building from source' fully depends on what you provide and could lead to issues (or simply fail).

open terminal and type the following two commands: sudo means the command is run as root and has all privileges, including that of fully erasing the content of the hard disk ;-) if you enter such command

bits@bits:~$ sudo port selfupdate
Password:
--->  Updating MacPorts base sources using rsync
MacPorts base version 2.2.1 installed,
MacPorts base version 2.2.1 downloaded.
--->  Updating the ports tree
--->  MacPorts base is already the latest version
 
The ports tree has been updated. To upgrade your installed ports, you should run
  port upgrade outdated
 
bits@bits:~$ sudo port upgrade outdated

The result will be empty ( Nothing to upgrade.) or updating a number of installed packages, you should run this routine from time to time to keep with latest version of all installed bio software

Please read the Documentation for other Port commands (list, install, uninstall, inactivate, and many more options)

What's next?

Installing specific Biocomputing packages

In order to process NGS reads, you will need a number of applications to map the reads, compute coverage, perform QC, ... What we just installed did not yet include any bioware, these now need to be added to your machine depending on your needs.

Biological data analysis is covered during different BITS trainings as well as hand-on practical sessions. The command line programs used to perform this work can partly or fully be installed using the now available 'apple developer tool + mac port resources'. Other tools will require building from source code obtained from third party web-sites.

Building tools is an art that will not be detailed here. It can range to the classical four commands shown below to a half day searching around and finding missing libraries.

# typical case of building from a C/c++ source
cd path/to/src
./configure
make 
make test
sudo make install

Always read as much as you can before starting such process as it can lead to destabilize your system. Trust big developer sites and look for instructions specific for your operating system. A couple excellent places where to find information are SeqAnswers [7] for NGS advice, BioStar [8] for biological questions, and stackoverflow [9] for hard and soft coding issues. Other great pages will help you with R/Bioconductor or with issues specific to a given application (often through the corresponding mailing list).

Follow-up tutorials for installing specific applications

For ChIPSeq, please refer to the training accompanying page Install ChIP-Seq training command line software describing some packages used during a recent BITS session (February 24 2014).

Statistics and data processing

After generating data, the next step is to analyze it and draw conclusions. This requires statistics and the most efficient (and cheap) way to proceed it to learn and use R and BioConductor.

Install R [10] and RStudio [11] on your mac and add from the many available BioConductor [12] packages using this amazing graphical environment.


References:
  1. https://developer.apple.com/register/index.action
  2. http://xquartz.macosforge.org/trac/wiki/X112.7.5
  3. http://www.macports.org/ports.php
  4. http://brew.sh
  5. http://coolestguidesontheplanet.com/setting-up-os-x-mavericks-and-homebrew/
  6. http://www.macports.org
  7. http://seqanswers.com
  8. http://www.biostars.org
  9. http://stackoverflow.com
  10. http://cran.r-project.org
  11. http://www.rstudio.com/ide/download/
  12. http://www.bioconductor.org


[ Main_Page ]