Introduction to Linux for bioinformatics

From BITS wiki
Jump to: navigation, search

Linux is a very popular operating system in bioinformatics. In this training you will learn why that is and how it can help you with your bioinformatics analysis. After this training you will be able to:

  • install software on Linux
  • use command line to run tools
  • use command line to handle files
  • write small scripts to automate your analysis

Training material

Additional information

Exercises during the training

Excercises: Part 1

On the training there is a Linux Ubuntu installation available on a Google cloud environment. To access Linux we use Google Chrome and the 'VNC Viewer for Google Chrome' application.
When you launch the application, you have to enter an IP address, this will be mentioned on the training.

Installing Linux

Installing software

Install the tools from the presentation slides. Note that when you install something, everyone in the training has access to that tool!
Here are some exercises to try on your personal Linux installation:

Command line

File system


Excercises: Part 2

Text mining, scripting and 'for' loops

NGS intro



Bioinformatics oneliners

Sneak preview to duplication rate of reads

gunzip -dc fastq.gz | head -n 1000000 | awk '{ if(NR%4==2) { print $1 } }' | sort | uniq -c | sort -g > sorted_duplicated

Convert fastq to fasta

paste - - - - < in.fq | cut -f 1,2 | sed 's/^@/>/' | tr "\t" "\n" > out.fa

Count all the variants called in all the vcf files

cat *.vcf | grep -v '^#' | wc -l

Count all the variants in three vcf files

cat *.raw.vcf | grep -v '^#' | awk '{print $1 "\t" $2 "\t" $5}' | sort | uniq -c | grep ' 3 ' | wc -l

Software installation exercises