Introduction to Linux for bioinformatics

From BITS wiki
Jump to: navigation, search


Access to Linux on the cloud

On the training we work on a Linux Ubuntu installation on a Google cloud environment. To access our Linux you have to use Google Chrome. In Chrome you have to install the app 'VNC Viewer for Google Chrome'.
When you launch the application, you have to enter an address, this will be mentioned during the training. You can leave 'Picture Quality' default on 'Automatic'.

Installing Linux

Installing software

Install the tools from the presentation slides. Note that when you install something, everyone in the training has access to that tool!
Here are some exercises to try on your personal Linux installation:

Command line

File system



Warming up

Text mining, scripting and 'for' loops

NGS intro


Bioinformatics oneliners

Sneak preview to duplication rate of reads

gunzip -dc fastq.gz | head -n 1000000 | awk '{ if(NR%4==2) { print $1 } }' | sort | uniq -c | sort -g > sorted_duplicated

Convert fastq to fasta

paste - - - - < in.fq | cut -f 1,2 | sed 's/^@/>/' | tr "\t" "\n" > out.fa

Count all the variants called in all the vcf files

cat *.vcf | grep -v '^#' | wc -l

Count all the variants in three vcf files

cat *.raw.vcf | grep -v '^#' | awk '{print $1 "\t" $2 "\t" $5}' | sort | uniq -c | grep ' 3 ' | wc -l



Software installation exercises

Additional explanation on covered topics