Unix toolbox

From BITS wiki
Jump to: navigation, search

Bash and other system commands you should know

[ Tool-Boxes | Main_Page ]

Command-line seems boring at first but wftar some time using it, You will become addicted to its simplicity and power..

Handicon.png Do not use Office programs to filter and parse large data files, use instead the power of Unix and simple commands to reduce the size to human readable amounts

A number of tools allow controlling NGS data at different levels. Some are presented below that show some levels of redundancy but also unique features.

How to get help under UNIX?

Most of unix executable have built-in help that can be called in various ways.

# man pages constitute the most sophisticated level of built-in help. 
# their content is stored in separate files and accessed by the system.
man <command-name>
# intrinsic help
<command-name> -h (--help | -? | ?)
# or often simply by typing 

Built-In Unix commands

These are only few of the many but they are absolute toppers.

  • cat to print text to file or combine files into one
  • grep, filter huge files and keep the 'substantifique moelle'
  • ls is the equivalent of the good old DOS 'dir' and will return a list of files and folders for the provided path
  • sort, order lines based on one or more columns
  • split to split big files into eatable chunks
  • tr to replace one character by another in a whole file at once

File editing applications

The must programs make your days by lifting the weight of hte data for you.

parse files

  • awk, your best friend when it comes to playing with tabular files


  • column presents columnar data nicely padded with spaces
  • transpose rows to columns and vice versa. A must have utility that matches well column.

query text files

  • filo, Useful FILe and stream Operations (includes groupBy, shuffle and stats)

File transfer

get BIG Data from the internet

  • wget will happily slurp down anything within reach of its greedy claws

System/Job management applications

create scripts with intelligent interfaces

  • Expect will guide you in the process of asking input and processing input during script pipelines

run jobs in parallel

  • parallel, the GNU tool to run commands in parallel and replace for loops for speed savings.
  • ppss to performe parallel tasks from a folder of similar input files.

run jobs while you are Offline

This is particularly useful if you work on a distant computer and which to deconnect to go home while the heavy load is taking place. It is also great to partition your applications in different screens and let them live their own lives while you do something else.

  • screen to start experiments in separate 'screens' and be able to logout without loosing them.

list running processes

You will often need to evaluate the workload on your favorite server and identify nasty jobs before killing them

  • Top and htop monitor your system and see what is running and how much resources are used.


[ Main_Page ]