From BITS wiki
Jump to: navigation, search

Manipulate FastQ and BAM NGS data

SimilarTo.png: Samtools, Picard, BamUtils

[ BioWare | Main_Page ]

NGSUtils [1][2] is a suite of software tools for working with next-generation sequencing datasets. Staring in 2009, we (Liu Lab @ Indiana University School of Medicine) starting working with next-generation sequencing data. We initially started doing custom coding for each project in a one-off manner. It quickly became apparent that this was an inefficient manner to work, so we started assembling smaller utilities that could be adapted into larger, more complicated, workflows. We have used them for Illumia, SOLiD and 454 sequencing data. We have used them for DNA and RNA resequcing, ChIP-Seq, CLIP-Seq, and targeted resequencing (Agilent exome capture and PCR targeting). These tools are also used heavily in our in-house DNA and RNA mapping pipelines.

These tools have of great use within our lab group, and so we are happy to make them available to the greater community.

NGSUtils is made up of 50+ programs (full list), mainly written in Python. These are separated into modules based on the type of file that is to be analyzed. There are four modules:

Each of these modules contains many commands for manipulating, filtering, converting, or analyzing these types of files. Check out the documentation for each module for more information about some of the commands available.

Getting started Installation

to get help type ngsutils help 'command-name'


Usage: bamutils COMMAND
    basecall      - Base/variant caller
    count         - Calculates counts/FPKM for genes/BED regions/repeats (also CNV)
    best          - Filter out multiple mappings for a read, selecting only the best
    convertregion - Converts region mapping to genomic mapping
    export        - Export reads, mapped positions, and other tags
    expressed     - Finds regions expressed in a BAM file
    extract       - Extracts reads based on regions in a BED file
    filter        - Removes reads from a BAM file based on criteria
    innerdist     - Calculate the inner mate-pair distance from two BAM files
    keepbest      - Parses BAM file and keeps the best mapping for reads that have multiple mappings
    merge         - Combine multiple BAM files together (taking best-matches)
    pair          - Given two separately mapped paired files, re-pair the files
    peakheight    - Find the size (max height, width) of given peaks (BED) in a BAM file
    renamepair    - Postprocesses a BAM file to rename pairs that have an extra /N value
    split         - Splits a BAM file into smaller pieces
    stats         - Calculates simple stats for a BAM file
    tag           - Update read names with a suffix (for merging)
    tobed         - Convert BAM reads to BED regions
    tobedgraph    - Convert BAM coverage to bedGraph (for visualization)
    tofasta       - Convert BAM reads to FASTA sequences
    tofastq       - Convert BAM reads back to FASTQ sequences
    check         - Checks a BAM file for corruption
    cleancigar    - Fixes BAM files where the CIGAR alignment has a zero length element
Run 'bamutils help CMD' for more information about a specific command
ngsutils 0.5.5-2232b67


Usage: bedutils COMMAND
    clean        - Cleans a BED file (score should be integers)
    extend       - Extends BED regions (3')
    reduce       - Merges overlapping BED regions
    refcount     - Given a number of BED files, calculate the number of samples that overlap regions in a reference BED file
    sizes        - Extract the sizes of BED regions
    sort         - Sorts a BED file (in place)
    stats        - Calculates simple stats for a BED file
    subtract     - Subtracts one set of BED regions from another
    annotate     - Annotate BED files by adding / altering columns
    frombasecall - Converts a file in basecall format to BED3 format
    fromprimers  - Converts a list of PCR primer pairs to BED regions
    fromvcf      - Converts a file in VCF format to BED6
    tobed3       - Removes extra columns from a BED (or BED compatible) file
    tobed6       - Removes extra columns from a BED (or BED compatible) file
    tobedgraph   - BED to BedGraph
    tofasta      - Extract BED regions from a reference FASTA file
    cleanbg      - Cleans up a bedgraph file
Run 'bedutils help CMD' for more information about a specific command
ngsutils 0.5.5-2232b67


Usage: fastqutils COMMAND
    barcode_split - Splits a FASTQ/FASTA file based on sequence barcodes
    filter        - Filter out reads using a number of metrics
    merge         - Merges paired FASTQ files into one file
    names         - Write out the read names
    properpairs   - Find properly paired reads (when fragments are filtered separately)
    revcomp       - Reverse compliment a FASTQ file
    sort          - Sorts a FASTQ file by name or sequence
    split         - Splits a FASTQ file into N chunks
    stats         - Calculate summary statistics for a FASTQ file
    tag           - Adds a prefix or suffix to the read names in a FASTQ file
    tile          - Splits long FASTQ reads into smaller (tiled) chunks
    trim          - Remove 5' and 3' linker sequences (slow, S/W aligned)
    truncate      - Truncates reads to a maximum length
    unmerge       - Unmerged paired FASTQ files into two (or more) files
    convertqual   - Converts qual values from Illumina to Sanger scale
    csencode      - Converts color-space FASTQ file to encoded FASTQ
    fromfasta     - Converts (cs)FASTA/qual files to FASTQ format
    fromqseq      - Converts Illumina qseq (export/sorted) files to FASTQ
    tofasta       - Converts to FASTA format (seq or qual)
Run 'fastqutils help CMD' for more information about a specific command
ngsutils 0.5.5-2232b67


Usage: gtfutils COMMAND
    add_isoform - Appends isoform annotation from UCSC isoforms file
    add_reflink - Appends isoform/name annotation from RefSeq/refLink
    add_xref    - Appends name annotation from UCSC Xref file
    annotate    - Annotates genomic positions based on a GTF model
    filter      - Filter annotations from a GTF file
    genesize    - Extract genomic/transcript sizes for genes
    junctions   - Build a junction library from FASTA and GTF model
    query       - Query a GTF file by coordinates
    tobed       - Convert a GFF/GTF file to BED format
Run 'gtfutils help CMD' for more information about a specific command
ngsutils 0.5.5-2232b67

  1. Marcus R Breese, Yunlong Liu
    NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets.
    Bioinformatics: 2013, 29(4);494-6
    [PubMed:23314324] ##WORLDCAT## [DOI] (I p)


[ BioWare | Main_Page ]