CIGAR

From BITS wiki
Jump to: navigation, search

Source1 Source2 (pdf)

"Cigar" (Compact Idiosyncratic Gapped Alignment Report) format is a compressed (run-length encoded) pairwise alignment format. It is useful for representing long (e.g. genomic) pairwise alignments. It is used in SAM format to represent alignments of reads to a reference genome sequence.

An 'extended' CIGAR string must be following motif: ([0-9]+[MIDNSHP])+|\*. Each character is preceded by a number, giving the base counts of the event, MIDNSH or P.

standard cigar:

  • M match
  • I insertion
  • D deletion

extended cigar

  • N gap
  • S substitution
  • H hard clipping
  • P padding
  • = sequence match
  • X sequence mismatch

Complete genomics data specific
B= move back (1 complete genomics read consist of several contiguous stretches, separated by gaps, in which first two stretches may overlap)

Example

    REF: AGCTAGCATCGTGTCGCCCGTCTAGCATACGCATGATCGACTGTCAGCTAGTCAGACTAGTCGATCGATGTG
    READ:       gggGTGTAACC-GACTAGgggg

The CIGAR for this alignment is: 3S8M1D6M4S.