CIGAR
From BITS wiki
"Cigar" (Compact Idiosyncratic Gapped Alignment Report) format is a compressed (run-length encoded) pairwise alignment format. It is useful for representing long (e.g. genomic) pairwise alignments. It is used in SAM format to represent alignments of reads to a reference genome sequence.
An 'extended' CIGAR string must be following motif: ([0-9]+[MIDNSHP])+|\*. Each character is preceded by a number, giving the base counts of the event, MIDNSH or P.
standard cigar:
- M match
- I insertion
- D deletion
extended cigar
- N gap
- S substitution
- H hard clipping
- P padding
- = sequence match
- X sequence mismatch
Complete genomics data specific
B= move back (1 complete genomics read consist of several contiguous stretches, separated by gaps, in which first two stretches may overlap)
Example
REF: AGCTAGCATCGTGTCGCCCGTCTAGCATACGCATGATCGACTGTCAGCTAGTCAGACTAGTCGATCGATGTG READ: gggGTGTAACC-GACTAGgggg
The CIGAR for this alignment is: 3S8M1D6M4S.