The .ACE format is produced by phrap as well as by most other assemblers (including Arachne, TIGR Assembler, CAP, etc.)
CO 1 30502 510 273 U CCTCTCC*GTAGAGTTCAACCGAAGCCGGTAGAGTTTTATCACCCCTCCC
BQ 20 20 20 20 20 20 20 20 20 20 20 20 20
AF TBEOG48.y1 C 1
BS 1 137 TBEOG48.y1
RD TBEOG48.y1 619 0 0 CCTCTCC*GTAGAGTTCAACCGAAGCCGGTAGAGTTTTATCACCCCTCCC
QA 1 619 1 619
- Contig identifiers (starting with CO) list the IDs ("1" in the example), the number of bases (30502), number of reads (510), and number of "base segments" (273) as well as whether the contigs is in the forward orientation (Uncomplemented) or reversed (Complemented). In general, the output of an assembler has all contigs listed as "U".
- The consensus sequence is padded, the gaps being represented as *s instead of dashes, and follows immediately after the CO line.
- Immediately after the BQ line, consensus quality values are provided for the bases alone, the gaps not being represented. These quality values are in phred-like format.
- The AF lines (one per aligned read) contain information of whether the read is complemented (C) or not (U) followed by a 1-based offset in the consensus sequence. Note that the offset refers to the beginning of the entire read in the alignment, not just the clear range. Thus the read acaggATTGA will have an offset of 1 even though the consensus truly starts at position 6.
- The BS lines indicate which read was used to calculate the consensus between the specified coordinates. These lines can, in general, be ignored as they are an artifact of the algorithms used to compute the consensus sequence.
- The sequence of each read is explicitly provided after each RD line. The sequence is padded with *s and is already complemented if necessary.
- The QA line following each read contains two 1-based ranges. The second range represents the clear range of the read, with respect to the read sequence (padded and potentially complemented) as provided in the RD record.