Seq crumbs

From BITS wiki
Jump to: navigation, search

Remove contaminant adapter sequences from your reads prior to other NGS processing

SimilarTo.png: cutadapt , FastX_toolkit, PrinSeq, Trimmomatic


[ BioWare | Main_Page ]


All seq crumbs try to share a consistent interface. By default most Seq Crumbs read from standard input and write to standard output, allowing them to to be easily combined using Unix pipes. Alternatively, several input sequence files can be provided as a list of arguments. Output can also be directed to specific files with the -o parameter (or --outfile).

seq_crumbs supports compressed gzip, BGZF and bzip2 files. When used as input it autodetects the compressed files. It can also generate compressed outputs.

The sequence formats accepted by seq_crumbs are those supported by Biopython's SeqIO module. As output only Sanger and Illumina fastq and fasta files are supported.

seq_crumbs can take advantage of multiprocessor computers by splitting the computational load into several processes.

The filtering seq crumbs can be made aware of paired reads and can filter both reads of pairs at once.

You can find more information about seq_crumbs in the seq_crumbs web site[1].

Available Crumbs

sff_extract Extracts reads from an SFF file used by 454 and Ion Torrent.
split_matepairs Splits mate-pairs separated by an oligo sequence.
filter_by_quality Filters sequences according to mean quality.
filter_by_length Filters sequences according to maximum and minimum length thresholds.
filter_by_name Filters sequences with a list of names given in a file.
filter_by_blast Filters the sequences using BLAST.
filter_by_complexity Filters sequences according to their complexity.
filter_by_bowtie2 It filters the sequences using bowtie2
trim_by_case Trims sequences according to case.
trim_edges Removes a fixed number of residues from sequence edges.
trim_quality Removes, using a sliding window, regions of low quality in the edges.
trim_blast_short Removes oligonucleotides by using the blast-short algorithm.
convert_format Converts between the different supported sequence formats.
guess_seq_format Guesses the format of a file, including Sanger and Illumina fastq formats.
cat_seqs Concatenates one or several input sequence files, possibly in different formats, into one output.
seq_head Outputs only the first sequences of the given input.
sample_seqs Outputs a random sampling of the input sequences.
count_seqs It counts sequences in the input files
change_case Modifies the case of sequences. Case can be converted to lower or upper, or swapped.
pair_matcher Filters out orphaned read pairs.
interleave_pairs Interleaves two ordered paired read files.
deinterleave_pairs Splits an ordered file of paired reads into two files, one for each end.calculate_stats
calculate_stats Generates basic statistics for the given sequence files.
orientate_transcripts Reverse complements transcripts according to polyA, ORF or BLAST hits.
fastqual_to_fastq Converts fasta and qual files to a fastq format file.

References:
  1. http://bioinf.comav.upv.es/seq_crumbs/



[ BioWare | Main_Page ]