In this exercise we will refactor the processor modules we have up to now. The idea is make the classes inherit from Bio::Seq::BaseSeqProcessor which in turn is an implementation of Bio::Factory::SequenceProcessorI. Have a look at the docs of that latter module (perldoc or CPAN or bioperl web site). Basically, it allows to create a chain of processors (pipeline). As a result of that every sequence in the stream is processed by every processor in the pipeline. The processing is done in the process_seq() method of every node in the pipeline (and lucky us, we called the processing method in the different modules process_seq already ;-).
Steps to take:
- Make sure that the different processor modules BITS::Training::SeqProcessor::* inherit from Bio::Seq::BaseSeqProcessor (we did it before e.g. Bioperl Training Exercise 9)
- The constructor is inherited from Bio::Seq::BaseSeqProcessor (the so-called superclass) so in principle we do not need one. However, for some modules we want to pass arguments to the constructor. One solution is to override the constructor. However, you should make sure that the constructor of the superclass is called as well. Without hardcoding the superclass name, this is done by means of the pseudoclass SUPER as such:
my $self = $class->SUPER::new(@args);
The SUPER pseudoclass allows a method to redispatch a call to the next available method in one of its parent classes. This redispatch mechanism works by searching for an inherited method in any of the ancestors of the current package (but not necessarily the invocant's package).
See also perldoc perltoot and look for 'superclass';
- Refactor the original process_all.pl processing script (Bioperl Training Exercise 14). Set up the processor pipeline.
- Remove all explicit process_seq() calls and sit back to watch what happens (it's magic !).
process_all.pl
|
#!/usr/bin/perl use strict; use Bio::SeqIO; use BITS::Training::SeqProcessor::Fuzzpro; use BITS::Training::SeqProcessor::Species; use BITS::Training::SeqProcessor::Reference; use Bio::Annotation::Reference; # io object to read in the fasta from 'proteins.fa' my $in = Bio::SeqIO->new(-format => 'fasta', -file => '< proteins.fa'); my $reference = Bio::Annotation::Reference->new( -title => 'BITS/VIB Bioperl Training', -location => 'Direct submission', -authors => '<your name>', ); # processor pipeline my $stream = BITS::Training::SeqProcessor::Fuzzpro->new(-source_stream => $in); $stream = BITS::Training::SeqProcessor::Species->new(-source_stream => $stream, -taxon => 'Saccharomyces cerevisiae'); $stream = BITS::Training::SeqProcessor::Reference->new(-source_stream => $stream, -reference => $reference); # io object to write genbank to STDOUT my $out = Bio::SeqIO->new(-format => 'genbank', -fh => \*STDOUT); # for every sequence while (my $seq = $stream->next_seq) { # write the sequence, process_seq() of all processors # in the pipeline will be automagically called $out->write_seq($seq); }
|
BITS::Training::SeqProcessor::Species
|
package BITS::Training::SeqProcessor::Species; use strict; use base 'Bio::Seq::BaseSeqProcessor'; use Bio::DB::Taxonomy; my $TAXON_DB = Bio::DB::Taxonomy->new( -source => 'entrez' ); sub new { my ( $class, @args ) = @_; my $self = $class->SUPER::new(@args); my ($query_value) = $self->_rearrange([qw/TAXON/], @args); # presume it is a taxon name my $query_key = '-name'; # if only digits, presume it is a taxon id if ( $query_value =~ /^\d+$/ ) { $query_key = '-taxonid'; } my $taxon = $TAXON_DB->get_taxon( $query_key => $query_value ); $self->{_species} = $taxon; bless $self; } sub process_seq { my ( $self, $seq ) = @_; $seq->species( $self->{_species} ); # return sequence object return $seq; } 1;
|
BITS::Training::SeqProcessor::Fuzzpro
|
package BITS::Training::SeqProcessor::Fuzzpro; use strict; use base 'Bio::Seq::BaseSeqProcessor'; use Bio::Factory::EMBOSS; use Bio::Tools::GFF; # N-glycosylation pattern argument for fuzzpro my $PATTERN = 'N-{P}-[ST]'; sub new { my ($class, @args) = @_; my $self = $class->SUPER::new(@args); $self->{_app} = Bio::Factory::EMBOSS->new->program('fuzzpro'); bless $self; } sub process_seq { my ($self, $seq) = @_; # create temporary file for fuzzpro output my ($fh, $gffile) = $self->{_app}->io->tempfile(UNLINK=>0); # run fuzzpro $self->{_app}->run({ -sequence => $seq, -pattern => $PATTERN, -rformat => 'GFF', -outfile => $gffile, }); # create reader/parser GFF object my $gffio = Bio::Tools::GFF->new(-fh => $fh); # loop over the feature stream while(my $feature = $gffio->next_feature) { # attach feature to sequence $seq->add_SeqFeature($feature); # change feature type $feature->primary_tag('protein_match'); # add notes $feature->add_tag_value(note => 'algorithm:fuzzpro'); $feature->add_tag_value(note => "pattern:$PATTERN"); # add label $feature->add_tag_value(label => 'N-glycosylation site'); } # close stream $gffio->close; # return sequence object return $seq; } 1;
|
BITS::Training::SeqProcessor::Reference
|
package BITS::Training::SeqProcessor::Reference; use strict; use base 'Bio::Seq::BaseSeqProcessor'; sub new { my ($class, @args) = @_; my $self = $class->SUPER::new(@args); my ($reference) = $self->_rearrange([qw/REFERENCE/], @args); $self->{_ref} = $reference; bless $self; } sub process_seq { my ($self, $seq) = @_; my $collection = $seq->annotation; $collection->add_Annotation(reference => $self->{_ref}); # return sequence object return $seq; } 1;
|