This exercise is very similar to Bioperl Training Exercise 12. The goal is to create a module (e.g. BITS::Training::SeqProcessor::Taxon) that adds the species/taxon information to every sequence in the stream. The taxon information is fetched from Entrez; a taxon id or a taxon name is given as query. Use e.g bioperl website as starting point for your search
- The constructor takes 1 argument: a tax id (e.g. 4932) or a taxon name (e.g. 'Saccharomyces cerevisiae'). Using that information set up the correct Entrez taxon query to fetch the taxon object.
- In the process_seq() method attach the taxon object to the sequence object (the methods script is your friend again). Return the object (why will become clear later)
fuzzpro.pl
|
#!/usr/bin/perl use strict; use Bio::SeqIO; use BITS::Training::SeqProcessor::Fuzzpro; use BITS::Training::SeqProcessor::Species; use Data::Dumper; # io object to read in the fasta from 'proteins.fa' my $in = Bio::SeqIO->new(-format => 'fasta', -file => '< proteins.fa'); my $fuzzpro = BITS::Training::SeqProcessor::Fuzzpro->new; my $taxon_processor = BITS::Training::SeqProcessor::Species->new('Saccharomyces cerevisiae'); # io object to write genbank to STDOUT my $out = Bio::SeqIO->new(-format => 'genbank', -fh => \*STDOUT); # for every sequence while (my $seq = $in->next_seq) { $fuzzpro->process_seq($seq); $taxon_processor->process_seq($seq); $out->write_seq($seq); }
|
BITS::Training::SeqProcessor::Species
|
package BITS::Training::SeqProcessor::Species; use strict; use Bio::DB::Taxonomy; my $TAXON_DB = Bio::DB::Taxonomy->new( -source => 'entrez' ); sub new { my ( $class, @args ) = @_; # looks stupid but is for later use ;-) my ($query_value) = @args; # in the other case, presume it is a taxon name my $query_key = '-name'; # if only digits, presume it is a taxon id if ( $query_value =~ /^\d+$/ ) { $query_key = '-taxonid'; } my $taxon = $TAXON_DB->get_taxon( $query_key => $query_value ); my $self = { _species => $taxon }; bless $self; } sub process_seq { my ( $self, $seq ) = @_; $seq->species( $self->{_species} ); # return sequence object return $seq; } 1;
|