Bioperl Training Exercise 11

input fasta protein (Google)
Save the file File:Proteins.txt as proteins.fa.
launch EMBOSS' fuzzpro from your script to look for a pattern (e.g. potential N-glycosylation site). Output should be GFF and written to a temporary file. Hints:
1. use Data::Dumper to figure out what kind of object your io object actually is
2. use the methods script to find a method to create a temporary files.
3. use perldoc to figure out how that method works
4. the fuzzpro analysis run() method takes a hash as arguments. As keys, the actual EMBOSS fuzzpro should be used. Part of the possible keys can be obtained using the acd() method. However, the key to indicate the report format (GFF) is missing; it's -rformat2 (see fuzzpro EMBOSS documentation).
5. The pattern for an N-glycosylation site is N-{P}-[ST]
Convert the GFF to features and add them to the sequence. Change every found feature:
1. add note tag to indicate the used program (fuzzpro)
2. add note tag to indicate the used pattern
3. add label 'N-glycosylation'
4. change the feature type to the Sequence Ontology term 'protein_match'
Write the annotated sequence to a genbank file

fuzzpro.pl

# io object to read in the fasta from 'proteins.fa'

my $in = Bio::SeqIO->new(-format => 'fasta', -file => '< proteins.fa');

my $out = Bio::SeqIO->new(-format => 'genbank', -fh => \*STDOUT);

# N-glycosylation pattern argument for fuzzpro

my $fuzzpro = Bio::Factory::EMBOSS->new->program('fuzzpro');

#print Data::Dumper->Dump([$fuzzpro->acd],['fuzzpro']);exit;

	# create temporary file for fuzzpro output

	my $gffio = Bio::Tools::GFF->new(-fh => $fh, -gff_version => 2);

    while(my $feature = $gffio->next_feature()) {

        $feature->primary_tag('protein_match');

        $feature->add_tag_value(note => 'algorithm:fuzzpro');

        $feature->add_tag_value(note => "pattern:$pattern");

        $feature->add_tag_value(label => 'N-glycosylation site');

Navigation menu