Bioperl introductionary training

From BITS wiki
Jump to: navigation, search

Bioperl

Bioperl is a collection of perl modules that facilitate the development of perl scripts for bioinformatics applications. As such, it does not include ready to use programs in the sense that many commercial packages and free web-based interfaces (eg Entrez, SRS) do. On the other hand, bioperl does provide reusable perl modules that facilitate writing perl scripts for sequence manipulation, accessing of databases using a range of data formats and execution and parsing of the results of various molecular biology programs including Blast, clustalw, TCoffee, genscan, ESTscan and HMMER. Consequently, bioperl enables developing scripts that can analyze large quantities of sequence data in ways that are typically difficult or impossible with web based systems.

Training outline

See [1]

Exercises

Day 1

Bioperl Training Exercise 1: perldoc

Bioperl Training Exercise 2: thou shalt not forget

Bioperl Training Exercise 3: arrays

Bioperl Training Exercise 4: hashes

Bioperl Training Exercise 5: packages and modules 1

Bioperl Training Exercise 6: packages and modules 2

Bioperl Training Exercise 7: complex data structures

Bioperl Training Exercise 8: OOP

Bioperl Training Exercise 9: inheritance, polymorphism

Bioperl Training Exercise 10: aggregation, delegation

Scratch pad

  • references: create a module containing a subroutine to sort arrays on length. Show Schwartzian Tranform solution as well.
  • A pragma is a module which influences some aspect of the compile time or run time behaviour of Perl. e.g. use strict

Day 2

  • bioperl ships with a whole bunch of ready to use scripts. They all start with bp_. On the command line type bp_ <tab> to list them all.
    use regular perldoc for information, e.g. perldoc bp_split_seq
    scripts that might be useful:
    • bp_sreformat:sequence and alignment format conversion. bp_seqconvert is similar but only allows sequence conversions.
    • bp_search2table: generate table output from e.g. blast report
    • bp_taxid4species: return a list of taxa ids for requested organisms
    • bp_mutate: randomly mutagenize a single protein or DNA sequence
    • etc ...

Bioperl Training Exercise 11: IO, add annotation, run EMBOSS application

Bioperl Training Exercise 12: Create a fuzzpro processor module

Bioperl Training Exercise 13: Create a taxon processor module

Bioperl Training Exercise 14: Create a reference processor module

Bioperl Training Exercise 15: Magical processor module

References, Sources, ...

Dependencies

  • Class::Inspector
  • Getopt::long
  • pod::usage
  • perltidy
  • Geany
  • EMBOSS

Author

Marc Logghe

Inspired by