Subroutines and libraries

From BITS wiki
Jump to: navigation, search
Go back to Perl introductionary training#Exercises

It can happen that you must in a program repeat several times the same series of actions. Copying each time the same lines of code is not convenient, certainly not if you want to be able to modify the code easily. Hence the utility of subroutines. Even when the code is not duplicated, many professional programmers still use subroutines, in order to make sure that the "main program" does not become too long and remains overviewable.

Note that (again) Perl is very flexible :

  • In most programming languages the number and type of variables that can be passed to a subroutine is fixed in advance (with the fprintf or whatever function for formatted printing as curious exception). In Perl you can pass any number of values to a subroutine. This is done with the special variable @_ (which, you might say, is for the subroutine what @ARGV is for the program itself).
  • By default a variable is always "global", that means that it is "visible" in the complete program and all its subroutines. It is however possible (and recommended) to define "local" variables, which are only visible inside a subroutine ; this avoids nasty unexpected interference with a variable with same name somewhere else in the program. You do this with the my function.

As an example we will write and use a subroutine that does the following : given a collections of strings of text (which may be sequences), provided as an array, find the position of the shortest one and give a warning if it is shorter than a given threshold. Write and try out the following program :

#!/usr/bin/perl

@strings = ('AACGT', 'CGT', 'CCTGAC');
$threshold = 5;
($pos, $warn) = &shortest($threshold, @strings);
print "shortest is $strings[$pos]\n";
if ($warn) { print "   is shorter than $threshold !\n"; }

sub shortest {
  my ($thr, @str) = @_;
  my $pos = 0;
  my $n = 0;
  my $warn;
  my $len = length $str[0];
  foreach $str (@str) {
    if (length $str < $len) {
      $pos = $n;
      $len = length $str;
    }
    $n++;
  }
  if ($len < $thr) { $warn = 1; }
  return $pos, $warn;
}

Some explanation :

  • You can assign in a single statement values to a complete list of variables, e.g. by writing ($x, $y) = ($a, $b). This is what we use when we write
($pos, $warn) = &shortest($threshold, @strings);

and thus call the subroutine shortest and put the values "returned" by the subroutine in a list of variables. But do note that the first array in the list at the left will "eat up" all the remaining values in the list at the right. So, you can only put one array and it should be in the last position.

  • A user defined subroutine is usually called with an ampersand '&' before its name. The lines :
  ($pos, $warn) = &shortest($threshold, @strings);	
  .....
    my ($thr, @str) = @_;	
  .....
  return $pos, $warn;
  • do the following :
    • the value of variable $threshold is attributed to the local variable $thr of subroutine shortest
    • the values in array @strings are attributed to the local array @str of subroutine shortest
    • the value of local variable $pos is returned by subroutine shortest and attributed to global variable $pos' (which is not the same variable !)
    • the value of local variable $warn is returned by subroutine shortest and attributed to global variable $warn (again, which is not the same variable !)

Note that by consequence you can pass only one array to a subroutine and that actually the values of the variable are passed, the subroutine does not alter the variable itself (unless you write inside the subroutine code that operates directly on the "global" variable).

If you are interested, you can make the program longer and call the subroutine "shortest" more than once. You could also try to make the code of "shortest" more efficient (note that it calls the length function more often than strictly needed).

Now, what if you have written a nice subroutine that you want to use in more than one program or that you might even want to share with your friends ? Well, that is what library modules are for.

You should now spread your code over two different files (and add some extra lines). Ask advice to the teacher if you do not find a way to do this efficiently (without retyping everything). The subroutine should get into a file called exercises.pm with content :

sub shortest {
  my ($thr, @str) = @_;
  my $pos;
  my $n;
  my $warn;
  my $str;
  my $len = length $str[0];
  foreach $str (@str) {
    if (length $str < $len) {
      $pos = $n;
      $len = length $str;
    }
    $n++;
  }
  if ($len < $thr) { $warn = 1; }
  return $pos, $warn;
}

return 1;

And the program itself becomes :

#!/usr/bin/perl

use exercises;

@strings = ('AACGT', 'CGT', 'CCTGAC');
$threshold = 5;
($pos, $warn) = &shortest($threshold, @strings);
print "shortest is $strings[$pos]\n";
if ($warn) { print "   is shorter than $threshold !\n"; }

The use command is always executed before anything else. use exercises means that the content of a file exercises.pm (pm stands for "Perl module") must be executed. Note that usually a module contains only declarations of variables, subroutines and objects, so that actually nothing is executed. Note also that the module must return 'true'. That is because the developers of Perl had foreseen that it might be useful to perform some tests to make sure that the module works properly. This feature is in practice almost never used.

There remains a problem : in its current state the program only works if exercises.pm is in the current directory. You can easily convince yourself of that by trying to execute from a different working directory. You will probably get the following error message :

Can't locate exercises.pm in @INC (@INC contains: C:/Dwimperl/perl/site/lib C:/Dwimperl/perl/vendor/lib .C:/Dwimperl/perl/lib) at Perl\shortest.pl line 3.
BEGIN failed--compilation aborted at Perl\shortest.pl line 1.

By default the Perl interpreter searches library modules in the current directory and in a list of directories defined in the Perl installation. So, do the following : create a directory mylib inside the directory exercises-perl, move exercises.pm into it and add a new line at the begin of your program, so that it becomes :

#!/usr/bin/perl

use lib "/Users/BITS/My Documents/Perl/mylib";
use exercises;
...

The Perl module lib.pm is part of the standard Perl distribution and the command use lib xxx,yyy,... makes that directories xxx, yyy,… are added to the list of directories where Perl searches for libraries. Note that use lib... must be written before any other use statement.

Before we move to the next exercise it's worth mentioning the following thing about local and global variables. The "scope" of a "local" variable defined with my is actually limited to the boundaries of the "block" of code, delimited by curly braces {...} in which the variable is defined. Perl programmers often write at the begin of the program use strict;. "strict" is a so-called "pragma", which modifies the way the Perl interpreter works. In this case it produces an error if you try to use a variable that has not been declared. This preserves you from the common error of mistyping a variable name, after what you do not operate on the variable you intend but create instead a new variable. If you work under "strict" and you want a variable to be "global" you must declare it using our instead of my.

Modify the program so that it becomes :

#!/usr/bin/perl

use lib "/Users/BITS/My Documents/Perl/mylib";
use exercises;
use strict;

my @strings = ('AACGT', 'CGT', 'CCTGAC');
my $threshold = 5;
my ($pos, $warn) = &shortest($threshold, @strings);
print "shortest is $strings[$pos]\n";
if ($warn) { print "   is shorter than $threshold !\n"; }

You can test that if you omit a my you get an error message. Instead of my you can also write our.

Sub.png

Go back to Perl introductionary training#Exercises