The Perl associative arrays

From BITS wiki
Jump to: navigation, search
Go back to Perl introductionary training#Exercises

Besides the arrays we already mentioned during the exercise My_first_Perl_program, Perl has also so-called "associative arrays", also shortly called hashes, where each individual variable is accessed not by a number but by a name, which is called the "key". So, you have :

	%xxx		the hash with name xxx
	$xxx{yyy}	the element of %xxx that is associated with key yyy
	exists $xxx{yyy}	This function returns "true" if the hash xxx has an
                                element with key yyy, even if the value is NULL.
	keys %xxx	returns an array with the keys of hash xxx (in an unpredictable
                        order, hence often used with the sort function)
	values %xxx	returns an array with the values of hash xxx

You can attribute key/value pairs to several elements of a hash at the same time with a statement of type :

%DBlongnames = ('sw' => 'SwissProt', 'em' => 'EMBL')

$DBlongnames{sw} will then return SwissProt.

We will now use hashes to make a small program that counts the number of occurrences of each word in a text and try it out on the file text4perl. We will first convince ourselves that we can get a text and split it into separate words. Make a file wc.pl with content :

#!/usr/bin/perl
$text = `type $ARGV[0]`;
@words = split /\W+/, $text;
foreach $word (@words) {
  print "$word\n";
}

Note : if you are working under MacOS X or Linux instead of Windows+DOS you should instead of "type" write "cat".

Some explanation :

  • A statement `XXX` lets the computer execute the command XXX as if you had typed XXX at the DOS command prompt and returns the text that would have been output on the screen. Note that to obtain the "backquote" sign '`' you must on some systems press the appropriate key while holding the <Alt Gr> key down and then press another key (e.g. the <space bar>).
  • The function split takes as input a regular expression and a variable (which, remember, contains in principle always a text) ; it searches all the strings of characters that match the regular expression and considers that these are separators ; it then outputs the pieces of text in between the separators. Here we use any string of nonalphanumeric characters as separator and hence take a string of letters and digits as a word.
  • The Perl foreach syntax is of course an imitation of the UNIX shell foreach syntax. Here, $word is a variable and (@words) a list of values ; the commands between the {...} will be executed over and over again, where each time $word is replaced by another value from the list.

Do :

wc.pl text4perl

Complete the program, so that it becomes :

#!/usr/bin/perl
$text = `type $ARGV[0]`;
@words = split /\W+/, $text;
foreach $word (@words) {
  $wc{$word}++;
}
foreach $word (sort keys %wc) {
  print "$word $wc{$word}\n";
}

And try it out ! Note the "increment" operator ++, imitated from the language C ($x++ is equivalent with $x = $x + 1). The first time a word is found, a new element of %wc pops into existence ; its key is equal to the word and its value is originally NULL but gets immediately incremented to 1. Each time the same word is found again the value of $wc{$word} is incremented by 1. Admire how simple this program is ! Consider how difficult it would be to program the same functionality using some other language you are familiar with !

Go back to Perl introductionary training#Exercises