Introduction into Perl one-liners

From BITS wiki
Jump to: navigation, search
Go back to Perl introductionary training#Exercises

You have now learned that you can put a Perl program in a file xxx.pl and execute it by typing the command perl xxx.pl in a DOS box or a UNIX terminal. It is also possible to type perl -e "XXX", where XXX is a short Perl program written on one line. There are people who don’t like Perl one-liners, while others love them and some are proficient in writing on-liners that nobody else can understand… Fact is that they can be useful. Under UNIX there are tools like awk, cut, sort and sed that allow to perform manipulations on text files but under DOS and Windows there are no readily available tools to do the same ; so, under DOS Perl can be very handy and even under UNIX it can be convenient to always use Perl since you have only to learn one tool and some very complex tasks work only with Perl.

We will give a few examples. First, using Perl as a calculator, type in a DOS box :

perl -e "print 9**9"

to find what 99 is. Note that under DOS the program must be surrounded by double quotes " while under UNIX you can choose between double quotes " and single quotes '. Note also that if there are quotes inside your program you must use \ to make sure they are not confused with the terminating ", e.g. :

perl –e "print \"Perl is fun !\n\""

The command perl –ne "XXX" yyy, where yyy is a file, will read the content of the file line by line, put the content of the line in special variable $_ and each time let the code XXX work on it. The command perl –nae "XXX" yyy will furthermore split the line into words and put them in an array @F. Try the following :

perl –ne "print" text4perl

to get the complete content of the file

perl –nae "print \"$F[0]\n\"" text4perl

to get the first word on each line

perl –ne "if (5..8) {print}" text4perl

to get from the 5th to the 8th line

perl –ne "$N++ ; if (eof) {print \"$N\n\"}" human_mouse_equivalents

(do replace human_mouse_equivalents by the name you gave to your output file of exercise 5) to count how many lines the file has (eof returns true if end-of-file reached)

perl –ne "if (/^Model Selected:/) {print}" modelgenerator.out

to get the lines match a regular expression

perl –nae "if (/^Model Selected:/) {print \"$F[2]\n\"}" modelgenerator.out

to see only the names of the models

And now a more complex example. The file domains.fasta contains a list of sequence ranges that might have been extracted from a database. We want to give it as input to a program but there is a problem : some sequences have the same name and some programs do not accept that. Therefore we will append numbers. Start by taking a look at the file domains.fasta. You can do this with the Notepad, with the DOS command type, or of course by doing :

perl –ne "print" domains.fasta

We want to count how many times a sequence identifier is repeated. Remember exercise 4, you can do :

perl –ne "if (/^>/) {$id{$_}++;print$id{$_}}" domains.fasta

For each line that starts with > we use the content of the line (is in $_) as key of a hash that contains the number of times we have read such a line. You should get a long string of 1 with here and there a 2 or 3. Now we must replace the end-of-line by an underscore _ followed by the number :

perl –ne "if (/^>/) {$id{$_}++;s/$/_$id{$_}/};print" domains.fasta

If you want to print out every line of a file you parse through, eventually after having performed some modifications, you can instead of putting a print statement at the end of the program simply add a parameter p to the perl command :

perl –npe "" text4perl
perl –npe "if (/^>/) {$id{$_}++;s/$/_$id{$_}/}" domains.fasta

If a set of data is generated by a running program, we can, instead of storing it into an intermediate file, process it on-the-fly by the Perl command line and then store it in a file. Try :

type domains.fasta | perl –npe "if (/^>/) {$id{$_}++; s/$/_$id{$_}/}" > mydomains.fasta

On a DOS (or UNIX) command line the pipe character | means that the output of the first program must not be displayed on the screen and that the second program must not wait for input from the keyboard, but that the output from the first program must be given as input to the second program. The > means that the output of the program must not be displayed on the screen but must be written into a file. Note the advantage of using pipes and Perl one-liners : if the list of sequences had really been very big (and generated by whatever database retrieval system rather than typed from a file) it would have been advantageous not to write it in a big file and then read that file. Also, Perl only stores the associative array in life RAM-memory, not the complete file content, as would happen if you opened the file with an interactive file/text editor.

If you search for "perl one liners" in Google or some other search engine, you will certainly find lots of other examples, some quite intricate.

Go back to Perl introductionary training#Exercises