Perl overview and quick reference

Author: John M. Gabriele | back to index

  • Perl overview and quick reference
  • ---

    The official Perl documentation is world-class. That said, here's a pile of quick reminders to glance at when you've been using some other language for a while and want to quickly refresh your Perl-fu.

    Other places to look are:

    The Perl docs

    You can get at your perl docs using the perldoc command. Also, the docs are online at http://perldoc.perl.org/.

    You can always jump straight to the docs for a given function using the -f option, for example: perldoc -f sort. You can read the perldocs of a local file like so: perldoc ./myfile.pl (you can use the -F option here to speed things up).

    See perldoc perl to get the big table of contents of all the available perldoc pages.

    Fundamentals

    More on operators

    A little more on operators:

    For more, read perldoc perlop.

    Built-ins

    Perl comes with a pretty large number of built-in functions. They're also called operators when you don't put parentheses around their arguments. When you use a built-in that only takes one arg, and you don't use parentheses, it's called a named unary operator.

    Standard library

    Perl comes with many modules in its standard library. To see the list of them, run perldoc perlmodlib.

    Keywords

    The details are in perldoc perlsyn. Among others, you'll find info on:

    Note, you can label loops, if you like.

    LINE:
    for my $line ( @lines ) {
        #...
        if ( $thus_and_so ) { next LINE }
    }
    

    Incidentally, a bare block is the same as a loop that only loops once. It's contents have their own lexical scope, and you can exit it using last (or even start it over using redo). if and unless blocks are, of course, not loops.

    When looping over a list, the variable being assigned to each item ($_ being the default) actually aliases the item. So, you can change the list on the fly.

    Lexicals, globals, and scoping

    Perl provides 2 kinds of namespaces: "package" (i.e. "symbol tables"), and lexical. Package variables are globals (aka "package globals"), are dynamically scoped, and live in symbol tables. Lexicals are locals and live in unnamed lexical scopes. File scope is the largest possible lexical scope.

    When one subroutine calls another sub, the 2nd (the callee) is in the dynamic scope of the 1st (caller). When one block is inside another the inner block is in the lexical scope of the outer one. Note however that when you get to the end of a block, you leave the current lexical and dynamic scopes.

    Symbol tables are actually global hashes, and they contain the names of the variables in them. Lexicals are attached to an associated block and are unnamed. All the built-in globals (like @ARGV, %ENV, $$, etc.) are located in the main symbol table.

    Incidentally, within each namespace there's a sub-namespace for each sigil (that's why $foo and @foo are 2 different variables). If you want to refer to every "foo" in a symbol table, regardless of sigil, you use a "typeglob". Perl uses typeglobs to implement importing modules.

    A fully-qualified package variable name, like $Foods::Veg::Tomato::variety, shows you the structure of the nested symbol tables -- the most deeply-nested of which contains the variety scalar. It also indicates that the file path leading to Tomato.pm looks like Foods/Veg/Tomato.pm.

    When you call use, it imports package symbols (or else gives the compiler some hints (as "pragmas")) for the current lexical scope. A package declaration is also lexically scoped, and declares the name of the current default package until the end of the current lexical scope (usually the file).

    Recall that calls to use happen at compile-time. Calls to require happen at run-time.

    Aside from package and use (and require), the three operators dealing with lexicals and package globals are my, our, and local:

    my

    declares a lexically-scoped variable. It's name and value are both stored locally, only.

    our

    declares a lexically scoped name that refers to a package global. Using the above example, your Tomato.pm file would contain our $variety; in it. our does not create values -- it just gives you access to the global, though, you can give access to and create at the same time: our $foo = 7;.

    local

    sets up a temporary value for a package global for only the current dynamic scope. That is, if you use local in a sub, and then call another sub, in that 2nd sub you're still in the same dynamic scope, and so still will see that localized value. Once the 1st sub returns, you're back to the pre-localized value. Of course, same thing happens if you come out of a lexical scope -- you're back to the value that the package global had before the scope you were just in.

    Context

    When perl is evaluating an expression (at compile-time), what type of value it expects to find (scalar or list) depends upon context.

    Context is determined at compile-time when perl parses your source code.

    If a scalar is put into a list context, it usually produces a one-element list.

    If a list-producing expression is put into a scalar context, they hopefully evaluate to something useful. For example, an array will yield the number of elements it contains.

    You can force a scalar context by using the scalar operator.

    In the docs, when you see something like sort LIST, it means that the sort operator provides a list context to its arguments. Furthermore, if the operator provides a list context to an argument, it also provides a list context to the elements of that list argument.

    Commonly-used variables

    Besides $_, a few of them are:

    See chap. 28 of the Camel for more, or read perldoc perlvar.

    Quoting

    If you need a multi-line string, use the heredoc syntax:

    my $long_string = <<"END_OF_STRING";   # Note explicit quoting.
    la dee da
    va va va $voom
    ok, done
    END_OF_STRING
    

    Various built-in functions

    Perl comes with a healthy selection of built-in functions. For a great summary, see chap. 29 of the Camel.

    For dealing with regexes

    Remember that, inside the current regex (i.e. during the match), you use \1. Outside the match, you use $1.

    The /g regex modifier is for globally finding matches. That is, the pattern match yields the following:

    Resulting value after an attempted match

    In scalar context

    In list context

    Regarding s///

    You can use /g with s/// as well, and does what you'd expect. Regardless, s/// returns a number telling how many times it succeeded in doing a replacement. But note, s///g in scalar context is not progressive like m//g -- you need to manually loop for something like that.

    There's a lot more to regexes, of course. For details, see chapter 5 of the Camel, and/or perldoc perlre.

    Files

    Regarding open:

    open( my $in_file,  '<', 'input.txt' )  or die "Agh!!";
    open( my $out_file, '>', 'output.txt' ) or die "Oof!!";
    
    my $one_line  = <$in_file>;
    my @all_lines = <$in_file>;
    
    print {$outfile} @all_lines;   # Note extra braces and lack of comma.
    

    You can do tests on files, using their filename and the various -x tests, such as -r (is readable), -w (is writable), -e (exists), and so on.

    Note that glob has some special magic: if it's in the condition of a while, for, or until loop, each time through it'll give you the next filename. This is analagous to the magic of the line input operator.

    Processes

    Exception handling

    See perldoc -f eval and perldoc -f die.

    POD

    Perl 5 POD is pretty non-frills. You indent what you want verbatim. You can get I<italic>, B<bold>, and C<monospace>. Headings are made with =headn where n is 1, 2, 3, or 4. Lists are made with =over, =item * (or =item foo), and =back. End POD with =cut. It works well for manpage-style docs.

    Perl 6 Pod is available now, for Perl 5, and, as its creator writes: "Compared to Perl 5 POD, Perldoc's Pod dialect is much more uniform, somewhat more compact, and considerably more expressive." This quick reference is written in Perl 6 Pod.

    Packages

    A package is a namespace. At the top of your file you can specify the current package name:

    package MyPackage;
    

    and that sets the name of the default package for whatever follows.

    You generally use CamelCase for package names. You can specify more than one package per file, but it's simpler to just have one per file. That is, one .pm file == one module == one package.

    You name your file the same as the tail end of the package name, but with .pm at the end. Further, if the package name contains ::, you place the file in the corresponding directory. That is, for package Foo::Bar::Baz;, you'd have ~/perllib/Foo/Bar/Baz.pm. For you own simple modules, there'll usually be no colons in the package name, and you'll just drop the files directly into your ~/perllib.

    You'll need to have use lib '/home/you/perllib' in your source file for it to find your own modules.

    Using modules

    To use modules, whether they come with the standard library or from elsewhere, you just name it near the top of your file like so:

    use Foo::Bar;
    use Foo::Baz qw( func1 func2 );
    

    In the above case, you can call any functions that are exported by default from Foo::Bar, and you can call func1 and func2 from Foo::Baz (if that module allows those functions to be exported).

    You'll sometimes see something like:

    use Foo::Qux -moo;
    

    The two things going on there are:

    1. Putting a dash in front of a bareword does a little magic and makes it into a string which starts with a dash.
    2. Putting a lone string where a list is expected evaluates to a one-item list.

    Some Perl idioms

    Other tips and best practices

    The best place to look for a list of Perl best practices is the book "Perl Best Practices, by Damian Conway. If you haven't already read it, you probably want to read it. Here's some tips:

    Modules to make use of