
Perl overview and quick reference
Author: John M. Gabriele | back to index
- The Perl docs
- Fundamentals
- More on operators
- Built-ins
- Standard library
- Keywords
- Lexicals, globals, and scoping
- Context
- Commonly-used variables
- Quoting
- Various built-in functions
- For dealing with regexes
- Files
- Processes
- Exception handling
- POD
- Packages
- Some Perl idioms
- Other tips and best practices
---
The official Perl documentation is world-class. That said, here's a pile of quick reminders to glance at when you've been using some other language for a while and want to quickly refresh your Perl-fu.
Other places to look are:
- the Reference material in the Camel book (also chapter 24)
- the perlfaq's
- the tutorials section at Perlmonks
- Randal's columns
- The Perl Cookbook (aka "the Ram")
- Damian's Perl Best Practices (aka "PBP") book
The Perl docs
You can get at your perl docs using the perldoc command. Also, the
docs are online at http://perldoc.perl.org/.
You can always jump straight to the docs for a given function using
the -f option, for example: perldoc -f sort. You can read the
perldocs of a local file like so: perldoc ./myfile.pl (you can use
the -F option here to speed things up).
See perldoc perl to get the big table of contents of all the
available perldoc pages.
Fundamentals
- Expressions are bits of code that
perlevaluates to some value. They are made up of terms and operators. Statements tell the interpreter to do something, and are made up of expressions. Declarations are like statements, but only tell the interpreter to learn something. Blocks are one or more statements separated by semicolons and delimited as a whole by braces. The
$,@, and%sigils are for scalar, array, and hash expressions, respectively. Variables look like$foo,@bar, and%baz(though you can also write${foo},@{bar}, and%{baz}and it also works). Also note the following funky syntax:my @foo = qw( food for thought ); print $foo[2], "\n"; # The usual syntax. "$foo[2]\n" works also. print ${foo[2]}, "\n"; # Though, this works too (but don't). print ${foo}[2], "\n"; # Same here.- Variables represent the value itself -- they are not
"references" to the values unless you explicitly make a
reference. When you do
@b = @a;, you're making a copy of@a. - Strings can be modified in-place (for example, using
s///,chomp, andsubstr). $_is the default arg in a good number of places. For example:- default item in a
for (<>)loop - default item in a
while (<>)loop - default arg to
chomp,print, and others /foo/(orm/foo/) matches against it unless you use the binding operator (=~).
- default item in a
- Note the difference between expressions as they appear in your source code, and the values that the interpreter evaluates them to (at runtime).
- Double-quotish strings allow escapes and variable
interpolation. You usually write them
"like this", orqq{this}. You can interpolate with curlies in there too if necessary,"like ${this}tastic.". - Different set of operators for working with strings:
lt,le,eq,ge,gt,ne, andcmp. Also.andx. - Use the dot "." to do string concatenation.
=>is the fat comma. No need to quote what's to the left of it if it's just a simple identifier...is the range operator.<>is the line input operator, a.k.a the angle operator, a.k.a the readline function.- You can put underscores in number literals, as in
1_000_000,0x0000_1111,0b11_00_11_00, etc. undef,0,'0', andq{}(empty string) are allfalse. All other scalars aretrue.- A list is something that exists at runtime, in the Perl interpreter. Unfortunately, in english, when we see a number of things separated by commas, we tend to call that a list. When discussing Perl compile-time expressions, it's more accurate to call a bunch of things separated by commas a "comma expression".
- Empty lists and empty hashes are false. If either have anything in them, they're true.
printonly prints what you tell it to. You need to include a"\n"if you want one.- By default, lists interpolate into strings as their elements separated by spaces. Hashes don't interpolate.
- Parentheses are used for all grouping, lists, and hashes. You
also need to put parens around an
if,while,for, etc. condition expression. - hashes:
my %h = (foo => 2, bar => 4);,$h{baz} = 3;,my $bar = $h{foo};. In list context, a hash unwinds into one long flat list of key/value pairs:my @a = %h;. Invert a hash like so:%inverted_h = reverse %h. - A handful of built-in functions/operators modify what you
call them on (such as
chompandsplice). Most, however, return something new. Define a function like so:
sub foo { ... }. Some quick notes:- Within a sub, you can just refer to globals normally.
- Last expression evaluated is what gets returned. Though
there's also a
returnoperator. - Args get passed in by reference via
@_. If you want local copies of them, do:my ( $foo, $bar ) = @_. Although@_is a local, its contents ($_[0],$_[1], etc.) refer to the variables in the caller's scope -- they are aliases to them. - Subroutines are package globals.
- You can call built-ins as functions or as operators. If you call them as functions (with explicit parentheses), they have very high precedence. If you call them as operators (no parens) they have very low precedence.
- Take a reference by adding a backslash in front of the variable:
my $foo_ref = \@my_array;. The special syntax for a literal array ref is[...], and for a hash ref is{...}. Dereference like so:
my $foo_ref = \@my_array; # $foo_ref is a reference. my @a2 = @{ $foo_ref }; # Dereferencing $foo_ref here.That is, you put a reference inside
${},@{}, or%{}to dereference it. The braces are sometimes optional, but I like to always include them, for clarity.There are shortcuts for dereferencing. Observe:
my %foods = ( 'good' => ['beets', 'spinach', 'carrots'], 'bad' => ['twinkies', 'devil dogs'], 'ugly' => ['gruel', 'slop'], ); my $f = ${ $foods{good} }[1]; # Not using any shortcuts. my $g = $foods{bad}->[0]; # Using the arrow shortcut to dereference. my $h = $foods{ugly}[1]; # Perl lets you omit the arrow here. # Also: my @ar = ( ['a', 'b', 'c'], [1, 2, 3], ['foo', 'bar'] ); my $s1 = ${ $ar[0] }[1]; # Not using any shortcuts. my $s2 = $ar[1]->[2]; # Using the arrow shortcut. my $s3 = $ar[2][1]; # Perl lets you omit the arrow here.- Use
mapandgrepto easily build lists from other lists. - You can stash data at the end of your file after a line that has
__END__on it. Access that data via theDATAfilehandle. If it's binary data, base64 encode it first (MIME::Base64).
More on operators
A little more on operators:
- They come in three flavors: unary, binary, and trinary.
- The things operators work on are called "terms".
- Autoincrement and autodecrement have a little extra magic when dealing with alphanumeric strings.
The
->is a binary infix dereference operator when used like so:$ar->[0] $hr->{foo} $sr->('bar')Otherwise, it's used for method calls, like:
my $f = $Foo->new(); $f->bar();
- Use
**for raising a number to a power. =~is the regex binding operator.=~is for "match",!~is for "doesn't match". These binding operators have a pretty high precedence.- You can get a list of n things like so:
my @a = ('whatever') x $n; - Among the assignment operators, note the presence of
||=,//=,.=, andx=. - In list context, the comma is just as separator. In scalar context, it's an operator, but not one you'd regularly use.
- You can make a reference to a list like so:
[ qw( foo bar baz ) ]
For more, read perldoc perlop.
Built-ins
Perl comes with a pretty large number of built-in functions. They're also called operators when you don't put parentheses around their arguments. When you use a built-in that only takes one arg, and you don't use parentheses, it's called a named unary operator.
Standard library
Perl comes with many modules in its standard library.
To see the list of them, run perldoc perlmodlib.
Keywords
The details are in perldoc perlsyn. Among others, you'll
find info on:
if,unless,elsif,elsefor,while,untilnext,last,redocontinue(seeperldoc -f continue)
Note, you can label loops, if you like.
LINE:
for my $line ( @lines ) {
#...
if ( $thus_and_so ) { next LINE }
}
Incidentally, a bare block is the same as a loop that only loops once.
It's contents have their own lexical scope, and you can exit it using
last (or even start it over using redo). if and unless
blocks are, of course, not loops.
When looping over a list, the variable being assigned to each item
($_ being the default) actually aliases the item. So, you can
change the list on the fly.
Lexicals, globals, and scoping
Perl provides 2 kinds of namespaces: "package" (i.e. "symbol tables"), and lexical. Package variables are globals (aka "package globals"), are dynamically scoped, and live in symbol tables. Lexicals are locals and live in unnamed lexical scopes. File scope is the largest possible lexical scope.
When one subroutine calls another sub, the 2nd (the callee) is in the dynamic scope of the 1st (caller). When one block is inside another the inner block is in the lexical scope of the outer one. Note however that when you get to the end of a block, you leave the current lexical and dynamic scopes.
Symbol tables are actually global hashes, and they contain the names of the variables in them. Lexicals are attached to an associated block and are unnamed. All the built-in globals (like
@ARGV,%ENV,$$, etc.) are located in the main symbol table.Incidentally, within each namespace there's a sub-namespace for each sigil (that's why
$fooand@fooare 2 different variables). If you want to refer to every "foo" in a symbol table, regardless of sigil, you use a "typeglob". Perl uses typeglobs to implement importing modules.
A fully-qualified package variable name, like
$Foods::Veg::Tomato::variety, shows you the structure of the nested
symbol tables -- the most deeply-nested of which contains the
variety scalar. It also indicates that the file path leading to
Tomato.pm looks like Foods/Veg/Tomato.pm.
When you call use, it imports package symbols (or else gives the
compiler some hints (as "pragmas")) for the current lexical scope. A
package declaration is also lexically scoped, and declares the name
of the current default package until the end of the current lexical
scope (usually the file).
Recall that calls to
usehappen at compile-time. Calls torequirehappen at run-time.
Aside from package and use (and require), the three operators
dealing with lexicals and package globals are my, our, and
local:
my- declares a lexically-scoped variable. It's name and value are both stored locally, only.
our- declares a lexically scoped name that refers to a
package global. Using the above example, your
Tomato.pmfile would containour $variety;in it.ourdoes not create values -- it just gives you access to the global, though, you can give access to and create at the same time:our $foo = 7;. local- sets up a temporary value for a package global for
only the current dynamic scope. That is, if you use
localin a sub, and then call another sub, in that 2nd sub you're still in the same dynamic scope, and so still will see that localized value. Once the 1st sub returns, you're back to the pre-localized value. Of course, same thing happens if you come out of a lexical scope -- you're back to the value that the package global had before the scope you were just in.
Context
When perl is evaluating an expression (at compile-time), what type
of value it expects to find (scalar or list) depends upon context.
Context is determined at compile-time when perl parses your source
code.
- If
perlis expecting a given expression to be a scalar, it tries to evaluate it so it provides a scalar. - If
perlis expecting a given expression to be a list, it tries to evaluate it so it provides a list.
If a scalar is put into a list context, it usually produces a one-element list.
If a list-producing expression is put into a scalar context, they hopefully evaluate to something useful. For example, an array will yield the number of elements it contains.
You can force a scalar context by using the scalar operator.
In the docs, when you see something like sort LIST, it means that
the sort operator provides a list context to its arguments.
Furthermore, if the operator provides a list context to an argument,
it also provides a list context to the elements of that list
argument.
Commonly-used variables
Besides $_, a few of them are:
@ARGV-- args passed into this script. Recall, the program's name is stored in$0, not in$ARGV[0].%ENV-- holds environment variables.%INC@INC%SIG-- to set up signal handlers.
See chap. 28 of the Camel for more, or read perldoc perlvar.
Quoting
qq{}(""),q{}('')qx{}(``)qw(),qv()m//,s///,tr///qr{}
If you need a multi-line string, use the heredoc syntax:
my $long_string = <<"END_OF_STRING"; # Note explicit quoting. la dee da va va va $voom ok, done END_OF_STRING
Various built-in functions
Perl comes with a healthy selection of built-in functions. For a great summary, see chap. 29 of the Camel.
For dealing with regexes
m//,s///- Captured groups go in
$1,$2, ... - There's also
$`,$&, and$'for pre-match, match, and post-match.
Remember that, inside the current regex (i.e. during the match),
you use \1. Outside the match, you use $1.
The /g regex modifier is for globally finding matches. That is,
the pattern match yields the following:
Resulting value after an attempted match
In scalar context
Without
/g:- If a match, returns true (1).
- If no match, returns the empty string (false).
With
/g("progressive match"):- If a match or no match, same as without
/g. However, each subsequent request for a match moves the position pointer to just after the previous match.
- If a match or no match, same as without
In list context
Without
/g- If a match, returns the list of matches captured by the grouping
parentheses (if there's no grouping parens, then returns
(1)). - If no match, returns the null list.
- If a match, returns the list of matches captured by the grouping
parentheses (if there's no grouping parens, then returns
With
/g:- If a match, and no grouping parentheses, returns a list of all matches found. If there's parens, returns the strings captured.
- If no match, same as without
/g.
Regarding s///
You can use /g with s/// as well, and does what you'd expect.
Regardless, s/// returns a number telling how many times it
succeeded in doing a replacement. But note, s///g in scalar context
is not progressive like m//g -- you need to manually loop for
something like that.
There's a lot more to regexes, of course. For details, see chapter 5
of the Camel, and/or perldoc perlre.
Files
open,close,chdir,glob,unlink,rename,mkdir,rmdir, ...
Regarding open:
open( my $in_file, '<', 'input.txt' ) or die "Agh!!";
open( my $out_file, '>', 'output.txt' ) or die "Oof!!";
my $one_line = <$in_file>;
my @all_lines = <$in_file>;
print {$outfile} @all_lines; # Note extra braces and lack of comma.
You can do tests on files, using their filename and the various -x
tests, such as -r (is readable), -w (is writable), -e
(exists), and so on.
Note that glob has some special magic: if it's in the condition of
a while, for, or until loop, each time through it'll give you
the next filename. This is analagous to the magic of the line input
operator.
Processes
system- The backtick for running a shell program and capturing its
output (ex.
my $foo = `date`). What's between the backticks is double-quotish.
Exception handling
See perldoc -f eval and perldoc -f die.
POD
Perl 5 POD is pretty non-frills. You indent what you want verbatim.
You can get I<italic>, B<bold>, and C<monospace>. Headings
are made with =headn where n is 1, 2, 3, or 4. Lists are made
with =over, =item * (or =item foo), and =back. End POD
with =cut. It works well for manpage-style docs.
Perl 6 Pod is available now, for Perl 5, and, as its creator writes: "Compared to Perl 5 POD, Perldoc's Pod dialect is much more uniform, somewhat more compact, and considerably more expressive." This quick reference is written in Perl 6 Pod.
Packages
A package is a namespace. At the top of your file you can specify the current package name:
package MyPackage;
and that sets the name of the default package for whatever follows.
You generally use CamelCase for package names. You can specify more than one package per file, but it's simpler to just have one per file. That is, one .pm file == one module == one package.
You name your file the same as the tail end of the package name, but
with .pm at the end. Further, if the package name contains ::,
you place the file in the corresponding directory. That is, for
package Foo::Bar::Baz;, you'd have ~/perllib/Foo/Bar/Baz.pm. For
you own simple modules, there'll usually be no colons in the package
name, and you'll just drop the files directly into your ~/perllib.
You'll need to have use lib '/home/you/perllib' in your source file
for it to find your own modules.
Using modules
To use modules, whether they come with the standard library or from elsewhere, you just name it near the top of your file like so:
use Foo::Bar; use Foo::Baz qw( func1 func2 );
In the above case, you can call any functions that are exported by
default from Foo::Bar, and you can call func1 and func2 from
Foo::Baz (if that module allows those functions to be exported).
You'll sometimes see something like:
use Foo::Qux -moo;
The two things going on there are:
- Putting a dash in front of a bareword does a little magic and makes it into a string which starts with a dash.
- Putting a lone string where a list is expected evaluates to a one-item list.
Some Perl idioms
When you need to do a number of search-replace operations on a given string:
for ( $st ) { s/foo/bar/; s/baz/moo/; s/qux/quux/; }Randomly picking one item from a list:
my @words = qw(foo bar baz); my $word = $words[ rand( @words ) ];
Other tips and best practices
The best place to look for a list of Perl best practices is the book "Perl Best Practices, by Damian Conway. If you haven't already read it, you probably want to read it. Here's some tips:
- Always
use strictanduse warnings. - Use
myfor variables you want to be local,ourfor ones you want to be global. You generally don't want to have too many globals. :) - Always use parentheses when calling non-built-in functions.
Only use the
unlessstatement modifier when the part that comes first is the usual case, and the modifier is for an "oh, by the way" situation. For example:go_to_store() unless $hurricane_outside;
You can also use modifiers with the various loop operators:
while (<>) { next if m/$bad_coffee_cake/; last if m/$tomato_too_soft/; #... }Regex tips:
- Except for very simple matches, always use
/xms. Then\Aand\zare beginning and end of string. - Prefer
m//xmsto//xms. - Use non-capturing parentheses (
(?:...)) when you don't intend on capturing anything. - Consider using canned regexen via
Regexp::Commonsometimes.
- Except for very simple matches, always use
- Don't
use constant;.use Readonly;instead. You'll need to grab it from CPAN. - Always quote your heredoc markers (after the
<<). - Use the fat comma for pairing.
- If you need to change the value of a punctuation variable, always localize it first.
use Englishfor the less-familiar punctuation variables.When you really do need to know indexes of values in a list:
my @ar = qw( foo bar baz moo poo qux ); for my $i ( 0 .. $#ar ) { my $word = $ar[$i]; #... }A named lexical iterator variable in a while loop looks like:
while ( my $line = <> ) { next if $line =~ m/^#/; }- Label your loops if you're using
next,last, orredoin them. - Always use a block with
mapandgrep. - Call your own functions with parentheses and without
&. When writing a function that takes more than 3 args, use a hashref to pass them in with names. As in:
sub foo { my ( $arg_ref ) = @_; # $arg_ref->{bar} # $arg_ref->{baz} # $arg_ref->{moo} # $arg_ref->{qux} #... } foo({ bar => 'the bar', baz => 'shirley temple', moo => 'love boat', qux => 'Isaac', });Check for hash key presence like so:
my $ans = exists $q_for{ans} ? $q_for{ans} : 42;- Always return from subs using an explicit
return. - Use a bare
returnto return failure. - To read in the contents of a whole file as one long string:
my $file_contents = do { local $/; <$infile> };. Or just usePerl6::Slurp.
Modules to make use of
Scalar::Util,List::Util, andList::MoreUtils(but notHash::Util) (PBP chp. 8, p. 170)- You can
IO::Interactive'sis_interactive()function, or just haveIO::Prompttake care of everything for you. CarpGetopt::Longfor command-line option processingMoosefor OO development.- For version numbers, use 3-part numbers that go like
revision.version.subversion (ex.
qv('2.0.1')) using theversionstandard module. Don't use vstrings (ex. "v2.0.1") or floats (ex. "2.000_01"). Common practice is for stable or maintenance versions of a distribution to have an even version number component, while development versions are odd. Generally, if you break API compatibility, you bump the version number component. Major redesigns are usually accompanied by an incremented revision number component. - Use
Module::Starterfor creating your own modules. Don't forget to check out theModule::Starter::PBPplug-in for it. Module::Build(used in yourBuild.PLfile)Config::StdorConfig::Tiny.Test::SimpleorTest::More.- See Appendix D of PBP