One of many backonyms for Perl is “Practical Extraction and Reporting Language”. The ability to create reports complete with headers, pagination and justified output is a feature built into the language yet I’m going to hazard a guess that few Perl programmers have ever used it. While you can go a long way with sprintf
, producing justified output, in particular, is non-trivial. If you need it, you may as well let Perl do the grunt work for you.
Declaring and using formats
Formats are declared with the format
keyword and formatted output is produced using the write
function (note: not the same function as the system call; that’s syswrite
). A format
declaration takes the following form:
format NAME = @field @field ... "picture" line $variable, $variable ... argument line # Comment line .
There can be any number of picture and argument lines. Each pair defines the pattern for a line of output. The @field
is the ‘@’ character followed by by one or more ‘<‘, ‘>’, ‘|’ or ‘#’ characters. Details (and examples) are:
- @<<<<<
Six characters, left justified - @>>>>
Five characters, right justified - @|||
Four characters, centre justified - @###.##
Decimal number with four digits in front of the decimal point and two after.
Note that the ‘@’ character is included in the character count.
The second line is a list of variable names (or constants) that hold the field values. You may see examples that show these space-separated like the field declarations. Follow this example if you like deprecation warnings, otherwise comma separate them. The format declaration is terminated by a lone ‘.’ at the start of the line. Here’s a simple script that will print its input centre-justified:
#!/usr/bin/perl -w use strict; format STDOUT = @||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| $_ . write while (<>);
Note the name “STDOUT”; this is the default name for a format and may be omitted. To define a page header create a format with the name <NAME>_TOP
, for example, “STDOUT_TOP”.
Full program
We’re going to create a program that will create a formatted report of the errno definitions for your system. Each line in the output will show the numeric value, right justified, the constant declaration and the error description.
errno definitions
These are found in /usr/include/errno.h
:
$ cat /usr/include/errno.h ...License text #include <sys/errno.h>
Seems like it’s not going to be that easy. sys/errno.h
includes a bunch of other files each of which include other files and so on. We could do this the hard way or we could note that there is already a program that can handle files which include other files which include yet other files: the C preprocessor. If you are using gcc
, the invocation that shows preprocessor output is gcc -E
. If you are using the Microsoft compiler this will be cl /E
:
$ echo '#include <errno.h>' | gcc -E - ...Some output # 1 "/usr/include/errno.h" 1 3 4 # 23 "/usr/include/errno.h" 3 4 # 1 "/usr/include/sys/errno.h" 1 3 4 # 72 "/usr/include/sys/errno.h" 3 4 ...More output
The interesting pattern is:
# <LINENO> "<FILENAME>" 1 ...
This is emitted when the preprocessor opens a file. Inside each include file, we’re looking for two types of line:
#define EAGAIN 35 /* Blah blah */ ... #define EWOULDBLOCK EAGAIN /* Blah blah */
We now have enough information to write the bit of our program that gathers data:
#!/usr/bin/perl # Produces formatted report of errno definitions use strict; use warnings qw/all/; # errno.h is typically a whole bunch of other # includes; rather than trying to process # it ourselves, let's get the C-preprocessor # to tell us where the includes are my $child; open($child, "-|", "echo '#include <errno.h>' | gcc -E -") // die("Can't exec gcc: $!"); my @includes; while(<$child>) { if (/^# \d* "(\/[^"]*errno[^"]*)" 1/) { # For example: # 1 "/usr/include/errno.h" 1 3 4 # 1 after the filename means "new file" push(@includes, $1); } } close($child); my @errno; my %no = (); my ($fh, $line); # As we grind over the error definitions, let's # keep track of the maximum field length; # this is for the purpose of constructing # formats later my ($maxN, $maxC, $maxD) = (0, 0, 0); my ($const, $val, $desc); for my $inc (@includes) { open($fh, "<", $inc) or die("Unable to read $inc: $!"); while ($line = <$fh>) { if ($line =~ /^#define\s*([A-Z]*)\s*(\d*)\s*\/\*\s*([^\*]*)\*/) { # For example: # define EPERM 1 /* ... */ ($const, $val, $desc) = ($1, $2 + 0, $3); # Lose whitespace from the end # of the description field $desc =~ s/\s*$//; $no{$const} = $val; push(@errno, { const => $const, errno => $val, desc => $desc }); } elsif ($line =~ /^#define\s*([A-Z]*)\s*([A-Z]*)\s*\/\*\s*([^\*]*)\*/) { # For example: #define EWOULDBLOCK EAGAIN /* ... */ ($const, $val, $desc) = ($1, $2, $3); if (exists($no{$val})) { # Resolve the errno $val = $no{$val}; $desc =~ s/\s*$//; push(@errno, { const => $const, errno => $val, desc => $desc }); } } else { next; } if (length($val) > $maxN) { $maxN = length($val); } if (length($const) > $maxC) { $maxC = length($const); } if (length($desc) > $maxD) { $maxD = length($desc); } } close($fh); } @errno = sort { $a->{errno} <=> $b->{errno} } @errno;
Apologies for those regular expressions that pick out the #define
statements. The principal cause of line noise is that we’re looking for C comments, /* ... */
, which means four backslash escapes.
Creating the format definitions
Although we know our field widths, we face a problem: the picture lines in a format declaration require you to know the field widths ahead of time. Fortunately, we can get round this by building up the format declaration programmatically and then eval
‘ing it:
# Build up formats; # field is right aligned my $fmt = '@' . ('>' x ($maxN - 1)) . " @" . ('<' x ($maxC - 1)) . " @" . ('<' x ($maxD - 1)) . "\n\$val, \$const, \$desc\n."; my $fmtHdr = '@' . ('>' x ($maxN - 1)) . " @" . ('<' x ($maxC - 1)) . " @" . ('<' x ($maxD - 1)) . "\n'#', '#define', 'Description'\n" . ('-' x ($maxN + $maxC + $maxD + 4)) . "\n."; eval("format STDOUT =\n$fmt\n"); eval("format STDOUT_TOP =\n$fmtHdr\n");
What we’re eval
‘ing looks something like this:
format STDOUT = @>> @<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<< $val, $const, $desc .
The header format that we build up is similar with the addition of an underline.
Producing output
Perl paginates report output adding a page break and a header after every sixty lines of output unless told otherwise. For our purposes, this is not desirable behaviour. We want a header but not paginated output. We can get round this by changing the variable that controls the number of lines per page, $=
:
$= = 2 + scalar(@errno);
We need to add 2 because the header is two lines. Now it’s just a matter of looping over our error information:
for my $errInf (@errno) { $val = $errInf->{errno}; $const = $errInf->{const}; $desc = $errInf->{desc}; write(); }
Let’s try it out:
$ ./errno2fmt # #define Description ------------------------------------------- 1 EPERM Operation not permitted 2 ENOENT No such file or directory ... 106 ELAST Must be equal largest errno
The output will be different depending on your operating system. You can download the complete script here.
Finding out more
This post has barely scratched the surface of Perl’s reporting capabilities. For the full lowdown, check out the perldoc
.