At last year's NESUG, Marge Scerbo
presented an interesting paper showing how a few simple SAS® datasteps statement could be used to generate powerful and
customizable reports. As I read through the paper, I wondered
"Gee, I could do most
of this in Perl. Or can I?" This paper is a response to that thought.
The
following is an outline
of the paper:
After reading the paper, you should
have a good overview of
Perl's reporting capabilities and hopefully be encouraged to create
your own reports with this command language.
Perl was developed by Larry Wall
starting in 1986. It officially
stands for Practical Extraction and Report Language.
[But there are those who say that like SAS it is a group of letters
with no meaning in itself. You
be the judge.]
Perl is a powerful command language
that has elements of C, UNIX shells, awk, sed, and much
more. The result is a self-contained
portable language. Perl is now almost a de facto standard with
UNIX system administrators. [It
also is used internally at
the SAS Institute.]
Perl's appeal is also because it
is distributed with source and available free as part of GNU public
software.It can be obtained via
e-mail or from various anonymous ftp sites. Per l can now be found
under AmigaOS,Atari OS, DOS [it runs fine under MS-Windows], Macintosh,
UNIX, and VMS.
Perl contains many different elements:
-- Over 100 built-in functions
-- A rich built-in library
-- networking capabilities
-- database capabilities
-- C interfaces
-- debugger
-- report capabilities
-- converters (awk, sed, C header
libraries to Perl)
Many utilities and interfaces have
been built with Perl. These include interfaces to Oracle, Sybase,
Curse, and X Windows.
Here are some places to look:
* A free man (help) document has
over 100 pages on Perl. A formatted copy can be obtained from
the anonymous ftp site chem.bu.edu.
* Various conferences give tutorials
on Perl. These include USENIX,
SUG (SUN), and DECUS (DEC).
* The Usenet group comp.lang.perl
is a treasure trove of Perl tips. Perl's creator Larry Wall is
actively posting useful messages
there.
* Once a month, a FAQ (frequently
asked questions) list is
posted on comp.lang.perl
* The Wall and Schwartz book (see
references) is considered
the source on Perl. An advanced Perl book is planned.
* The German book covers Perl portability
and has a healthy number
of Perl references.
Before looking at our first Perl
report, it is helpful to understand the following:
* Perl statements must be in lowercase
except for filenames, and subroutines.
* Perl statements must end with
a semicolon. [Making SAS users feel right at home.]
* A series of statements may be
processed as a block. A block is contained within braces. (i.e.
{})
* Comments
begin with a #.
* Perl supports a number of data
types each with its own unique identifier:
- $ -- Scalar variables may
contain numbers (including decimals, characters, or Boolean (1,0).
Scalars also may hold the elements of simple and associative arrays.
examples:
$a = 1; #Assigned a number
$a = "dog" #Assigned
string
- @ -- Simple arrays.
Can contain elements with numbers
or characters. Each element is designated by
a numeric key marking the position in the array.
examples:
@array1 #Entire array
$array1[0] #First element in
array
$array1[$#array1] #Last element
in array.
- % -- Associative arrays.
Can contain elements with numbers or characters. Each element
is designated by a numeric OR character key marking the position
in the array. Associative arrays are beyond the scope of this
paper.
* The following are some of the
functions that are used in
these examples:
- CLOSE. Closes an open file.
- DIE. If a condition is met then
die (end program) with an
optional message. A WARN function is also available.
- OPEN. A powerful command. May
open a file for reading (default),
writing, or both! An alias for the file is assigned by the user.
(Like SAS's libref or fileref component in a LIBNAME or FILENAME
statement.) Also may be used like SAS's LIBNAME PIPE/FILENAME
PIPE statements to pipe output from a n operating system command
to or from a file.
[Do note that all examples shown
are "standard Perl" and should be portable across operating
systems. I created these examples on MS-DOS or a Macintosh and
ran them of UNIX "as is!"]
Data may be inputted two different
ways. Interactively and non-interactively:
The following is a simple program
that takes user input and writes it to a file. The chop function
removes the newline.
open(FILE1,">>input.txt");
$cnt = 1;
un:
print "Enter the NAME of the
University\n";
$univ=substr(<STDIN>,0,21);
chop($univ);
cy:
print "Enter the CITY of the
University\n";
$city=substr(<STDIN>,0,16);
chop($city);
printit:
print FILE1 "$univ $city \n";
print "Do you wish to enter
another record? Y/N\n";
$choice=substr(<STDIN>,0,1);
if ($choice eq "Y") {$cnt++;
goto un;}
else {die "$cnt records added\n";}
This approach is ideal for small
databases. A rich range of data checking is possible.
For smaller files, you can pre-build
an array that contains values:
@array1= ("Brown University
Providence","Cornell Ithaca");
For larger files, it is recommended
to use compressed files or dbm files.:
Compressed (Binary) Files: Files
with variable-length records are compressed and uncompressed using
the pack/unpack functions. This is shown a little later in the
paper. They can also be set up as random-access files
DBM files.
DBM stands for Data Base Management. DBM is available in some
format for all Perl interpreters except the Amiga and the Macintosh..
This is done using associative arrays and is beyond the scope
of this paper.
Report #1 -- A Simple List
The following report should be produced:
BROWN UNIVERSITY
PROVIDENCE
CORNELL
ITHACA
UNIV OF MARYLAND
BALTIMORE
UCLA
LOS ANGELES
COLUMBIA
NYC
SYRACUSE UNIV.
SYRACUSE
To do this, the program will also:
1) split the "fields" of
the "record" to appear on two lines and 2) convert
the values of these fields to uppercase
regardless whatever was the
original case of the value.
Here is the program that creates
both the input record and the report:
#########################
# a. Create an array #
#########################
$fileo = "ex1.txt"; #Set
value for file
@array1= ("Brown University
Providence",
"Cornell Ithaca",
"Univ of Maryland Baltimore",
"UCLA Los Angeles",
"Columbia NYC",
"Syracuse Univ. Syracuse");
########################
# b. Open a file for writing #
########################
open(EX1,">$fileo");
foreach $cnt (0 .. $#array1) {
############################
# c. Split the "record"
into two fields #
############################
($univ,$loc) = split(' ',$array1[$cnt]);
#############################
# d. Translate record to uppercase
#
#############################
($university = $univ) =~ tr/a-z/A-Z/;
($location = $loc) =~ tr/a-z/A-Z/;
############################
#e. Write out record and close file
#
############################
print EX1 "$university\n$location\n";
}
close(EX1);
Note that a scalar variable contains
the value of thefile name. This allows you to easily change a
file name IN ONE PLACE ONLY when needed.
Formatted list like the one below
can also be created with Perl.
| BROWN UNIVERSITY | PROVIDENCE |
| CORNELL | ITHACA |
| UNIV OF MARYLAND | BALTIMORE |
| UCLA | LOS ANGELES |
| COLUMBIA | NYC |
| SYRACUSE UNIV | SYRACUSE |
Note that it would be easy to add
the UNIV text as in Marge's
example. The following part creates the binary file:
#Example 2 -- Fixed Records (Use Pack/Unpack) Input Part
####################################
#a. Create an array #
####################################
@univs = ( "Brown University",
"Providence",
"Cornell", "Ithaca",
"Univ of Maryland", "Baltimore",
"UCLA", "Los Angeles",
"Columbia", "NYC",
"Syracuse Univ.", "Syracuse");
####################################
#b. Open a file for writing #
####################################
open (EX2,">ex2.txt")
|| die "Can't open ex2.txt
$!\n"; #exception handling
####################################
#c. Go through array #
####################################
foreach $i (0 .. $#univs) {
####################################
#d. If university, then assign to
$university. #
####################################
if (($i == 0) || (length($i/2)==1)){
#first record
$university = $univs[$i];
}
####################################
#e. If location, then assign to
$location .#
# write out "packed" record
#
# close file #
####################################
if (length($i/2)==3) { #location
$location = $univs[$i];
$line = pack("A20 A15",$university,$location);
print EX2 $line;
}
}
close(EX2);
This example is used to retrieve
and unpack the records from the file and create the report:
##############################
# a. open file and retrieve packed
line #
#############################
file_part:
open (EXP2,"ex2.txt")
|| die "Can't open ex2.txt
$!\n";
while (<EXP2>) {
chop;
$line = $_;
}
close(EXP2);
#############################################
# b. Loop through line: #
# 1) Unpack line, 2) Strip leading
characters, 3) Rejoin line #
# 4) Set line to uppercase, 5)
Print line #
##############################################
Perl has a powerful report facility
that can do pretty much anything SAS can with PUT statements.
Here is a simple example:
This is the data as stored in the
input file: [Note the * as a field delimiter]
Brown University*ri*
This is the Perl script that generated
it: [Note that you first create a template and then use it.]
#Create a header format. Period
= end of format. format HEAD1=
# System Variables $^ -- header
format name
Here is a list of report variables:
>0 Writes out buffer after a write or print.
Many other capabilities are possible
such as sorting records,
changing lines per page, and generating footers.
Unfortunately, it would take far more pages than I have to cover
that material.
This can only be the briefest of
introduction to Perl's reporting
capabilities. It offers a strong (and free) alternative for SAS
in doing simple reports. The reader is encouraged to try the examples
and read the suggested references. Posters in future years may
discuss some of Perl's advanced reporting capabilities and how
to create interactive Perl applications.
Getting in touch with me/Trademarks
Hallett German
References [Annotated]
Bates, Douglas "Data Manipulation
in Perl" Unpublished
Paper pp1-6.
German, Hallett Command Language
Cookbook 1992 Van Nostrand
Reinhold pp. 247-305
Scerbo, Marge "Data Step Reporting"
NESUG 91 Proceedings 1991
pp.
Wall, Larry and Randall L. Schwartz
Programming Perl 1991 O'Reilly & Associates. pp 1-42, 106-118
rpt_part:
$len = length($line);
for($offset=0;($offset<$len);$offset=$offset+34)
{
$lin = substr($line,$offset);
($univ,$loc) = unpack("A20
A15",$lin);
@loc=split(' ',$loc);
$unn = join(' ',$univ[0],$univ[1],$univ[2]);
($univ= $unn) =~ tr/a-z/A-Z/; #Change
to uppercase
$lon = join(' ',$loc[0],$loc[1]);
($loc= $lon) =~ tr/a-z/A-Z/;
printf "%20s %15s\n",$univ,$loc;
#formatted print
University List
University
State
Zip BROWN UNIVERSITY
RI
UNIV. OF MARYLAND
MD
21201 UCLA
CA
COLUMBIA
NY
10005 SYRACUSE UNIV
NY
13112
Univ. of Maryland*md*21201
UCLA*CA*
Columbia*ny*10005
Syracuse Univ*ny*13112
University List
University State Zip
.
#Define report format. Accent =
blank line
format EX3B=
~
#<<< -- Place holder and
left justification
@<<<<<<<<<<<<<<<<<<<
@<< @<<<<<
#Variables in report
$un, $st, $zip
.
open(EX3A,"ex3a.txt")
|| (die "cant open ex3a.txt $!\n");
open(EX3B, ">ex3b.txt")
|| (die "cant open ex3b.txt $!\n");
# $~ -- report format name
select (EX3B); $^ = "HEAD1";
$~ = "EX3B";
while (<EX3A>) {
chop;
($unn,$stt,$zipp) = split(/\*/,$_);
#Parse fields
($un= $unn) =~ tr/a-z/A-Z/; #Set
to Uppercase
($st= $stt) =~ tr/a-z/A-Z/;
($zip= $zipp) =~ tr/a-z/A-Z/;
write(EX3B); #Write out report
}
close(EX3A);
close(EX3B);
$|
0 (default) writes out buffer every x lines.
$%
Current Page number
$=
Current page length. Default=60.
$-
Number of lines left on a page available for writing.
$~
Current report format
$^
Current header format
GTE Laboratories Inc
40 Sylvan Road
Waltham, Ma 02254
Mail to me
SAS ® and all other SAS products
mentioned are a registered trademark of the SAS Institute
[Strongly recommended. Has a good
section on how to use Perl
to clean up datafiles. Some of this capability was added into the 6.07
release.]
[Has plenty of Perl references and
a good discussion on Perl portability.]
[If you want to see how to generate
the same examples using SAS, look at Marge's paper.]
[The Perl "bible". Also
called the Camel book because what is on the cover. A reference,
tutorial, and code ideas book all in one place. Strongly recommended.]