Report Writing on a Budget: Using Perl

Hallett German

GTE Laboratories, Inc.


Copyright (c) 1992 All Rights Reserved

Return to Home Page Publications


Introduction

At last year's NESUG, Marge Scerbo presented an interesting paper showing how a few simple SAS® datasteps statement could be used to generate powerful and customizable reports. As I read through the paper, I wondered "Gee, I could do most of this in Perl. Or can I?" This paper is a response to that thought. The following is an outline of the paper:

  1. What is Perl?
  2. How can I learn more about Perl?
  3. Perl Concepts
  4. Basic Reports -- SAS vs. Perl
  5. Conclusions
  6. References

After reading the paper, you should have a good overview of Perl's reporting capabilities and hopefully be encouraged to create your own reports with this command language.

What is Perl?

Perl was developed by Larry Wall starting in 1986. It officially stands for Practical Extraction and Report Language. [But there are those who say that like SAS it is a group of letters with no meaning in itself. You be the judge.]

Perl is a powerful command language that has elements of C, UNIX shells, awk, sed, and much more. The result is a self-contained portable language. Perl is now almost a de facto standard with UNIX system administrators. [It also is used internally at the SAS Institute.]

Perl's appeal is also because it is distributed with source and available free as part of GNU public software.It can be obtained via e-mail or from various anonymous ftp sites. Per l can now be found under AmigaOS,Atari OS, DOS [it runs fine under MS-Windows], Macintosh, UNIX, and VMS.

Perl contains many different elements:

-- Over 100 built-in functions
-- A rich built-in library
-- networking capabilities
-- database capabilities
-- C interfaces
-- debugger
-- report capabilities
-- converters (awk, sed, C header libraries to Perl)

Many utilities and interfaces have been built with Perl. These include interfaces to Oracle, Sybase, Curse, and X Windows.

How can I learn more about Perl?

Here are some places to look:

* A free man (help) document has over 100 pages on Perl. A formatted copy can be obtained from the anonymous ftp site chem.bu.edu.
* Various conferences give tutorials on Perl. These include USENIX, SUG (SUN), and DECUS (DEC).
* The Usenet group comp.lang.perl is a treasure trove of Perl tips. Perl's creator Larry Wall is actively posting useful messages there.
* Once a month, a FAQ (frequently asked questions) list is posted on comp.lang.perl
* The Wall and Schwartz book (see references) is considered the source on Perl. An advanced Perl book is planned.
* The German book covers Perl portability and has a healthy number of Perl references.

Perl Concepts

Before looking at our first Perl report, it is helpful to understand the following:

* Perl statements must be in lowercase except for filenames, and subroutines.
* Perl statements must end with a semicolon. [Making SAS users feel right at home.]
* A series of statements may be processed as a block. A block is contained within braces. (i.e. {})
* Comments begin with a #.
* Perl supports a number of data types each with its own unique identifier:

- $ -- Scalar variables may contain numbers (including decimals, characters, or Boolean (1,0). Scalars also may hold the elements of simple and associative arrays.

examples:

$a = 1; #Assigned a number
$a = "dog" #Assigned string

- @ -- Simple arrays. Can contain elements with numbers or characters. Each element is designated by a numeric key marking the position in the array.

examples:

@array1 #Entire array
$array1[0] #First element in array
$array1[$#array1] #Last element in array.

- % -- Associative arrays. Can contain elements with numbers or characters. Each element is designated by a numeric OR character key marking the position in the array. Associative arrays are beyond the scope of this paper.

* The following are some of the functions that are used in these examples:

- CLOSE. Closes an open file.

- DIE. If a condition is met then die (end program) with an optional message. A WARN function is also available.

- OPEN. A powerful command. May open a file for reading (default), writing, or both! An alias for the file is assigned by the user. (Like SAS's libref or fileref component in a LIBNAME or FILENAME statement.) Also may be used like SAS's LIBNAME PIPE/FILENAME PIPE statements to pipe output from a n operating system command to or from a file.

Basic Reports -- SAS vs Perl: Input Forms

[Do note that all examples shown are "standard Perl" and should be portable across operating systems. I created these examples on MS-DOS or a Macintosh and ran them of UNIX "as is!"]

Data may be inputted two different ways. Interactively and non-interactively:

Interactively:

The following is a simple program that takes user input and writes it to a file. The chop function removes the newline.

open(FILE1,">>input.txt");
$cnt = 1;

un:
print "Enter the NAME of the University\n";
$univ=substr(<STDIN>,0,21);
chop($univ);

cy:
print "Enter the CITY of the University\n";
$city=substr(<STDIN>,0,16);
chop($city);

printit:
print FILE1 "$univ $city \n";
print "Do you wish to enter another record? Y/N\n";
$choice=substr(<STDIN>,0,1);
if ($choice eq "Y") {$cnt++; goto un;}
else {die "$cnt records added\n";}

This approach is ideal for small databases. A rich range of data checking is possible.

Non-interactively:

For smaller files, you can pre-build an array that contains values:

@array1= ("Brown University Providence","Cornell Ithaca");

For larger files, it is recommended to use compressed files or dbm files.:

Compressed (Binary) Files: Files with variable-length records are compressed and uncompressed using the pack/unpack functions. This is shown a little later in the paper. They can also be set up as random-access files

DBM files. DBM stands for Data Base Management. DBM is available in some format for all Perl interpreters except the Amiga and the Macintosh.. This is done using associative arrays and is beyond the scope of this paper.

Basic Reports -- SAS vs Perl: Input Forms

Report #1 -- A Simple List

The following report should be produced:

BROWN UNIVERSITY
PROVIDENCE
CORNELL
ITHACA
UNIV OF MARYLAND
BALTIMORE
UCLA
LOS ANGELES
COLUMBIA
NYC
SYRACUSE UNIV.
SYRACUSE

To do this, the program will also: 1) split the "fields" of the "record" to appear on two lines and 2) convert the values of these fields to uppercase regardless whatever was the original case of the value.

Here is the program that creates both the input record and the report:

Example 1 -- Standard Approach.

#########################
# a. Create an array #
#########################
$fileo = "ex1.txt"; #Set value for file
@array1= ("Brown University Providence",
"Cornell Ithaca",
"Univ of Maryland Baltimore",
"UCLA Los Angeles",
"Columbia NYC",
"Syracuse Univ. Syracuse");

########################
# b. Open a file for writing #
########################
open(EX1,">$fileo");
foreach $cnt (0 .. $#array1) {

############################
# c. Split the "record" into two fields #
############################
($univ,$loc) = split(' ',$array1[$cnt]);

#############################
# d. Translate record to uppercase #
#############################
($university = $univ) =~ tr/a-z/A-Z/;
($location = $loc) =~ tr/a-z/A-Z/;

############################
#e. Write out record and close file #
############################
print EX1 "$university\n$location\n";
}
close(EX1);

Note that a scalar variable contains the value of thefile name. This allows you to easily change a file name IN ONE PLACE ONLY when needed.

Report #2 -- A Formatted List

Formatted list like the one below can also be created with Perl.

BROWN UNIVERSITY PROVIDENCE
CORNELL ITHACA
UNIV OF MARYLAND BALTIMORE
UCLA LOS ANGELES
COLUMBIA NYC
SYRACUSE UNIV SYRACUSE

Note that it would be easy to add the UNIV text as in Marge's example. The following part creates the binary file:

#Example 2 -- Fixed Records (Use Pack/Unpack) Input Part

####################################
#a. Create an array #
####################################
@univs = ( "Brown University", "Providence",
"Cornell", "Ithaca",
"Univ of Maryland", "Baltimore",
"UCLA", "Los Angeles",
"Columbia", "NYC",
"Syracuse Univ.", "Syracuse");

####################################
#b. Open a file for writing #
####################################
open (EX2,">ex2.txt")
|| die "Can't open ex2.txt $!\n"; #exception handling

####################################
#c. Go through array #
####################################
foreach $i (0 .. $#univs) {

####################################
#d. If university, then assign to $university. #
####################################
if (($i == 0) || (length($i/2)==1)){ #first record
$university = $univs[$i];
}

####################################
#e. If location, then assign to $location .#
# write out "packed" record #
# close file #
####################################
if (length($i/2)==3) { #location
$location = $univs[$i];
$line = pack("A20 A15",$university,$location);
print EX2 $line;
}
}
close(EX2);

This example is used to retrieve and unpack the records from the file and create the report:

Example 2 -- Fixed Records (Use Pack/UnPack) Report Part

##############################
# a. open file and retrieve packed line #
#############################

file_part:
open (EXP2,"ex2.txt") || die "Can't open ex2.txt $!\n";
while (<EXP2>) {
chop;
$line = $_;
}
close(EXP2);

#############################################
# b. Loop through line: #
# 1) Unpack line, 2) Strip leading characters, 3) Rejoin line #
# 4) Set line to uppercase, 5) Print line #

##############################################
rpt_part:
$len = length($line);
for($offset=0;($offset<$len);$offset=$offset+34) {
$lin = substr($line,$offset);
($univ,$loc) = unpack("A20 A15",$lin); @univ=split(' ',$univ); #Trim leading blanks
@loc=split(' ',$loc);
$unn = join(' ',$univ[0],$univ[1],$univ[2]);
($univ= $unn) =~ tr/a-z/A-Z/; #Change to uppercase
$lon = join(' ',$loc[0],$loc[1]);
($loc= $lon) =~ tr/a-z/A-Z/;
printf "%20s %15s\n",$univ,$loc; #formatted print

Example #3 Creating a formatted report using Perl.

Perl has a powerful report facility that can do pretty much anything SAS can with PUT statements. Here is a simple example:

University List
University State Zip
BROWN UNIVERSITY RI
UNIV. OF MARYLAND MD 21201
UCLA CA
COLUMBIA NY 10005
SYRACUSE UNIV NY 13112

This is the data as stored in the input file: [Note the * as a field delimiter]

Brown University*ri*
Univ. of Maryland*md*21201
UCLA*CA*
Columbia*ny*10005
Syracuse Univ*ny*13112

This is the Perl script that generated it: [Note that you first create a template and then use it.]

Example 3 -- Using Formatted Reports

#Create a header format. Period = end of format. format HEAD1=
University List
University State Zip
.
#Define report format. Accent = blank line
format EX3B=
~
#<<< -- Place holder and left justification
@<<<<<<<<<<<<<<<<<<< @<< @<<<<<

#Variables in report
$un, $st, $zip
.
open(EX3A,"ex3a.txt") || (die "cant open ex3a.txt $!\n");
open(EX3B, ">ex3b.txt") || (die "cant open ex3b.txt $!\n");

# System Variables $^ -- header format name
# $~ -- report format name
select (EX3B); $^ = "HEAD1"; $~ = "EX3B";
while (<EX3A>) {
chop;
($unn,$stt,$zipp) = split(/\*/,$_); #Parse fields
($un= $unn) =~ tr/a-z/A-Z/; #Set to Uppercase
($st= $stt) =~ tr/a-z/A-Z/;
($zip= $zipp) =~ tr/a-z/A-Z/;
write(EX3B); #Write out report
}
close(EX3A);
close(EX3B);

Here is a list of report variables:

$| 0 (default) writes out buffer every x lines.

>0 Writes out buffer after a write or print.

$% Current Page number
$= Current page length. Default=60.
$- Number of lines left on a page available for writing.
$~ Current report format
$^ Current header format

Many other capabilities are possible such as sorting records, changing lines per page, and generating footers. Unfortunately, it would take far more pages than I have to cover that material.

Conclusions

This can only be the briefest of introduction to Perl's reporting capabilities. It offers a strong (and free) alternative for SAS in doing simple reports. The reader is encouraged to try the examples and read the suggested references. Posters in future years may discuss some of Perl's advanced reporting capabilities and how to create interactive Perl applications.

Getting in touch with me/Trademarks

Hallett German
GTE Laboratories Inc
40 Sylvan Road
Waltham, Ma 02254
Mail to me
SAS ® and all other SAS products mentioned are a registered trademark of the SAS Institute

References [Annotated]

Bates, Douglas "Data Manipulation in Perl" Unpublished Paper pp1-6.
[Strongly recommended. Has a good section on how to use Perl to clean up datafiles. Some of this capability was added into the 6.07 release.]

German, Hallett Command Language Cookbook 1992 Van Nostrand Reinhold pp. 247-305
[Has plenty of Perl references and a good discussion on Perl portability.]

Scerbo, Marge "Data Step Reporting" NESUG 91 Proceedings 1991 pp.
[If you want to see how to generate the same examples using SAS, look at Marge's paper.]

Wall, Larry and Randall L. Schwartz Programming Perl 1991 O'Reilly & Associates. pp 1-42, 106-118
[The Perl "bible". Also called the Camel book because what is on the cover. A reference, tutorial, and code ideas book all in one place. Strongly recommended.]