...making Linux just a little more fun!

<-- prev | next -->

Perl One-Liner of the Month: The Count of Corpus Christi (TX)

By Ben Okopnik

When I first published these notes, there seemed to be some small public interest in seeing them continue; later, I thought I perceived a waning of this interest, and abandoned the project. Since then, on regular occasions, I have received explosives in the mail   promises to extract my kidney through my nose several gentle reminders that restarting the process would not be taken amiss; therefore, I once again take up my weary pen in the hope that my small effort shall please our readers.

In order to acquaint those who are unfamiliar with the tale of The Great Internet Detective, I here reprint the notice that was posted at the top of the very first Woomert and Frink story that ever saw light. Some may see it as a cowardly version of "don't shoot me, I'm only the messenger!" - but I assure you that every word is as true as pseudology and logodaedaly can make it. And that's how the matters lie at the present.

(Reprinted from http://linuxgazette.net/issue84/okopnik.html)

A Reporter's Note

Recently, I became acquainted with a set of documents which journal the adventures and experiences of none other than Woomert Foonly, the world-famous but strangely reticent Hard-Nosed Computer Detective. To the best of my knowledge, the information they contain is authentic. My anonymous correspondent - whom I suspect to be Frink Ooblick, the great man's sidekick and confidant, although my sworn promise forbids me to investigate - had emailed me a short note which engaged my interest, then sent me an encrypted file which took several nights of concerted hacking effort to decrypt (he seems to think that this indicates a sense of humor on his part). This has become the pattern: once in a while, I'll receive a file from a sender whose name matches a complex regular expression (the procmail recipe for this has grown to several pages, and now seems to be mutating on its own). I then have to drop whatever I'm doing and begin hacking at high speed - the encryption method is, in some manner which I've been unable to puzzle out, time-dependent, and becomes much more difficult to break in a few short hours. In our early exchanges, I had been granted permission to publish this material. My correspondent has stated that he chooses to keep his identity private, and is satisfied to receive no credit for his work. I present it here, though I can't claim authorship, in the hope of casting some light on the work of that great detective whose exploits had until now been shrouded in deepest mystery.

I have little to add to the above, save a few words of caution: since the code used in these adventures often depends on "side effects" rather than the core features of Perl, it should not be used for production or be incorporated in working code. This kind of thing is referred to as Perl Golf - i.e., an entertaining game for Perl enthusiasts that abuses standard programming practice to demonstrate the power and flexibility of Perl at the cost of readability - a tradeoff that should never be made or accepted in code intended for use by other people. It is, however, fun to explore and useful for one-time tools - and the experience of parsing it can be instructive for those who are trying to read someone else's "code from hell".

Let the wise reader enjoy but beware.

Ben Okopnik
On board S/V "Ulysses", Dec 22 2006


When Woomert walked into the breakfast nook, he spotted Frink, stretched out in the big armchair and apparently completely at ease. Every once in a while, the young man would reach over to his laptop and type a few characters, examine the result, and lie back with a satisfied grin. Woomert tapped the 'start' button on the gleaming stove to heat some water for his tea, and glanced over at Frink's screen.

- "Good morning, Frink. Homework?"

- "Good morning, Woomert! Yeah, it's our last school assignment before the holidays. I decided to come over to your place to do it, since I figured I'd use Perl for a lot of the repetitive stuff, and didn't have all the modules installed over at my place. It's going pretty well, though; so far, I've just been using the core functions, and haven't needed any modules at all."

- "Hmm. Let me have your laptop a moment, please." Woomert rapidly typed several lines in the console, hit 'Enter', and passed the laptop back to Frink. "There. Not that I don't want you around, partner, but you should have all these modules installed on your machine as well; when you get back home, just check your email and follow the instructions."

Frink glanced at the console, then scrolled up past the long list of module names to see what Woomert had typed.

su -c 'perl -MCPAN -weautobundle; cp /root/.cpan/Bundle/Snapshot*pm ~frink' &&
name=`echo Snapshot*pm`; mail -a $name -s "Modules" frink@frink.com <<!
Dear Frink -

Just run 

perl -MCPAN -e 'install Bundle::${name%.pm}'

as root wherever you save the attached file, and all the modules will be
installed.


Woomert
!

- "Thanks, Woomert! I've used the CPAN module for installing modules from the Comprehensive Perl Archive Network, but I'd obviously missed a bit of its functionality. I guess the 'autobundle' function writes a list of the modules installed on the system, and allows installing an identical list elsewhere?"

- "Bingo." Woomert spooned some pearls of top-grade Taiwanese tea into his cup, warmed it with a quick rinse of the steaming-hot spring water from the tea-kettle, and poured in water until the cup was nearly full. "Clearly, you already realize that anything following the '-e' option in Perl is treated as a command, so I didn't need to use the single quotes around the function name; after that, I simply copied the name of the file into a variable, and sent you an email with the instructions, using the shell's "heredoc" mechanism. Would you like some tea?"

Frink sat up, his eyes gleaming.

- "Woomert, where are we going? Clearly, you have something lined up. You don't usually drink that Ti Kuan Yin unless you're getting ready for action - it's pretty stimulating stuff! - and I noticed that your favorite typing gloves aren't on their rack, which means they're in your travel case and ready for action." He held up a hand to forestall any arguments. "Before you say anything - I'm almost done with my homework. In fact, here..." He bent down to the laptop, clicked a window running a "Vim" session, and issued a few rapid-fire Perl commands. "There, now I'm all done."

Woomert laughed and nodded, acknowledging Frink's point, then turned the laptop a bit so he could see what was going on and typed 'q:' so he could see the recent command history in Vim. The last couple of lines looked like this:

:'<,'>!perl -lane'my$s;$s+=$_ for@F;$t+=$s;$c+=@F;printf"$_: \%.2f\n",$s/@F}{printf"Avg: \%.2f\n",$t/$c'
:'<,'>!perl -walpe'/\$(\d+\.\d+)\s*$/;$s+=$1 if$1;END{printf"\%s\n\%-22s\$\%.2f\n","-"x32,"Total:",$s}'

- "Well, well. That looks quite appropriate for today, considering what we're about to do. Before I get into that, partner, do you want to explain what you've done here? I just want to see if you're ready for the road; we've got quite a challenge ahead of us."

Frink smiled. "Sure, Woomert. I've been slacking off a bit this morning, since I knew I could get this stuff completed in jig time with Perl; as you can see it worked out fine. We were given a problem involving a traveling salesman... no, no - it's not the classic Traveling Salesman Problem!" He laughed. "If I could solve NP-hard problems... well, never mind. Anyway, what we have are the records of a salesman who has been visiting a list of cities, a score for each of his appointments, and the costs associated with the trip. We were supposed to get an average for each city - he made 10 appointments in each one, but some of them didn't happen for various reasons, which resulted in an empty space for that entry - and the overall average; we were also supposed to total up his overall costs. Pretty basic stuff, but, as I'd said, it was the last one before the holidays - the teacher was willing to let us off easy. Here's the first bunch of data:"

  5 1 2   3 1 5   4
5 1 4 2 1 4 1 3 2 5
  1 2   5 2 1 4 1 3
3 4 4 2 3   5 4 4 3
1 3 5   5 2 5   3  
5   4 2 3 1 1 2 1 3
3 1 2 3 3 2 4 5 2 5
1 3 2   1 2 5   3 5
2 3 1 2 3 1 3 2 5  
4 2     1 1 2 4 5 1
4 2   1 2 4 5 2   5
  4 5 4       1 1 2
2 1 3 2 1 3 3   5 5
2 4 5 4 1 4 2   2 5
  3 2 1 3       3 2
  4 2 1 2 1 3 1 2 2
  4 4 2 4 4 2 2 2  
  2 5 5 1     2 5 2
1 2 2 3 5 3 3 5   1
  1 3 1 1 5   5 1 1
4   5 1 5     2 1 1
2 2 3 2 2 3 3 1 5 1
5 2 4   3 2 2 2    
4 2     2 3     4  
4 1 4     5 5 2    
2 2   5     5   1  
5 3   3 5 3 3 1   5
5 2 3   2 5 2 4 1 5
5 2 1 2 1 5 3 1 5 5
2 3 5 5 3 4 1 1 4 4

"Doing the totals by hand would be pretty gruelling, but with Perl, it was a snap - once I did the right thing in Vim, that is. I used 'Shift-V' to select all those lines, then "filtered" them through Perl. That's rather easy in Vim: once you type ':', it will bring up the command line with the markers indicating the visually-selected area ('<,'>). At that point, I type '!', which - given a selection - means 'filter the text', and supply it with the filter - which, in this case, is Perl."

- "Sounds pretty good so far. And what did you do in the script itself?"

- "Oh, that wasn't too tough. Here, I'll walk through it:"

perl -lane'my$s;$s+=$_ for@F;$t+=$s;$c+=@F;printf"$_: \%.2f\n",$s/@F}{printf"Avg: \%.2f\n",$t/$c'
-l	Enable line-end processing - i.e., strip off newlines as they're read, add them when printing
-a	Autosplit each line of input into @F (since "-F" isn't specified, use whitespace as a delimiter)
-n	Loop over the input, one line at a time (non-printing mode)
-e	Execute the list of commands as a script

"By the way, even though it's usually the right thing to do, I didn't use the '-w' switch to avoid warnings about the blank spots (the missed appointments) in the score list. Heck, I know they're not numbers, but I don't need all the noise as output!"

"Now, the script itself: I'm saving the sum of all the numbers in the line in $s, but I don't want the value of $s to persist between lines - otherwise, it'll just keep growing - so I declare it as a private (lexical) variable with the 'my' operator, which will reset its value to 'undef' between iterations. Next, I loop over the result of the 'autosplit' - this is saved in the @F variable - and increment the value in $s for each of the elements. Last, since I need a total score for the overall average, I increment the value in $t by $s for each line, and add the number of elements in @F - remember that blanks are going to be ignored for that purpose, since we're splitting on whitespace - to $c."

- "Wait a minute, Frink. Are you sure? You appear to be adding a array, '@F', to a scalar, $c. Isn't that wrong?"

Frink laughed.

- "Woomert, you're not going to catch me out on anything that basic. Perl is context-sensitive; if I use an array in scalar context - that is, in an operation that requires a scalar, such as addition - then that array will be treated as a scalar, which in the case of arrays returns the count of elements. You taught me that yourself."

Woomert made a gesture indicating both 'Touché!' and an invitation to go on.

"Fine, then. Next, I defined a template for 'printf' and included some simple things that I didn't need to format: the $_ variable, which represents the current line - I needed to print that back out, since '-n' is a non-printing loop - followed by a template marker. Oh, one note here: since I was doing this in Vim, the '%' sign on the command line could cause a bit of conflict (it means 'current filename' in Vim and vi), so I had to escape it by preceding it with a backslash. It doesn't actually matter to Perl, since any escaped character in a double-quoted string is exactly equal to itself, but Vim will now ignore it. The important part was this: the template defined the output - which I wanted to be the ratio of the sum and the count - as a floating-point number with a 2-digit mantissa, which is how we're supposed to present the line averages."

"After that, though, I used a trick that I learned while reading the comp.lang.perl.misc newsgroup, ages ago - since '-n' assumes a while(){} loop around the code, adding a '}' to it just terminates the opening brace at that point. Of course, you now have to "pair" the other brace to avoid a syntax error, but anything after that '{' is executed in the same way that a an empty block following a while loop would be - that is, after all the lines have been read and the loop has terminated."

- "Excellent, Frink! So what do you do at that point?"

- "Oh, that's easy. After all the looping, I just print a separate line showing the average - which I compute as a ratio of $t and $c. Here, I'll run it over the data I just showed you..." Frink quickly ran 'undo' in Vim, reselected the lines of data, pulled up the command from history, and executed it.

  5 1 2   3 1 5   4: 3.00
5 1 4 2 1 4 1 3 2 5: 2.80
  1 2   5 2 1 4 1 3: 2.38
3 4 4 2 3   5 4 4 3: 3.56
1 3 5   5 2 5   3  : 3.43
5   4 2 3 1 1 2 1 3: 2.44
3 1 2 3 3 2 4 5 2 5: 3.00
1 3 2   1 2 5   3 5: 2.75
2 3 1 2 3 1 3 2 5  : 2.44
4 2     1 1 2 4 5 1: 2.50
4 2   1 2 4 5 2   5: 3.12
  4 5 4       1 1 2: 2.83
2 1 3 2 1 3 3   5 5: 2.78
2 4 5 4 1 4 2   2 5: 3.22
  3 2 1 3       3 2: 2.33
  4 2 1 2 1 3 1 2 2: 2.00
  4 4 2 4 4 2 2 2  : 3.00
  2 5 5 1     2 5 2: 3.14
1 2 2 3 5 3 3 5   1: 2.78
  1 3 1 1 5   5 1 1: 2.25
4   5 1 5     2 1 1: 2.71
2 2 3 2 2 3 3 1 5 1: 2.40
5 2 4   3 2 2 2    : 2.86
4 2     2 3     4  : 3.00
4 1 4     5 5 2    : 3.50
2 2   5     5   1  : 3.00
5 3   3 5 3 3 1   5: 3.50
5 2 3   2 5 2 4 1 5: 3.22
5 2 1 2 1 5 3 1 5 5: 3.00
2 3 5 5 3 4 1 1 4 4: 3.20
Avg: 2.86

Woomert applauded quietly.

- "Very, very good! Mr. Ooblick, I do believe you're getting to be quite the adept; your Perl-Fu is growing. What was that next line, then?"

- "Well, this time, I just needed to add up the totals - all the dollar amounts. However, this included a little extra challenge: some of the descriptions of the expenses actually included dollar amounts, but I didn't want to count those, only the actual totals at the end of the lines. Also, not every line ended with a total, some of them had varying amounts of whitespace on the end, and some were completely empty. So, this is what I started with:"

Airfare:
	Flights            $6419.20
	Travel agent's fee  $430.00

Hotels:
	Corpus Christi, TX  $134.81
	Little Rock, AR     $106.48
	Escondido, CA        $63.14
	Santa Rosa, CA      $115.65
	Torrington, CT       $90.84
	Daytona Beach, FL    $99.71
	Fort Myers, FL       $68.03
	Savannah, GA        $130.70
	Des Moines, IA      $127.44
	Northbrook, IL       $98.43
	Summit, IL           $77.73
	Highland, IN         $88.71
	Bowling Green, KY    $60.55
	Lexington, KY       $148.78
	Framingham, MA       $79.33
	Gardner, MA         $147.02
	Belle Mead, NJ      $115.94
	Toms River, NJ       $70.04
	Binghamton, NY      $132.74
	Pendleton, NY       $138.87
	Wickliffe, OH        $70.08
	Bethlehem, PA       $128.59
	Collegeville, PA     $69.75
	San Antonio, TX      $87.77
	Charlottesville, VA $146.58
	Roanoke, VA         $128.79
	Stephens City, VA    $70.40
	Renton, WA          $149.88
	Oconomowoc, WI      $125.79
	Huntington, WV       $74.96

Cars:
	Rental             $3247.34
	Gas/parking         $640.63

Per diem:
	$44.50 x 30        $1335.00

Travel bonus:
	                   $5000.00

"The options to Perl were a little different this time:"

perl -walpe'/\$(\d+\.\d+)\s*$/;$s+=$1 if$1;END{printf"\%s\n\%-22s\$\%.2f\n","-"x32,"Total:",$s}'
-w	Enable warnings
-a	Autosplit each line of input into @F (since "-F" isn't specified, use whitespace as a delimiter)
-l	Enable line-end processing - i.e., strip off newlines as they're read, add them when printing
-p	Loop over the input, one line at a time (printing mode)
-e	Execute the list of commands as a script

"All I did in the main body of the code is use a regular expression that matched a literal dollar sign which was followed by one or more digits, which were followed by a literal period and two more digits. Since the dollar amounts could have been followed by whitespace, I matched zero or more occurrences of it with '\s*' - and that brought me to the end of the line, which 'nailed down' the location of the string I was looking for."

By this point, Frink had clearly warmed up to his teaching role; he was gesturing at the screen in grand style, flushed and smiling with confidence in his ability.

"The important part of that regex, though, is the set of parentheses. They're often used simply for grouping characters into a string, but the interesting part of their behavior is what call "capturing" or "memory parentheses"; anything that is matched within them is "remembered" by Perl in sequentially-named variables starting with $1. In this case, $1 is the only variable we have - but it holds everything necessary, the actual amount following the literal dollar sign. So, in the next statement - '$s+=1 if $1' - I check to see if $1 exists for a given line, and increment the sum variable, $s, by that amount if it does. When I'm finished reading all the lines, I execute the END{} clause - that is, the last thing that the script will do before exiting. In it, I print a line of dashes" - here, Frink smiled bashfully - "I decided to get just a little fancy, Woomert - followed by a newline, a string that said "Total:", and the value of $s preceded by a literal dollar sign. Oh, and I formatted everything with a 'printf' template, much like before. How's that?"

Woomert stood up and slapped Frink on the back, at the same time guiding him toward the door.

- "All I can say, Frink, is that you've come a long, long way since the early days - and I'm proud to have been there for it. I'm glad to have you as a partner.

Now, as to the upcoming case: this is absolutely top secret, "for your eyes only" stuff... oh, and before I forget - "


Correspondent's Note: I'm afraid that my record of this exchange between the Great Internet Detective and his partner cuts off at this point. Perhaps this case, like those of The Mysterious Incident at the Palace or The Networked Diadem, are slated to remain forever hidden from public scrutiny - or, as occasionally happens, it may come to light in some future recount of their adventures.

Talkback: Discuss this article with The Answer Gang


picture

Ben is the Editor-in-Chief for Linux Gazette and a member of The Answer Gang.

Ben was born in Moscow, Russia in 1962. He became interested in electricity at the tender age of six, promptly demonstrated it by sticking a fork into a socket and starting a fire, and has been falling down technological mineshafts ever since. He has been working with computers since the Elder Days, when they had to be built by soldering parts onto printed circuit boards and programs had to fit into 4k of memory. He would gladly pay good money to any psychologist who can cure him of the recurrent nightmares.

His subsequent experiences include creating software in nearly a dozen languages, network and database maintenance during the approach of a hurricane, and writing articles for publications ranging from sailing magazines to technological journals. After a seven-year Atlantic/Caribbean cruise under sail and passages up and down the East coast of the US, he is currently anchored in St. Augustine, Florida. He works as a technical instructor for Sun Microsystems and a private Open Source consultant/Web developer. His current set of hobbies includes flying, yoga, martial arts, motorcycles, writing, and Roman history; his Palm Pilot is crammed full of alarms, many of which contain exclamation points.

He has been working with Linux since 1997, and credits it with his complete loss of interest in waging nuclear warfare on parts of the Pacific Northwest.


Copyright © 2007, Ben Okopnik. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 134 of Linux Gazette, January 2007

<-- prev | next -->
Tux