Digital Media

Moshell - Spring 99

Lecture 2: Introducing CGI Programming and PERL

This lecture is based on Chapters 1, 2 and 3 of the Castro text: "PERL and CGI for the World Wide Web". This text is designed for folks who have not programmed before, and so you should be able to inhale it in great chunks.

Flash - I just got word that the bookstore doesn't HAVE IT! Oh curses and abominations. I'll have to provide you with photocopies of the early chapters, but that makes the read-ahead process pretty lame for Tuesday, doesn't it? Everyone should DASH RIGHT OUT and order the sucker through Amazon.com. We gotta have the book to proceed.

JMM

Before we begin with content, some process matters:

1. Circulation of an attendance list and seating chart, so I can find out who's really here and needs an account.

2. Announcements of various sorts:
3. CGI Buddies. Everyone who already has TELNET and FTP please raise your hands. Everyone needs to note the location of one such person. Also all please exchange phone numbers with at least 2 fellow classmates. When you're not here, don't call me to ask what's up - call your CGI Buddy (and look on the Web!)

Now, on with PERL and CGI.

The Upside:

PERL: A language specialized in text processing; that's exactly what is needed to process HTML.

PERL is public-domain (in fact it's a classic example of Internet Culture: there's a large body of folks who maintain and extend PERL for free.). Its syntax somewhat resembles that of the C language. PERL is usually interpreted rather than compiled. PERL 4.0 and up have object oriented features, like most modern languages.

PERL runs on many platforms but its basic "mentality" is oriented towards Unix systems. We'll be running it on a Sun Sparcstation this semester. In fact upwards of 80% of all the traffic that runs through the Internet runs on Sun systems (according to Sun.) This is because they've long specialized in Internet, and because the Solaris operating system running on Sun hardware is the nearest thing to failure-free software we know of.

Appendix D (Page 239) of the Castro book contains a reference for the most common Unix commands.

CGI: Common Gateway Interface - a protocol, or agreed-upon set of ways of communicating between a client (your PC) and a server (usually a script running under the care of a Web server (e. g. Apache) on an Internet Service Provider's computer.)

HTML: We assume you know the fundamentals of HTML. If not, go get a book.

The Downside:

PERL tries but does not always succeed in giving comprehensible error messages. Many times I've found it necessary to "back up" one version in my development, because some syntax error is producing screwy error messages. This is only possible if you get into the habit of maintaining a version of source code for every change. Seems wierd but it works.

Here's how I do it. Let's say I'm building a server named JoeBeans. My first version is named Joebeanszz.pl. I upload that to the server, try it, note the mistakes. Then I open up Joebeanszz.pl (I usually use MS Word for editing PERL) and save it immediately as Joebeanszy.pl. Then I make the corrections, upload it and continue.

Why work backwards? Because the most recent version is always on top of the list. I'll be using FTP to send it up to the server for testing, and FTP sorts its lists (more often than Windows does, for sure!)

FTP? Huh?

Two tools you will need to immediately acquire (if you don't have 'em) and learn how to use, are Telnet and FTP. Telnet is built into windows; just open the RUN window and type Telnet. (Before you do this, start up your Internet connection.) Then specify othrys.dml.cs.ucf.edu (or another server whose name I'll provide to you). Telnet makes your PC into a remote terminal for the Unix host.

FTP (File Transfer Protocol) is how you copy files from your PC to the Unix host, or vice versa.

If you don't have (or don't know if you have) these tools, grab one of the people who will raise their hands when I ask who does have 'em. Invite them over for dinner, compliment their skills and get their help in setting you up.
 

CHAPTER 1: Getting Started With PERL

First fact: PERL is Case Sensitive. So variables $mike and $Mike are different!

Variables: Scalars look like $address (they start with a dollar sign.) They are untyped - that is, you can assign integers, real numbers, text strings or logical values to any variable. Actually we use 1 for True and 0 for False. $ looks like S which stands for Scalar.

Array variables start out with @, like @thislist. Surprisingly we don't use many arrays. When they are used, they are often called lists. @ looks like a which stands for array.

Hashes, or associative arrays, are the meat-and-potatoes of PERL. A hash is in fact an associative array - that is, a kind of lookup table in which the usual role of the subscript (in a list) is replaced by a key. The key can be a number or a string. Hash variables begin with %, as in %NameTable. % has two parts, to remind us of the value and key in each pair in the hash.

The Whole versus the Part. The above conventions for @ and % are used when we refer to the WHOLE array at once (e. g. when we're searching it or doing other things to all the data.) If we have a hash named %FORM, we access individual elements by references like $FORM{"Moshell"}. This is because one element of an array is really a scalar, represented by the "S" in $. If there is no data value associated with the key Moshell, the value returned is just the null string.

Operators are pretty numerous. The obvious + - * / are joined by others that search strings, substitute stuff in strings, etc. String concatenation (very important) is represented by the period. So that if

$firstname="Mike";
$secondname="Moshell";
print $firstname.$secondname;

You would see

MikeMoshell

on the output file.

Query 2.1: Concoct a print statement using the variables $firstname and $secondname which would emit

Mike Moshell

with a nice space between the names. But don't put a space into either variable ($firstname or $secondname.)

Boolean Operators are very important, and tricky because string operators are eq, ne, lt, gt, ge, le - whereas numeric operators are ==, !=, <, <=, >, >=. The logical connectors are ! (not), && (and), || (or). Parentheses are always a good idea.

Functions are usually named with lowercase strings, as in shift(@somelist).

Quote Marks are also very important. If I want to print out 'Mike', I would say

print "'Mike'";

If I wanted to print "Mike", I'd say

print '"Mike"';

That is, each kind of quote mark can "capture" the other. But they aren't identical. Consider the following

$name="Herman";
print "This guy is named $name";

the result would be

This guy is named Herman

Whereas if I did this:

print 'This guy is named $name';

what I would see is

This guy is named $name

Newline Characters must be explicitly supplied, or the next print statement will pile up on the previous one. So we normally terminate each print statement with "\n", like this:

print "This guy is named $name\n";

And lastly: Comments begin with a pound sign # and run for the rest of the line.

Escapes. You saw an example here of an "escape sequence" in which the backslash is used to modify the meaning of a character. If we want to put anything that's not alphanumeric into a string, this is the best way to do it. There are a bunch of special escapes in PERL, particularly for string matching. I haven't seen a good summary of them in this text, so I'll copy a table from the previous (out of print) text for you.

CHAPTER 2: CREATING A PERL SCRIPT

Scripts are interpreted by the Unix op system, and so the first line must always tell Unix what kind of script it is (if it's not in the default scripting language of the "shell" or user interface (hah!) manager being used.) Perl scripts begin with

#!/usr/local/bin/perl

-which tells Unix to go to that directory, execute perl, and let it interpret the rest of the script.

Our author calls this the "shebang" line, which I like.

Creating output for the Web Browser

Unix programs use "standard input" and "standard output" as very friendly conventions for where data comes from and goes to. The basic flow of info in a Web transaction is like this:

Your PC:Netscape  --> Your PC: Internet connection --> ISP:TCP/IP server --> Web Server (e. g. Apache) --> Standard In of your script.

The reverse path is, well, the reverse of this. The standard out of your script is routed right back to your browser, which presumably has just done something to request input - so it's expecting a flow of characters. We'll look at how this whole transaction is handled, later.

Part of the MIME convention for e-mail is used here to tell the browser what's coming, as follows:

print "Content-type: text/html\n\n";
print "Aha, hi there world!\n";

This Content stuff must ALWAYS be the first line of your script. Whatever follows, gets interpreted. Plain naked text just gets displayed.

Trying out the script

FTP the sucker up there, into public_html, cgi_bin or whatever subdirectory you are directed to use, by your internet service provider. (If it's a secure site it won't be public_html, but details differ by ISP.)

Give the file an extension of .pl or .cgi. Normally I save .cgi for compiled C files which I'm using for CGI trickery, and .pl for the ones that are in PERL.

UNIX Permissions must be set properly. The text describes how to do this with the chmod operator, as follows (assuming your script is myscript.pl)

chmod 755 myscript.pl

Then type ls - l (that's ell ess minus ell) to see a listing of the whole directory with permissions. Owner Group EveryoneElse is three octal digits, interpreted as follows:

4=read
2=write
1=execute

Add em up. For instance, 4+1=5 which means "read and execute" privileges. 7="everything" privileges.

Testing the Script.

The best thing to do at this point is to use your Telnet connection and just type the name of the script, like

>myscript.pl

You will see the first few syntax errors, if you made any, come up on the screen. If none, you will see standard output come up on the screen, like

Content-type: text/html.
Aha, hi there world!

REALLY testing the script.

Just get your browser running and type the path to the script, like this:

www.dml.cs.ucf.edu/~joebloe/myscript.pl

And maybe it'll work. If not, where's your CGI BUDDY?

Queries for Lecture 2

Query 2.2: List the relational operators for PERL numbers and strings.
Query 2.3: List the PERL logical connectives
Query 2.4: Put some basic HTML tags into your test program and get it running on the Digital Media server.
 

CHAPTER 3: FORMS

At this point we explored Chapter 3 in the Castro book, via the mechanism of looking at the source HTML for the forms in an online registration I currently maintain at www.regmaster.com/chi99.html. You are invited to explore that link.

The tags we discussed were

<form ...> which declares where the server is and what kind of protocol (GET or POST) is to be used
<input ...> which produces text windows
<select ...> which produces pulldown menus

<input type=checkbox ...> which is obviously a checkbos
<input type="hidden" > which produces values as though an input had occurred.
<input type="submit" value="CONTINUE"> which produces a control button labelled CONTINUE, which when clicked acutally submits the form to the server.

Standard form stuff. Your responsibility is to be able to use any tag found in the above example, in your own code when we get to that point.

Back to previous lecture
Forward to next lecture
Back to the Index
Back to the Syllabus