1) We consider each of the important ideas in Chapter 2 of the CGI Programming text. I assume that you have read the chapter as instructed, and will ask questions as we go along.
2) When we come to a piece of Perl code, we discuss the new constructs introduced therein.
3) I provide Queries to exercise your understanding of key features.
The syllabus says we'll talk about debugging. I did a little of that last time, when showing you how to build a trace-file to store the values that occur during your program. More ideas will come up as we go along.
Here we go:
Environment Variables
CGI scripts can only execute in environments that are set up to support them. Unix is the original model. Environment variables can be considered as global variables of the operating system. The operating system is assisted by the Web server, because CGI is intimately involved with the HTTP (Hypertext Transfer Protocol) which is the basis of the Web.
When an HTTP message arrives at the server, it strips out the message and dumps some of its parts into environment variables. Your CGI script then picks them up and uses them. The program on Page 17, and our similar online example named systell.pl. If you just run the Perl program from a telnet window, you can see that some of the environment variables (those associated with the Web server) have no values.
Checking the Client Browser
On pages 18-19, the book examines one of the environment variables, HTTP_USER_AGENT. The Perl command that is used is as follows:
if ($client_browser =~ /$nongraphics_browsers/)
1) the variable $client_browser already has the contents of $ENV(HTTP_USER_AGENT);
2) the variable $nongraphics_browsers contains the string 'Lynx|CERN-LineMode' and the vertical bar means, to a Perl search, that either the string before or the string after is acceptable.
3) The operator =~ can be read in English as "is contained in". It means to apply the search operation to the named variable instead of the $_ default variable.
The required HTML Header
The demo program at the bottom of page 19 begins with the required line
print "Content-type: text/html", "\n\n";
This produces the content type line and then a blank line. Every HTML document must begin exactly like this.
A Bit of File I/O
The program on page 21 shows some code with this included:
if (open (HTML, "<".$html_document))
{while (<HTML>) (print; } etc ...
The < sign indicates that the file is being opened for reading. The
file name is being provided in a variable; if for instance the filename
was 'mytext.html', the entire parameter would look like "<mytext.html". The "while (<HTML>)" returns a nonzero value (interpreted as True) as long as more lines are available in the file we just opened. The 'print' will, of course, print each line.
Now you may wonder why we want to use a CGI script to print a simple HTML file. As the text makes clear, this program switches between documents, depending on who the caller is.
Simple Interaction: Query Strings
There are two ways to send info along with a CGI call. You can tack
it onto the URL itself, or you can send a whole string of info. First method
is called a GET, and puts the data into the query string which it tacked
onto the URL after a question mark. Like:
http://some.machine/cgi-bin/name.pl?fortuneThe script on page 25 executes either Unix's 'fortune', 'finger' or 'date' services depending on what string is sent to it in the query string which winds up in $ENV('QUERY_STRING').
Note that the test for equality between strings is 'eq', but for numbers it's '=='. I always forget this distinction! Likewise, 'ne' and !=.
The Good Stuff: FORMS
The HTML on the top of page 26 contains the essence of a form. The key elements are the form tag itself, the input tag, and the submit tag. In the form tag, the key information is pretty clear:
- where's the server application? /cgi-bin/unix.pl
- how shall the information be sent? by the GET method
The Input tag is similarly direct. It specifies
- what kind of input gadget should we display? A text window.
- what label should we use for the resulting information? "command"
- how big should the window be? 40 characters.
The submit tag generates a button, with the caption specified by "Value". It is possible (and sometimes very useful) to have more than one submit button. We will revisit this idea later.
If the user typed 'fortune', the query string would come back containing the string
command=fortuneso, all your Perl has gotta do is to split it at the equals sign, and use the parts. The code on page 27 does just that.
($field_name, $command) = split (/=/, $query_string);After this operation, $field_name would contain "command" and $command would contain "fortune". Unfortunate choice of example, but you get the idea.
The Other Way: POST
Data comes as a stream (which might include end-of-lines), instead of
as a line of text.
And it arrives into standard-in, rather than into an environment variable.
The example in the text requires the user to deal with the number of characters
(Content-length), but you can also read STDIN line-by-line until the end
of file.
Query 8.1: At the bottom of Page 28, the author says "... the CGI programmer can't control which method the program will be called by." Criticise this statement. How are CGI scripts usually called? How could a hacker do unexpected things to your CGI script?
Encoded Data
Data that comes via GET is not allowed to contain spaces, end-of-lines or other special characters. It is, after all, supposed to look like a single (sometimes very long) line. So there's an 'escape technique' used. When a three character sequence beginning with a percent sign arrives, it's interpreted as a hexadecimal number and transformed into a character. Thus %2F represents the slash (/), etc.
We will skip all this jazz (pages 31-36) about other languages.
Final Topic: Associative Arrays.
The "Examining Environment Variables" program on page 37 does pretty much what the initial "systell" example did - but it uses a more interesting Perl construct - namely, the "associative array". We normally think of arrays as mappings from a set of integer indices (0,1,..) into a set of values. However, and associative array allows us to use ANYTHING as indices. (Practically speaking, "anything" means "any string.") Some languages call these critters "dictionaries". They are usually implemented using hash-coding, and so some folks call them "hashes".
Associative arrays in Perl are recognized by their names which begin with the percent sign, like %dict. Individual elements in such a dictionary are referred to by the index, in curly brackets, like $dict{"Moshell"}. We don't see that syntax in this example, but it's coming soon.
Look carefully at the program on pages 37 and 38. On 37, the array named %list is initialized with index-value pairs, beginning with 'SERVER_SOFTWARE' and 'The server software is: '. This means that the value of
$list{'SERVER_SOFTWARE'}
is the string
'The server software is: '
Now you can't go through an associative array in "numeric" order, like $list[0], list[1], etc. because you don't know what order they're stored in. But you can get at all the elements, one by one, via a special kind of operator called 'each'. We see it in action on page 38:
while ( ($env_var, $info) = each %list )
{ print $info, "<B>", $ENV ($env_var), "</B>", <BR>", "\n"; }
Query 8.2: Write down the first eight lines of the HTML that this program
would emit.
Query 8.3: Think about this associative array idea. Then write Perl code which would read a file with the following information in it:
name=gloves
size=medium
price=$24.95
stock=34
unit=pair
manufacturer=Corduray Gloves Inc.
.. and store the information into an associative array called %salesItem,
such that
$salesItem{"name"} contains the value "gloves", etc.
We'll discuss your code in class next Tuesday and see if you got it
right. (You can even try it at home on your new Perl system, if you wish.)