Digital Media
Moshell -Spring 98
Lecture 18
Databases - From S. Chapter 9
Chapter 9 covers four topics, but we will pay attention to only one and a half of them: databases, and searching with fgrep (a Unix utility.) Andrew Ziekowitz already talked about sendmail, and we don't need to look at Man pages.

SQL is the standard query language for relational databases. Here are some typical commands:

To define a table with columns named Player, Years, etc:
create table Player_Info
{
    Player            character varying (30) not null,
    Years            integer,
    Points            integer,
    Rebounds      integer,
    Assists           integer,
    Championships    integer
};
To select all the rows (records):
select *
    from Player_Info;

To select one column
select Player
    from Player_Info;

To select one column and include only specific rows:

select Player
    from Player_Info
    where (Points > 25);

To select records having a text field that includes some string:

select *
    from Player_Info
        where Player LIKE '% Johnson';

To change a value in one or more records:

update Player_Info
    set Championships = 1
    where Player = "Shaquille O'Neal";

To drop a column from the database

alter table Player_Info
    drop column Championships;

etc, etc. We now know enough to use the small SQL DBMS (Database Management System) named Sprite, which the author provides us.

Self-Declaring Data

A very useful technique is introduced on page 227, without much fanfare. The "employee database" consists of rows of text with individual fields delimited by commas. The first row defines the column heads. Sprite therefore uses the first row to decide how many columns to put into the database, and how to label them. In this example, the author chooses not to allow this degree of flexibility, for security reasons.

Unlike more general DBMS, Sprite treats all fields as text. Here's some example data

Last, First, Job_Title, Department, EMail, Phone
Moshell, Michael, Professor, Computer Science, moshell@cs.ucf.edu,407 823 5341
many more rows would follow here.

The Example
 
The Employee Database will allow the user to specify any combination of the six fields above, and will then list all the records that match those fields. It begins with a simple form, which is parsed via &parse_form_data into an associative array named %FORM, into buckets named $FORM{"First"} etc.

We'll put the program together so we can see it.

# Employee Database Example from S-Text
#!/usr/local/bin/perl5
use Sprite;
$webmaster="shishir\@bu\.edu";
$query = undef;

&parse_form_data(*FORM);
$fields = "(Last|First|Job_Title|Department|EMail|Phone)";    # Will be used for security checking in a bit.

foreach $key (keys %FORM)
{
    if (($key !~ /\b$fields\b/o) || $FORM{$key} =~ /[^\w\-\(\) ]/) )
                # IF key isn't in the fields list, or FORM contains any non-alphanumerics
                # \b=word boundary; /$fields/o means scan fields; ^=beginning of line;\w=alphanumeric
                # also accepts (,),- and space.
         { &return_error(500,"Bad person!","Junk in the form.")}
    else
         {
        $FORM{$key} =~ s/(\W)/\\$1/g;    # Escape any (,),- or blanks in the FORM data
                                                                # \W means any NON-alphanumeric. complement of \w
                                                                # Notice use of the $1 for "the thing we matched, put ithere.".
        if ($FORM{$key}                            # If it's non-null at this point
            {                                                    # Building the query
                $query = join (" and ", $query, "($key =~ /$FORM{$key}/i)");    # What if you used ' not "?
            }
        }
 }
 
# At this point the query will look something like this:
# and (Last =~ /Martin/i) and (First =~ /Robert/i) and (Department =~ /Sales/i)
# So we can see that it's not REALLY SQL, more like SQperl.
# We have to trim off that first 'and'.
if ($query)    # For a non-null query:
{$query =~ s?^ and //;    # Query 18.1: Explain this simple operation
}
    else {return_error (500, "Whoa", "No query was entered.");}

#Query 18.2: Will this query also match Martinski? Why or why not?

# Alternative code for "Building the query" above:

    $FORM($key) = s/([' "])/\\$1/g;    # Query 18.3: So what does THAT do?
                                                        # (aside: Why would there be quotes in $FORM?)
        $query = join (" and ",$query, "($key = '$FORM{$key}'););
                # Query 18.4: What is the difference in this and the "building the query" line above?

# Now for the direct call to the database system:

    $rdb = new Sprite ();    # Instantiate an object
    $rdb->set_delimiter ("Read, ",")    # Read the DB, look for commas between fields
    @data = $rdb->dsql("select * from phone.db where $query");

# Data array comes back with first field =1 to indicate success, 0 for failure.
# We assume that the query asked for professors.
# $data[0] = 1
# $data[1] = Moshell\0Michael\0Professor\0Computer Science\0moshell@cs.ucf.edu\0407 823 5341
#etc

$status = shift (@data);     # Strip off the first element into a separate variable.
$no_elements = @ data;    # This returns the number of elements left in the array.

# Do error messages if $status=0 or $no_elements=0
# I skip 'em here.
#There is now some code to format the contents of @data and print it out as HTML.
# I don't understand how $_ the default variable gets data from @data. Anyone?

-------------------

Example 2: A Search Engine (pages 255 forward.)

You may (you SHOULD) be asking, "why do you need a search engine, when you have database capability?" Well, the obvious reason would be that you might have a large and dynamically growing Web structure (say, the maintenance manuals for your company's products). It consists of hundreds of HTML files that change daily. You want to be able to quickly produce a list of URLs to link to any file referring to, say,"carburetor".

The example uses a Unix utility named fgrep. A simple input  form provides asingle query string. Any non-alphanumeric characters result in an error.

Here's the essential code:

$document_root = $ENV{'DOCUMENT_ROOT'} ; # Root directory of where script is located

# Skipping the error checking. Prepare a header

print "Content-type: text/html", "\n\n";
print "<HTML>","\n";
print "<HEAD><TITLE>Search Results</TITLE></HEAD>";
print "<BODY>","\n";
print "<H1> Results of searching for: ",$query, $</Hl>";
print "<HR>";

open(SEARCH, "$fgrep -A2 -B2 -i -n -e $query $document_root/* |");
# A2, B2: two lines before and after the target
# -i: ignore case. -n: Display line numbers. -s: suppress error messages

Assume that the text we were seeking was "Moshell" and the body of text looked like this

Once there was a man
with too much legs, who
was his own greatest fan
Moshell, they called him
and it was said, that they
broke the mold before they
made him.

The output will look like:

/abc/cde/filename.abc-57-with too much legs, who
/abc/cde/filename.abc-58-was his own greatest fan
/abc/cde/filename.abc-59-Moshell, they called him
/abc/cde/filename.abc-60-and it was said, that they
/abc/cde/filename.abc-61-broke the mold before they

We will look at these pages of code via photocopy of the text pages.

The desired output is going to be an HREF to the file, and the above running text is included in the output, with the match-line in bold. (Page 260).

Query 18.5: It is clear that the use of Regular Expressions is the key to much of this Perl stuff.
Make sure you understand all the RE's used in this section. Make a personal dictionary of each
of the string match operators (use your web-sources for Perl if you don't know some of them.)