$Id: CoreLangParser.lhs,v 1.15 1998/03/19 07:18:13 leavens Exp $

AUTHOR: Gary T. Leavens

This Haskell module defines the lexical and context free grammar,
and a parser for Dave Schmidt's "Core Language"; see chapter 1 of
"The Structure of Typed Programming Languages" (MIT Press, 1994).

> module CoreLangParser where
> import ParserFunctions
> import LexerTools

The semicolon operator in the abstract syntax associates to the right.
That is, c1 `Semi` c2 `Semi` c3 means c1 `Semi` (c2 `Semi` c3).
(This declaration has to be at the top of the module in Haskell.)

> infixr `Semi`

The following infix declarations allow the abstract syntax trees to be
printed out by Haskell in an appealing way.

> infix `Assign`
> infixl `Plus`
> infix `Lt`, `Gt`, `Le`, `Ge`
> infixl `Equals`
> infixl `And`
> infixl `Or`


               MICROSYNTAX (LEXICAL SCANNER)

The microsyntax for the core language is defined by the following
definitions, which use the function "scanning" from the LexerTools module.

> scanner :: [Char] -> [Token]
> scanner = scanning keywords symbols []
> keywords = ["if", "then", "else", "fi",
>             "while", "do", "od",
>             "skip", "not", "true", "false", "and", "or",
>             "loc"]
> symbols = ["(", ")", ":=", ";", "@", "+", "=", "<", ">", "<=", ">="]


                 ABSTRACT SYNTAX

The abstract syntax of the core language is as follows.

> type C = Command
> type E = Expression
> type L = Location
> type N = Numeral
> type B = Boolean
>
> data Command = L `Assign` E | Skip
>	       | C `Semi` C | If E C C
>	       | While E C
> data Expression = Num N | Deref L
>	       | E `Plus` E | E `Lt` E | E `Gt` E | E `Le` E | E `Ge` E
>	       | Not E | E `Equals` E
>              | BoolLit B | And E E | Or E E
> data Location = Loc Integer
> type Numeral = Integer
> type Boolean = Bool


                   CONCRETE SYNTAX

The concrete syntax and parser for the core language is the following.
It uses the parser combinators found in the ParserFunctions module.

Because this is top-down parsing, we have to avoid left recursion.
For example, in command, we can't have the syntax command ::= command ; command
as that would cause looping.  The standard trick is to break the parser
up into levels (see, for example, Aho and Ullman's book).  Here we use stmt
as a command without semicolons, and stmts associate to the right,
due to the use of foldr1.
We add parenthesized commands so that users can group them in other ways
if desired.

> command :: Parser Token Command
> command = (some_sep_by (p_check ";") stmt)          `using` (foldr1 Semi)
>  where stmt = (location $~ (p_check ":=" /$~ expr)) `using` (uncurry Assign)
>            $| p_check "skip"                        `p_return` Skip
>            $| ((p_check "while" /$~ expr) $~
>                (p_check "do" /$~ command) $~/
>                p_check "od")                        `using` (uncurry While)
>            $| ((p_check "if" /$~ expr $~
>                (p_check "then" /$~ command)) $~
>                (p_check "else" /$~ command) $~/
>                 p_check "fi")                  `using` (uncurry (uncurry If))
>            $| (p_check "(" /$~ command $~/ p_check ")")

For expressions, have the following precedence levels and associativity
     operator     precedence   associativity
     ---------------------------------------
     @, not       highest      none
     +                         left
     <, >, >=, <=              none
     =                         left
     and                       left
     or                        left

Note that = and + associate to the left, due to the use of foldl1.
Note that not is followed by a factor, not an expr; this makes the
not bind very tightly to what follows it.
Again, we've added parenthesized expressions to allow the user to make
other groupings.

> expr :: Parser Token Expression
> expr = (some_sep_by (p_check "or") disj)             `using` (foldl1 Or)
>  where disj = (some_sep_by (p_check "and") conj)     `using` (foldl1 And)
>        conj = (some_sep_by (p_check "=") reltd)      `using` (foldl1 Equals)
>        reltd = ((term $~ comp) $~ term)     `using` (\((t1,f),t2) -> f t1 t2)
>               $| term
>        term = (some_sep_by (p_check "+") factor)     `using` (foldl1 Plus)
>        factor = p_Number                             `using` Num
>               $| (p_check "@" /$~ location)          `using` Deref
>               $| (p_check "not" /$~ factor)          `using` Not
>               $| boolean                             `using` BoolLit
>               $| (p_check "(" /$~ expr $~/ p_check ")")
>
>        comp = (p_check "<")                          `p_return` Lt
>            $| (p_check ">")                          `p_return` Gt
>            $| (p_check ">=")                         `p_return` Ge
>            $| (p_check "<=")                         `p_return` Le

> location :: Parser Token Location
> location = (p_check "loc" /$~ p_Number)              `using` Loc

> boolean :: Parser Token Boolean
> boolean = (p_check "true")                         `p_return` True
>        $| (p_check "false")                        `p_return` False


             READER (READ INSTANCE)

The read function is an overloaded function that can be used to read in
commands, expressions, or locations from a String.
This allows you to type at the command prompt things like:
     (read "skip" :: Command)
and have an abstract syntax tree for the command returned.

> instance Read Command where
>    readsPrec n input = first_parse_by command input

> instance Read Expression where
>    readsPrec n input = first_parse_by expr input

> instance Read Location where
>    readsPrec n input = first_parse_by location input

The helping function first_parse_by takes the first parse
that eats all of the input.  If no such parse is available, it fails.

> first_parse_by :: Parser Token a -> Parser Char a
> first_parse_by = first_all_consuming scanner

> first_all_consuming :: (String -> [Token]) -> Parser Token a -> Parser Char a
> first_all_consuming scanner p input =
>       let parses = p (scanner input)
>       in case filter (\(t,rest) -> rest == []) parses of
>            ((lt,[]):_) -> [(lt,[])]
>            _ -> []