$Id: WhileNParser.lhs,v 1.7 1998/03/19 07:18:13 leavens Exp $

AUTHOR: Gary Leavens

This Haskell module defines the lexical and context free grammar,
and a parser for the language "WhileN" of Chapter 2 of Dave Schmidt's book
"The Structure of Typed Programming Languages" (MIT Press, 1994).
In essence, this is from section 2.15, although locations are added to allow
parsing of examples from earlier in the chapter before variable names
are added.
Recursive abstractions are not supported, although they may be easy to add.

Note that the numeral domain doesn't contain identifier expressions;
this is because the parser has to decide where to put identifiers used
as expressions; we choose to make them fun invocations, not const invocations.
Because of this, the syntax for const declarations is wider than it should be;
hence the type rules will have to check that the expression has one of the
allowed forms, const I = N or const I = X.  (Optionally, one could allow
const declarations to be eagerly-evaluated expression abstractions.)

(See WhileNUnParser for the unparser for this grammar.)

> module WhileNParser where
> import ParserFunctions
> import LexerTools

The semicolon operator in the abstract syntax associates to the right.
That is, c1 `Semi` c2 `Semi` c3 means c1 `Semi` (c2 `Semi` c3).
(This declaration has to be at the top of the module in Haskell.)

> infixr `Semi`

Similarly, the semicolon and comma operators for declarations
associate to the right.

> infixr `Then`
> infixr `Also`

The dot operator for identifier expressions associates to the left,
as does the + operator for expressions.

> infixl `Dot`
> infixl `Plus`
> infixl `Equals`

The following aren't associative, but it's nice to have Haskell print them
as infix anyway.

> infix `In`


                  MICROSYNTAX (LEXICAL GRAMMAR)

The microsyntax is defined by the following definition,
which uses the function "scanning" from the LexerTools module.

> scanner :: [Char] -> [Token]
> scanner = scanning keywords symbols []
>    where keywords = ["if", "then", "else", "fi",
>                      "while", "do", "od",
>                      "skip", "not", "true", "false",
>                      "loc", "in", "alias", "fun", "const", "proc",
>                      "var", "class", "module", "import",
>                      "newint", "record", "end", "call"]
>          symbols = ["(", ")", "{", "}", ":=", ";", ",", "@", "+", ".", "-",
>                     "=", ":"]


                  ABSTRACT SYNTAX

The abstract syntax of the core language is as follows.

> type P = Program
> type D = Declaration
> type T = TypeStructure
> type C = Command
> type E = Expression
> type L = Variable
> type X = IdentifierExpr
> type I = Identifier
> type N = Numeral
> type B = Bool

> data Program = D `In` C
> data Declaration =
>        D `Also` D | D `Then` D
>      | Fun I E | Const I E | Proc I C
>      | Var I T | Alias I L | Class I T | Module I D
>      | Import X 
> data TypeStructure = NewInt | Record D
>                    | TId X
> data Command = L `Assign` E | Skip
>	       | C `Semi` C | If E C C
>	       | While E C | Call X
> data Expression = Num N | Deref L
>	       | E `Plus` E
>	       | Not E | E `Equals` E
>              | Ident X | BoolLit B
> data Variable = Varref X | Loc Integer
> data IdentifierExpr = Name I | X `Dot` I
> type Identifier = String
> type Numeral = Integer
> type Boolean = Bool


                   CONCRETE SYNTAX (PARSER)

The concrete syntax and parser for the core language is the following.
It uses the parser combinators found in the ParserFunctions module.

> program :: Parser Token Program
> program = (declaration $~ (p_check "in" /$~ command)) `using` (uncurry In)

For declarations, semicolon (;) binds less tightly than comma (,),
and both bind less tightly than other kinds of declaration forms.
We add parentheses to allow the specification of arbitrary trees.

> declaration :: Parser Token Declaration
> declaration = (some_sep_by (p_check ";") rds)       `using` (foldr1 Then)
>  where rds = (some_sep_by (p_check ",") decl)       `using` (foldr1 Also)
>        decl = ((p_check "fun" /$~ p_Id) $~
>                (p_check "=" /$~ expr))              `using` (uncurry Fun)
>            $| ((p_check "const" /$~ p_Id) $~
>                (p_check "=" /$~  expr))             `using` (uncurry Const)
>            $| ((p_check "proc" /$~ p_Id) $~
>                (p_check "=" /$~  command))          `using` (uncurry Proc)
>            $| ((p_check "var" /$~ p_Id) $~
>                (p_check ":" /$~  typeStruct))       `using` (uncurry Var)
>            $| ((p_check "alias" /$~ p_Id) $~
>                (p_check "=" /$~  variable))         `using` (uncurry Alias)
>            $| ((p_check "class" /$~ p_Id) $~
>                (p_check "=" /$~  typeStruct))       `using` (uncurry Class)
>            $| ((p_check "module" /$~ p_Id) $~
>                (p_check "=" /$~ p_check "{" /$~
>                 declaration $~/ p_check "}"))       `using` (uncurry Module)
>            $| (p_check "import" /$~ identifierExpr) `using` Import
>            $| (p_check "(" /$~ declaration $~/ p_check ")")

> typeStruct :: Parser Token TypeStructure
> typeStruct = (p_check "newint")                  `p_return` NewInt
>           $| (p_check "record" /$~ declaration $~/
>               p_check "end")                        `using` Record
>           $| identifierExpr                         `using` TId

In parsing commands, we use stmt as a command without semicolons,
and make stmts associate to the right (by using folrdr1).
This makes semicolon (;) bind less tightly than other statement forms.

> command :: Parser Token Command
> command = (some_sep_by (p_check ";") stmt)          `using` (foldr1 Semi)
>  where stmt = (variable $~ (p_check ":=" /$~ expr)) `using` (uncurry Assign)
>            $| p_check "skip"                     `p_return` Skip
>            $| ((p_check "while" /$~ expr) $~
>                (p_check "do" /$~ command) $~/
>                p_check "od")                        `using` (uncurry While)
>            $| ((p_check "if" /$~ expr $~
>                (p_check "then" /$~ command)) $~
>                (p_check "else" /$~ command) $~/
>                 p_check "fi")                  `using` (uncurry (uncurry If))
>            $| (p_check "call" /$~ identifierExpr)    `using` Call
>            $| (p_check "(" /$~ command $~/ p_check ")")

For expressions, we make "=" have lowest precedence, and "+" next highest,
with "@" and "not" binding more tightly still.
Note that "=" and "+" associate to the left, due to the use of foldl1.
Note that not is followed by a factor, not an expr; this makes the
not bind tightly to what follows it.
Again, we've added parenthesized expressions to allow the user to make
other groupings.

> expr :: Parser Token Expression
> expr = (some_sep_by (p_check "=") term)              `using` (foldl1 Equals)
>  where term = (some_sep_by (p_check "+") factor)     `using` (foldl1 Plus)
>        factor = numeral                              `using` Num
>               $| (p_check "@" /$~ variable)          `using` Deref
>               $| (p_check "not" /$~ factor)          `using` Not
>               $| identifierExpr                      `using` Ident
>               $| boolean                             `using` BoolLit
>               $| (p_check "(" /$~ expr $~/ p_check ")")

> variable :: Parser Token Variable
> variable = (p_check "loc" /$~ p_Number)              `using` Loc
>          $| identifierExpr                           `using` Varref

> numeral :: Parser Token Numeral
> numeral = ((p_check "-") /$~ p_Number)               `using` negate
>        $| p_Number

> boolean :: Parser Token Boolean
> boolean = (p_check "true")                        `p_return` True
>        $| (p_check "false")                       `p_return` False

> identifierExpr :: Parser Token IdentifierExpr
> identifierExpr = (some_sep_by (p_check ".") p_Id)
>                               `using` (\(id:ids) -> foldl Dot (Name id) ids)


             READER (READ INSTANCE)


The read function is an overloaded function that can be used to read in
commands, expressions, or locations from a String.

> instance Read Program where
>    readsPrec n = first_parse_by program

> instance Read Declaration where
>    readsPrec n = first_parse_by declaration

> instance Read TypeStructure where
>    readsPrec n = first_parse_by typeStruct

> instance Read Command where
>    readsPrec n = first_parse_by command

> instance Read Expression where
>    readsPrec n = first_parse_by expr

> instance Read IdentifierExpr where
>    readsPrec n = first_parse_by identifierExpr

The helping function first_parse_by takes the first parse
that eats all of the input.  If no such parse is available, it fails.

> first_parse_by :: Parser Token a -> Parser Char a
> first_parse_by = first_all_consuming scanner

> first_all_consuming :: (String -> [Token]) -> Parser Token a -> Parser Char a
> first_all_consuming scanner p input =
>       let parses = p (scanner input)
>       in case filter (\(t,rest) -> rest == []) parses of
>            ((lt,[]):_) -> [(lt,[])]
>            _ -> []