COP 4020 Lecture -*- Outline -*- * Natural Language Parsing and Grammars (9.4 and 9.5) ** flexible parsing of ambiguous grammars Relational programs can be written to parse ambiguous grammars No grammar restrictions But if the grammar is very ambiguous, this can be very slow... *** encoding of grammars as relations ------------------------------------------ PARSING PROTOCOL For each nonterminal NT: write a procedure {NT Input ?Unparsed} that takes input list and bind unparsed part Example: ::= lives becomes proc {IntransVerb Input ?Unparsed} Input=lives|Unparsed end Parsing the whole thing: {SolveAll proc {$ Rest} {Sentence [mary lives] Rest} Rest=nil end ------------------------------------------ Q: How would you write a parser for ::= ? proc {Sentence Input ?Unparsed} Rest in {NounPharse Input Rest} {VerbPhrase Rest Unparsed} end ------------------------------------------ PARSING CHOICES ::= man | woman becomes proc {Noun Input ?Unparsed} choice Input=man|Unparsed [] Input=woman|Unparsed end end ------------------------------------------ Q: How would you write a parser for ::= | ? proc {VerbPhrase I ?U} choice local R in {TransVerb I R} {NounPhrase R U} end [] {IntransVerb I U} end end *** creating parse trees ------------------------------------------ MAKING PARSE TREES Besides checking for validity, return parse tree (data structure). Parsers for nonterminals are now functions that - takes subtrees (or placeholders) - takes input list - bind unparsed part - returns new parse tree ------------------------------------------ ------------------------------------------ EXAMPLE WITH PARSE TREES % An grammar for simple expressions. % The parsers return parse trees. Compare figures 9.5 and 9.6 of CTM. % ::= % | % | 'if' 'then' 'else' % | 'if' 'then' % ::= '+' | '*' | '-' | '>' % ::= | declare fun {Expression I ?U} choice {BasicExp I U} [] local R1 R2 LeftTree OpTree RightTree in LeftTree={BasicExp I R1} OpTree={Operator LeftTree RightTree R1 R2} RightTree={Expression R2 U} OpTree end [] local K1 K2 K3 R1 R2 TestTree ThenTree ElseTree in {Check 'if' I K1} TestTree={Expression K1 R1} {Check 'then' R1 K2} ThenTree={Expression K2 R2} {Check 'else' R2 K3} ElseTree={Expression K3 U} ifExp(TestTree ThenTree ElseTree) end [] local K1 K2 R TestTree ThenTree in {Check 'if' I K1} TestTree={Expression K1 R} {Check 'then' R K2} ThenTree={Expression K2 U} ifExp(TestTree ThenTree numExp(0)) end end end fun {Operator LeftTree RightTree I ?U} choice {Check '+' I U} plusExp(LeftTree RightTree) [] {Check '*' I U} timesExp(LeftTree RightTree) [] {Check '-' I U} subExp(LeftTree RightTree) [] {Check '>' I U} gtExp(LeftTree RightTree) end end fun {BasicExp I ?U} What in I=What|U choice {IsAtom What}=true idExp(What) [] {IsNumber What}=true numExp(What) end end % A helping procedure to check for reserved words proc {Check KW I ?U} I=KW|U end ------------------------------------------ This is in ExpressionParser.oz See also ExpressionParserTest.oz ------------------------------------------ SOME TESTS SHOWING AMBIGUITY \insert 'ExpressionParser.oz' \insert 'SolveAll.oz' \insert 'TestingNoStop.oz' % A helper function for tests fun {Parses LoA} {SolveAll proc {$ PTree} PTree={Expression LoA nil} end} end {Test {Parses [3]} '==' [numExp(3)]} {Test {Parses [foo]} '==' [idExp(foo)]} {Test {Parses [foo '+' 3]} '==' [plusExp(idExp(foo) numExp(3))]} {Test {Parses [7 '*' foo '+' 3]} '==' [timesExp(numExp(7) plusExp(idExp(foo) numExp(3)))]} {Test {Parses ['if' 6 '>' 7 '*' foo 'then' 3 'else' bar]} '==' [ifExp(gtExp(numExp(6) timesExp(numExp(7) idExp(foo))) numExp(3) idExp(bar))]} {Test {Parses ['if' baz 'then' 4 'else' bar]} '==' [ifExp(idExp(baz) numExp(4) idExp(bar))]} % The classic dangling-else ambiguity {Test {Parses ['if' 6 '>' 7 '*' foo 'then' 'if' baz 'then' 4 'else' bar]} '==' [ifExp(gtExp(numExp(6) timesExp(numExp(7) idExp(foo))) ifExp(idExp(baz) numExp(4) numExp(0)) idExp(bar)) ifExp(gtExp(numExp(6) timesExp(numExp(7) idExp(foo))) ifExp(idExp(baz) numExp(4) idExp(bar)) numExp(0))]} ------------------------------------------ See ExpressionParserTest.oz This is essentially a unification grammar, or definite clause grammar ** grammar interpreter (9.5) Q: Can we make a generic parser that takes the grammar as data? Sure, See figure 9.8, although the data isn't too pleasant...