COP 4020 Lecture 5 Overview Syntax 1/23/02 syntax is the specification of what is valid in a language, while semantics is the meaning of expressions in a language. Strings in a language are called sentences or statements. The smallest unit of a language is called a lexeme. lexemes can be combined to make statements. A token is a category of lexemes. For example, 24 is a lexeme that is an integer literal token. A language recognizer is a tool that can read in a statement and decide whether it belongs in a language L or not. A language generator is a tool that can simply generate valid statements in a language L. It turns out that most programming languages are context free grammars. (In the book it is also referred to as Backus-Naur form.) A CFG has four components: 1) an alphabet 2) a start symbol 3) a list of variables 4) a list or productions or rules. The alphabet is the set of valid lexemes. A language is simply a set of strings, where a string is the concatenation of 0 or more elements of the alphabet. Here is an example grammar: alphabet = {a, b} Start symbol = S Variables = {A, B} Rules: S -> aA | Bb A -> Ab | a B -> Ba | b Now, this set or rules is essentially a generator of the grammer. You follow the rules like so: S -> aA -> aAb -> aAbb -> aabb Every time you have a variable in a production, you can substitute for it using all the given rules. Eventually you'll get rid of all the variables. What is left is a string in the language. If there exists a method to follow the given rules and come up with a string w, then w is in the language described by the grammar. Based upon the derivation of a string, a parse tree for that derivation can be drawn. (This was done in class.) If a string in a grammar has two different parse trees, then the grammar is ambiguous. This is because the semantics of a grammar come from the parse tree of an expression in the grammar. If the expression has two different parse trees, that means it can be interpreted in two different ways. Several ambiguous grammars can be turned into equivalent unambiguous grammars. Examples are in the text.