COP 4020 Lecture 2 Overview Basics of Architecture, Compiler Design(1.4 - 1.8) & Some Semantic Issues(5.1 - 5.3) Influences on Language Design ----------------------------- von Neumann architecture is very well-suited for imperative languages. data and programs are stored in the same memory, while the CPU is separate from the memory. For each instruction to execute, it along with the necessary data must be loaded into the CPU, executed and then generally some information must be stored back into memory. Central features of imperative languages that the von Neumann architecture supports well: 1) variables 2) assignment statements 3) iteration In particular, the standard archtecture does not efficiently handle lots of recursion. Program Design Methodologies ---------------------------- In the early 70s the bulk of cost moved from hardware to software, so people analyzed software more. More complex problems were being solved by programmers...this led to more emphasis on top-down design and stepwise refinement. Deficiencies in PLs of the time: 1) incomplete type checking 2) lack of control statements In the late 70s, the analysis shifted from a process to a data oriented approach. This is where Abstract Data Types(ADTs) were first looked at. In the early 80s, research shifted to object-oriented designs. Language Categories ------------------- Most PLs fit into one of four categories: 1) Imperative (what you are used to, algorithmically orientedly, lots of detail, control structs, functions, variables, etc.) Our example language : C 2) Functional (symbolic manipulation through lists and lots of recursion) Our example language : ML 3) Logic (A rule based language, where you follow productions) Our example language : Prolog 4) Object Oriented (everything uses classes, inheritance must be present) Our example language : C++ (though technically, this is just object based) Language Design Tradeoffs ------------------------- Note that some goals in a PL are at odds with each other. For example, any time you make a PL more reliable, it is probably going to be more costly. Similarly, if you try to cut costs, the PL will probably be less reliable. In general, when deciding what capabilities a language will have, you will always have tradeoffs to weigh. Implementation methods ---------------------- There are layers of software that run a computer. The lowest level of code is machine code. After that there is a macroinstruction interpreter the translates OS code into machine code. On top of that, there are compilers of various different types that convert from a high level language, eventually to machine code. These compilers allow the machine to become a number of different virtual machines. Process of Compilation ---------------------- 1) lexical analyzer 2) syntax analyzer 3) intermediate code generation + type checking 4) optimization -> produce intermediate code 5) final code generation Typically, this code must be linked to prewritten code and other user's code as well. See the picture on page 28 of the text. Examples: C, Pascal Process of Interpretation ------------------------- Grab some statements from the source while "executing" and convert to machine code and execute. Continue doing this until you are done running the program. Examples: UNIX shellscripts, Javascript Interpretation is slower than executing a compiled program. The middle ground between these two options isa hybrid implementation system, such as Perl or Java. ---------------------------------------------------------------------- Here rather than producing machine code when "compiling", only intermediate code is produced. In Java, this intermediate code is JVM, which is very similar to assembly language. This aspect of Java makes it portable because JVM files can be interpreted on any computer with a Java interpreter. Fundamental Semantic Issues --------------------------- Variables have 6 attributes: 1) name 2) address 3) value 4) type 5) lifetime 6) scope Name ---- Concerns about variable names: How long should they be allowed to be? Do you allow connector characters? Should they be case sensitive? What about reserved words? Difference between a keyword and a reserved word: A reserved word only has one meaning, but a keyword only takes its reserved meaning in certain situations. Address ------- Not as easy to determine as they seem because the same variable can reside in different address during the run of a program and different variables can be stored in the same address at different time during the run of a program. sometimes the address is known as the l-value of a variable. Aliases ------- Two or more names for the same memory address. In the past the justifications for these was to save memory. Nowadays since memory is fairly plentiful, that particular justification does not hold. Type ---- Determines the range and set of operations defined on the variable Value ----- We think of floating pt. numbers as one memory location, but really they take up 4 contiguous bytes. As for other data structures, they may take up some other number of contiguous bytes. The is sometimes called r-value.