The University of Queensland Homepage
School of ITEE ITEE Main Website

 COMP4403/COMP7402 - Compilers and Interpreters

Last updated: Sun 27 Nov 2011 17:09:10 GMT.

Welcome to the COMP4403/7402 (Compilers and Interpreters) course homepage. This page contains links to course materials and important announcements. It will be updated throughout the semester, so be sure to visit it regularly.

Course Profiles

THIS IS LAST YEAR'S COURSE PROFILE. This year's will be similar but the dates etc will change. The course profiles contains vital information about COMP4403/7402, including course aims, syllabus, and assessment requirements.
The course profiles for COMP4403 and COMP7402 differ in the assessment requirements.

General Course Information

Course Newsgroup: uq.itee.comp4403

This is a valuable forum for discussion of COMP4403/7402 course material. If you're having difficulties this is a good place to get help from both the teaching staff and other students.

For people who want nicer newsgroup reading than my.UQ, the news server is at (you will need to connect using ssl):

    sslnews.eait.uq.edu.au
Some instructions (about news groups) can be found here:

Contacting the Teaching Staff

If you have a problem you wish to discuss with someone, but it is inappropriate for posting to the newsgroup, get in touch with Ian.

Contact Details

Professor Ian Hayes Office: 78-326 Ph:(07) 3365-2386 Email: ianh@itee.uq.edu.au
Consultation: I'm generally available for consultation when in my office and not otherwise engaged. Around assignment deadlines I'll provide a list of times when I'll be in my office via the course news group. You can also email me to arrange a specific consultation time.

Course Overview

The invention of (high-level) programming languages in the 1950s was one of the most significant advances for making computers easier to use --- compare writing programs in machine or assembly language with, say, Java. Implementing programming languages was a major challenge taken on in the 1960s and 1970s that led to the development of modern day compilers and the theory and practice that we will study in this course. A significant advantage of using a programming language over machine/assembly language is machine independence.

To ensure that the implementations of a programming language on different machines are consistent one needs a machine-independent definition of the language. To achieve this we first define the syntax of the language (i.e., what are the legal strings in the language) and then their semantics (or meaning). The syntax of the language is defined via a grammar written in terms of the basic symbols (such as keywords, identifiers, and operators), which are called lexical tokens and the tokens are defined in terms of regular expressions over character strings. Not all strings allowed by the grammar are valid programs in the language because for exmaple, identifiers cannot be multiply defined within a single scope and expressions must be of the correct type for their context. We use a set of type checking rules to define the programs that are well formed. Finally we need to define what the result of an expression evaluation should be and what the effect of executing a statement should be.

A programming language is implemented by either a compiler or interpreter. These are themselves both just (rather sophisticated) programs. A compiler implements a programming language by translating a high-level program (e.g., a Java program represented simply as a text file (which is itself represented as a stream of characters (which are represented by bits))) to a machine-level program (represented as a sequence of machine instructions (e.g., Pentium machine instructions (that are themselves represented as bits))). [Aside: you have to be able to handle lots of nesting in this course.]

The tasks undertaken by a compiler can be divided into the following.

  • Recognising the input program and detecting errors in the program, which in turn is composed of
    • lexical analysis: recognising the basic symbols (such as keywords, identifiers and operators) of the language, and
    • parsing: recognising the structure of programs composed of these basic symbols.
  • Constructing an abstract internal representation of the program in terms of
    • an abstract syntax tree, and
    • a symbol table.
  • Checking the well formedness of a program according to its static semantics, e.g., checking that assignment statements are type correct.
  • Generating code, which can include changing the representation of the program to forms more suitable for code generation and code optimisation (both machine independent at the source code level and machine dependent at the target machine code level).
The generated machine code can then be executed on the target machine to run the program. Due to time constraints within the course we don't spend a lot of time on code optimisation.

An interpreter is a simpler (and usually less efficient) way of implementing a programming language. It acts in the same way as a compiler in the early phases, but does not have a code generation phase. Instead it directly interprets its internal representation of the program. The main advantage of interpreters over compilers is that they do not depend on the target machine, and hence are more portable than compilers. Interpreters are used for simple languages, like operating system command shells, where the overheads of generating machine code outweigh the slower execution time of the interpreted program, or for very high-level languages (e.g., functional and logic programming languages) where the individual operations are so powerful that the execution-time cost of interpretation over compilation is not all that significant. In this course we focus on compiling to a simple interpreted stack machine (similar to but simpler than the Java Virtual Machine (JVM)).

The language recognition phases of compilers are common to many applications of computers that involve a language of some form. For example, a web broswer recognises a document written in the Hyper Text Markup Language (HTML) and interprets it by generating a display of the document on the screen.

Early compilers were hand-crafted programs designed to implement a particular language for a particular target machine. However, in the late 1960s it was recognised that many of the aspects of writing a compiler were not dependent on the particular programming language being compiled. In particular, the early phases of recognising a program and creating an abstract representation of it were dependent mainly on the grammar of the language. The result was the creation of compiler generators (lexical analyser and parser generators really). A parser generator is yet another program that accepts a description of a programming language (in the form of a grammar for the language) and generates a program, that is a parser for that language. Modern compiler generators are often implemented as suites of cooperating tools that generate compiler components from a range of notations, each of which specifies a different aspect of a language (and its implementation). These activities include not only lexical analysis and parsing, but also error checking and even code generation.