MFC Programmer's SourceBook : Thinking in C++

All computer languages are translated from something that tends to be easy for a human to understand ( source code ) into something that is executed on a computer ( machine instructions). Traditionally, translators fall into two classes: interpreters and compilers.

Interpreters

An interpreter translates source code into activities (which may comprise groups of machine instructions) and immediately executes those activities. BASIC, for example, has been a popular interpreted language. Traditional BASIC interpreters translate and execute one line at a time, and then forget that the line has been translated. This makes them slow, since they must re-translate any repeated code. BASIC has also been compiled, for speed. More modern interpreters, such as those for the Perl language, translate the entire program into an intermediate language which is then executed by a much faster interpreter [14].

Interpreters have many advantages. The transition from writing code to executing code is almost immediate, and the source code is always available so the interpreter can be much more specific when an error occurs. The benefits often cited for interpreters are ease of interaction and rapid development (but not necessarily execution) of programs.

Interpreted languages often have severe limitations when building large projects (Perl seems to be an exception to this). The interpreter (or a reduced version) must always be in memory to execute the code, and even the fastest interpreter may introduce unacceptable speed restrictions. Most interpreters require that the complete source code be brought into the interpreter all at once. Not only does this introduce a space limitation, it can also cause more difficult bugs if the language doesn’t provide facilities to localize the effect of different pieces of code.

Compilers

A compiler translates source code directly into assembly language or machine instructions. The eventual end product is a file or files containing machine code. This is an involved process, and usually takes several steps. The transition from writing code to executing code is significantly longer with a compiler.

Depending on the acumen of the compiler writer, programs generated by a compiler tend to require much less space to run, and they run much more quickly. Although size and speed are probably the most often cited reasons for using a compiler, in many situations they aren’t the most important reasons. Some languages (such as C) are designed to allow pieces of a program to be compiled independently. These pieces are eventually combined into a final executable program by a tool called the linker. This process is called separate compilation.

Separate compilation has many benefits. A program that, taken all at once, would exceed the limits of the compiler or the compiling environment can be compiled in pieces. Programs can be built and tested a piece at a time. Once a piece is working, it can be saved and treated as a building block. Collections of tested and working pieces can be combined into libraries for use by other programmers. As each piece is created, the complexity of the other pieces is hidden. All these features support the creation of large programs [15].

Compiler debugging features have improved significantly over time. Early compilers only generated machine code, and the programmer inserted print statements to see what was going on. This is not always effective. Modern compilers can insert information about the source code into the executable program. This information is used by powerful source-level debuggers to show exactly what is happening in a program by tracing its progress through the source code.

Some compilers tackle the compilation-speed problem by performing in-memory compilation. Most compilers work with files, reading and writing them in each step of the compilation process. In-memory compilers keep the program in RAM. For small programs, this can seem as responsive as an interpreter.

The compilation process

To program in C and C++, you need to understand the steps and tools in the compilation process. Some languages (C and C++, in particular) start compilation by running a preprocessor on the source code. The preprocessor is a simple program that replaces patterns in the source code with other patterns the programmer has defined (using preprocessor directives). Preprocessor directives are used to save typing and to increase the readability of the code (Later in the book, you’ll learn how the design of C++ is meant to discourage much of the use of the preprocessor, since it can cause subtle bugs). The pre-processed code is often written to an intermediate file.

Compilers usually do their work in two passes. The first pass parses the pre-processed code. The compiler breaks the source code into small units and organizes it into a structure called a tree. In the expression “ A + B ” the elements ‘ A’, ‘ +’ and ‘ B’ are leaves on the parse tree.

A global optimizer is sometimes used between the first and second passes to produce smaller, faster code.

In the second pass, the code generator walks through the parse tree and generates either assembly language code or machine code for the nodes of the tree. If the code generator creates assembly code, the assembler must then be run. The end result in both cases is an object module (a file that typically has an extension of .o or .obj). A peephole optimizer is sometimes used in the second pass to look for pieces of code containing redundant assembly-language statements.

The use of the word “object” to describe chunks of machine code is an unfortunate artifact. The word came into use before object-oriented programming was in general use. “Object” is used in the same sense as “goal” when discussing compilation, while in object-oriented programming it means “a thing with boundaries.”

The linker combines a list of object modules into an executable program that can be loaded and run by the operating system. When a function in one object module makes a reference to a function or variable in another object module, the linker resolves these references – it makes sure that all the external functions and data you claimed existed during compilation actually do exist. The linker also adds a special object module to perform start-up activities.

The linker can search through special files called libraries in order to resolve all its references. A library contains a collection of object modules in a single file. A library is created and maintained by a program called a librarian.

Static type checking

The compiler performs type checking during the first pass. Type checking tests for the proper use of arguments in functions, and prevents many kinds of programming errors. Since type checking occurs during compilation rather than when the program is running, it is called static type checking.

Some object-oriented languages (notably Java) perform some type checking at run-time ( dynamic type checking ). If combined with static type checking, dynamic type checking is more powerful than static type checking alone. However, it also adds overhead to program execution.

C++ uses static type checking because the language cannot assume any particular run-time support for bad operations. Static type checking notifies the programmer about misuses of types during compilation, and thus maximizes execution speed. As you learn C++ you will see that most of the language design decisions favor the same kind of high-speed, production-oriented programming the C language is famous for.

You can disable static type checking in C++. You can also do your own dynamic type checking – you just need to write the code.

[14] The boundary between compilers and interpreters can tend to become a bit fuzzy, especially with Perl, which has many of the features and power of a compiled language but the quick turnaround of an interpreted language.

The process of language translation

Interpreters

Compilers

The compilation process

Static type checking