All
computer languages are translated from something that tends to be easy for a
human to understand (
source
code
)into
something that is executed on a computer (
machine
instructions).
Traditionally, translators fall into two classes:
interpreters
and
compilers.
Interpreters
An
interpreter translates source code into activities (which may comprise groups
of machine instructions) and immediately executes those activities. BASIC, for
example, has been a popular interpreted language. Traditional BASIC
interpreters translate and execute one line at a time, and then forget that the
line has been translated. This makes them slow, since they must re-translate
any repeated code. BASIC has also been compiled, for speed. More modern
interpreters, such as those for the Perl language, translate the entire program
into an intermediate language which is then executed by a much faster interpreter
[14].
Interpreters
have many advantages. The transition from writing code to executing code is
almost immediate, and the source code is always available so the interpreter
can be much more specific when an error occurs. The benefits often cited for
interpreters are ease of interaction and rapid development (but not necessarily
execution) of programs.
Interpreted
languages often have severe limitations when building large projects (Perl
seems to be an exception to this). The interpreter (or a reduced version) must
always be in memory to execute the code, and even the fastest interpreter may
introduce unacceptable speed restrictions. Most interpreters require that the
complete source code be brought into the interpreter all at once. Not only does
this introduce a space limitation, it can also cause more difficult bugs if the
language doesn’t provide facilities to localize the effect of different
pieces of code.
Compilers
A
compiler translates source code directly into assembly language or machine
instructions. The eventual end product is a file or files containing machine
code. This is an involved process, and usually takes several steps. The
transition from writing code to executing code is significantly longer with a
compiler.
Depending
on the acumen of the compiler writer, programs generated by a compiler tend to
require much less space to run, and they run much more quickly. Although size
and speed are probably the most often cited reasons for using a compiler, in
many situations they aren’t the most important reasons. Some languages
(such as C) are designed to allow pieces of a program to be compiled
independently. These pieces are eventually combined into a final
executable
program by a tool called the
linker.
This process is called
separate
compilation.
Separate
compilation has many benefits. A program that, taken all at once, would exceed
the limits of the compiler or the compiling environment can be compiled in
pieces. Programs can be built and tested a piece at a time. Once a piece is
working, it can be saved and treated as a building block. Collections of tested
and working pieces can be combined into
libraries
for use by other programmers. As each piece is created, the complexity of the
other pieces is hidden. All these features support the creation of large programs
[15].
Compiler
debugging
features have improved significantly over time. Early compilers only generated
machine code, and the programmer inserted print statements to see what was
going on. This is not always effective. Modern compilers can insert information
about the source code into the executable program. This information is used by
powerful
source-level
debuggers
to show exactly what is happening in a program by tracing its progress through
the source code.
Some
compilers tackle the compilation-speed problem by performing
in-memory
compilation.
Most compilers work with files, reading and writing them in each step of the
compilation process. In-memory compilers keep the program in RAM. For small
programs, this can seem as responsive as an interpreter.
The
compilation process
To
program in C and C++, you need to understand the steps and tools in the
compilation process. Some languages (C and C++, in particular) start
compilation by running a
preprocessor
on the source code. The preprocessor is a simple program that replaces patterns
in the source code with other patterns the programmer has defined (using
preprocessor
directives).
Preprocessor directives are used to save typing and to increase the readability
of the code (Later in the book, you’ll learn how the design of C++ is
meant to discourage much of the use of the preprocessor, since it can cause
subtle bugs). The pre-processed code is often written to an intermediate file.
Compilers
usually do their work in two passes. The first pass
parses
the pre-processed code. The compiler breaks the source code into small units
and organizes it into a structure called a
tree.
In the expression “
A
+ B
”
the elements ‘
A’,
‘
+’
and ‘
B’
are leaves on the parse tree.
A
global
optimizer
is sometimes used between the first and second passes to produce smaller,
faster code.
In
the second pass, the
code
generator
walks through the parse tree and generates either assembly language code or
machine code for the nodes of the tree. If the code generator creates assembly
code, the assembler must then be run. The end result in both cases is an object
module
(a file that typically has an extension of
.o
or
.obj).
A
peephole
optimizer
is sometimes used in the second pass to look for pieces of code containing
redundant assembly-language statements.
The
use of the word “object”
to describe chunks of machine code is an unfortunate artifact. The word came
into use before object-oriented programming was in general use.
“Object” is used in the same sense as “goal” when
discussing compilation, while in object-oriented programming it means “a
thing with boundaries.”
The
linker
combines a list of object modules into an executable program that can be loaded
and run by the operating system. When a function in one object module makes a
reference to a function or variable in another object module, the linker
resolves these references – it makes sure that all the external functions
and data you claimed existed during compilation actually do exist.
The linker also adds a special object module to perform start-up activities.
The
linker can search through special files called
libraries
in order to resolve all its references. A library
contains a collection of object modules in a single file. A library is created
and maintained by a program called a
librarian.
Static
type checking
The
compiler performs
type
checking
during the first pass. Type checking tests for the proper use of arguments in
functions, and prevents many kinds of programming errors. Since type checking
occurs during compilation rather than when the program is running, it is called
static
type checking.
Some
object-oriented languages (notably Java)
perform some type checking at run-time (
dynamic
type checking
).
If combined with static type checking, dynamic type checking
is more powerful than static type checking alone. However, it also adds
overhead to program execution.
C++
uses static type checking because the language cannot assume any particular
run-time support for bad operations. Static type checking notifies the
programmer about misuses of types during compilation, and thus maximizes
execution speed. As you learn C++ you will see that most of the language design
decisions favor the same kind of high-speed, production-oriented programming
the C language is famous for.
You
can disable static type checking in C++. You can also do your own dynamic type
checking – you just need to write the code.
[14]
The boundary between compilers and interpreters can tend to become a bit
fuzzy, especially with Perl, which has many of the features and power of a
compiled language but the quick turnaround of an interpreted language.
[15]
Perl is again an exception, since it also provides separate compilation.