MFC Programmer's SourceBook : Thinking in C++
Bruce Eckel's Thinking in C++, 2nd Ed Contents | Prev | Next

Tools for separate compilation

Separate compilation is particularly important when building large projects. In C and C++, a program can be created in small, manageable, independently tested pieces. The most fundamental tool for breaking a program up into pieces is the ability to create named subroutines or subprograms. In C and C++, a subprogram is called a function, and functions are the pieces of code that can be placed in different files, enabling separate compilation. Put another way, the function is the atomic unit of code, since you cannot have part of a function in one file and another part in a different file – the entire function must be placed in a single file (although files can and do contain more than one function).

When you call a function, you typically pass it some arguments, which are values you’d like the function to work with during its execution. When the function is finished, you typically get back a return value , a value that the function hands back to you as a result. It’s also possible to write functions that take no arguments and return no values.

To create a program with multiple files, functions in one file must access functions and data in other files. When compiling a file, the C or C++ compiler must know about the functions and data in the other files, in particular their names and proper usage. The compiler ensures that functions and data are used correctly. This process of “telling the compiler” the names of external functions and data and what they should look like is called declaration. Once you declare a function or variable, the compiler knows how to check to make sure it is used properly.

Declarations vs. definitions

It’s important to understand the difference between declarations and definitions because these terms will be used precisely throughout the book. Essentially all C and C++ programs require declarations. Before you can write your first program, you need to understand the proper way to write a declaration.

A declaration introduces a name – an identifier – to the compiler. It tells the compiler “this function or this variable exists somewhere, and here is what it should look like.” A definition, on the other hand, says: “make this variable here” or “make this function here.” It allocates storage for the name. This meaning works whether you’re talking about a variable or a function; in either case, at the point of definition the compiler allocates storage. For a variable, the compiler determines how big that variable is and causes space to be generated in memory to hold the data for that variable. For a function, the compiler generates code, which ends up occupying storage in memory.

You can declare a variable or a function in many different places, but there must only be one definition in C and C++ (this is sometimes called the ODR: one-definition rule ). When the linker is uniting all the object modules, it will usually complain if it finds more than one definition for the same function or variable.

A definition can also be a declaration. If the compiler hasn’t seen the name x before and you define int x; , the compiler sees the name as a declaration and allocates storage for it all at once.

Function declaration syntax

A function declaration in C and C++ gives the function name, the argument types passed to the function, and the return value of the function. For example, here is a declaration for a function called func1( ) that takes two integer arguments (integers are denoted in C/C++ with the keyword int) and returns an integer:

int func1(int,int);

The first keyword you see is the return value, all by itself: int. The arguments are enclosed in parentheses after the function name, in the order they are used. The semicolon indicates the end of a statement; in this case, it tells the compiler “that’s all – there is no function definition here!”

C and C++ declarations attempt to mimic the form of the item’s use. For example, if a is another integer the above function might be used this way:

a = func1(2,3);

Since func1( ) returns an integer, the C or C++ compiler will check the use of func1( ) to make sure that a can accept the return value and that the arguments are appropriate.

Arguments in function declarations may have names. The compiler ignores the names but they can be helpful as mnemonic devices for the user. For example, we can declare func1( ) in a different fashion that has the same meaning:

int func1(int length, int width);

A gotcha

There is a significant difference between C and C++ for functions with empty argument lists. In C, the declaration:

int func2();

means “a function with any number and type of argument.” This prevents type-checking, so in C++ it means “a function with no arguments.”

Function definitions

Function definitions look like function declarations except they have bodies. A body is a collection of statements enclosed in braces. Braces denote the beginning and ending of a block of code. To give func1( ) a definition which is an empty body (a body containing no code), write this:

int func1(int length, int width) { }

Notice that in the function definition, the braces replace the semicolon. Since braces surround a statement or group of statements, you don’t need a semicolon. Notice also that the arguments in the function definition must have names if you want to use the arguments in the function body (since they are never used here, they are optional).

Variable declaration syntax

The meaning attributed to the phrase “variable declaration” has historically been confusing and contradictory, and it’s important that you understand the correct definition so you can read code properly. A variable declaration tells the compiler what a variable looks like. It says “I know you haven’t seen this name before, but I promise it exists someplace, and it’s a variable of X type.”

In a function declaration, you give a type (the return value), the function name, the argument list, and a semicolon. That’s enough for the compiler to figure out that it’s a declaration, and what the function should look like. By inference, a variable declaration might be a type followed by a name. For example:

int a;

could declare the variable a as an integer, using the above logic. Here’s the conflict: there is enough information in the above code for the compiler to create space for an integer called a, and that’s what happens. To resolve this dilemma, a keyword was necessary for C and C++ to say “this is only a declaration; it’s defined elsewhere.” The keyword is extern. It can mean the definition is external to the file, or that the definition occurs later in the file.

Declaring a variable without defining it means using the extern keyword before a description of the variable, like this:

extern int a;

extern can also apply to function declarations. For func1( ), it looks like this:

extern int func1(int length, int width);

This statement is equivalent to the previous func1( ) declarations. Since there is no function body, the compiler must treat it as a function declaration rather than a function definition. The extern keyword is thus superfluous and optional for function declarations. It is probably unfortunate that the designers of C did not require the use of extern for function declarations; it would have been more consistent and less confusing (but would have required more typing, which probably explains the decision).

Here are some more examples of declarations:

//: C03:Declare.cpp
// Declaration & definition examples
extern int i; // Declaration without definition
extern float f(float); // Function declaration

float b;  // Declaration & definition
float f(float a) {  // Definition
  return a + 1.0;
}

int i; // Definition
int h(int x) { // Declaration & definition
  return x + 1;
}

int main() {
  b = 1.0;
  i = 2;
  f(b);
  h(i);
} ///:~ 

In the function declarations, the argument identifiers are optional. In the definitions, they are required. This is true only in C, not C++.

Including headers

Most libraries contain significant numbers of functions and variables. To save work and ensure consistency when making the external declarations for these items, C and C++ use a device called the header file. A header file is a file containing the external declarations for a library; it conventionally has a file name extension of ‘h’, such as headerfile.h. (You may also see some older code using different extensions like .hxx or .hpp, but this is becoming very rare.)

The programmer who creates the library provides the header file. To declare the functions and external variables in the library, the user simply includes the header file. To include a header file, use the #include preprocessor directive. This tells the preprocessor to open the named header file and insert its contents where the include statement appears. Files may be named in an include statement in two ways: in double quotes, or in angle brackets ( < > ). File names in double quotes, such as:

#include "local.h"

tell the preprocessor to search the current directory for the file and report an error if the file does not exist. File names in angle brackets tell the preprocessor to look through a search path specified in the environment or on the compiler command line. The mechanism for setting the search path varies between machines, operating systems and C++ implementations, and may require some investigation on your part.

To include the iostream header file, you say:

#include <iostream>

The preprocessor will find the iostream header file (often in a subdirectory called “include”) and insert it.

Standard C++ include format

As C++ evolved, different compiler vendors chose different extensions for file names. In addition, various operating systems have different restrictions on file names, in particular on name length. These issues caused source-code portability problems. To smooth over these rough edges, the standard uses a format that allows file names longer than the notorious eight characters and eliminates the extension. For example, instead of the old style of including iostream.h, which looks like this:

#include <iostream.h>

you can now say:

#include <iostream>

The translator can implement the include statements in a way to suit the needs of that particular compiler and operating system, if necessary truncating the name and adding an extension. Of course, you can also copy the headers given you by your compiler vendor to ones without extensions if you want to use this style before a vendor has provided support for it.

The libraries that have been inherited from C are still available with the traditional ‘ .h’ extension. However, you can also use them with the more modern C++ include style by prepending a “ c” before the name. Thus:

#include <stdio.h>
#include <stdlib.h> 

Become:

#include <cstdio>
#include <cstdlib> 

And so on, for all the Standard C headers. This provides a nice distinction to the reader indicating when you’re using C versus C++ libraries.

Linking

The linker collects object modules (which often use file name extensions like .o or .obj), generated by the compiler, into an executable program the operating system can load and run. It is the last phase of the compilation process.

Linker characteristics vary from system to system. Generally, you just tell the linker the names of the object modules and libraries you want linked together, and the name of the executable, and it goes to work. Some systems require you to invoke the linker yourself. With most C++ packages you invoke the linker through the C++ compiler. In many situations, the linker is invoked for you, invisibly.

Some older linkers won’t search object files and libraries more than once, and they search through the list you give them from left to right. This means that the order of object files and libraries can be important. If you have a mysterious problem that doesn’t show up until link time, one possibility is the order in which the files are given to the linker.

Using libraries

Now that you know the basic terminology, you can understand how to use a library. To use a library:

  1. Include the library’s header file
  2. Use the functions and variables in the library
  3. Link the library into the executable program
These steps also apply when the object modules aren’t combined into a library. Including a header file and linking the object modules are the basic steps for separate compilation in both C and C++.

How the linker searches a library

When you make an external reference to a function or variable in C or C++, the linker, upon encountering this reference, can do one of two things. If it has not already encountered the definition for the function or variable, it adds the identifier to its list of “unresolved references.” If the linker has already encountered the definition, the reference is resolved.

If the linker cannot find the definition in the list of object modules, it searches the libraries. Libraries have some sort of indexing so the linker doesn’t need to look through all the object modules in the library – it just looks in the index. When the linker finds a definition in a library, the entire object module, not just the function definition, is linked into the executable program. Note that the whole library isn’t linked, just the object module in the library that contains the definition you want (otherwise programs would be unnecessarily large). If you want to minimize executable program size, you might consider putting a single function in each source code file when you build your own libraries. This requires more editing [16], but it can be helpful to the user.

Because the linker searches files in the order you give them, you can pre-empt the use of a library function by inserting a file with your own function, using the same function name, into the list before the library name appears. Since the linker will resolve any references to this function by using your function before it searches the library, your function is used instead of the library function.

Secret additions

When a C or C++ executable program is created, certain items are secretly linked in. One of these is the startup module, which contains initialization routines that must be run any time a C or C++ program begins to execute. These routines set up the stack and initialize certain variables in the program.

The linker always searches the standard library for the compiled versions of any “standard” functions called in the program. Because the standard library is always searched, you can use anything in that library by simply including the appropriate header file in your program – you don’t have tell it to search the standard library. The iostream functions, for example, are in the Standard C++ library. To use them, you just include the <iostream> header file.

If you are using an add-on library, you must explicitly add the library name to the list of files handed to the linker.

Using plain C libraries

Just because you are writing code in C++, you are not prevented from using C library functions. In fact, the entire C library is included by default into Standard C++. There has been a tremendous amount of work done for you in these functions, so they can save you a lot of time.

This book will use Standard C++ (and thus also Standard C) library functions when convenient, but only standard library functions will be used, to ensure the portability of programs. In the few cases where library functions must be used that are not in the C++ standard, all attempts will be made to use POSIX-compliant functions. POSIX is a standard based on a Unix standardization effort which includes functions that go beyond the scope of the C++ library. You can generally expect to find POSIX functions on Unix (in particular, Linux) platforms, and often under DOS/Windows.


[16] I would recommend using Perl to automate this task as part of your library-packaging process.

Contents | Prev | Next


Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru