A
tiny C-like library
A
small library usually starts out as a collection of functions, but those of you
who have used third-party C libraries know that there’s usually more to
it than that because there’s more to life than behavior, actions and
functions. There are also characteristics (blue, pounds, texture, luminance),
which are represented by data. And when you start to deal with a set of
characteristics in C, it is very convenient to clump them together into a
struct,
especially if you want to represent more than one similar thing in your problem
space. Then you can make a variable of this
struct
for each thing.
Thus,
most C libraries have a set of
structs
and a set of functions that act on those
structs.
As an example of what such a system looks like, consider a programming tool
that acts like an array, but whose size can be established at run-time, when it
is created. I’ll call it a
CStash.
Although it’s written in C++, it has the style of what you’d write
in C:
//: C04:Lib.h
// Header file for a C-like library
// Array-like entity created at run-time
typedef struct CStashTag {
int size; /* Size of each space */
int quantity; /* Number of storage spaces */
int next; /* Next empty space */
/* Dynamically allocated array of bytes: */
unsigned char* storage;
} CStash;
void initialize(CStash* s, int size);
void cleanup(CStash* s);
int add(CStash* s, void* element);
void* fetch(CStash* s, int index);
int count(CStash* s);
void inflate(CStash* s, int increase);
/* ///:~ */
The
tag name for
the
struct
is generally used in case you need to reference the
struct
inside
itself. For example, when creating a linked list, you need a pointer to the next
struct.
But almost universally in a C library you’ll see the
typedef
as shown above, on every
struct
in the library. This is done so you can treat the
struct
as if it were a new type and define variables of that
struct
like this:
Note
that the function declarations use the Standard C style of function
prototyping, which is much safer and clearer than the “old” C
style. You aren’t just introducing a function name; you’re also
telling the compiler what the argument list and return value look like.
The
storage
pointer is an
unsigned
char*
.
This is the smallest piece of storage a C compiler supports, although
on some machines it can be the same size as the largest. It’s
implementation dependent. You might think that because the
CStash
is designed to hold any type of variable, a
void*
would be more appropriate here. However, the purpose is not to treat this
storage as a block of some unknown type, but rather as a block of contiguous
bytes.
The
source code for the implementation file (which you may not get if you buy a
library commercially – you might get only a compiled OBJ or LIB or DLL,
etc.) looks like this:
/*: C04:Lib.c {O}
Implementation of example C library */
/* Declare structure and functions: */
#include "Lib.h"
/* Error testing macros: */
#include <assert.h>
/* Dynamic memory allocation functions: */
#include <stdlib.h>
#include <string.h> /* memcpy() */
#include <stdio.h>
void initialize(CStash* s, int sz) {
s->size = sz;
s->quantity = 0;
s->storage = 0;
s->next = 0;
}
void cleanup(CStash* s) {
if(s->storage) {
puts("freeing storage");
free(s->storage);
}
}
int add(CStash* s, void* element) {
/* enough space left? */
if(s->next >= s->quantity)
inflate(s, 100);
/* Copy element into storage,
starting at next empty space: */
memcpy(&(s->storage[s->next * s->size]),
element, s->size);
s->next++;
return(s->next - 1); /* Index number */
}
void* fetch(CStash* s, int index) {
if(index >= s->next || index < 0)
return 0; /* Not out of bounds? */
/* Produce pointer to desired element: */
return &(s->storage[index * s->size]);
}
int count(CStash* s) {
/* Number of elements in CStash */
return s->next;
}
void inflate(CStash* s, int increase) {
void* v =
realloc(s->storage,
(s->quantity + increase)
* s->size);
/* Was it successful? */
assert(v != 0);
s->storage = (unsigned char*)v;
s->quantity += increase;
} /* ///:~ */
initialize( )
performs the necessary setup for
struct
CStash
by setting the internal variables to appropriate values. Initially, the
storage
pointer is set to zero, and the
size
indicator is also zero – no initial storage is allocated.
The
add( )
function inserts an element into the
CStash
at the next available location. First, it checks to see if there is any
available space left. If not, it expands the storage using the
inflate( )
function, described later.
Because
the compiler doesn’t know the specific type of the variable being stored
(all the function gets is a
void*),
you can’t just do an assignment, which would certainly be the convenient
thing. Instead, you must use the Standard C library function
memcpy( )
to copy the variable byte-by-byte. The first argument is the destination
address where
memcpy( )
is to start copying bytes. It is produced by the expression:
&(s->storage[s->next
* s->size])
This
indexes from the beginning of the block of storage to the
next
available piece. This number, which is simply a count of the number of pieces
used plus one, must be multiplied by the number of bytes occupied by each piece
to produce the offset in bytes. This doesn’t produce the address, but
instead the byte at the address. To produce the address, you must use the
address-of operator
&.
The
second and third arguments to
memcpy( )
are the starting address of the variable to be copied and the number of bytes
to copy, respectively. The
next
counter is incremented, and the index of the value stored is returned, so the
programmer can use it later in a call to
fetch( )
to select that element.
fetch( )
checks
to see that the index isn’t out of bounds and then returns the address of
the desired variable, calculated the same way as it was in
add( ). count( )
may look a bit strange at first to a seasoned C programmer. It seems like a lot
of trouble to go through to do something that would probably be a lot easier to
do by hand. If you have a
struct
CStash
called
intStash,
for example, it would seem much more straightforward to find out how many
elements it has by saying
intStash.next
instead of making a function call (which has overhead) like
count(&intStash).
However, if you wanted to change the internal representation of
CStash
and thus the way the count was calculated, the function call interface allows
the necessary flexibility. But alas, most programmers won’t bother to
find out about your “better” design for the library. They’ll
look at the
struct
and grab the
next
value directly, and possibly even change
next
without your permission. If only there were some way for the library designer
to have better control over things like this! (Yes, that’s foreshadowing.)
Dynamic
storage allocation
You
never know the maximum amount of storage you might need for a
CStash,
so the memory pointed to by
storage
is allocated from the
heap.
The heap
is a big block of memory used for allocating smaller pieces at run-time. You
use the heap when you don’t know the size of the memory you’ll need
while you’re writing a program. That is, only at run-time will you find
out that you need space to hold 200
airplane
variables instead of 20. Dynamic-memory allocation functions are part of the
Standard C library and include
malloc( ),
calloc( ),
realloc( ),
and
free( ). The
inflate( )
function uses
realloc( )
to get a bigger chunk of space for the
CStash.
realloc( )
takes as its first argument the address of the storage that’s already
been allocated and that you want to resize. (If this argument is zero –
which is the case just after
initialize( )
has
been called – it allocates a new chunk of memory.) The second argument is
the new size that you want the chunk to be. If the size is smaller,
there’s no chance the block will need to be copied, so the heap manager
is simply told that the extra space is free. If the size is larger, as in
inflate( ),there
may not be enough contiguous space, so a new chunk might be allocated and the
memory copied. The
assert( )
checks to make sure that the operation was successful. (
malloc( ),
calloc( )
and
realloc( )
all return zero if the heap is exhausted.)
Note
that the C heap manager is fairly primitive. It gives you chunks of memory and
takes them back when you
free( )
them. There’s no facility for
heap
compaction
,
which compresses the heap to provide bigger free chunks. If a program allocates
and frees heap storage for a while, you can end up with a heap that has lots of
memory free, just not anything big enough to allocate the size of chunk
you’re looking for at the moment. However, a heap compactor moves memory
chunks around, so your pointers won’t retain their proper values. Some
operating environments have heap compaction built in, but they require you to
use special memory
handles
(which can be temporarily converted to pointers, after locking the memory so
the heap compactor can’t move it) instead of pointers.
assert( )
is a preprocessor macro in
<cassert>.
assert( )
takes a single argument, which can be any expression that evaluates to true or
false. The macro says, “I assert this to be true, and if it’s not,
the program will exit after printing an error message.” When you are no
longer debugging, you can define a flag so asserts are ignored. In the
meantime, it is a very clear and portable way to test for errors.
Unfortunately, it’s a bit abrupt in its handling of error situations:
“Sorry, mission control. Our C program failed an assertion and bailed
out. We’ll have to land the shuttle on manual.” In Chapter 16,
you’ll see how C++ provides a better solution to critical errors with
exception
handling. When
you create a variable on the stack at compile-time, the storage for that
variable is automatically created and freed by the compiler. It knows exactly
how much storage it needs, and it knows the lifetime of the variables because
of scoping. With dynamic memory allocation, however, the compiler doesn’t
know how much storage you’re going to need,
and
it doesn’t know the lifetime of that storage. It doesn’t get
cleaned up automatically. Therefore, you’re responsible for releasing the
storage using
free( ),
which
tells the heap manager that storage can be used by the next call to
malloc( ),
calloc( )
or
realloc( ).
The logical place for this to happen in the library is in the
cleanup( )
function because that is where all the closing-up housekeeping is done.
To
test the library, two
CStashes
are created. The first holds
ints
and the second holds arrays of 80
chars.
(You could almost think of this as a new data type. But that happens later.)
/*: C04:Libtestc.c
//{L} Lib
Test demonstration library */
#include <stdio.h>
#include <assert.h>
#include "Lib.h"
#define BUFSIZE 80
int main() {
CStash intStash, stringStash;
int i;
FILE* file;
char buf[BUFSIZE];
char* cp;
/* .... */
initialize(&intStash, sizeof(int));
for(i = 0; i < 100; i++)
add(&intStash, &i);
/* Holds 80-character strings: */
initialize(&stringStash,
sizeof(char) * BUFSIZE);
file = fopen("Libtestc.c", "r");
assert(file);
while(fgets(buf, BUFSIZE, file))
add(&stringStash, buf);
fclose(file);
for(i = 0; i < count(&intStash); i++)
printf("fetch(&intStash, %d) = %d\n", i,
*(int*)fetch(&intStash, i));
i = 0;
while((cp = (char*)fetch(&stringStash,i++))!=0)
printf("fetch(&stringStash, %d) = %s",
i - 1, cp);
putchar('\n');
cleanup(&intStash);
cleanup(&stringStash);
return 0;
} /* ///:~ */
At
the beginning of
main( ),
the variables are defined, including the two
CStash
structures. Of course, you must remember to initialize these later in the
block. One of the problems with libraries is that you must carefully convey to
the user the importance of the initialization and
cleanup functions.
If these functions aren’t called, there will be a lot of trouble.
Unfortunately, the user doesn’t always wonder if initialization and
cleanup are mandatory. They know what
they
want to accomplish, and they’re not as concerned about you jumping up and
down saying, “Hey, wait, you have to do
this
first!” Some users have even been known to initialize the elements of the
structure themselves. There’s certainly no mechanism to prevent it (more
foreshadowing).
The
intStash
is filled up with integers, and the
stringStash
is filled with character arrays. These character arrays are produced by opening
the source code file,
Libtest.c,
and reading the lines from it into the
stringStash.
Notice something interesting here: The Standard C library functions for opening
and reading files use the same techniques as in the
Stash
library!
fopen( )
returns
a pointer to a
FILE
struct,
which it creates on the heap, and this pointer is passed to any function that
refers to that file (
fgets( ),
in
this case). One of the things
fclose( )
does
is release the
FILE
struct
back to the heap. Once you start noticing this pattern of a C library
consisting of
structs
and associated functions, you see it everywhere!
After
the two
Stashes
are loaded, you can print them out. The
intStash
is printed using a
for
loop, which uses
count( )
to establish its limit. The
stringStash
is printed with a
while,
which breaks out when
fetch( )
returns zero to indicate it is out of bounds.
There
are a number of other things you should understand before we look at the
problems in creating a C library. (You may already know these because
you’re a C programmer.) First, although header files are
used here because it’s good practice, they aren’t essential.
It’s possible in C to call a function that you haven’t declared. A
good compiler will warn you that you probably ought to declare a function
first, but it isn’t enforced. This is a dangerous practice, because the
compiler can assume that a function that you call with an
int
argument has an argument list containing
int,
and it will treat it accordingly – a very difficult bug to find.
Note
that the
Lib.h
header file
must
be included in any file that refers to
CStash
because the compiler can’t even guess at what that structure looks like.
It can guess at functions, even though it probably shouldn’t, but
that’s part of the history of C.
Each
separate C file is a
translation
unit.
That is, the compiler is run separately on each translation unit, and when it
is running it is aware of only that unit. Thus, any information you provide by
including header files is quite important because it provides the
compiler’s understanding of the rest of your program. Declarations in
header files are particularly important, because everywhere the header is
included, the compiler will know exactly what to do. If, for example, you have
a declaration in a header file that says
void
func(float);
,
the compiler knows that if you call it with an integer argument, it should
promote
the
int
to a
float.
Without the declaration, the compiler would simply assume that a function
func(int)
existed, and it wouldn’t do the promotion.
For
each translation unit, the compiler creates an object file, with an extension of
.o
or
.obj
or something similar. These object files, along with the necessary start-up
code, must be collected by the linker into
the executable program. During linking, all the external references must be
resolved. For example, in
Libtest.c,
functions like
initialize( )
and
fetch( )
are declared (that is, the compiler is told what they look like) and used, but
not defined. They are defined elsewhere, in
Lib.c.
Thus, the calls in
Libtest.c
are external references. The linker must, when it puts all the object files
together, take the unresolved external references and
find the addresses they actually refer to. Those addresses are put in to
replace the external references.
It’s
important to realize that in C, the references are simply function names,
generally with an underscore in front of them. So all the linker has to do is
match up the function name where it is called and the function body in the
object file, and it’s done. If you accidentally made a call that the
compiler interpreted as
func(int)
and there’s a function body for
func(float)
in some other object file, the linker will see
_func
in one place and
_func
in another, and it will think everything’s OK. The
func( )
at the calling location will push an
int
onto the stack, and the
func( )
function body will expect a
float
to be on the stack. If the function only reads the value and doesn’t
write to it, it won’t blow up the stack. In fact, the
float
value it reads off the stack might even make some kind of sense. That’s
worse because it’s harder to find the bug.
Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru