Composite
type creation
The
fundamental data types and their variations are essential, but rather
primitive. C and C++ provide tools that allow you to compose more sophisticated
data types from the fundamental data types. As you’ll see, the most
important of these is
struct,
which is the foundation for
class
in C++. However, the simplest way to create more sophisticated types is simply
to alias a name to another name via
typedef.
Aliasing
names with typedef
This
keyword promises more than it delivers:
typedef
suggests
“type definition” when “alias” would probably have been
a more accurate description, since that’s all it really does. The syntax
is:
typedef
existing-type-description alias-name
People
often use
typedef
when data types get slightly complicated, just to prevent extra keystrokes.
Here is a commonly-used
typedef: typedef
unsigned long ulong;
Now
if you say
ulong
the compiler knows that you mean
unsigned
long
.
You might think that this could as easily be accomplished using preprocessor
substitution, but there are key situations where the compiler must be aware
that you’re treating a name as if it were a type, so
typedef
is essential.
You
can argue that it’s more explicit and therefore more readable to avoid
typedefs
for primitive types, and indeed programs rapidly become difficult to read when
many
typedefs
are used. However,
typedefs
become especially important in C when used with
struct.
Combining
variables with struct
A
struct
is a way to collect a group of variables into a structure. Once you create a
struct,
then you can make many instances of this “new” type of variable
you’ve invented. For example:
//: C03:SimpleStruct.cpp
struct Structure1 {
char c;
int i;
float f;
double d;
};
int main() {
struct Structure1 s1, s2;
s1.c = 'a'; // Select an element using a '.'
s1.i = 1;
s1.f = 3.14;
s1.d = 0.00093;
s2.c = 'a';
s2.i = 1;
s2.f = 3.14;
s2.d = 0.00093;
} ///:~
The
struct
declaration
must end with a semicolon.
In
main(
)
,
two instances of
Structure1
are created:
s1
and
s2.
Each of these have their own separate versions of
c,
i,
f
and
d.
So
s1
and
s2
represent clumps of completely independent variables. To select one of the
elements within
s1
or
s2,
you use a ‘
.’,
syntax you’ve seen in the previous chapter when using C++
class
objects
– since
classes
evolved from
structs,
this is where that syntax arose.
One
thing you’ll notice is the awkwardness of the use of
Structure1.
You can’t just say
Structure1
when you’re defininge variables, you must say
struct
Structure1
.
This is where
typedef
becomes especially handy:
//: C03:SimpleStruct2.cpp
// Using typedef with struct
typedef struct {
char c;
int i;
float f;
double d;
} Structure2;
int main() {
Structure2 s1, s2;
s1.c = 'a';
s1.i = 1;
s1.f = 3.14;
s1.d = 0.00093;
s2.c = 'a';
s2.i = 1;
s2.f = 3.14;
s2.d = 0.00093;
} ///:~
By
using
typedef
in this way, you can pretend that
Structure2
is a built-in type, like
int
or
float,
when you define
s1
and
s2
(but notice it only has data – characteristics – and does not
include behavior, which is what we get with real objects in C++). You’ll
notice that the
struct
name has been left off at the beginning, because the goal is to create the
typedef.
However, there are times when you might need to refer to the
struct
during its definition. In those cases, you can actually repeat the name of the
struct
as the
struct
name and as the
typedef:
//: C03:SelfReferential.cpp
// Allowing a struct to refer to itself
typedef struct SelfReferential {
int i;
SelfReferential* sr; // Head spinning yet?
} SelfReferential;
int main() {
SelfReferential sr1, sr2;
sr1.sr = &sr2;
sr2.sr = &sr1;
sr1.i = 47;
sr2.i = 1024;
} ///:~
If
you look at this for awhile, you’ll see that
sr1
and
sr2
point to each other, as well as each holding a piece of data.
Actually,
the
struct
name does not have to be the same as the
typedef
name, but it is usually done this way as it tends to keep things simpler.
Pointers
and structs
In
the above examples, all the
structs
are manipulated as objects. However, like any piece of storage you can take the
address of a
struct
object (as seen in
SelfReferential.cpp
above). To select the elements of a particular
struct
object, you use a ‘
.’,
as seen above. However, if you have a pointer to a
struct
object,
you must select an element of that object using a different operator: the ‘
->’.
Here’s an example:
//: C03:SimpleStruct3.cpp
// Using pointers to structs
typedef struct Structure3 {
char c;
int i;
float f;
double d;
} Structure3;
int main() {
Structure3 s1, s2;
Structure3* sp = &s1;
sp->c = 'a';
sp->i = 1;
sp->f = 3.14;
sp->d = 0.00093;
sp = &s2; // Point to a different struct object
sp->c = 'a';
sp->i = 1;
sp->f = 3.14;
sp->d = 0.00093;
} ///:~
In
main( ),
the
struct
pointer
sp
is initially pointing to
s1,
and the members of
s1
are initialized by selecting them with the ‘
->’
(and you use this same operator in order to read those members). But then
sp
is pointed to
s2,
and those variables are initialized the same way. So you can see that another
benefit of pointers is that they can be dynamically redirected to point to
different objects; this provides more flexibility in your programming, as you
shall learn.
For
now, that’s all you need to know about
structs,
but
you’ll
become much more comfortable with
structs
(and especially their more potent successors,
classes)
as the book progresses.
Clarifying
programs with enum
An
enumerated data type is a way of attaching names to numbers, thereby giving
more meaning to anyone reading the code. The
enum
keyword (from C) automatically enumerates any list of words you give it by
assigning them values of 0, 1, 2, etc. You can declare
enum
variables (which are always
ints).
The declaration of an
enum
looks similar to a
struct
declaration.
An
enumerated data type is very useful when you want to keep track of some sort of
feature:
//: C03:Enum.cpp
// Keeping track of shapes.
enum ShapeType {
circle,
square,
rectangle
}; // Must end with a semicolon like a struct
int main() {
ShapeType shape = circle;
// Activities here....
// Now do something based on what the shape is:
switch(shape) {
case circle: /* circle stuff */ break;
case square: /* square stuff */ break;
case rectangle: /* rectangle stuff */ break;
}
} ///:~
shape
is a variable of the
ShapeType
enumerated data type, and its value is compared with the value in the
enumeration. Since
shape
is really just an
int,
however, it can be any value an
int
can hold (including a negative number). You can also compare an
int
variable with a value in the enumeration.
If
you don’t like the way the compiler assigns values, you can do it
yourself, like this:
enum ShapeType {
circle = 10, square = 20, rectangle = 50
};
If
you give values to some names and not to others, the compiler will use the next
integral value. For example,
enum
snap { crackle = 25, pop };
The
compiler gives
pop
the value 26.
You
can see how much more readable the code is when you use enumerated data types.
However, to some degree this is still an attempt (in C) to accomplish the
things that we can do with a
class
in C++, so you’ll see
enum
used less in C++.
Saving
memory with union
Sometimes
a program will handle different types of data using the same variable. In this
situation, you have two choices: you can create a
struct
containing all the possible different types you might need to store, or you can
use a
union.
A
union
piles all the data into a single space; it figures out the amount of space
necessary for the largest item you’ve put in the
union,
and makes that the size of the
union.
Use a
union
to save memory.
Anytime
you place a value in a
union,
the value always starts in the same place at the beginning of the
union,
but only uses as much space as is necessary. Thus, you create a
“super-variable,” capable of holding any of the
union
variables. All the addresses of the
union
variables are the same (in a class or
struct,
the addresses are different).
Here’s
a simple use of a
union.
Try removing various elements and see what effect it has on the size of the
union.
Notice that it makes no sense to declare more than one instance of a single
data type in a
union
(unless you’re just doing it to use a different name).
//: C03:Union.cpp
// The size and simple use of a union
#include <iostream>
using namespace std;
union packed { // Declaration similar to a class
char i;
short j;
int k;
long l;
float f;
double d;
// The union will be the size of a
// double, since that's the largest element
}; // Semicolon ends a union, like a struct
int main() {
cout << "sizeof(packed) = "
<< sizeof(packed) << endl;
packed x;
x.i = 'c';
cout << x.i << endl;
x.d = 3.14159;
cout << x.d << endl;
} ///:~
The
compiler performs the proper assignment according to the union member you select.
Once
you perform an assignment, the compiler doesn’t care what you do with the
union. In the above example, you could assign a floating-point value to
x: and
then send it to the output as if it were an
int: This
would produce garbage.
Arrays
Arrays
are a kind of composite type because they allow you to clump a lot of variables
together, one right after the other, under a single identifier name. If you say:
You
create storage for 10
int
variables stacked on top of each other, but without unique identifier names for
each variable. Instead, they are all lumped under the name
a. To
access one of these
array
elements
,
you use the same square-bracket syntax that you use to define an array:
However,
you must remember that even though the
size
of
a
is 10, you select array elements starting at zero (this is sometimes called
zero
indexing
),
so you can only select the array elements 0-9, like this:
//: C03:Arrays.cpp
#include <iostream>
using namespace std;
int main() {
int a[10];
for(int i = 0; i < 10; i++) {
a[i] = i * 10;
cout << "a[" << i << "] = " << a[i] << endl;
}
} ///:~
Array
access is extremely fast. However, if you index past the end of the array,
there is no safety net – you’ll step on other variables. The other
drawback is the fact that you must define the size of the array at compile
time; if you want to change the size at run-time you can’t do it with the
above syntax (C does have a way to create an array dynamically, but it’s
significantly messier). The C++
vector,
introduced in the previous chapter, provides an array-like object that
automatic resizes itself, so it is usually a much better solution if your array
size cannot be constant.
You
can make an array of any type, even of
structs:
//: C03:StructArray.cpp
// An array of struct
typedef struct {
int i, j, k;
} ThreeDpoint;
int main() {
ThreeDpoint p[10];
for(int i = 0; i < 10; i++) {
p[i].i = i + 1;
p[i].j = i + 2;
p[i].k = i + 3;
}
} ///:~
Notice
how the
struct
identifier
i
is independent of the
for
loop’s
i. To
see that each element of an array is contiguous with the next, you can print
out the addresses like this:
//: C03:ArrayAddresses.cpp
#include <iostream>
using namespace std;
int main() {
int a[10];
cout << "sizeof(int) = " << sizeof(int) <<endl;
for(int i = 0; i < 10; i++)
cout << "&a[" << i << "] = "
<< (long)&a[i] << endl;
} ///:~
When
you run this program, you’ll see that each element is one
int
size away from the previous one. That is, they are stacked one on top of the
other.
Pointers
and arrays
The
identifier of an array is unlike the identifiers for ordinary variables. For
one thing, an array identifier is not an lvalue – you cannot assign to
it. It’s really just a hook into the square-bracket syntax, and when you
give the name of an array, without square brackets, what you get is the
starting address of the array:
//: C03:ArrayIdentifier.cpp
#include <iostream>
using namespace std;
int main() {
int a[10];
cout << "a = " << a << endl;
cout << "&a[0] =" << &a[0] << endl;
} ///:~
When
you run this program you’ll see that the two addresses (which will be
printed in hexadecimal, since there is no cast to
long)
are the same.
So
one way to look at the array identifier is as a read-only pointer to the
beginning of an array. And although we can’t change the array identifier
to point somewhere else, we
can
create another pointer and use that to move around in the array. In fact, the
square-bracket syntax works with regular pointers, as well:
//: C03:PointersAndBrackets.cpp
int main() {
int a[10];
int* ip = a;
for(int i = 0; i < 10; i++)
ip[i] = i * 10;
} ///:~
The
fact that naming an array produces its starting address turns out to be quite
important when you want to pass an array to a function. If you declare an array
as a function argument, what you’re really declaring is a pointer. So in
the following example,
func1( )
and
func2( )
effectively have the same argument lists:
//: C03:ArrayArguments.cpp
#include <iostream>
#include <string>
using namespace std;
void func1(int a[], int size) {
for(int i = 0; i < size; i++)
a[i] = i * i - i;
}
void func2(int* a, int size) {
for(int i = 0; i < size; i++)
a[i] = i * i + i;
}
void print(int a[], string name, int size) {
for(int i = 0; i < size; i++)
cout << name << "[" << i << "] = "
<< a[i] << endl;
}
int main() {
int a[5], b[5];
// Probably garbage values:
print(a, "a", 5);
print(b, "b", 5);
// Initialize the arrays:
func1(a, 5);
func1(b, 5);
print(a, "a", 5);
print(b, "b", 5);
// Notice the arrays are always modified:
func2(a, 5);
func2(b, 5);
print(a, "a", 5);
print(b, "b", 5);
} ///:~
Even
though
func1( )
and
func2( )
declare their arguments differently, the usage is the same inside the function.
There are some other issues that this example reveals: arrays cannot be passed
by value
[21],
that is, you never automatically get a local copy of the array that you pass
into a function. Thus, when you modify an array, you’re always modifying
the outside object. This can be a bit confusing at first, if you’re
expecting the pass-by-value provided with ordinary arguments.
You’ll
notice that
print( )
uses the square-bracket syntax for array arguments. Even though the pointer
syntax and the square-bracket syntax are effectively the same when passing
arrays as arguments, the square-bracket syntax makes it clearer to the reader
that you mean for this argument to be an array.
Also
note that the
size
argument is passed in each case. Just passing the address of an array
isn’t enough information; you must always be able to know how big the
array is inside your function, so you don’t run off the end of that array.
Arrays
can be of any type, including arrays of pointers. In fact, when you want to
pass command-line arguments into your program, C & C++ have a special
argument list for
main( )
which looks like this:
int
main(int argc, char* argv[]) { // ...
The
first argument is the number of elements in the array which is the second
argument. The second argument is always an array of
char*,
because the arguments are passed from the command line as character arrays (and
remember, an array can only be passed as a pointer). Each whitespace-delimited
cluster of characters on the command line is turned into a separate array
argument. The following program prints out all its command-line arguments by
stepping through the array:
//: C03:CommandLineArgs.cpp
#include <iostream>
using namespace std;
int main(int argc, char* argv[]) {
cout << "argc = " << argc << endl;
for(int i = 0; i < argc; i++)
cout << "argv[" << i << "] = "
<< argv[i] << endl;
} ///:~
You’ll
notice that
argv[0]
is the path and name of the program itself. This allows the program to discover
information about itself. It also adds one more to the array of program
arguments, so a common error when fetching command-line arguments is to grab
argv[0]
when you want
argv[1]. You
are not forced to use
argc
and
argv
as identifiers in
main( );
those identifiers are only conventions (but it will confuse people if you
don’t use them). Also, there are alternate ways to declare
argv:
int main(int argc, char** argv) { // ...
int main(int argc, char argv[][]) { // ...
All
three forms are equivalent, but I find the form used in this book to be the
most intuitive when reading the code, since it says, directly, “this is
an array of character pointers.”
All
you get from the command-line is character arrays; if you want to treat an
argument as some other type, you are responsible for converting it, inside your
program. To facilitate the conversion to numbers, there are some helper
functions in the Standard C library, declared in
<cstdlib>.
The simplest ones to use are
atoi( ),
atol( )
and
atof( ),
to convert an ASCII character array to an
int,
long
and
double
floating-point value, respectively. Here’s an example using
atoi( )
(the other two functions are called the same way):
//: C03:ArgsToInts.cpp
// Converting command-line arguments to ints
#include <iostream>
#include <cstdlib>
using namespace std;
int main(int argc, char* argv[]) {
for(int i = 1; i < argc; i++)
cout << atoi(argv[i]) << endl;
} ///:~
In
this program, you can put any number of arguments on the command line.
You’ll notice that the
for
loop starts at the value
1
to skip over the program name at
argv[0].
Also, if you put a floating-point number containing a decimal point on the
command line,
atoi( )
only takes the digits up to the decimal point. If you put non-numbers on the
command line, these come back from
atoi( )
as zero
Pointer
arithmetic
If
all you could do with a pointer that points at an array is treat it as if it
were an alias for that array, pointers into arrays wouldn’t be very
interesting. However, pointers are more flexible than this, since they can be
moved around (and remember, the array identifier cannot be moved).
Pointer
arithmetic
refers to the application of some of the arithmetic operators to pointers. The
reason pointer arithmetic is a separate subject from ordinary arithmetic is
that pointers must conform to special constraints in order to make them behave
properly. For example, a common operator to use with pointers is
++,
which “adds one to the pointer.” What this actually means is that
the pointer is changed to move to “the next value,” whatever that
means. Here’s an example:
//: C03:PointerIncrement.cpp
#include <iostream>
using namespace std;
int main() {
int i[10];
double d[10];
int* ip = i;
double* dp = d;
cout << "ip = " << (long)ip << endl;
ip++;
cout << "ip = " << (long)ip << endl;
cout << "dp = " << (long)dp << endl;
dp++;
cout << "dp = " << (long)dp << endl;
} ///:~
For
one run on my machine, the output is:
ip = 6684124
ip = 6684128
dp = 6684044
dp = 6684052
What’s
interesting here is that even though the operation
++
appears to be the same operation for both the
int*
and the
double*,
you can see that the pointer has been changed only 4 bytes for the
int*
but 8 bytes for the
double*.
Not coincidently, these are the sizes of
int
and
double
on my machine. And that’s the trick of pointer arithmetic: the compiler
figures out the right amount to change the pointer so that it’s pointing
to the next element in the array (pointer arithmetic is only meaningful within
arrays). This even works with arrays of
structs:
//: C03:PointerIncrement2.cpp
#include <iostream>
using namespace std;
typedef struct {
char c;
short s;
int i;
long l;
float f;
double d;
long double ld;
} Primitives;
int main() {
Primitives p[10];
Primitives* pp = p;
cout << "sizeof(Primitives) = "
<< sizeof(Primitives) << endl;
cout << "pp = " << (long)pp << endl;
pp++;
cout << "pp = " << (long)pp << endl;
} ///:~
The
output for one run on my machine was:
sizeof(Primitives) = 40
pp = 6683764
pp = 6683804
So
you can see the compiler also does the right thing for pointers to
structs
(and
classes
and
unions). Pointer
arithmetic also works with the operators
--,
+
and
-,
but the latter two operators are limited: you cannot add or subtract two
pointers. Instead, you must add or subtract an integral value. Here’s an
example demonstrating the use of pointer arithmetic:
//: C03:PointerArithmetic.cpp
#include <iostream>
using namespace std;
#define P(EXP) \
cout << #EXP << ": " << EXP << endl;
int main() {
int a[10];
for(int i = 0; i < 10; i++)
a[i] = i; // Give it index values
int* ip = a;
P(*ip);
P(*++ip);
P(*(ip + 5));
int* ip2 = ip + 5;
P(*ip2);
P(*(ip2 - 4));
P(*--ip2);
} ///:~
It
begins with another macro, but this one uses a preprocessor feature called
stringizing
(implemented with the ‘
#’
sign before an expression) which takes any expression and turns it into a
character array. This is quite convenient, since it allows the expression to be
printed, followed by a colon, followed by the value of the expression. In
main( )
you can see the useful shorthand that is produced.
Although
pre- and postfix versions of
++
and
--
are
valid with pointers, only the prefix versions are used in this example because
they are applied before the pointers are dereferenced in the above expressions,
so they allow us to see the effects of the operations. Note that only integral
values are being added and subtracted; if two pointers were combined this way
the compiler would not allow it.
Here
is the output of the above program:
*ip: 0
*++ip: 1
*(ip + 5): 6
*ip2: 6
*(ip2 - 4): 2
*--ip2: 5
In
all cases, the pointer arithmetic results in the pointer being adjusted to
point to the “right place,” based on the size of the elements being
pointed to.
If
pointer arithmetic seems a bit overwhelming at first, don’t worry. Most
of the time you’ll only need to create arrays and index into them with
[ ],
and the most sophisticated pointer arithmetic you’ll usually need is
++
and
--.
Pointer arithmetic is generally reserved for more clever and complex programs,
and many of the containers in the Standard C++ library hide most of these
clever details so you don’t have to worry about them.
[21]
Unless you take the very strict approach that “all argument passing in
C/C++ is by value, and the ‘value’ of an array is what is produced
by the array identifier: it’s address.” This can be seen as true
from the assembly-language standpoint, but I don’t think it helps when
trying to work with higher-level concepts. The addition of references in C++
makes the “all passing is by value” argument more confusing, to the
point where I feel it’s more helpful to think in terms of “passing
by value” vs. “passing addresses.”
Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru