Introduction
to data
types
Data
types
define
the way you utilize storage (memory) in the programs that you write. By
specifying a data type, you tell the compiler how to create a particular piece
of storage, and also how to manipulate that storage.
Data
types can be built-in
or abstract.
A built-in data type
is one that the compiler intrinsically understands, one that is wired directly
into the compiler. The types of built-in data are almost identical in C and
C++. In contrast, a user-defined data type
is one that you or another programmer create as a class. These are commonly
referred to as abstract data types.
The compiler knows how to handle built-in types when it starts up; it
“learns” how to handle abstract data types by reading header files
containing class declarations (you’ll learn about this in later chapters).
Basic
built-in types
The
Standard C specification for built-in types (which C++ inherits) doesn’t
say how many bits each of the built-in types must contain. Instead, it
stipulates the minimum and maximum values that the built-in type must be able
to hold. When a machine is based on binary, this maximum value can be directly
translated into a minimum number of bits necessary to hold that value. However,
if a machine uses, for instance, binary-coded decimal (BCD) to represent
numbers then the amount of space in the machine required to hold the maximum
numbers for each data type will be different. The minimum and maximum values
that can be stored in the various data types are defined in the system header
files
limits.h
and
float.h
(in C++ you will generally
#include
<climits>
and
<cfloat>
instead).
C
and C++ have four basic built-in data types, described here for binary-based
machines. A
char
is for character storage and uses a minimum of 8 bits (one byte) of storage. An
int
stores an integral number and uses a minimum of two bytes of storage. The
float
and
double
types store floating-point numbers,
usually in IEEE floating-point format.
float
is for single-precision floating point
and
double
is for double-precision floating point. As
previously mentioned, you can define variables anywhere in a scope, and you can
define and initialize them
at the same time. Here’s how to define variables using the four basic
data types:
//: C03:Basic.cpp
// Defining the four basic data
// types in C and C++
int main() {
// Definition without initialization:
char protein;
int carbohydrates;
float fiber;
double fat;
// Simultaneous definition & initialization:
char pizza = 'A', pop = 'Z';
int dongdings = 100, twinkles = 150,
heehos = 200;
float chocolate = 3.14159;
// Exponential notation:
double fudge_ripple = 6e-4;
} ///:~
The
first part of the program defines variables of the four basic data types
without initializing them. If you don’t initialize a variable, the
Standard says that its contents are undefined (usually, this means they contain
garbage). The second part of the program defines and initializes variables at
the same time (it’s always best, if possible, to provide an
initialization value at the point of definition). Notice the use of exponential
notation
in the constant 6e-4, meaning: “6 times 10 to the minus fourth
power.”
bool,
true, & false
Before
bool
became
part of Standard C++, everyone tended to use different techniques in order to
produce Boolean-like behavior. These
produced portability problems and could introduce subtle errors.
The
Standard C++
bool
type can have two states expressed by the built-in constants
true
(which converts to an integral one) and
false
(which converts to an integral zero). All three names are keywords. In
addition, some language elements have been adapted:
|
|
|
Take
bool
arguments and return
bool.
|
|
|
|
Conditional
expressions convert to
bool
values
|
|
First
operand converts to
bool
value
|
Because
there’s a lot of existing code that uses an
int
to represent a flag, the compiler will implicitly convert from an
int
to a
bool.
Ideally, the compiler will give you a warning as a suggestion to correct the
situation.
An
idiom that falls under “poor programming style” is the use of
++
to set a flag to true. This is still allowed, but
deprecated,
which means that at some time in the future it will be made illegal. The
problem is that you’re making an implicit type conversion from
bool
to
int,
incrementing the value (perhaps beyond the range of the normal
bool
values of zero and one), and then implicitly converting it back again.
Pointers
(which will be introduced later in this chapter) will also be automatically
converted to
bool
when necessary.
Specifiers
Specifiers
modify the meanings of the basic built-in types, and expand the built-in types
to a much larger set. There are four specifiers:
long,
short,
signed
and
unsigned. long
and
short
modify the maximum and minimum values that a data type will hold. A plain
int
must be at least the size of a
short.
The size hierarchy for integral types is:
short
int,
int,
long
int.
All the sizes could conceivably be the same, as long as they satisfy the
minimum/maximum value requirements. On a machine with a 64-bit word, for
instance, all the data types might be 64 bits.
The
size hierarchy for floating point numbers
is:
float,
double,
and
long
double.
“Long float” is not a legal type. There are no
short
floating-point numbers.
The
signed
and
unsigned
specifiers tell the compiler how to use the sign bit with integral types and
characters (floating-point numbers always contain a sign). An
unsigned
number does not keep track of the sign and thus has an extra bit available, so
it can store positive numbers twice as large as the positive numbers that can
be stored in a
signed
number.
signed
is the default and is only necessary with
char;
char
may or may not default to
signed.
By specifying
signed
char,
you force the sign bit to be used.
The
following example shows the size of the data types in bytes by using the
sizeof( )
operator, introduced later in this chapter:
//: C03:Specify.cpp
// Demonstrates the use of specifiers
#include <iostream>
using namespace std;
int main() {
char c;
unsigned char cu;
int i;
unsigned int iu;
short int is;
short iis; // Same as short int
unsigned short int isu;
unsigned short iisu;
long int il;
long iil; // Same as long int
unsigned long int ilu;
unsigned long iilu;
float f;
double d;
long double ld;
cout
<< "\n char= " << sizeof(c)
<< "\n unsigned char = " << sizeof(cu)
<< "\n int = " << sizeof(i)
<< "\n unsigned int = " << sizeof(iu)
<< "\n short = " << sizeof(is)
<< "\n unsigned short = " << sizeof(isu)
<< "\n long = " << sizeof(il)
<< "\n unsigned long = " << sizeof(ilu)
<< "\n float = " << sizeof(f)
<< "\n double = " << sizeof(d)
<< "\n long double = " << sizeof(ld)
<< endl;
} ///:~
Be
aware that the results you get by running this program will probably be
different from one machine/operating system/compiler to the next, since (as
previously mentioned) the only thing that must be consistent is that each
different type hold the minimum and maximum values specified in the Standard.
When
you are modifying an
int
with
short
or
long,
the keyword
int
is optional, as shown above.
Introduction
to Pointers
Whenever
you run a program, it is first loaded (typically from disk) into the
computer’s memory. Thus, all elements of your program are located
somewhere in memory. Memory is typically laid out as a sequential series of
memory locations; we usually refer to these locations as eight-bit
bytes
but
actually the size of each space depends on the architecture of that particular
machine and is usually called that machine’s
word
size
.
Each space can be uniquely distinguished from all other spaces by its
address.
For the purposes of this discussion, we’ll just say that all machines use
bytes which have sequential addresses starting at zero and going up to however
much memory you have in your computer.
Since
your program lives in memory while it’s being run, every element of your
program has an address. Suppose we start with a simple program:
//: C03:YourPets1.cpp
#include <iostream>
using namespace std;
int dog, cat, bird, fish;
void f(int pet) {
cout << "pet id number: " << pet << endl;
}
int main() {
int i, j, k;
} ///:~
Each
of the elements in this program has a location in storage when the program is
running. You can visualize it like this:
All
the variables, and even the function, occupy storage. As you’ll see, it
turns out that what an element is and the way you define it usually determines
the area of memory where that element is placed.
There
is an operator in C and C++ that will tell you the address of an element. This
is the ‘
&’
operator. All you do is precede the identifier name with ‘
&’
and it will produce the address of that identifier.
YourPets1.cpp
can be modified to print out the addresses of all its elements, like this:
//: C03:YourPets2.cpp
#include <iostream>
using namespace std;
int dog, cat, bird, fish;
void f(int pet) {
cout << "pet id number: " << pet << endl;
}
int main() {
int i, j, k;
cout << "f(): " << (long)&f << endl;
cout << "dog: " << (long)&dog << endl;
cout << "cat: " << (long)&cat << endl;
cout << "bird: " << (long)&bird << endl;
cout << "fish: " << (long)&fish << endl;
cout << "i: " << (long)&i << endl;
cout << "j: " << (long)&j << endl;
cout << "k: " << (long)&k << endl;
} ///:~
The
(long)
is a
cast.
It says “don’t treat this as it’s normal type, instead treat
it as a
long.”
The cast isn’t essential, but if it wasn’t there the addresses
would have been printed out in hexadecimal instead, so casting to a
long
makes things a little more readable.
The
results of this program will vary depending on your computer, OS and all sorts
of other factors, but it will still give you some interesting insights. For a
single run on my computer, the results looked like this:
You
can see how the variables that are defined inside
main( )
are in a different area than the variables defined outside of
main( );
you’ll understand why as you learn more about the language. Also,
f( )
appears to be in its own area; code is typically separated from data in memory.
Another
interesting thing to note is that variables defined one right after the other
appear to be placed contiguously in memory. They are separated by the number of
bytes that are required by their data type. Here, the only data type used is
int,
and
cat
is four bytes away from
dog,
bird
is four bytes away from
cat,
etc. So it would appear that, on this machine, an
int
is four bytes long.
Other
than this interesting experiment showing how memory is mapped out, what can you
do with an address? The most important thing you can do is store it inside
another variable for later use. C and C++ have a special type of variable that
holds an address. This variable is called a
pointer. The
operator that defines a pointer is the same as the one used for multiplication:
‘
*’.
The compiler knows that it isn’t multiplication because of the context in
which it is used, as you shall see.
When
you define a pointer, you must specify the type of variable it points to. You
start out by giving the type name, then instead of immediately giving an
identifier for the variable, you say “wait, it’s a pointer”
by inserting a star between the type and the identifier. So a pointer to an
int
looks like this:
int*
ip; // ip points to an int variable
The
association of the ‘
*’
with the type looks sensible and reads easily, but it can actually be a bit
deceiving. Your inclination might be to say “intpointer” as if it
is a single discrete type. However, with an
int
or other basic data type, it’s possible to say:
whereas
with a pointer, you’d
like
to say:
C
syntax (and by inheritance, C++ syntax) does not allow such sensible
expressions. In the above definitions, only
ipa
is a pointer, but
ipb
and
ipc
are ordinary
ints
(you can say that “* binds more tightly to the identifer”).
Consequently, the best results can be achieved by using only one definition per
line: you still get the sensible syntax without the confusion:
int* ipa;
int* ipb;
int* ipc;
Since
a general guideline for C++ programming is that you should always initialize a
variable at the point of definition, this form actually works better. For
example, the above variables are not initialized to any particular value; they
hold garbage. It’s much better to say something like:
int a = 47;
int* ipa = &a;
Now
both
a
and
ipa
have been initialized, and
ipa
holds the address of
a. Once
you have an initialized pointer, the most basic thing you can do with it is to
use it to modify the value it points to. To access a variable through a
pointer, you
dereference
the pointer using the same operator that you used to define it, like this:
Now
a
contains the value 100 instead of 47.
These
are the basics of pointers: you can hold an address, and you can use that
address to modify the original variable. But the question still remains: why do
you want to modify one variable using another variable as a proxy?
For
this introductory view of pointers, we can put the answer into two broad
categories:
- To
change “outside objects” from within a function. This is perhaps
the most basic use of pointers, and it will be examined here.
- To
achieve many other clever programming techniques, which you’ll learn
about in portions of the rest of the book.
Modifying
the outside object
Ordinarily,
when you pass an argument to a function, a copy of that argument is made inside
the function. This is referred to as
pass-by-value.
You
can see the effect of pass-by-value in the following program:
//: C03:PassByValue.cpp
#include <iostream>
using namespace std;
void f(int a) {
cout << "a = " << a << endl;
a = 5;
cout << "a = " << a << endl;
}
int main() {
int x = 47;
cout << "x = " << x << endl;
f(x);
cout << "x = " << x << endl;
} ///:~
In
f(
)
,
a
is a
local
variable
,
so it only exists for the duration of the function call to
f(
)
.
Because it’s a function argument, the value of
a
is initialized by the arguments that are passed when the function is called; in
main(
)
the argument is
x
which has a value of 47, so this value is copied into
a
when
f(
)
is called.
When
you run this program you’ll see:
x = 47
a = 47
a = 5
x = 47
Initially,
of course,
x
is 47. When
f(
)
is called, temporary space is created to hold the variable
a
for the duration of the function call, and
a
is initialized by copying the value of
x,
which is verified by printing it out. Of course, you can change the value of
a
and show that it is changed. But when
f(
)
is completed, the temporary space that was created for
a
disappears, and we see that the only connection that ever existed between
a
and
x
happened when the value of
x
was copied into
a. When
you’re inside
f(
)
,
x
is the
outside
object
(my terminology), and changing the local variable does not affect the outside
object, naturally enough, since they are two separate locations in storage. But
what if you
do
want to modify the outside object? This is where pointers come in handy. In a
sense, a pointer is an alias for another variable. So if we pass a
pointer
into a function instead of an ordinary value, we are actually passing an alias
to the outside object, enabling the function to modify that outside object,
like this:
//: C03:PassAddress.cpp
#include <iostream>
using namespace std;
void f(int* p) {
cout << "p = " << p << endl;
cout << "*p = " << *p << endl;
*p = 5;
cout << "p = " << p << endl;
}
int main() {
int x = 47;
cout << "x = " << x << endl;
cout << "&x = " << &x << endl;
f(&x);
cout << "x = " << x << endl;
} ///:~
Now
f(
)
takes a pointer as an argument, and dereferences the pointer during assignment,
and this causes the outside object
x
to be modified. The output is:
x = 47
&x = 0065FE00
p = 0065FE00
*p = 47
p = 0065FE00
x = 5
Notice
that the value contained in
p
is
the same as address of
x
– the pointer
p
does indeed point to
x.
If that isn’t convincing enough, when
p
is dereferenced to assign the value 5, we see that the value of
x
is now changed to 5 as well.
Thus,
passing a pointer into a function will allow that function to modify the
outside object. You’ll see plenty of other uses for pointers later, but
this is arguably the most basic and possibly the most common use.
Introduction
to C++ references
Pointers
work roughly the same in C and in C++, but C++ adds an additional way to pass
an address into a function. This is
pass-by-reference
and it exists in several other programming languages so it was not a C++
invention.
Your
initial perception of references may be that they are unnecessary – that
you could write all your programs without references. In general, this is true,
with the exception of a few important places which you’ll learn about
later in the book. You’ll also learn more about references later in the
book, but the basic idea is the same as the above demonstration of pointer use:
you can pass the address of an argument using a reference. The difference
between references and pointers is that
calling
a function that takes references is cleaner, syntactically, than calling a
function that takes pointers (and it is exactly this syntactic difference that
makes references essential in certain situations). If
PassAddress.cpp
is modified to use references, you can see the difference in the function call
in
main(
)
:
//: C03:PassReference.cpp
#include <iostream>
using namespace std;
void f(int& r) {
cout << "r = " << r << endl;
cout << "&r = " << &r << endl;
r = 5;
cout << "r = " << r << endl;
}
int main() {
int x = 47;
cout << "x = " << x << endl;
cout << "&x = " << &x << endl;
f(x); // Looks like pass-by-value,
// is actually pass by reference
cout << "x = " << x << endl;
} ///:~
In
f(
)
’s
argument list, instead of saying
int*
to pass a pointer, you say
int&
to pass a reference. Inside
f(
)
,
if you just say ‘
r’
(which would produce the address if
r
were a pointer) you get
the
value in the variable that
r
references.
If you assign to
r,
you actually assign to the variable that
r
references. In fact, the only way to get the address that’s held inside
r
is with the ‘
&’
operator.
In
main(
)
,
you can see the key effect of references in the syntax of the call to
f(
)
,
which is just
f(x).
Even though this looks like an ordinary pass-by-value, the effect of the
reference is that it actually takes the address and passes it in, rather than
making a copy of the value. The output is:
x = 47
&x = 0065FE00
r = 47
&r = 0065FE00
r = 5
x = 5
So
you can see that pass-by-reference allows a function to modify the outside
object, just like passing a pointer does (you can also observe that the
reference obscures the fact that an address is being passed – this will
be examined later in the book). Thus, for this simple introduction you can
assume that references are just a syntactically different way (sometimes
referred to as “syntactic sugar”) to accomplish the same thing that
pointers do: allow functions to change outside objects.
Pointers
and references as modifiers
So
far, you’ve seen the basic data types
char,
int,
float
and
double,
along with the specifiers
signed,
unsigned,
short
and
long,
which can be used with the basic data types in almost any combination. Now
we’ve added pointers and references which are orthogonal to the basic
data types and specifiers, so the possible combinations have just tripled:
//: C03:AllDefinitions.cpp
// All possible combinations of basic data types,
// specifiers, pointers and references
#include <iostream>
using namespace std;
void f1(char c, int i, float f, double d);
void f2(short int si, long int li, long double ld);
void f3(unsigned char uc, unsigned int ui,
unsigned short int usi, unsigned long int uli);
void f4(char* cp, int* ip, float* fp, double* dp);
void f5(short int* sip, long int* lip,
long double* ldp);
void f6(unsigned char* ucp, unsigned int* uip,
unsigned short int* usip,
unsigned long int* ulip);
void f7(char& cr, int& ir, float& fr, double& dr);
void f8(short int& sir, long int& lir,
long double& ldr);
void f9(unsigned char& ucr, unsigned int& uir,
unsigned short int& usir,
unsigned long int& ulir);
int main() {} ///:~
Pointers
and references also work when passing objects into and out of functions;
you’ll learn about this in a later chapter.
There’s
one other type that works with pointers:
void.
If you state that a pointer is a
void*,
it means that any type of address at all can be assigned to that pointer
(whereas if you have an
int*,
you can only assign the address of an
int
variable to that pointer). For example:
//: C03:VoidPointer.cpp
int main() {
void* vp;
char c;
int i;
float f;
double d;
// The address of ANY type can be
// assigned to a void pointer:
vp = &c;
vp = &i;
vp = &f;
vp = &d;
} ///:~
Once
you assign to a
void*
you lose any information about what type it is. This means that before you can
use the pointer, you must cast it to the correct type:
//: C03:CastFromVoidPointer.cpp
int main() {
int i = 99;
void* vp = &i;
// Can't dereference a void pointer:
// *vp = 3; // Compile-time error
// Must cast back to int before dereferencing:
*((int*)vp) = 3;
} ///:~
The
cast
(int*)vp
takes the
void*
and tells the compiler to treat it as an
int*,
and thus it can be successfully dereferenced. You might observe that this
syntax is ugly, and it is, but it’s worse than that – the
void*
introduces a hole in the language’s type system. That is, it allows, or
even promotes the treatment of one type as another type. In the above example,
I treat an
int
as an
int
by casting
vp
to an
int*,
but there’s nothing that says I can’t cast it to a
char*
or
double*,
which would modify a different amount of storage that had been allocated for the
int,
possibly crashing your program. In general,
void
pointers should be avoided, and only used in rare special cases, the likes of
which you won’t be ready to consider until significantly later in the book.
You
cannot have a
void
reference, for reasons that will be explained in a future chapter.
Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru