vector
The
vector
is intentionally made to look like a souped-up array, since it has array-style
indexing but also can expand dynamically.
vector
is so fundamentally useful that it was introduced in a very primitive way early
in this book, and used quite regularly in previous examples. This section will
give a more in-depth look at
vector. To
achieve maximally-fast indexing and iteration, the
vector
maintains its storage as a single contiguous array of objects. This is a
critical point to observe in understanding the behavior of
vector.
It means that indexing and iteration are lighting-fast, being basically the
same as indexing and iterating over an array of objects. But it also means that
inserting an object anywhere but at the end (that is, appending) is not really
an acceptable operation for a
vector.
It also means that when a
vector
runs out of pre-allocated storage, in order to maintain its contiguous array it
must allocate a whole new (larger) chunk of storage elsewhere and copy the
objects to the new storage. This has a number of unpleasant side effects.
Cost
of overflowing allocated storage
A
vector
starts
by grabbing a block of storage, as if it’s taking a guess at how many
objects you plan to put in it. As long as you don’t try to put in more
objects than can be held in the initial block of storage, everything is very
rapid and efficient (note that if you
do
know
how many objects to expect, you can pre-allocate storage using
reserve( )).
But eventually you will put in one too many objects and, unbeknownst to you, the
vector
responds
by:
- Allocating
a new, bigger piece of storage
- Copying
all the objects from the old storage to the new (using the copy-constructor)
- Destroying
all the old objects (the destructor is called for each one)
- Releasing
the old memory
For
complex objects, this copy-construction and destruction can end up being very
expensive if you overfill your vector a lot. To see what happens when
you’re filling a
vector,
here is a class that prints out information about its creations, destructions,
assignments and copy-constructions:
//: C20:Noisy.h
// A class to track various object activities
#ifndef NOISY_H
#define NOISY_H
#include <iostream>
class Noisy {
static long create, assign, copycons, destroy;
long id;
public:
Noisy() : id(create++) {
std::cout << "d[" << id << "]";
}
Noisy(const Noisy& rv) : id(rv.id) {
std::cout << "c[" << id << "]";
copycons++;
}
Noisy& operator=(const Noisy& rv) {
std::cout << "(" << id << ")=[" <<
rv.id << "]";
id = rv.id;
assign++;
return *this;
}
friend bool
operator<(const Noisy& lv, const Noisy& rv) {
return lv.id < rv.id;
}
friend bool
operator==(const Noisy& lv, const Noisy& rv) {
return lv.id == rv.id;
}
~Noisy() {
std::cout << "~[" << id << "]";
destroy++;
}
friend std::ostream&
operator<<(std::ostream& os, const Noisy& n) {
return os << n.id;
}
friend class NoisyReport;
};
struct NoisyGen {
Noisy operator()() { return Noisy(); }
};
// A singleton. Will automatically report the
// statistics as the program terminates:
class NoisyReport {
static NoisyReport nr;
NoisyReport() {} // Private constructor
public:
~NoisyReport() {
std::cout << "\n-------------------\n"
<< "Noisy creations: " << Noisy::create
<< "\nCopy-Constructions: "
<< Noisy::copycons
<< "\nAssignments: " << Noisy::assign
<< "\nDestructions: " << Noisy::destroy
<< std::endl;
}
};
// Because of these this file can only be used
// in simple test situations. Move them to a
// .cpp file for more complex programs:
long Noisy::create = 0, Noisy::assign = 0,
Noisy::copycons = 0, Noisy::destroy = 0;
NoisyReport NoisyReport::nr;
#endif // NOISY_H ///:~
Each
Noisy
object has its own identifier, and there are
static
variables to keep track of all the creations, assignments (using
operator=),
copy-constructions and destructions. The
id
is initialized using the
create
counter inside the default constructor; the copy-constructor and assignment
operator take their
id
values from the rvalue. Of course, with
operator=
the lvalue is already an initialized object so the old value of
id
is printed before it is overwritten with the
id
from the rvalue.
In
order to support certain operations like sorting and searching (which are used
implicitly by some of the containers),
Noisy
must have an
operator<
and
operator==.
These simply compare the
id
values. The
operator<<
for
ostream
follows the standard form and simply prints the
id. NoisyGen
produces a function object (since it has an
operator( ))
that is used to automatically generate
Noisy
objects during testing.
NoisyReport
is a type of class called a
singleton,
which is a “design pattern” (these are covered more fully in
Chapter XX). Here, the goal is to make sure there is one and only one
NoisyReport
object, because it is responsible for printing out the results at program
termination. It has a
private
constructor so no one else can make a
NoisyReport
object, and a single static instance of
NoisyReport
called
nr.
The only executable statements are in the destructor, which is called as the
program exits and the static destructors are called; this destructor prints out
the statistics captured by the
static
variables in
Noisy. The
one snag to this header file is the inclusion of the definitions for the
statics
at the end. If you include this header in more than one place in your project,
you’ll get multiple-definition errors at link time. Of course, you can
put the
static
definitions in a separate
.cpp
file and link it in, but that is less convenient, and since
Noisy
is just intended for quick-and-dirty experiments the header file should be
reasonable for most situations.
Using
Noisy.h,
the following program will show the behaviors that occur when a
vector
overflows its currently allocated storage:
//: C20:VectorOverflow.cpp
// Shows the copy-construction and destruction
// That occurs when a vector must reallocate
// (It maintains a linear array of elements)
#include <vector>
#include <iostream>
#include <string>
#include <cstdlib>
#include "Noisy.h"
#include "../require.h"
using namespace std;
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
int size = 1000;
if(argc >= 2) size = atoi(argv[1]);
vector<Noisy> vn;
Noisy n;
for(int i = 0; i < size; i++)
vn.push_back(n);
cout << "\n cleaning up \n";
} ///:~
You
can either use the default value of 1000, or use your own value by putting it
on the command-line.
When
you run this program, you’ll see a single default constructor call (for
n),
then a lot of copy-constructor calls, then some destructor calls, then some
more copy-constructor calls, and so on. When the vector runs out of space in
the linear array of bytes it has allocated, it must (to maintain all the
objects in a linear array, which is an essential part of its job) get a bigger
piece of storage and move everything over, copying first and then destroying
the old objects. You can imagine that if you store a lot of large and complex
objects, this process could rapidly become prohibitive.
There
are two solutions to this problem. The nicest one requires that you know
beforehand how many objects you’re going to make. In that case you can use
reserve( )
to tell the vector how much storage to pre-allocate, thus eliminating all the
copies and destructions and making everything very fast (especially random
access to the objects with
operator[
]
).
Note that the use of
reserve( )
is different from the using the
vector
constructor with an integral first argument; the latter initializes each
element using the default copy-constructor.
However,
in the more general case you won’t know how many objects you’ll
need. If
vector
reallocations are slowing things down, you can change sequence containers. You
could use a
list,
but
as
you’ll see, the
deque
allows speedy insertions at either end of the sequence, and never needs to copy
or destroy objects as it expands its storage. The
deque
also allows random access with
operator[
]
,
but it’s not quite as fast as
vector’s
operator[
].
So in the case where you’re creating all your objects in one part of the
program and randomly accessing them in another, you may find yourself filling a
deque,
then creating a
vector
from the
deque
and using the
vector
for rapid indexing. Of course, you don’t want to program this way
habitually, just be aware of these issues (avoid premature optimization).
There
is a darker side to
vector’s
reallocation of memory, however. Because
vector
keeps its objects in a nice, neat array (allowing, for one thing,
maximally-fast random access), the iterators used by
vector
are generally just pointers. This is a good thing – of all the sequence
containers, these pointers allow the fastest selection and manipulation.
However, consider what happens when you’re holding onto an iterator (i.e.
a pointer) and then you add the one additional object that causes the
vector
to reallocate storage and move it elsewhere. Your pointer is now pointing off
into nowhere:
//: C20:VectorCoreDump.cpp
// How to break a program using a vector
#include <vector>
#include <iostream>
using namespace std;
int main() {
vector<int> vi(10, 0);
ostream_iterator<int> out(cout, " ");
copy(vi.begin(), vi.end(), out);
vector<int>::iterator i = vi.begin();
cout << "\n i: " << long(i) << endl;
*i = 47;
copy(vi.begin(), vi.end(), out);
// Force it to move memory (could also just add
// enough objects):
vi.resize(vi.capacity() + 1);
// Now i points to wrong memory:
cout << "\n i: " << long(i) << endl;
cout << "vi.begin(): " << long(vi.begin());
*i = 48; // Access violation
} ///:~
If
your program is breaking mysteriously, look for places where you hold onto an
iterator while adding more objects to a
vector.
You’ll need to get a new iterator after adding elements, or use
operator[
]
instead
for element selections. If you combine the above observation with the awareness
of the potential expense of adding new objects to a
vector,
you may conclude that the safest way to use one is to fill it up all at once
(ideally, knowing first how many objects you’ll need) and then just use
it (without adding more objects) elsewhere in the program. This is the way
vector
has been used in the book up to this point.
You
may observe that using
vector
as
the “basic” container the book in earlier chapters may not be the
best choice in all cases. This is a fundamental issue in containers, and in
data structures in general: the “best” choice varies according to
the way the container is used. The reason
vector
has been the “best” choice up until now is that it looks a lot like
an array, and was thus familiar and easy for you to adopt. But from now on
it’s also worth thinking about other issues when choosing containers.
Inserting
and erasing elements
The
vector
is most efficient if:
- You
reserve( )
the correct amount of storage at the beginning so the
vector
never has to reallocate.
- You
only add and remove elements from the back end.
It
is possible to insert and erase elements from the middle of a
vector
using an iterator, but the following program demonstrates what a bad idea it is:
//: C20:VectorInsertAndErase.cpp
// Erasing an element from a vector
#include <iostream>
#include <vector>
#include <algorithm>
#include "Noisy.h"
using namespace std;
int main() {
vector<Noisy> v;
v.reserve(11);
cout << "11 spaces have been reserved" << endl;
generate_n(back_inserter(v), 10, NoisyGen());
ostream_iterator<Noisy> out(cout, " ");
cout << endl;
copy(v.begin(), v.end(), out);
cout << "Inserting an element:" << endl;
vector<Noisy>::iterator it =
v.begin() + v.size() / 2; // Middle
v.insert(it, Noisy());
cout << endl;
copy(v.begin(), v.end(), out);
cout << "\nErasing an element:" << endl;
// Cannot use the previous value of it:
it = v.begin() + v.size() / 2;
v.erase(it);
cout << endl;
copy(v.begin(), v.end(), out);
cout << endl;
} ///:~
When
you run the program you’ll see that the call to
reserve( )
really does only allocate storage – no constructors are called. The
generate_n( )
call is pretty busy: each call to
NoisyGen::operator( )
results in a construction, a copy-construction (into the
vector)
and a destruction of the temporary. But when an object is inserted into the
vector
in the middle, it must shove everything down to maintain the linear array and
– since there is enough space – it does this with the assignment
operator (if the argument of
reserve( )
is 10 instead of eleven then it would have to reallocate storage). When an
object is erased from the
vector,
the assignment operator is once again used to move everything up to cover the
place that is being erased (notice that this requires that the assignment
operator properly cleans up the lvalue). Lastly, the object on the end of the
array is deleted.
You
can imagine how enormous the overhead can become if objects are inserted and
removed from the middle of a
vector
if the number of elements is large and the objects are complicated. It’s
obviously a practice to avoid.
Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru