MFC Programmer's SourceBook : Thinking in C++

vector

The vector is intentionally made to look like a souped-up array, since it has array-style indexing but also can expand dynamically. vector is so fundamentally useful that it was introduced in a very primitive way early in this book, and used quite regularly in previous examples. This section will give a more in-depth look at vector.

To achieve maximally-fast indexing and iteration, the vector maintains its storage as a single contiguous array of objects. This is a critical point to observe in understanding the behavior of vector. It means that indexing and iteration are lighting-fast, being basically the same as indexing and iterating over an array of objects. But it also means that inserting an object anywhere but at the end (that is, appending) is not really an acceptable operation for a vector. It also means that when a vector runs out of pre-allocated storage, in order to maintain its contiguous array it must allocate a whole new (larger) chunk of storage elsewhere and copy the objects to the new storage. This has a number of unpleasant side effects.

Cost of overflowing allocated storage

A vector starts by grabbing a block of storage, as if it’s taking a guess at how many objects you plan to put in it. As long as you don’t try to put in more objects than can be held in the initial block of storage, everything is very rapid and efficient (note that if you do know how many objects to expect, you can pre-allocate storage using reserve( )). But eventually you will put in one too many objects and, unbeknownst to you, the vector responds by:

Allocating a new, bigger piece of storage
Copying all the objects from the old storage to the new (using the copy-constructor)
Destroying all the old objects (the destructor is called for each one)
Releasing the old memory

For complex objects, this copy-construction and destruction can end up being very expensive if you overfill your vector a lot. To see what happens when you’re filling a vector, here is a class that prints out information about its creations, destructions, assignments and copy-constructions:

//: C20:Noisy.h
// A class to track various object activities
#ifndef NOISY_H
#define NOISY_H
#include <iostream>

class Noisy {
  static long create, assign, copycons, destroy;
  long id;
public:
  Noisy() : id(create++) { 
    std::cout << "d[" << id << "]"; 
  }
  Noisy(const Noisy& rv) : id(rv.id) {
    std::cout << "c[" << id << "]";
    copycons++;
  }
  Noisy& operator=(const Noisy& rv) {
    std::cout << "(" << id << ")=[" <<
      rv.id << "]";
    id = rv.id;
    assign++;
    return *this;
  }
  friend bool 
  operator<(const Noisy& lv, const Noisy& rv) {
    return lv.id < rv.id;
  }
  friend bool 
  operator==(const Noisy& lv, const Noisy& rv) {
    return lv.id == rv.id;
  }
  ~Noisy() {
    std::cout << "~[" << id << "]";
    destroy++;
  }
  friend std::ostream& 
  operator<<(std::ostream& os, const Noisy& n) {
    return os << n.id;
  }
  friend class NoisyReport;
};

struct NoisyGen {
  Noisy operator()() { return Noisy(); }
};

// A singleton. Will automatically report the
// statistics as the program terminates:
class NoisyReport {
  static NoisyReport nr;
  NoisyReport() {} // Private constructor
public:
  ~NoisyReport() {
    std::cout << "\n-------------------\n"
      << "Noisy creations: " << Noisy::create
      << "\nCopy-Constructions: " 
      << Noisy::copycons
      << "\nAssignments: " << Noisy::assign
      << "\nDestructions: " << Noisy::destroy
      << std::endl;
  }
};

// Because of these this file can only be used
// in simple test situations. Move them to a 
// .cpp file for more complex programs:
long Noisy::create = 0, Noisy::assign = 0,
  Noisy::copycons = 0, Noisy::destroy = 0;
NoisyReport NoisyReport::nr;
#endif // NOISY_H ///:~

Each Noisy object has its own identifier, and there are static variables to keep track of all the creations, assignments (using operator=), copy-constructions and destructions. The id is initialized using the create counter inside the default constructor; the copy-constructor and assignment operator take their id values from the rvalue. Of course, with operator= the lvalue is already an initialized object so the old value of id is printed before it is overwritten with the id from the rvalue.

In order to support certain operations like sorting and searching (which are used implicitly by some of the containers), Noisy must have an operator< and operator==. These simply compare the id values. The operator<< for ostream follows the standard form and simply prints the id.

NoisyGen produces a function object (since it has an operator( )) that is used to automatically generate Noisy objects during testing.

NoisyReport is a type of class called a singleton, which is a “design pattern” (these are covered more fully in Chapter XX). Here, the goal is to make sure there is one and only one NoisyReport object, because it is responsible for printing out the results at program termination. It has a private constructor so no one else can make a NoisyReport object, and a single static instance of NoisyReport called nr. The only executable statements are in the destructor, which is called as the program exits and the static destructors are called; this destructor prints out the statistics captured by the static variables in Noisy.

The one snag to this header file is the inclusion of the definitions for the statics at the end. If you include this header in more than one place in your project, you’ll get multiple-definition errors at link time. Of course, you can put the static definitions in a separate .cpp file and link it in, but that is less convenient, and since Noisy is just intended for quick-and-dirty experiments the header file should be reasonable for most situations.

Using Noisy.h, the following program will show the behaviors that occur when a vector overflows its currently allocated storage:

//: C20:VectorOverflow.cpp
// Shows the copy-construction and destruction
// That occurs when a vector must reallocate
// (It maintains a linear array of elements)
#include <vector>
#include <iostream>
#include <string>
#include <cstdlib>
#include "Noisy.h"
#include "../require.h"
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  int size = 1000;
  if(argc >= 2) size = atoi(argv[1]);
  vector<Noisy> vn;
  Noisy n;
  for(int i = 0; i < size; i++)
    vn.push_back(n);
  cout << "\n cleaning up \n";
} ///:~

You can either use the default value of 1000, or use your own value by putting it on the command-line.

When you run this program, you’ll see a single default constructor call (for n), then a lot of copy-constructor calls, then some destructor calls, then some more copy-constructor calls, and so on. When the vector runs out of space in the linear array of bytes it has allocated, it must (to maintain all the objects in a linear array, which is an essential part of its job) get a bigger piece of storage and move everything over, copying first and then destroying the old objects. You can imagine that if you store a lot of large and complex objects, this process could rapidly become prohibitive.

There are two solutions to this problem. The nicest one requires that you know beforehand how many objects you’re going to make. In that case you can use reserve( ) to tell the vector how much storage to pre-allocate, thus eliminating all the copies and destructions and making everything very fast (especially random access to the objects with operator[ ] ). Note that the use of reserve( ) is different from the using the vector constructor with an integral first argument; the latter initializes each element using the default copy-constructor.

However, in the more general case you won’t know how many objects you’ll need. If vector reallocations are slowing things down, you can change sequence containers. You could use a list, but as you’ll see, the deque allows speedy insertions at either end of the sequence, and never needs to copy or destroy objects as it expands its storage. The deque also allows random access with operator[ ] , but it’s not quite as fast as vector’s operator[ ]. So in the case where you’re creating all your objects in one part of the program and randomly accessing them in another, you may find yourself filling a deque, then creating a vector from the deque and using the vector for rapid indexing. Of course, you don’t want to program this way habitually, just be aware of these issues (avoid premature optimization).

There is a darker side to vector’s reallocation of memory, however. Because vector keeps its objects in a nice, neat array (allowing, for one thing, maximally-fast random access), the iterators used by vector are generally just pointers. This is a good thing – of all the sequence containers, these pointers allow the fastest selection and manipulation. However, consider what happens when you’re holding onto an iterator (i.e. a pointer) and then you add the one additional object that causes the vector to reallocate storage and move it elsewhere. Your pointer is now pointing off into nowhere:

//: C20:VectorCoreDump.cpp
// How to break a program using a vector
#include <vector>
#include <iostream>
using namespace std;

int main() {
  vector<int> vi(10, 0);
  ostream_iterator<int> out(cout, " ");
  copy(vi.begin(), vi.end(), out);
  vector<int>::iterator i = vi.begin();
  cout << "\n i: " << long(i) << endl;
  *i = 47;
  copy(vi.begin(), vi.end(), out);
  // Force it to move memory (could also just add
  // enough objects):
  vi.resize(vi.capacity() + 1);
  // Now i points to wrong memory:
  cout << "\n i: " << long(i) << endl;
  cout << "vi.begin(): " << long(vi.begin());
  *i = 48;  // Access violation
} ///:~

If your program is breaking mysteriously, look for places where you hold onto an iterator while adding more objects to a vector. You’ll need to get a new iterator after adding elements, or use operator[ ] instead for element selections. If you combine the above observation with the awareness of the potential expense of adding new objects to a vector, you may conclude that the safest way to use one is to fill it up all at once (ideally, knowing first how many objects you’ll need) and then just use it (without adding more objects) elsewhere in the program. This is the way vector has been used in the book up to this point.

You may observe that using vector as the “basic” container the book in earlier chapters may not be the best choice in all cases. This is a fundamental issue in containers, and in data structures in general: the “best” choice varies according to the way the container is used. The reason vector has been the “best” choice up until now is that it looks a lot like an array, and was thus familiar and easy for you to adopt. But from now on it’s also worth thinking about other issues when choosing containers.

Inserting and erasing elements

The vector is most efficient if:

You reserve( ) the correct amount of storage at the beginning so the vector never has to reallocate.
You only add and remove elements from the back end.

It is possible to insert and erase elements from the middle of a vector using an iterator, but the following program demonstrates what a bad idea it is:

//: C20:VectorInsertAndErase.cpp
// Erasing an element from a vector
#include <iostream>
#include <vector>
#include <algorithm>
#include "Noisy.h"
using namespace std;

int main() {
  vector<Noisy> v;
  v.reserve(11);
  cout << "11 spaces have been reserved" << endl;
  generate_n(back_inserter(v), 10, NoisyGen());
  ostream_iterator<Noisy> out(cout, " ");
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << "Inserting an element:" << endl;
  vector<Noisy>::iterator it = 
    v.begin() + v.size() / 2; // Middle
  v.insert(it, Noisy());
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << "\nErasing an element:" << endl;
  // Cannot use the previous value of it:
  it = v.begin() + v.size() / 2;
  v.erase(it);
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << endl;
} ///:~

When you run the program you’ll see that the call to reserve( ) really does only allocate storage – no constructors are called. The generate_n( ) call is pretty busy: each call to NoisyGen::operator( ) results in a construction, a copy-construction (into the vector) and a destruction of the temporary. But when an object is inserted into the vector in the middle, it must shove everything down to maintain the linear array and – since there is enough space – it does this with the assignment operator (if the argument of reserve( ) is 10 instead of eleven then it would have to reallocate storage). When an object is erased from the vector, the assignment operator is once again used to move everything up to cover the place that is being erased (notice that this requires that the assignment operator properly cleans up the lvalue). Lastly, the object on the end of the array is deleted.

You can imagine how enormous the overhead can become if objects are inserted and removed from the middle of a vector if the number of elements is large and the objects are complicated. It’s obviously a practice to avoid.

Contents | Prev | Next

Go to CodeGuru.com
Contact: webmaster@codeguru.com
� Copyright 1997-1999 CodeGuru