Associative
containers
The
set,
map,
multiset
and
multimap
are called
associative
containers
because they associate
keys
with
values.
Well, at least
maps
and
multimaps
associate keys to values, but you can look at a
set
as a
map
that has no values, only keys (and they can in fact be implemented this way),
and the same for the relationship between
multiset
and
multimap.
So, because of the structural similarity
sets
and
multisets
are lumped in with associative containers.
The
most important basic operations with associative containers are putting things
in, and in the case of a
set,
seeing if something is in the set. In the case of a
map,
you want to first see if a key is in the
map,
and if it exists you want the associated value for that key to be returned. Of
course, there are many variations on this theme but that’s the
fundamental concept. The following example shows these basics:
//: C20:AssociativeBasics.cpp
// Basic operations with sets and maps
#include <iostream>
#include <set>
#include <map>
#include "Noisy.h"
using namespace std;
int main() {
Noisy na[] = { Noisy(), Noisy(), Noisy(),
Noisy(), Noisy(), Noisy(), Noisy() };
// Add elements via constructor:
set<Noisy> ns(na, na+ sizeof na/sizeof(Noisy));
// Ordinary insertion:
Noisy n;
ns.insert(n);
cout << endl;
// Check for set membership:
cout << "ns.count(n)= " << ns.count(n) << endl;
if(ns.find(n) != ns.end())
cout << "n(" << n << ") found in ns" << endl;
// Print elements:
copy(ns.begin(), ns.end(),
ostream_iterator<Noisy>(cout, " "));
cout << endl;
cout << "\n-----------\n";
map<int, Noisy> nm;
for(int i = 0; i < 10; i++)
nm[i]; // Automatically makes pairs
cout << "\n-----------\n";
for(int j = 0; j < nm.size(); j++)
cout << "nm[" << j <<"] = " << nm[j] << endl;
cout << "\n-----------\n";
nm[10] = n;
cout << "\n-----------\n";
nm.insert(make_pair(47, n));
cout << "\n-----------\n";
cout << "\n nm.count(10)= "
<< nm.count(10) << endl;
cout << "nm.count(11)= "
<< nm.count(11) << endl;
map<int, Noisy>::iterator it = nm.find(6);
if(it != nm.end())
cout << "value:" << (*it).second
<< " found in nm at location 6" << endl;
for(it = nm.begin(); it != nm.end(); it++)
cout << (*it).first << ":"
<< (*it).second << ", ";
cout << "\n-----------\n";
} ///:~
The
set<Noisy>
object
ns
is created using two iterators into an array of
Noisy
objects, but there is also a default constructor and a copy-constructor, and
you can pass in an object that provides an alternate scheme for doing
comparisons. Both
sets
and
maps
have an
insert( )
member function to put things in, and there are a couple of different ways to
check to see if an object is already in an associative container:
count( ),
when given a key, will tell you how many times that key occurs (this can only
be zero or one in a
set
or
map,
but it can be more than one with a
multiset
or
multimap).
The
find( )
member function will produce an iterator indicating the first occurrence (with
set
and
map,
the
only
occurrence) of the key that you give it, or the past-the-end iterator if it
can’t find the key. The
count( )
and
find( )
member functions exist for all the associative containers, which makes sense.
The associative containers also have member functions
lower_bound( ),
upper_bound( )
and
equal_range( ),
which actually only make sense for
multiset
and
multimap,
as you shall see (but don’t try to figure out how they would be useful for
set
and
map,
since they are designed for dealing with a range of duplicate keys, which those
containers don’t allow).
Designing
an
operator[
]
always produces a little bit of a dilemma because it’s intended to be
treated as an array-indexing operation, so people don’t tend to think
about performing a test before they use it. But what happens if you decide to
index out of the bounds of the array? One option, of course, is to throw an
exception, but with a
map
“indexing out of the array” could mean that you want an entry
there, and that’s the way the STL
map
treats it. The first
for
loop after the creation of the
map<int,
Noisy> nm
just “looks up” objects using the
operator[
]
,
but this is actually creating new
Noisy
objects! The
map
creates a new key-value pair (using the default constructor for the value) if
you look up a value with
operator[
]
and it isn’t there. This means that if you really just want to look
something up and not create a new entry, you must use
count( )
(to see if it’s there) or
find( )
(to get an iterator to it).
The
for
loop that prints out the values of the container using
operator[
]
has a number of problems. First, it requires integral keys (which we happen to
have in this case). Next and worse, if all the keys are not sequential,
you’ll end up counting from 0 to the size of the container, and if there
are some spots which don’t have key-value pairs you’ll
automatically create them, and miss some of the higher values of the keys.
Finally, if you look at the output from the
for
loop you’ll see that things are
very
busy, and it’s quite puzzling at first why there are so many
constructions and destructions for what appears to be a simple lookup. The
answer only becomes clear when you look at the code in the
map
template for
operator[
]
,
which will be something like this:
mapped_type& operator[] (const key_type& k) {
value_type tmp(k,T());
return (*((insert(tmp)).first)).second;
}
Following
the trail, you’ll find that
map::value_type
is:
typedef
pair<const Key, T> value_type;
Now
you need to know what a
pair
is, which can be found in
<utility>:
template <class T1, class T2>
struct pair {
typedef T1 first_type;
typedef T2 second_type;
T1 first;
T2 second;
pair();
pair(const T1& x, const T2& y)
: first(x), second(y) {}
// Templatized copy-constructor:
template<class U, class V>
pair(const pair<U, V> &p);
};
It
turns out this is a very important (albeit simple)
struct
which is used quite a bit in the STL. All it really does it package together
two objects, but it’s very useful, especially when you want to return two
objects from a function (since a
return
statement only takes one object). There’s even a shorthand for creating a
pair called
make_pair( ),
which is used in
AssociativeBasics.cpp. So
to retrace the steps,
map::value_type
is a
pair
of the key and the value of the map – actually, it’s a single entry
for the map. But notice that
pair
packages its objects by value, which means that copy-constructions are
necessary to get the objects into the
pair.
Thus, the creation of
tmp
in
map::operator[
]
will involve at least a copy-constructor call and destructor call for each
object in the
pair.
Here, we’re getting off easy because the key is an
int.
But if you want to really see what kind of activity can result from
map::operator[
]
,
try running this:
//: C20:NoisyMap.cpp
// Mapping Noisy to Noisy
#include <map>
#include "Noisy.h"
using namespace std;
int main() {
map<Noisy, Noisy> mnn;
Noisy n1, n2;
cout << "\n--------\n";
mnn[n1] = n2;
cout << "\n--------\n";
cout << mnn[n1] << endl;
cout << "\n--------\n";
} ///:~
You’ll
see that both the insertion and lookup generate a lot of extra objects, and
that’s because of the creation of the
tmp
object. If you look back up at
map::operator[
]
you’ll see that the second line calls
insert( )
passing it
tmp
– that is,
operator[
]
does an insertion every time. The return value of
insert( )
is a different kind of
pair,
where
first
is an iterator pointing to the key-value
pair
that was just inserted, and
second
is a
bool
indicating whether the insertion took place. You can see that
operator[
]
grabs
first
(the iterator), dereferences it to produce the
pair,
and then returns the
second
which is the value at that location.
So
on the upside,
map
has this fancy “make a new entry if one isn’t there”
behavior, but the downside is that you
always
get a lot of extra object creations and destructions when you use
map::operator[
]
.
Fortunately,
AssociativeBasics.cpp
also demonstrates how to reduce the overhead of insertions and deletions, by
not using
operator[
]
if you don’t have to. The
insert( )
member function is slightly more efficient than
operator[
]
.
With a
set
you only hold one object, but with a
map
you hold key-value pairs, so
insert( )
requires a
pair
as its argument. Here’s where
make_pair( )
comes in handy, as you can see.
For
looking objects up in a
map,
you can use
count( )
to see whether a key is in the map, or you can use
find( )
to produce an iterator pointing directly at the key-value pair. Again, since the
map
contains
pairs
that’s what the iterator produces when you dereference it, so you have to
select
first
and
second.
When you run
AssociativeBasics.cpp
you’ll notice that the iterator approach involves no extra object
creations or destructions at all. It’s not as easy to write or read,
though.
If
you use a
map
with large, complex objects and discover there’s too much overhead when
doing lookups and insertions (don’t assume this from the beginning
– take the easy approach first and use a profiler to discover
bottlenecks), then you can use the counted-handle approach shown in Chapter XX
so that you are only passing around small, lightweight objects.
Of
course, you can also iterate through a
set
or
map
and operate on each of its objects. This will be demonstrated in later examples.
Generators
and fillers
for
associative containers
You’ve
seen how useful the
fill( ),
fill_n( ),
generate( )
and
generate_n( )
function templates in
<algorithm>
have been for filling the sequential containers (
vector,
list
and
deque)
with data. However, these are implemented by using
operator=
to
assign values into the sequential containers, and the way that you add objects
to associative containers is with their respective
insert( )
member functions. Thus the default “assignment” behavior causes a
problem when trying to use the “fill” and “generate”
functions with associative containers.
One
solution is to duplicate the “fill” and “generate”
functions, creating new ones that can be used with associative containers. It
turns out that only the
fill_n( )
and
generate_n( )
functions can be duplicated (
fill( )
and
generate( )
copy
in between two iterators, which doesn’t make sense with associative
containers), but the job is fairly easy, since you have the
<algorithm>
header file to work from (and since it contains templates, all the source code
is there):
//: C20:assocGen.h
// The fill_n() and generate_n() equivalents
// for associative containers.
#ifndef ASSOCGEN_H
#define ASSOCGEN_H
template<class Assoc, class Count, class T>
void
assocFill_n(Assoc& a, Count n, const T& val) {
for (; 0 < n; --n)
a.insert(val);
}
template<class Assoc, class Count, class Gen>
void assocGen_n(Assoc& a, Count n, Gen g) {
for (; 0 < n; --n)
a.insert(g());
}
#endif // ASSOCGEN_H ///:~
You
can see that instead of using iterators, the container class itself is passed
(by reference, of course, since you wouldn’t want to make a local copy,
fill it, and then have it discarded at the end of the scope).
This
code demonstrates two valuable lessons. The first lesson is that if the
algorithms don’t do what you want, copy the nearest thing and modify it.
You have the example at hand in the STL header, so most of the work has already
been done.
The
second lesson is more pointed: if you look long enough, there’s probably
a way to do it
without
inventing anything new. The present problem can instead be solved by using an
insert_iterator
(produced by a call to
inserter( )),
which calls
insert( )
to place items in the container instead of
operator=.
This is
not
simply a variation of
front_insert_iterator
(produced by a call to
front_inserter( ))
or
back_insert_iterator
(produced by a call to
back_inserter( )),
since those iterators use
push_front( )
and
push_back( ),
respectively. Each of the insert iterators is different by virtue of the member
function it uses for insertion, and
insert( )
is the one we need. Here’s a demonstration that shows filling and
generating both a
map
and a
set
(of course, it can also be used with
multimap
and
multiset).
First, some templatized, simple generators are created (this may seem like
overkill, but you never know when you’ll need them; for that reason
they’re placed in a header file):
//: C20:SimpleGenerators.h
// Generic generators, including one
// that creates pairs
#include <iostream>
// A generator that increments its value:
template<typename T>
class IncrGen {
T i;
public:
IncrGen(T ii) : i (ii) {}
T operator()() { return i++; }
};
// A generator that produces an STL pair<>:
template<typename T1, typename T2>
class PairGen {
T1 i;
T2 j;
public:
PairGen(T1 ii, T2 jj) : i(ii), j(jj) {}
std::pair<T1,T2> operator()() {
return std::pair<T1,T2>(i++, j++);
}
};
// A generic global operator<< for printing
// any STL pair<>:
template<typename Pair>
std::ostream&
operator<<(std::ostream& os, const Pair& p) {
return os << p.first << "\t"
<< p.second << std::endl;
} ///:~
Both
generators expect that
T
can be incremented, and they simply use
operator++
to generate new values from whatever you used for initialization.
PairGen
creates an STL
pair
object as its return value, and that’s what can be placed into a
map
or
multimap
using
insert( ). The
last function is a generalization of
operator<<
for
ostreams,
so that any
pair
can be printed, assuming each element of the
pair
supports
a stream
operator<<.
As you can see below, this allows the use of
copy( )
to output the
map:
//: C20:AssocInserter.cpp
// Using an insert_iterator so fill_n() and
// generate_n() can be used with associative
// containers
#include <iterator>
#include <iostream>
#include <algorithm>
#include <set>
#include <map>
#include "SimpleGenerators.h"
using namespace std;
int main() {
set<int> s;
fill_n(inserter(s, s.begin()), 10, 47);
generate_n(inserter(s, s.begin()), 10,
IncrGen<int>(12));
copy(s.begin(), s.end(),
ostream_iterator<int>(cout, "\n"));
map<int, int> m;
fill_n(inserter(m, m.begin()), 10,
make_pair(90,120));
generate_n(inserter(m, m.begin()), 10,
PairGen<int, int>(3, 9));
copy(m.begin(), m.end(),
ostream_iterator<pair<int,int> >(cout,"\n"));
} ///:~
The
second argument to
inserter
is an interator, which actually isn’t used in the case of associative
containers since they maintain their order internally, rather than allowing you
to tell them where the element should be inserted. However, an
insert_iterator
can be used with many different types of containers so you must provide the
iterator.
Note
how the
ostream_iterator
is created to output a
pair;
this wouldn’t have worked if the
operator<<
hadn’t been created, and since it’s a template it is automatically
instantiated for
pair<int,
int>
.
The
magic of maps
An
ordinary array uses an integral value to index into a sequential set of
elements of some type. A
map
is an
associative
array
,
which means you associate one object with another in an array-like fashion, but
instead of selecting an array element with a number as you do with an ordinary
array, you look it up with an object! The example which follows counts the
words in a text file, so the index is the
string
object representing the word, and the value being looked up is the object that
keeps count of the strings.
In
a single-item container like a
vector
or
list,
there’s only one thing being held. But in a
map,
you’ve got two things: the
key
(what you look up by, as in
mapname[key])
and the
value
that results from the lookup with the key. If you simply want to move through
the entire
map
and list each key-value pair, you use an iterator, which when dereferenced
produces a
pair
object containing both the key and the value. You access the members of a
pair
by selecting
first
or
second. This
same philosophy of packaging two items together is also used to insert elements
into the map, but the
pair
is created as part of the instantiated
map
and is called
value_type,
containing the key and the value. So one option for inserting a new element is
to create a
value_type
object, loading it with the appropriate objects and then calling the
insert( )
member function for the
map.
Instead, the following example makes use of the aforementioned special feature
of
map:
if you’re trying to find an object by passing in a key to
operator[
]
and that object doesn’t exist,
operator[
]
will automatically insert a new key-value pair for you, using the default
constructor for the value object. With that in mind, consider an implementation
of a word counting program:
//: C20:WordCount.cpp
//{L} StreamTokenizer
// Count occurrences of words using a map
#include <string>
#include <map>
#include <iostream>
#include <fstream>
#include "../require.h"
#include "StreamTokenizer.h"
using namespace std;
class Count {
int i;
public:
Count() : i(0) {}
void operator++(int) { i++; } // Post-increment
int& val() { return i; }
};
typedef map<string, Count> WordMap;
typedef WordMap::iterator WMIter;
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
ifstream in(argv[1]);
assure(in, argv[1]);
StreamTokenizer words(in);
WordMap wordmap;
string word;
while((word = words.next()).size() != 0)
wordmap[word]++;
for(WMIter w = wordmap.begin();
w != wordmap.end(); w++)
cout << (*w).first << ": "
<< (*w).second.val() << endl;
} ///:~
The
need for the
Count
class is to contain an
int
that’s automatically initialized to zero. This is necessary because of
the crucial line:
This
finds the word that has been produced by
StreamTokenizer
and
increments the
Count
object associated with that word, which is fine as long as there
is
a key-value pair for that
string.
If there isn’t, the
map
automatically inserts a key for the word you’re looking up, and a
Count
object, which is initialized to zero by the default constructor. Thus, when
it’s incremented the
Count
becomes 1.
Printing
the entire list requires traversing it with an iterator (there’s no
copy( )
shortcut for a
map
unless you want to write an
operator<<
for
the
pair
in the map). As previously mentioned, dereferencing this iterator produces a
pair
object, with the
first
member the key and the
second
member the value. In this case
second
is a
Count
object, so its
val( )
member must be called to produce the actual word count.
If
you want to find the count for a particular word, you can use the array index
operator, like this:
cout
<< "the: " << wordmap["the"].val() << endl;
You
can see that one of the great advantages of the
map
is the clarity of the syntax; an associative array makes intuitive sense to the
reader (note, however, that if “the” isn’t already in the
wordmap
a new entry will be created!).
A
command-line argument tool
A
problem that often comes up in programming is the management of program
arguments that you can specify on the command line. Usually you’d like to
have a set of defaults that can be changed via the command line. The following
tool expects the command line arguments to be in the form
flag1=value1
with no spaces around the ‘
=‘
(so it will be treated as a single argument). The
ProgVal
class simply inherits from
map<string,
string>
:
//: C20:ProgVals.h
// Program values can be changed by command line
#ifndef PROGVALS_H
#define PROGVALS_H
#include <map>
#include <iostream>
#include <string>
class ProgVals
: public std::map<std::string, std::string> {
public:
ProgVals(std::string defaults[][2], int sz);
void parse(int argc, char* argv[],
std::string usage, int offset = 1);
void print(std::ostream& out = std::cout);
};
#endif // PROGVALS_H ///:~
The
constructor expects an array of
string
pairs (as you’ll see, this allows you to initialize it with an array of
char*)
and the size of that array. The
parse( )
member function is handed the command-line arguments along with a
“usage” string to print if the command line is given incorrectly,
and the “offset” which tells it which command-line argument to
start with (so you can have non-flag arguments at the beginning of the command
line). Finally,
print( )
displays the values. Here is the implementation:
//: C20:ProgVals.cpp {O}
#include "ProgVals.h"
using namespace std;
ProgVals::ProgVals(
std::string defaults[][2], int sz) {
for(int i = 0; i < sz; i++)
insert(make_pair(
defaults[i][0], defaults[i][1]));
}
void ProgVals::parse(int argc, char* argv[],
string usage, int offset) {
// Parse and apply additional
// command-line arguments:
for(int i = offset; i < argc; i++) {
string flag(argv[i]);
int equal = flag.find('=');
if(equal == string::npos) {
cerr << "Command line error: " <<
argv[i] << endl << usage << endl;
continue; // Next argument
}
string name = flag.substr(0, equal);
string value = flag.substr(equal + 1);
if(find(name) == end()) {
cerr << name << endl << usage << endl;
continue; // Next argument
}
operator[](name) = value;
}
}
void ProgVals::print(ostream& out) {
out << "Program values:" << endl;
for(iterator it = begin(); it != end(); it++)
out << (*it).first << " = "
<< (*it).second << endl;
} ///:~
The
constructor uses the STL
make_pair( )
helper function to convert each pair of
char*
into a
pair
object that can be inserted into the
map.
In
parse( ),
each command-line argument is checked for the existence of the telltale ‘
=‘
sign (reporting an error if it isn’t there), and then is broken into two
strings, the
name
which appears before the ‘
=‘,
and the
value
which appears after. The
operator[ ]
is then used to change the existing value to the new one.
Here’s
an example to test the tool:
//: C20:ProgValTest.cpp
//{L} ProgVals
#include "ProgVals.h"
using namespace std;
string defaults[][2] = {
{ "color", "red" },
{ "size", "medium" },
{ "shape", "rectangular" },
{ "action", "hopping"},
};
const char* usage = "usage:\n"
"ProgValTest [flag1=val1 flag2=val2 ...]\n"
"(Note no space around '=')\n"
"Where the flags can be any of: \n"
"color, size, shape, action \n";
// So it can be used globally:
ProgVals pvals(defaults,
sizeof defaults / sizeof *defaults);
class Animal {
string color, size, shape, action;
public:
Animal(string col, string sz,
string shp, string act)
:color(col),size(sz),shape(shp),action(act){}
// Default constructor uses program default
// values, possibly change on command line:
Animal() : color(pvals["color"]),
size(pvals["size"]), shape(pvals["shape"]),
action(pvals["action"]) {}
void print() {
cout << "color = " << color << endl
<< "size = " << size << endl
<< "shape = " << shape << endl
<< "action = " << action << endl;
}
// And of course pvals can be used anywhere
// else you'd like.
};
int main(int argc, char* argv[]) {
// Initialize and parse command line values
// before any code that uses pvals is called:
pvals.parse(argc, argv, usage);
pvals.print();
Animal a;
cout << "Animal a values:" << endl;
a.print();
} ///:~
This
program can create
Animal
objects with different characteristics, and those characteristics can be
established with the command line. The default characteristics are given in the
two-dimensional array of
char*
called
defaults
and, after the
usage
string you can see a global instance of
ProgVals
called
pvals
is
created; this is important because it allows the rest of the code in the
program to access the values.
Note
that
Animal’s
default constructor uses the values in
pvals
inside its constructor initializer list. When you run the program you can try
creating different animal characteristics.
Many
command-line programs also use a style of beginning a flag with a hyphen, and
sometimes they use single-character flags.
The
STL
map
is used in numerous places throughout the rest of this book.
Multimaps
and duplicate keys
A
multimap
is a
map
that can contain duplicate keys. At first this may seem like a strange idea,
but it can occur surprisingly often. A phone book, for example, can have many
entries with the same name.
Suppose
you are monitoring wildlife, and you want to keep track of where and when each
type of animal is spotted. Thus, you may see many animals of the same kind, all
in different locations and at different times. So if the type of animal is the
key, you’ll need a
multimap.
Here’s what it looks like:
//: C20:WildLifeMonitor.cpp
#include <vector>
#include <map>
#include <string>
#include <algorithm>
#include <iostream>
#include <sstream>
#include <ctime>
using namespace std;
class DataPoint {
int x, y; // Location coordinates
time_t time; // Time of Sighting
public:
DataPoint() : x(0), y(0), time(0) {}
DataPoint(int xx, int yy, time_t tm) :
x(xx), y(yy), time(tm) {}
// Synthesized operator=, copy-constructor OK
int getX() { return x; }
int getY() { return y; }
time_t* getTime() { return &time; }
};
string animal[] = {
"chipmunk", "beaver", "marmot", "weasel",
"squirrel", "ptarmigan", "bear", "eagle",
"hawk", "vole", "deer", "otter", "hummingbird",
};
const int asz = sizeof animal/sizeof *animal;
vector<string> animals(animal, animal + asz);
// All the information is contained in a
// "Sighting," which can be sent to an ostream:
typedef pair<string, DataPoint> Sighting;
ostream&
operator<<(ostream& os, const Sighting& s) {
return os << s.first << " sighted at x= " <<
s.second.getX() << ", y= " << s.second.getY()
<< ", time = " << ctime(s.second.getTime());
}
// A generator for Sightings:
class SightingGen {
vector<string>& animals;
static const int d = 100;
public:
SightingGen(vector<string>& an) :
animals(an) { srand(time(0)); }
Sighting operator()() {
Sighting result;
int select = rand() % animals.size();
result.first = animals[select];
result.second = DataPoint(
rand() % d, rand() % d, time(0));
return result;
}
};
typedef multimap<string, DataPoint> DataMap;
typedef DataMap::iterator DMIter;
int main() {
DataMap sightings;
generate_n(
inserter(sightings, sightings.begin()),
50, SightingGen(animals));
// Print everything:
copy(sightings.begin(), sightings.end(),
ostream_iterator<Sighting>(cout, ""));
// Print sightings for selected animal:
while(true) {
cout << "select an animal or 'q' to quit: ";
for(int i = 0; i < animals.size(); i++)
cout <<'['<< i <<']'<< animals[i] << ' ';
cout << endl;
string reply;
cin >> reply;
if(reply.at(0) == 'q') return 0;
istringstream r(reply);
int i;
r >> i; // Converts to int
i %= animals.size();
// Iterators in "range" denote begin, one
// past end of matching range:
pair<DMIter, DMIter> range =
sightings.equal_range(animals[i]);
copy(range.first, range.second,
ostream_iterator<Sighting>(cout, ""));
}
} ///:~
All
the data about a sighting is encapsulated into the class
DataPoint,
which is simple enough that it can rely on the synthesized assignment and
copy-constructor. It uses the Standard C library time functions to record the
time of the sighting.
In
the array of
string
animal,
notice that the
char*
constructor is automatically used during initialization, which makes
initializing an array of
string
quite convenient. Since it’s easier to use the animal names in a
vector,
the length of the array is calculated and a
vector<string>
is initialized using the
vector(iterator,
iterator)
constructor.
The
key-value pairs that make up a
Sighting
are the
string
which names the type of animal, and the
DataPoint
that says where and when it was sighted. The standard
pair
template combines these two types and is typedefed to produce the
Sighting
type. Then an
ostream
operator<<
is created for
Sighting;
this will allow you to iterate through a map or multimap of
Sightings
and
print it out.
SightingGen
generates random sightings at random data points to use for testing. It has the
usual
operator( )
necessary for a function object, but it also has a constructor to capture and
store a reference to a
vector<string>,
which is where the aforementioned animal names are stored.
A
DataMap
is a
multimap
of
string-DataPoint
pairs, which means it stores
Sightings.
It is filled with 50
Sightings
using
generate_n( ),
and printed out (notice that because there is an
operator<<
that takes a
Sighting,
an
ostream_iterator
can be created). At this point the user is asked to select the animal that they
want to see all the sightings for. If you press
‘q’
the program will quit, but if you select an animal number, then the
equal_range( )
member function is invoked. This returns an iterator (
DMIter)
to the beginning of the set of matching pairs, and one indicating past-the-end
of the set. Since only one object can be returned from a function,
equal_range( )
makes use of
pair.
Since the
range
pair has the beginning and ending iterators of the matching set, those
iterators can be used in
copy( )
to print out all the sightings for a particular type of animal.
Multisets
You’ve
seen the
set,
which only allows one object of each value to be inserted. The
multiset
is odd by comparison since it allows more than one object of each value to be
inserted. This seems to go against the whole idea of “setness,”
where you can ask “is ‘it’ in this set?” If there can
be more than one of ‘it’, then what does that question mean?
With
some thought, you can see that it makes no sense to have more than one object
of the same value in a set if those duplicate objects are
exactly
the same (with the possible exception of counting occurrences of objects, but
as seen earlier in this chapter that can be handled in an alternative, more
elegant fashion). Thus each duplicate object will have something that makes it
unique from the other duplicates – most likely different state
information that is not used in the calculation of the value during the
comparison. That is, to the comparison operation, the objects look the same but
they actually contain some differing internal state.
Like
any STL container that must order its elements, the
multiset
template uses the
less
template by default to determine element ordering. This uses the contained
classes’
operator<,
but you may of course substitute your own comparison function.
Consider
a simple class that contains one element that is used in the comparison, and
another that is not:
//: C20:MultiSet1.cpp
// Demonstration of multiset behavior
#include <iostream>
#include <set>
#include <algorithm>
#include <ctime>
using namespace std;
class X {
char c; // Used in comparison
int i; // Not used in comparison
// Don't need default constructor and operator=
X();
X& operator=(const X&);
// Usually need a copy-constructor (but the
// synthesized version works here)
public:
X(char cc, int ii) : c(cc), i(ii) {}
// Notice no operator== is required
friend bool operator<(const X& x, const X& y) {
return x.c < y.c;
}
friend ostream& operator<<(ostream& os, X x) {
return os << x.c << ":" << x.i;
}
};
class Xgen {
static int i;
// Number of characters to select from:
static const int span = 6;
public:
Xgen() { srand(time(0)); }
X operator()() {
char c = 'A' + rand() % span;
return X(c, i++);
}
};
int Xgen::i = 0;
typedef multiset<X> Xmset;
typedef Xmset::const_iterator Xmit;
int main() {
Xmset mset;
// Fill it with X's:
generate_n(inserter(mset, mset.begin()),
25, Xgen());
// Initialize a regular set from mset:
set<X> unique(mset.begin(), mset.end());
copy(unique.begin(), unique.end(),
ostream_iterator<X>(cout, " "));
cout << "\n----\n";
// Iterate over the unique values:
for(set<X>::iterator i = unique.begin();
i != unique.end(); i++) {
pair<Xmit, Xmit> p = mset.equal_range(*i);
copy(p.first, p.second,
ostream_iterator<X>(cout, " "));
cout << endl;
}
} ///:~
In
X,
all the comparisons are made with the
char
c
.
The comparison is performed with
operator<,
which is all that is necessary for the
multiset,
since in this example the default
less
comparison object is used. The class
Xgen
is used to randomly generate
X
objects, but the comparison value is restricted to the span from
‘A’
to ‘
E’.
In
main( ),
a
multiset<X>
is created and filled with 25
X
objects using
Xgen,
guaranteeing that there will be duplicate keys. So that we know what the unique
values are, a regular
set<X>
is created from the
multiset
(using the
iterator,
iterator
constructor). These values are displayed, then each one is used to produce the
equal_range( )
in the
multiset
(
equal_range( )
has the same meaning here as it does with
multimap:
all the elements with matching keys). Each set of matching keys is then printed.
As
a second example, a (possibly) more elegant version of
WordCount.cpp
can be created using
multiset:
//: C20:MultiSetWordCount.cpp
//{L} StreamTokenizer
// Count occurrences of words using a multiset
#include <string>
#include <set>
#include <fstream>
#include <iterator>
#include "../require.h"
#include "StreamTokenizer.h"
using namespace std;
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
ifstream in(argv[1]);
assure(in, argv[1]);
StreamTokenizer words(in);
multiset<string> wordmset;
string word;
while((word = words.next()).size() != 0)
wordmset.insert(word);
typedef multiset<string>::iterator MSit;
MSit it = wordmset.begin();
while(it != wordmset.end()) {
pair<MSit, MSit> p=wordmset.equal_range(*it);
int count = distance(p.first, p.second);
cout << *it << ": " << count << endl;
it = p.second; // Move to the next word
}
} ///:~
The
setup in
main( )
is identical to
WordCount.cpp,
but then each word is simply inserted into the
multiset<string>.
An iterator is created and initialized to the beginning of the
multiset;
dereferencing this iterator produces the current word.
equal_range( )
produces the starting and ending iterators of the word that’s currently
selected, and the STL algorithm
distance( )
(which is in
<iterator>)
is used to count the number of elements in that range. Then the iterator
it
is
moved forward to the end of the range, which puts it at the next word. Although
if you’re unfamiliar with the
multiset
this code can seem more complex, the density of it and the lack of need for
supporting classes like
Count
has a lot of appeal.
In
the end, is this really a “set,” or should it be called something
else? An alternative is the generic “bag” that has been defined in
some container libraries, since a bag holds anything at all without
discrimination – including duplicate objects. This is close, but it
doesn’t quite fit since a bag has no specification about how elements
should be ordered, while a
multiset
(which requires that all duplicate elements be adjacent to each other) is even
more restrictive than the concept of a set, which could use a hashing function
to order its elements, in which case they would not be in sorted order.
Besides, if you wanted to store a bunch of objects without any special
criterions, you’d probably just use a
vector,
deque
or
list.
Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru