Overloading
assignment
A
common source of confusion with new C++ programmers is assignment. This is no
doubt because the
=
sign is such a fundamental operation in programming, right down to copying a
register at the machine level. In addition, the copy-constructor
(from the previous chapter) can also be invoked when using the
=
sign:
MyType b;
MyType a = b;
a = b;
In
the second line, the object
a
is being
defined.
A new object is being created where one didn’t exist before. Because you
know by now how defensive the C++ compiler is about object initialization, you
know that a constructor must always be called at the point where an object is
defined. But which constructor?
a
is being created from an existing
MyType
object, so there’s only one choice: the copy-constructor. So even though
an equal sign is involved, the copy-constructor is called.
In
the third line, things are different. On the left side of the equal sign,
there’s a previously initialized object. Clearly, you don’t call a
constructor for an object that’s already been created. In this case
MyType::operator=
is called for
a,
taking as an argument whatever appears on the right-hand side. (You can have
multiple
operator=
functions to take different right-hand arguments.)
This
behavior is not restricted to the copy-constructor. Any time you’re
initializing an object using an
=
instead of the ordinary function-call form of the constructor, the compiler
will look for a constructor that accepts whatever is on the right-hand side:
//: C12:FeeFi.cpp
// Copying vs. initialization
class Fi {
public:
Fi() {}
};
class Fee {
public:
Fee(int) {}
Fee(const Fi&) {}
};
int main() {
Fee f = 1; // Fee(int)
Fi fi;
Fee fum = fi; // Fee(Fi)
} ///:~
When
dealing with the
=
sign, it’s important to keep this distinction in mind: If the object
hasn’t been created yet, initialization is required; otherwise the
assignment
operator=
is used.
It’s
even better to avoid writing code that uses the
=
for initialization; instead, always use the explicit constructor form; the last
line becomes
This
way, you’ll avoid confusing your readers.
Behavior
of operator=
In
Binary.cpp,
you saw that
operator=
can be only a member function. It is intimately connected to the object on the
left side of the
=,
and if you could define
operator=
globally, you could try to redefine the built-in
=
sign:
int
operator=(int, MyType); // Global = not allowed!
The
compiler skirts this whole issue by forcing you to make
operator=
a member function.
When
you create an
operator=,
you must copy all the necessary information from the right-hand object into
yourself to perform whatever you consider “assignment” for your
class. For simple objects, this is obvious:
//: C12:Simpcopy.cpp
// Simple operator=()
#include <iostream>
using namespace std;
class Value {
int a, b;
float c;
public:
Value(int aa = 0, int bb = 0, float cc = 0.0) {
a = aa;
b = bb;
c = cc;
}
Value& operator=(const Value& rv) {
a = rv.a;
b = rv.b;
c = rv.c;
return *this;
}
friend ostream&
operator<<(ostream& os, const Value& rv) {
return os << "a = " << rv.a << ", b = "
<< rv.b << ", c = " << rv.c;
}
};
int main() {
Value A, B(1, 2, 3.3);
cout << "A: " << A << endl;
cout << "B: " << B << endl;
A = B;
cout << "A after assignment: " << A << endl;
} ///:~
Here,
the object on the left side of the
=
copies all the elements of the object on the right, then returns a reference to
itself, so a more complex expression can be created.
A
common mistake was made in this example. When you’re assigning two
objects of the same type, you should always check first for self-assignment: Is
the object being assigned to itself? In some cases, such as this one,
it’s harmless if you perform the assignment operations anyway, but if
changes are made to the implementation of the class it, can make a difference,
and if you don’t do it as a matter of habit, you may forget and cause
hard-to-find bugs.
Pointers
in classes
What
happens if the object is not so simple? For example, what if the object
contains pointers to other objects? Simply copying a pointer
means you’ll end up with two objects pointing to the same storage
location. In situations like these, you need to do bookkeeping of your own.
There
are two common approaches to this problem. The simplest technique is to copy
whatever the pointer refers to when you do an assignment or a copy-constructor.
This is very straightforward:
//: C12:Copymem.cpp {O}
// Duplicate during assignment
#include <cstdlib>
#include <cstring>
#include "../require.h"
using namespace std;
class WithPointer {
char* p;
enum { blocksz = 100 };
public:
WithPointer() {
p = (char*)malloc(blocksz);
require(p != 0);
memset(p, 1, blocksz);
}
WithPointer(const WithPointer& wp) {
p = (char*)malloc(blocksz);
require(p != 0);
memcpy(p, wp.p, blocksz);
}
WithPointer&
operator=(const WithPointer& wp) {
// Check for self-assignment:
if(&wp != this)
memcpy(p, wp.p, blocksz);
return *this;
}
~WithPointer() {
free(p);
}
}; ///:~
This
shows the four functions you will always need to define when your class
contains pointers: all necessary ordinary constructors, the copy-constructor,
operator=
(either define it or disallow it), and a destructor. The
operator=
checks for self-assignment as a matter of course, even though it’s not
strictly necessary here. This virtually eliminates the possibility that
you’ll forget to check for self-assignment if you
do
change the code so that it matters.
Here,
the constructors allocate the memory and initialize it, the
operator=
copies it, and the destructor frees the memory. However, if you’re
dealing with a lot of memory or a high overhead to initialize that memory, you
may want to avoid this copying. A very common approach to this problem is called
reference
counting.
You make the block of memory smart, so it knows how many objects are pointing
to it. Then copy-construction or assignment means attaching another pointer to
an existing block of memory and incrementing the reference count. Destruction
means reducing the reference count and destroying the object if the reference
count goes to zero.
But
what if you want to write to the block of memory? More than one object may be
using this block, so you’d be modifying someone else’s block as
well as yours, which doesn’t seem very neighborly. To solve this problem,
an additional technique called
copy-on-write
is
often used. Before writing to a block of memory, you make sure no one else is
using it. If the reference count is greater than one, you must make yourself a
personal copy of that block before writing it, so you don’t disturb
someone else’s turf. Here’s a simple example of reference counting
and copy-on-write:
//: C12:Refcount.cpp
// Reference count, copy-on-write
#include <cstring>
#include "../require.h"
using namespace std;
class Counted {
class MemBlock {
enum { size = 100 };
char c[size];
int refcount;
public:
MemBlock() {
memset(c, 1, size);
refcount = 1;
}
MemBlock(const MemBlock& rv) {
memcpy(c, rv.c, size);
refcount = 1;
}
void attach() { ++refcount; }
void detach() {
require(refcount != 0);
// Destroy object if no one is using it:
if(--refcount == 0) delete this;
}
int count() const { return refcount; }
void set(char x) { memset(c, x, size); }
// Conditionally copy this MemBlock.
// Call before modifying the block; assign
// resulting pointer to your block;
MemBlock* unalias() {
// Don't duplicate if not aliased:
if(refcount == 1) return this;
--refcount;
// Use copy-constructor to duplicate:
return new MemBlock(*this);
}
} * block;
public:
Counted() {
block = new MemBlock; // Sneak preview
}
Counted(const Counted& rv) {
block = rv.block; // Pointer assignment
block->attach();
}
void unalias() { block = block->unalias(); }
Counted& operator=(const Counted& rv) {
// Check for self-assignment:
if(&rv == this) return *this;
// Clean up what you're using first:
block->detach();
block = rv.block; // Like copy-constructor
block->attach();
return *this;
}
// Decrement refcount, conditionally destroy
~Counted() { block->detach(); }
// Copy-on-write:
void write(char value) {
// Do this before any write operation:
unalias();
// It's safe to write now.
block->set(value);
}
};
int main() {
Counted A, B;
Counted C(A);
B = A;
C = C;
C.write('x');
} ///:~
The
nested class
MemBlock
is the block of memory pointed to. (Notice the pointer
block
defined at the end of the nested class.) It contains a reference count and
functions to control and read the reference count. There’s a
copy-constructor so you can make a new
MemBlock
from an existing one.
The
attach( )
function increments the reference count of a
MemBlock
to indicate there’s another object using it.
detach( )
decrements
the reference count. If the reference count goes to zero, then no one is using
it anymore, so the member function destroys its own object by saying
delete
this
. You
can modify the memory with the
set( )
function, but before you make any modifications, you should ensure that you
aren’t walking on a
MemBlock
that
some other object is using. You do this by calling
Counted::unalias( ),
which in turn calls
MemBlock::unalias( ).
The
latter function will return the
block
pointer if the reference count is one (meaning no one else is pointing to that
block), but will duplicate the block if the reference count is more than one.
This
example includes a sneak preview of the next chapter. Instead of C’s
malloc( )
and
free( )
to create and destroy the objects, the special C++ operators
new
and
delete
are used. For this example, you can think of
new
and
delete
just like
malloc( )
and
free( ),
except
new
calls the constructor after allocating memory, and
delete
calls the destructor before freeing the memory.
The
copy-constructor, instead of creating its own memory, assigns
block
to the
block
of the source object. Then, because there’s now an additional object
using that block of memory, it increments the reference count by calling
MemBlock::attach( ). The
operator=
deals with an object that has already been created on the left side of the
=,
so it must first clean that up by calling
detach( )
for that
MemBlock,
which will destroy the old
MemBlock
if
no one else is using it. Then
operator=
repeats the behavior of the copy-constructor. Notice that it first checks to
detect whether you’re assigning the same object to itself.
The
destructor calls
detach( )
to conditionally destroy the
MemBlock. To
implement copy-on-write, you must control all the actions that write to your
block of memory. This means you can’t ever hand a raw pointer to the
outside world. Instead you say, “Tell me what you want done and
I’ll do it for you!” For example, the
write( )
member function allows you to change the values in the block of memory. But
first, it uses
unalias( )
to prevent the modification of an aliased block (a block with more than one
Counted
object using it).
main( )
tests the various functions that must work correctly to implement reference
counting: the constructor, copy-constructor,
operator=,
and destructor. It also tests the copy-on-write by calling the
write( )
function for object
C,
which is aliased to
A’s
memory block.
Tracing
the output
To
verify that the behavior of this scheme is correct, the best approach is to add
information and functionality to the class to generate a trace output that can
be analyzed. Here’s
Refcount.cpp
with added trace information:
//: C12:Rctrace.cpp
// REFCOUNT.CPP w/ trace info
#include <cstring>
#include <fstream>
#include "../require.h"
using namespace std;
ofstream out("rctrace.out");
class Counted {
class MemBlock {
enum { size = 100 };
char c[size];
int refcount;
static int blockcount;
int blocknum;
public:
MemBlock() {
memset(c, 1, size);
refcount = 1;
blocknum = blockcount++;
}
MemBlock(const MemBlock& rv) {
memcpy(c, rv.c, size);
refcount = 1;
blocknum = blockcount++;
print("copied block");
out << endl;
rv.print("from block");
}
~MemBlock() {
out << "\tdestroying block "
<< blocknum << endl;
}
void print(const char* msg = "") const {
if(*msg) out << msg << ", ";
out << "blocknum:" << blocknum;
out << ", refcount:" << refcount;
}
void attach() { ++refcount; }
void detach() {
require(refcount != 0);
// Destroy object if no one is using it:
if(--refcount == 0) delete this;
}
int count() const { return refcount; }
void set(char x) { memset(c, x, size); }
// Conditionally copy this MemBlock.
// Call before modifying the block; assign
// resulting pointer to your block;
MemBlock* unalias() {
// Don't duplicate if not aliased:
if(refcount == 1) return this;
--refcount;
// Use copy-constructor to duplicate:
return new MemBlock(*this);
}
} * block;
enum { sz = 30 };
char id[sz];
public:
Counted(const char* ID = "tmp") {
block = new MemBlock; // Sneak preview
strncpy(id, ID, sz);
}
Counted(const Counted& rv) {
block = rv.block; // Pointer assignment
block->attach();
strncpy(id, rv.id, sz);
strncat(id, " copy", sz - strlen(id));
}
void unalias() { block = block->unalias(); }
void addname(const char* nm) {
strncat(id, nm, sz - strlen(id));
}
Counted& operator=(const Counted& rv) {
print("inside operator=\n\t");
if(&rv == this) {
out << "self-assignment" << endl;
return *this;
}
// Clean up what you're using first:
block->detach();
block = rv.block; // Like copy-constructor
block->attach();
return *this;
}
// Decrement refcount, conditionally destroy
~Counted() {
out << "preparing to destroy: " << id
<< "\n\tdecrementing refcount ";
block->print();
out << endl;
block->detach();
}
// Copy-on-write:
void write(char value) {
unalias();
block->set(value);
}
void print(const char* msg = "") {
if(*msg) out << msg << " ";
out << "object " << id << ": ";
block->print();
out << endl;
}
};
int Counted::MemBlock::blockcount = 0;
int main() {
Counted A("A"), B("B");
Counted C(A);
C.addname(" (C) ");
A.print();
B.print();
C.print();
B = A;
A.print("after assignment\n\t");
B.print();
out << "Assigning C = C" << endl;
C = C;
C.print("calling C.write('x')\n\t");
C.write('x');
out << "\n exiting main()" << endl;
} ///:~
Now
MemBlock
contains a
static
data member blockcount
to
keep track of the number of blocks created, and to create a unique number
(stored in
blocknum)
for
each block so you can tell them apart. The destructor announces which block is
being destroyed, and the
print( )
function displays the block number and reference count.
The
Counted
class contains a buffer
id
to keep track of information about the object. The
Counted
constructor creates a
new
MemBlock
object and assigns the result (a pointer to the
MemBlock
object on the heap) to
block.
The identifier, copied from the argument, has the word “copy”
appended to show where it’s copied from. Also, the
addname( )
function lets you put additional information about the object in
id
(the actual identifier, so you can see what it is as well as where it’s
copied from).
object A: blocknum:0, refcount:2
object B: blocknum:1, refcount:1
object A copy (C) : blocknum:0, refcount:2
inside operator=
object B: blocknum:1, refcount:1
destroying block 1
after assignment
object A: blocknum:0, refcount:3
object B: blocknum:0, refcount:3
Assigning C = C
inside operator=
object A copy (C) : blocknum:0, refcount:3
self-assignment
calling C.write('x')
object A copy (C) : blocknum:0, refcount:3
copied block, blocknum:2, refcount:1
from block, blocknum:0, refcount:2
exiting main()
preparing to destroy: A copy (C)
decrementing refcount blocknum:2, refcount:1
destroying block 2
preparing to destroy: B
decrementing refcount blocknum:0, refcount:2
preparing to destroy: A
decrementing refcount blocknum:0, refcount:1
destroying block 0
By
studying the output, tracing through the source code, and experimenting with
the program, you’ll deepen your understanding of these techniques.
Automatic
operator= creation
Because
assigning an object to another object
of
the same type
is an activity most people expect to be possible, the compiler will
automatically create a
type::operator=(type)
if you don’t make one. The behavior of this operator mimics that of the
automatically created copy-constructor: If the class contains objects (or is
inherited from another class), the
operator=
for those objects is called recursively. This is called
memberwise
assignment.
For
example,
//: C12:Autoeq.cpp
// Automatic operator=()
#include <iostream>
using namespace std;
class Bar {
public:
Bar& operator=(const Bar&) {
cout << "inside Bar::operator=()" << endl;
return *this;
}
};
class MyType {
Bar b;
};
int main() {
MyType a, b;
a = b; // Prints: "inside Bar::operator=()"
} ///:~
The
automatically generated
operator=
for
MyType
calls
Bar::operator=. Generally
you don’t want to let the compiler do this for you. With classes of any
sophistication (especially if they contain pointers!) you want to explicitly
create an
operator=.
If you really don’t want people to perform assignment, declare
operator=
as a
private
function. (You don’t need to define it unless you’re using it
inside the class.)
Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru