MFC Programmer's SourceBook : Thinking in C++
Bruce Eckel's Thinking in C++, 2nd Ed Contents | Prev | Next

Overloading assignment

A common source of confusion with new C++ programmers is assignment. This is no doubt because the = sign is such a fundamental operation in programming, right down to copying a register at the machine level. In addition, the copy-constructor (from the previous chapter) can also be invoked when using the = sign:

MyType b;
MyType a = b;
a = b; 

In the second line, the object a is being defined. A new object is being created where one didn’t exist before. Because you know by now how defensive the C++ compiler is about object initialization, you know that a constructor must always be called at the point where an object is defined. But which constructor? a is being created from an existing MyType object, so there’s only one choice: the copy-constructor. So even though an equal sign is involved, the copy-constructor is called.

In the third line, things are different. On the left side of the equal sign, there’s a previously initialized object. Clearly, you don’t call a constructor for an object that’s already been created. In this case MyType::operator= is called for a, taking as an argument whatever appears on the right-hand side. (You can have multiple operator= functions to take different right-hand arguments.)

This behavior is not restricted to the copy-constructor. Any time you’re initializing an object using an = instead of the ordinary function-call form of the constructor, the compiler will look for a constructor that accepts whatever is on the right-hand side:

//: C12:FeeFi.cpp
// Copying vs. initialization

class Fi {
public:
  Fi() {}
};

class Fee {
public:
  Fee(int) {}
  Fee(const Fi&) {}
};

int main() {
  Fee f = 1; // Fee(int)
  Fi fi;
  Fee fum = fi; // Fee(Fi)
} ///:~ 

When dealing with the = sign, it’s important to keep this distinction in mind: If the object hasn’t been created yet, initialization is required; otherwise the assignment operator= is used.

It’s even better to avoid writing code that uses the = for initialization; instead, always use the explicit constructor form; the last line becomes

Fee fum(fi);

This way, you’ll avoid confusing your readers.

Behavior of operator=

In Binary.cpp, you saw that operator= can be only a member function. It is intimately connected to the object on the left side of the =, and if you could define operator= globally, you could try to redefine the built-in = sign:

int operator=(int, MyType); // Global = not allowed!

The compiler skirts this whole issue by forcing you to make operator= a member function.

When you create an operator=, you must copy all the necessary information from the right-hand object into yourself to perform whatever you consider “assignment” for your class. For simple objects, this is obvious:

//: C12:Simpcopy.cpp
// Simple operator=()
#include <iostream>
using namespace std;

class Value {
  int a, b;
  float c;
public:
  Value(int aa = 0, int bb = 0, float cc = 0.0) {
    a = aa;
    b = bb;
    c = cc;
  }
  Value& operator=(const Value& rv) {
    a = rv.a;
    b = rv.b;
    c = rv.c;
    return *this;
  }
  friend ostream&
    operator<<(ostream& os, const Value& rv) {
      return os << "a = " << rv.a << ", b = "
        << rv.b << ", c = " << rv.c;
    }
};

int main() {
  Value A, B(1, 2, 3.3);
  cout << "A: " << A << endl;
  cout << "B: " << B << endl;
  A = B;
  cout << "A after assignment: " << A << endl;
} ///:~ 

Here, the object on the left side of the = copies all the elements of the object on the right, then returns a reference to itself, so a more complex expression can be created.

A common mistake was made in this example. When you’re assigning two objects of the same type, you should always check first for self-assignment: Is the object being assigned to itself? In some cases, such as this one, it’s harmless if you perform the assignment operations anyway, but if changes are made to the implementation of the class it, can make a difference, and if you don’t do it as a matter of habit, you may forget and cause hard-to-find bugs.

Pointers in classes

What happens if the object is not so simple? For example, what if the object contains pointers to other objects? Simply copying a pointer means you’ll end up with two objects pointing to the same storage location. In situations like these, you need to do bookkeeping of your own.

There are two common approaches to this problem. The simplest technique is to copy whatever the pointer refers to when you do an assignment or a copy-constructor. This is very straightforward:

//: C12:Copymem.cpp {O}
// Duplicate during assignment
#include <cstdlib>
#include <cstring>
#include "../require.h"
using namespace std;

class WithPointer {
  char* p;
  enum { blocksz = 100 };
public:
  WithPointer() {
    p = (char*)malloc(blocksz);
    require(p != 0);
    memset(p, 1, blocksz);
  }
  WithPointer(const WithPointer& wp) {
    p = (char*)malloc(blocksz);
    require(p != 0);
    memcpy(p, wp.p, blocksz);
  }
  WithPointer&
  operator=(const WithPointer& wp) {
    // Check for self-assignment:
    if(&wp != this)
      memcpy(p, wp.p, blocksz);
    return *this;
  }
  ~WithPointer() {
    free(p);
  }
}; ///:~ 

This shows the four functions you will always need to define when your class contains pointers: all necessary ordinary constructors, the copy-constructor, operator= (either define it or disallow it), and a destructor. The operator= checks for self-assignment as a matter of course, even though it’s not strictly necessary here. This virtually eliminates the possibility that you’ll forget to check for self-assignment if you do change the code so that it matters.

Here, the constructors allocate the memory and initialize it, the operator= copies it, and the destructor frees the memory. However, if you’re dealing with a lot of memory or a high overhead to initialize that memory, you may want to avoid this copying. A very common approach to this problem is called reference counting. You make the block of memory smart, so it knows how many objects are pointing to it. Then copy-construction or assignment means attaching another pointer to an existing block of memory and incrementing the reference count. Destruction means reducing the reference count and destroying the object if the reference count goes to zero.

But what if you want to write to the block of memory? More than one object may be using this block, so you’d be modifying someone else’s block as well as yours, which doesn’t seem very neighborly. To solve this problem, an additional technique called copy-on-write is often used. Before writing to a block of memory, you make sure no one else is using it. If the reference count is greater than one, you must make yourself a personal copy of that block before writing it, so you don’t disturb someone else’s turf. Here’s a simple example of reference counting and copy-on-write:

//: C12:Refcount.cpp
// Reference count, copy-on-write
#include <cstring>
#include "../require.h"
using namespace std;

class Counted {
  class MemBlock {
    enum { size = 100 };
    char c[size];
    int refcount;
  public:
    MemBlock() {
      memset(c, 1, size);
      refcount = 1;
    }
    MemBlock(const MemBlock& rv) {
      memcpy(c, rv.c, size);
      refcount = 1;
    }
    void attach() { ++refcount; }
    void detach() {
      require(refcount != 0);
      // Destroy object if no one is using it:
      if(--refcount == 0) delete this;
    }
    int count() const { return refcount; }
    void set(char x) { memset(c, x, size); }
    // Conditionally copy this MemBlock.
    // Call before modifying the block; assign
    // resulting pointer to your block;
    MemBlock* unalias() {
      // Don't duplicate if not aliased:
      if(refcount == 1) return this;
      --refcount;
      // Use copy-constructor to duplicate:
      return new MemBlock(*this);
    }
  } * block;
public:
  Counted() {
    block = new MemBlock; // Sneak preview
  }
  Counted(const Counted& rv) {
    block = rv.block; // Pointer assignment
    block->attach();
  }
  void unalias() { block = block->unalias(); }
  Counted& operator=(const Counted& rv) {
    // Check for self-assignment:
    if(&rv == this) return *this;
    // Clean up what you're using first:
    block->detach();
    block = rv.block; // Like copy-constructor
    block->attach();
    return *this;
  }
  // Decrement refcount, conditionally destroy
  ~Counted() { block->detach(); }
  // Copy-on-write:
  void write(char value) {
    // Do this before any write operation:
    unalias();
    // It's safe to write now.
    block->set(value);
  }
};

int main() {
  Counted A, B;
  Counted C(A);
  B = A;
  C = C;
  C.write('x');
} ///:~ 

The nested class MemBlock is the block of memory pointed to. (Notice the pointer block defined at the end of the nested class.) It contains a reference count and functions to control and read the reference count. There’s a copy-constructor so you can make a new MemBlock from an existing one.

The attach( ) function increments the reference count of a MemBlock to indicate there’s another object using it. detach( ) decrements the reference count. If the reference count goes to zero, then no one is using it anymore, so the member function destroys its own object by saying delete this .

You can modify the memory with the set( ) function, but before you make any modifications, you should ensure that you aren’t walking on a MemBlock that some other object is using. You do this by calling Counted::unalias( ), which in turn calls MemBlock::unalias( ). The latter function will return the block pointer if the reference count is one (meaning no one else is pointing to that block), but will duplicate the block if the reference count is more than one.

This example includes a sneak preview of the next chapter. Instead of C’s malloc( ) and free( ) to create and destroy the objects, the special C++ operators new and delete are used. For this example, you can think of new and delete just like malloc( ) and free( ), except new calls the constructor after allocating memory, and delete calls the destructor before freeing the memory.

The copy-constructor, instead of creating its own memory, assigns block to the block of the source object. Then, because there’s now an additional object using that block of memory, it increments the reference count by calling MemBlock::attach( ).

The operator= deals with an object that has already been created on the left side of the =, so it must first clean that up by calling detach( ) for that MemBlock, which will destroy the old MemBlock if no one else is using it. Then operator= repeats the behavior of the copy-constructor. Notice that it first checks to detect whether you’re assigning the same object to itself.

The destructor calls detach( ) to conditionally destroy the MemBlock.

To implement copy-on-write, you must control all the actions that write to your block of memory. This means you can’t ever hand a raw pointer to the outside world. Instead you say, “Tell me what you want done and I’ll do it for you!” For example, the write( ) member function allows you to change the values in the block of memory. But first, it uses unalias( ) to prevent the modification of an aliased block (a block with more than one Counted object using it).

main( ) tests the various functions that must work correctly to implement reference counting: the constructor, copy-constructor, operator=, and destructor. It also tests the copy-on-write by calling the write( ) function for object C, which is aliased to A’s memory block.

Tracing the output

To verify that the behavior of this scheme is correct, the best approach is to add information and functionality to the class to generate a trace output that can be analyzed. Here’s Refcount.cpp with added trace information:

//: C12:Rctrace.cpp
// REFCOUNT.CPP w/ trace info
#include <cstring>
#include <fstream>
#include "../require.h"
using namespace std;

ofstream out("rctrace.out");

class Counted {
  class MemBlock {
    enum { size = 100 };
    char c[size];
    int refcount;
    static int blockcount;
    int blocknum;
  public:
    MemBlock() {
      memset(c, 1, size);
      refcount = 1;
      blocknum = blockcount++;
    }
    MemBlock(const MemBlock& rv) {
      memcpy(c, rv.c, size);
      refcount = 1;
      blocknum = blockcount++;
      print("copied block");
      out << endl;
      rv.print("from block");
    }
    ~MemBlock() {
      out << "\tdestroying block "
          << blocknum << endl;
    }
    void print(const char* msg = "") const {
      if(*msg) out << msg << ", ";
      out << "blocknum:" << blocknum;
      out << ", refcount:" << refcount;
    }
    void attach() { ++refcount; }
    void detach() {
      require(refcount != 0);
      // Destroy object if no one is using it:
      if(--refcount == 0) delete this;
    }
    int count() const { return refcount; }
    void set(char x) { memset(c, x, size); }
    // Conditionally copy this MemBlock.
    // Call before modifying the block; assign
    // resulting pointer to your block;
    MemBlock* unalias() {
      // Don't duplicate if not aliased:
      if(refcount == 1) return this;
      --refcount;
      // Use copy-constructor to duplicate:
      return new MemBlock(*this);
    }
  } * block;
  enum { sz = 30 };
  char id[sz];
public:
  Counted(const char* ID = "tmp") {
    block = new MemBlock; // Sneak preview
    strncpy(id, ID, sz);
  }
  Counted(const Counted& rv) {
    block = rv.block; // Pointer assignment
    block->attach();
    strncpy(id, rv.id, sz);
    strncat(id, " copy", sz - strlen(id));
  }
  void unalias() { block = block->unalias(); }
  void addname(const char* nm) {
    strncat(id, nm, sz - strlen(id));
  }
  Counted& operator=(const Counted& rv) {
    print("inside operator=\n\t");
    if(&rv == this) {
      out << "self-assignment" << endl;
      return *this;
    }
    // Clean up what you're using first:
    block->detach();
    block = rv.block; // Like copy-constructor
    block->attach();
    return *this;
  }
  // Decrement refcount, conditionally destroy
  ~Counted() {
    out << "preparing to destroy: " << id
        << "\n\tdecrementing refcount ";
    block->print();
    out << endl;
    block->detach();
  }
  // Copy-on-write:
  void write(char value) {
    unalias();
    block->set(value);
  }
  void print(const char* msg = "") {
    if(*msg) out << msg << " ";
    out << "object " << id << ": ";
    block->print();
    out << endl;
  }
};

int Counted::MemBlock::blockcount = 0;

int main() {
  Counted A("A"), B("B");
  Counted C(A);
  C.addname(" (C) ");
  A.print();
  B.print();
  C.print();
  B = A;
  A.print("after assignment\n\t");
  B.print();
  out << "Assigning C = C" << endl;
  C = C;
  C.print("calling C.write('x')\n\t");
  C.write('x');
  out << "\n exiting main()" << endl;
} ///:~ 

Now MemBlock contains a static data member blockcount to keep track of the number of blocks created, and to create a unique number (stored in blocknum) for each block so you can tell them apart. The destructor announces which block is being destroyed, and the print( ) function displays the block number and reference count.

The Counted class contains a buffer id to keep track of information about the object. The Counted constructor creates a new MemBlock object and assigns the result (a pointer to the MemBlock object on the heap) to block. The identifier, copied from the argument, has the word “copy” appended to show where it’s copied from. Also, the addname( ) function lets you put additional information about the object in id (the actual identifier, so you can see what it is as well as where it’s copied from).

Here’s the output:

object A: blocknum:0, refcount:2
object B: blocknum:1, refcount:1
object A copy (C) : blocknum:0, refcount:2
inside operator=
   object B: blocknum:1, refcount:1
  destroying block 1
after assignment
   object A: blocknum:0, refcount:3
object B: blocknum:0, refcount:3
Assigning C = C
inside operator=
   object A copy (C) : blocknum:0, refcount:3
self-assignment
calling C.write('x')
   object A copy (C) : blocknum:0, refcount:3
copied block, blocknum:2, refcount:1
from block, blocknum:0, refcount:2
exiting main()
preparing to destroy: A copy (C)
  decrementing refcount blocknum:2, refcount:1
  destroying block 2
preparing to destroy: B
  decrementing refcount blocknum:0, refcount:2
preparing to destroy: A
  decrementing refcount blocknum:0, refcount:1
  destroying block 0 
By studying the output, tracing through the source code, and experimenting with the program, you’ll deepen your understanding of these techniques.

Automatic operator= creation

Because assigning an object to another object of the same type is an activity most people expect to be possible, the compiler will automatically create a type::operator=(type) if you don’t make one. The behavior of this operator mimics that of the automatically created copy-constructor: If the class contains objects (or is inherited from another class), the operator= for those objects is called recursively. This is called memberwise assignment. For example,

//: C12:Autoeq.cpp
// Automatic operator=()
#include <iostream>
using namespace std;

class Bar {
public:
  Bar& operator=(const Bar&) {
    cout << "inside Bar::operator=()" << endl;
    return *this;
  }
};

class MyType {
  Bar b;
};

int main() {
  MyType a, b;
  a = b; // Prints: "inside Bar::operator=()"
} ///:~ 

The automatically generated operator= for MyType calls Bar::operator=.

Generally you don’t want to let the compiler do this for you. With classes of any sophistication (especially if they contain pointers!) you want to explicitly create an operator=. If you really don’t want people to perform assignment, declare operator= as a private function. (You don’t need to define it unless you’re using it inside the class.)

Contents | Prev | Next


Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru