What’s
in a string
In
C, a string is simply an array of characters that always includes a binary zero
(often called the
null
terminator
)
as its final array element. There are two significant differences between C++
strings
and their C progenitors. First, C++
string
objects associate the array of characters which constitute the
string
with methods useful for managing and operating on it. A
string
also contains certain “housekeeping” information about the size and
storage location of its data. Specifically, a C++
string
object knows its starting location in memory, its content, its length in
characters, and the length in characters to which it can grow before the
string
object must resize its internal data buffer. This gives rise to the second big
difference between C
char
arrays and C++
strings.
C++
strings
do not include a null terminator, nor do the C++
string
handling member functions rely on the existence of a null terminator to perform
their jobs. C++
strings
greatly reduce the likelihood of making three of the most common and
destructive C programming errors: overwriting array bounds, trying to access
arrays through uninitialized or incorrectly valued pointers, and leaving
pointers “dangling” after an array ceases to occupy the storage
that was once allocated to it.
The
exact implementation of memory layout for the string class is not defined by
the C++ Standard. This architecture is intended to be flexible enough to allow
differing implementations by compiler vendors, yet guarantee predictable
behavior for users. In particular, the exact conditions under which storage is
allocated to hold data for a string object are not defined. String allocation
rules were formulated to allow but not require a reference-counted
implementation, but whether or not the implementation uses reference counting,
the semantics must be the same. To put this a bit differently, in C, every
char
array occupies a unique physical region of memory. In C++, individual
string
objects may or may not occupy unique physical regions of memory, but if
reference counting is used to avoid storing duplicate copies of data, the
individual objects must look and act as though they do exclusively own unique
regions of storage. For example:
//: C17:StringStorage.cpp
#include <string>
#include <iostream>
using namespace std;
int main() {
string s1("12345");
// Set the iterator indicate the first element
string::iterator it = s1.begin();
// This may copy the first to the second or
// use reference counting to simulate a copy
string s2 = s1;
// Either way, this statement may ONLY modify first
*it = '0';
cout << "s1 = " << s1 << endl;
cout << "s2 = " << s2 << endl;
} ///:~
Reference
counting may serve to make an implementation more memory efficient, but it is
transparent to users of the
string
class.
Creating
and initializing C++ strings
Creating
and initializing
strings
is a straightforward proposition, and fairly flexible as well. In the example
shown below, the first
string,
imBlank,
is declared but contains no initial value. Unlike a C
char
array, which would contain a random and meaningless bit pattern until
initialization,
imBlank
does contain meaningful information. This
string
object has been initialized to hold “no characters,” and can
properly report its 0 length and absence of data elements through the use of
class member functions.
The
next
string,
heyMom,
is initialized by the literal argument "Where are my socks?". This form of
initialization uses a quoted character array as a parameter to the
string
constructor. By contrast,
standardReply
is simply initialized with an assignment. The last
string
of the group,
useThisOneAgain,
is initialized using an existing C++
string
object. Put another way, this example illustrates that
string
objects let you:
- Create
an empty
string
and defer initializing it with character data
- Initialize
a
string
by passing a literal, quoted character array as an argument to the constructor
- Initialize
a
string
using ‘
=‘
- Use
one
string
to initialize another
//: C17:SmallString.cpp
#include <string>
using namespace std;
int main() {
string imBlank;
string heyMom("Where are my socks?");
string standardReply = "Beamed into deep "
"space on wide angle dispersion?";
string useThisOneAgain(standardReply);
} ///:~
These
are the simplest forms of
string
initialization, but there are other variations which offer more flexibility and
control. You can :
- Use
a portion of either a C
char
array or a C++
string
- Combine
different sources of initialization data using
operator+
- Use
the
string
object’s
substr( )
member function to create a substring
//: C17:SmallString2.cpp
#include <string>
using namespace std;
int main() {
string s1
("What is the sound of one clam napping?");
string s2
("Anything worth doing is worth overdoing.");
string s3("I saw Elvis in a UFO.");
// Copy the first 8 chars
string s4(s1, 0, 8);
// Copy 6 chars from the middle of the source
string s5(s2, 15, 6);
// Copy from middle to end
string s6(s3, 6, 15);
// Copy all sorts of stuff
string quoteMe = s4 + "that" +
// substr() copies 10 chars at element 20
s1.substr(20, 10) + s5 +
// substr() copies up to either 100 char
// or eos starting at element 5
"with" + s3.substr(5, 100) +
// OK to copy a single char this way
s1.substr(37, 1);
} ///:~
The
string
member function
substr( )
takes a starting position as its first argument and the number of characters to
select as the second argument. Both of these arguments have default values and
if you say
substr( )
with an empty argument list you produce a copy of the entire
string,
so this is a convenient way to duplicate a
string.
Here’s
what the
string
quoteMe
contains
after the initialization shown above :
"What
is that one clam doing with Elvis in a UFO?"
Notice
the final line of example above. C++ allows
string
initialization techniques to be mixed in a single statement, a flexible and
convenient feature. Also note that the last initializer copies
just
one character
from the source
string. Another
slightly more subtle initialization technique involves the use of the
string
iterators
string.begin( )
and
string.end( ).
This treats a
string
like a
container
object (which you’ve seen primarily in the form of
vector
so far in this book – you’ll see many more containers soon) which
has
iterators
indicating the start and end of the “container.” This way you can
hand a
string
constructor two iterators and it will copy from one to the other into the new
string:
//: C17:StringIterators.cpp
#include <string>
#include <iostream>
using namespace std;
int main() {
string source("xxx");
string s(source.begin(), source.end());
cout << s << endl;
} ///:~
The
iterators are not restricted to
begin( )
and
end( ),
so you can choose a subset of characters from the source
string.
Initialization
limitations
C++
strings
may
not
be initialized with single characters or with ASCII or other integer values.
//: C17:UhOh.cpp
#include <string>
using namespace std;
int main() {
// Error: no single char inits
//! string nothingDoing1('a');
// Error: no integer inits
//! string nothingDoing2(0x37);
} ///:~
This
is true both for initialization by assignment and by copy constructor.
Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru