CGI
programming in C++
The
World-Wide Web has become the common tongue of connectivity on planet earth. It
began as simply a way to publish primitively-formatted documents in a way that
everyone could read them regardless of the machine they were using. The
documents are created in
hypertext
markup language
(HTML) and placed on a central server machine where they are handed to anyone
who asks. The documents are requested and read using a
web
browser
that has been written or ported to each particular platform.
Very
quickly, just reading a document was not enough and people wanted to be able to
collect information from the clients, for example to take orders or allow
database lookups from the server. Many different approaches to
client-side
programming
have been tried such as Java applets, Javascript, and other scripting or
programming languages. Unfortunately, whenever you publish something on the
Internet you face the problem of a whole history of browsers, some of which may
support the particular flavor of your client-side programming tool, and some
which won’t. The only reliable and well-established solution
[65]
to this problem is to use straight HTML (which has a very limited way to
collect and submit information from the client) and
common
gateway interface
(CGI) programs that are run on the server. The Web server takes an encoded
request submitted via an HTML page and responds by invoking a CGI program and
handing it the encoded data from the request. This
request is classified as either a “GET”
or a “POST”
(the meaning of which will be explained later) and if you look at the URL
window in your Web browser when you push a “submit” button on a
page you’ll often be able to see the encoded request and information.
CGI
can seem a bit intimidating at first, but it turns out that it’s just
messy, and not all that difficult to write. (An innocent statement that’s
true of many things –
after
you understand them.) A CGI program is quite straightforward since it takes its
input from environment variables and standard input, and sends its output to
standard output. However, there is some decoding that must be done in order to
extract the data that’s been sent to you from the client’s web
page. In this section you’ll get a crash
course in CGI programming, and we’ll develop tools that will perform the
decoding for the two different types of CGI submissions (GET and POST). These
tools will allow you to easily write a CGI program to solve any problem. Since
C++ exists on virtually all machines that have Web servers (and you can get GNU
C++ free for virtually any platform), the solution presented here is quite
portable.
Encoding
data for CGI
To
submit data to a CGI program, the HTML “form” tag is used. The
following very simple HTML page contains a form that has one user-input field
along with a “submit” button:
//:! C26:SimpleForm.html
<HTML><HEAD>
<TITLE>A simple HTML form</TITLE></HEAD>
Test, uses standard html GET
<Form method="GET" ACTION="/cgi-bin/CGI_GET.exe">
<P>Field1: <INPUT TYPE = "text" NAME = "Field1"
VALUE = "This is a test" size = "40"></p>
<p><input type = "submit" name = "submit" > </p>
</Form></HTML>
///:~
Everything
between the
<Form
and the
</Form>
is part of this form (You can have multiple forms on a single page, but each
one is controlled by its own method and submit button). The
“method” can be either “get” or “post,” and
the “action” is what the server does when it receives the form
data: it calls a program. Each form has a method, an action, and a submit
button, and the rest of the form consists of input fields. The most
commonly-used input field is shown here: a text field. However, you can also
have things like check boxes, drop-down selection lists and radio buttons.
CGI_GET.exe
is the name of the executable program that resides in the directory
that’s typically called “cgi-bin” on your Web server.
[66]
(If the named program is not in the cgi-bin directory, you won’t see any
results.) Many Web servers are Unix machines (mine runs Linux) that don’t
traditionally use the
.exe
extension for their executable programs, but you can call the program anything
you want under Unix. By using the
.exe
extension
the program can be tested without change under most operating systems.
If
you fill out this form and press the “submit” button, in the URL
address window of your browser you will see something like:
http://www.pooh.com/cgi-bin/CGI_GET.exe?Field1=
This+is+a+test&submit=Submit+Query
(Without
the line break, of course.) Here you see a little bit of the way that data is
encoded to send to CGI. For one thing, spaces are not allowed (since spaces
typically separate command-line arguments). Spaces are replaced by ‘
+’
signs. In addition, each field contains the field name (which is determined by
the form on the HTML page) followed by an ‘
=‘
and the field data, and terminated by a ‘
&’. At
this point, you might wonder about the ‘
+’,
‘
=,’
and ‘
&’.
What if those are used in the field, as in “John & Marsha
Smith”? This is encoded to:
That
is, the special character is turned into a
‘%’
followed by its ASCII value in hex. Fortunately, the web browser automatically
performs all encoding for you.
The
CGI parser
There
are many examples of CGI programs written using Standard C. One argument for
doing this is that Standard C can be found virtually everywhere. However, C++
has become quite ubiquitous, especially in the form of the GNU
C++ Compiler
[67]
(
g++)
that
can be downloaded free from the Internet for virtually any platform (and often
comes pre-installed with operating systems such as Linux). As you will see,
this means that you can get the benefit of object-oriented programming in a CGI
program.
Since
what we’re concerned with when parsing the CGI information is the field
name-value pairs, one class (
CGIpair)
will
be used to represent a single name-value pair and a second class (
CGImap)
will use
CGIpair
to parse each name-value pair that is submitted from the HTML form into keys
and values that it will hold in a
map
of
strings
so you can easily fetch the value for each field at your leisure.
One
of the reasons for using C++ here is the convenience of the STL,
in particular the
map
class. Since
map
has
the
operator[
],
you
have a nice syntax for extracting the data for each field. The
map
template will be used in the creation of
CGImap,
which you’ll see is a fairly short definition considering how powerful it
is.
The
project will start with a reusable portion, which consists of
CGIpair
and
CGImap
in a header file. Normally you should avoid cramming this much code into a
header file, but for these examples it’s convenient and it doesn’t
hurt anything:
//: C26:CGImap.h
// Tools for extracting and decoding data from
// from CGI GETs and POSTs.
#include <string>
#include <vector>
#include <iostream>
using namespace std;
class CGIpair : public pair<string, string> {
public:
CGIpair() {}
CGIpair(string name, string value) {
first = decodeURLString(name);
second = decodeURLString(value);
}
// Automatic type conversion for boolean test:
operator bool() const {
return (first.length() != 0);
}
private:
static string decodeURLString(string URLstr) {
const int len = URLstr.length();
string result;
for(int i = 0; i < len; i++) {
if(URLstr[i] == '+')
result += ' ';
else if(URLstr[i] == '%') {
result +=
translateHex(URLstr[i + 1]) * 16 +
translateHex(URLstr[i + 2]);
i += 2; // Move past hex code
} else // An ordinary character
result += URLstr[i];
}
return result;
}
// Translate a single hex character; used by
// decodeURLString():
static char translateHex(char hex) {
if(hex >= 'A')
return (hex & 0xdf) - 'A' + 10;
else
return hex - '0';
}
};
// Parses any CGI query and turns it into an
// STL vector of CGIpair which has an associative
// lookup operator[] like a map. A vector is used
// instead of a map because it keeps the original
// ordering of the fields in the Web page form.
class CGImap : public vector<CGIpair> {
string gq;
int index;
// Prevent assignment and copy-construction:
void operator=(CGImap&);
CGImap(CGImap&);
public:
CGImap(string query): index(0), gq(query){
CGIpair p;
while((p = nextPair()) != 0)
push_back(p);
}
// Look something up, as if it were a map:
string operator[](const string& key) {
iterator i = begin();
while(i != end()) {
if((*i).first == key)
return (*i).second;
i++;
}
return string(); // Empty string == not found
}
void dump(ostream& o, string nl = "<br>") {
for(iterator i = begin(); i != end(); i++) {
o << (*i).first << " = "
<< (*i).second << nl;
}
}
private:
// Produces name-value pairs from the query
// string. Returns an empty Pair when there's
// no more query string left:
CGIpair nextPair() {
if(gq.length() == 0)
return CGIpair(); // Error, return empty
if(gq.find('=') == -1)
return CGIpair(); // Error, return empty
string name = gq.substr(0, gq.find('='));
gq = gq.substr(gq.find('=') + 1);
string value = gq.substr(0, gq.find('&'));
gq = gq.substr(gq.find('&') + 1);
return CGIpair(name, value);
}
};
// Helper class for getting POST data:
class Post : public string {
public:
Post() {
// For a CGI "POST," the server puts the
// length of the content string in the
// environment variable CONTENT_LENGTH:
char* clen = getenv("CONTENT_LENGTH");
if(clen == 0) {
cout << "Zero CONTENT_LENGTH, Make sure "
"this is a POST and not a GET" << endl;
return;
}
int len = atoi(clen);
char* s = new char[len];
cin.read(s, len); // Get the data
append(s, len); // Add it to this string
delete s;
}
}; ///:~
The
CGIpair
class starts out quite simply: it inherits from the standard library
pair
template to create a
pair
of
string
s,
one for the name and one for the value. The second constructor calls the member
function
decodeURLString( )
which produces a
string
after stripping away all the extra characters added by the browser as it
submitted the CGI request. There is no need to provide functions to select each
individual element – because
pair
is
inherited publicly, you can just select the
first
and
second
elements of the
CGIpair.
The
operator
bool
provides automatic type conversion to
bool.
If you have a
CGIpair
object called
p
and you use it in an expression where a Boolean result is expected, such as
then
the compiler will recognize that it has a
CGIpair
and it needs a Boolean, so it will automatically call
operator
bool
to perform the necessary conversion.
Because
the
string
objects take care of themselves, you don’t need to explicitly define the
copy-constructor,
operator=
or destructor – the default versions synthesized by the compiler do the
right thing.
The
remainder of the
CGIpair
class consists of the two methods
decodeURLString( )
and a helper member function
translateHex( )
which is used by
decodeURLString( ).
(Note that
translateHex( )
does not guard against bad input such as “%1H.”)
decodeURLString( )
moves through and replaces each ‘
+’
with a space, and each hex code (beginning with a ‘
%’)
with the appropriate character. It’s worth noting here and in
CGImap
the power of the
string
class – you can index into a
string
object using
operator[
]
,
and you can use methods like
find( )
and
substring( ). CGImap
parses and holds all the name-value pairs submitted from the form as part of a
CGI request. You might think that anything that has the word “map”
in it’s name should be inherited from the STL
map,
but
map
has it’s own way of ordering the elements it stores whereas here
it’s useful to keep the elements in the order that they appear on the Web
page. So
CGImap
is inherited from
vector<CGIpair>,
and
operator[
]
is
overloaded so you get the associative-array lookup of a
map. You
can also see that
CGImap
has a copy-constructor and an
operator=,
but they’re both declared as
private.
This is to prevent the compiler from synthesizing the two functions (which it
will do if you don’t declare them yourself), but it also prevents the
client programmer from passing a
CGImap
by value or from using assignment.
CGImap’s
job is to take the input data and parse it into name-value pairs, which it will
do with the aid of
CGIpair
(effectively,
CGIpair
is only a helper class, but it also seems to make it easier to understand the
code). After copying the query string (you’ll see where the query string
comes from later) into a local
string
object
gq,
the
nextPair( )
member function is used to parse the string into raw name-value pairs,
delimited by ‘
=‘
and
‘&’
signs. Each resulting
CGIpair
object is added to the
vector
using the standard
vector::push_back( ).
When
nextPair( )
runs out of input from the query string, it returns zero.
The
CGImap::operator[
]
takes the brute-force approach of a linear search through the elements. Since
the
CGImap
is intentionally not sorted and they tend to be small, this is not too
terrible. The
dump( )
function is used for testing, typically by sending information to the resulting
Web page, as you might guess from the default value of
nl,
which is an HTML “break line” token.
Using
GET can be fine for many applications. However, GET passes its data to the CGI
program through an environment variable (called
QUERY_STRING),
and operating systems typically run out of environment space with long GET
strings (you should start worrying at about 200 characters). CGI provides a
solution for this: POST. With POST, the data is encoded and concatenated the
same way as with GET, but POST uses standard input to pass the encoded query
string to the CGI program and has no length limitation on the input. All you
have to do in your CGI program is determine the length of the query string.
This length is stored in the environment variable
CONTENT_LENGTH.
Once you know the length, you can allocate storage and read the precise number
of bytes from standard input. Because POST is the less-fragile solution, you
should probably prefer it over GET, unless you know for sure that your input
will be short. In fact, one might surmise that the only reason for GET is that
it is slightly easier to code a CGI program in C using GET. However, the last
class in
CGImap.h
is a tool that makes handling a POST just as easy as handling a GET, which
means you can always use POST.
The
class
Post
inherits from a string and only has a constructor. The job of the constructor
is to get the query data from the POST into itself (a
string).
It does this by reading the
CONTENT_LENGTH
environment variable using the Standard C library function
getenv( ).
This comes back as a pointer to a C character string. If this pointer is zero,
the CONTENT_LENGTH environment variable has not been set, so something is
wrong. Otherwise, the character string must be converted to an integer using
the Standard C library function
atoi( ).
The resulting length is used with
new
to
allocate enough storage to hold the query string (plus its null terminator),
and then
read( )
is called for
cin.
The
read( )
function takes a pointer to the destination buffer and the number of bytes to
read. The resulting buffer is inserted into the current
string
using
string::append( ).
At this point, the POST data is just a
string
object and can be easily used without further concern about where it came from.
Testing
the CGI parser
Now
that the basic tools are defined, they can easily be used in a CGI program like
the following which simply dumps the name-value pairs that are parsed from a
GET query. Remember that an iterator for a
CGImap
returns a
CGIpair
object when it is dereferenced, so you must select the
first
and
second
parts of that
CGIpair:
//: C26:CGI_GET.cpp
// Tests CGImap by extracting the information
// from a CGI GET submitted by an HTML Web page.
#include "CGImap.h"
int main() {
// You MUST print this out, otherwise the
// server will not send the response:
cout << "Content-type: text/plain\n" << endl;
// For a CGI "GET," the server puts the data
// in the environment variable QUERY_STRING:
CGImap query(getenv("QUERY_STRING"));
// Test: dump all names and values
for(CGImap::iterator it = query.begin();
it != query.end(); it++) {
cout << (*it).first << " = "
<< (*it).second << endl;
}
} ///:~
When
you use the GET approach (which is controlled by the HTML page with the METHOD
tag of the FORM directive), the Web server grabs everything after the
‘?’ and puts in into the operating-system environment variable
QUERY_STRING.
So to read that information all you have to do is get the
QUERY_STRING.
You do this with the standard C library function
getenv( ),
passing it the identifier of the environment variable you wish to fetch. In
main( ),
notice
how simple the act of parsing the
QUERY_STRING
is: you just hand it to the constructor for the
CGImap
object called
query
and
all the work is done for you. Although an iterator is used here, you can also
pull out the names and values from
query
using
CGImap::operator[
]
. Now
it’s important to understand something about CGI. A CGI program is handed
its input in one of two ways: through QUERY_STRING during a GET (as in the
above case) or through standard input during a POST. But a CGI program only
returns its results through standard output, via
cout.
Where does this output go? Back to the Web server, which decides what to do
with it. The server makes this decision based on the
content-type
header, which means that if the
content-type
header isn’t the first thing it sees, it won’t know what to do with
the data. Thus it’s essential that you start the output of all CGI
programs with the
content-type
header.
In
this case, we want the server to feed all the information directly back to the
client program. The information should be unchanged, so the
content-type
is
text/plain.
Once the server sees this, it will echo all strings right back to the client as
a simple text Web page.
To
test this program, you must compile it in the cgi-bin directory of your host
Web server. Then you can perform a simple test by writing an HTML page like this:
//:! C26:GETtest.html
<HTML><HEAD>
<TITLE>A test of standard HTML GET</TITLE>
</HEAD> Test, uses standard html GET
<Form method="GET" ACTION="/cgi-bin/CGI_GET.exe">
<P>Field1: <INPUT TYPE = "text" NAME = "Field1"
VALUE = "This is a test" size = "40"></p>
<P>Field2: <INPUT TYPE = "text" NAME = "Field2"
VALUE = "of the emergency" size = "40"></p>
<P>Field3: <INPUT TYPE = "text" NAME = "Field3"
VALUE = "broadcast system" size = "40"></p>
<P>Field4: <INPUT TYPE = "text" NAME = "Field4"
VALUE = "this is only a test" size = "40"></p>
<P>Field5: <INPUT TYPE = "text" NAME = "Field5"
VALUE = "In a real emergency" size = "40"></p>
<P>Field6: <INPUT TYPE = "text" NAME = "Field6"
VALUE = "you will be instructed" size = "40"></p>
<p><input type = "submit" name = "submit" > </p>
</Form></HTML>
///:~
Of
course, the
CGI_GET.exe
program
must be compiled on some kind of Web server and placed in the correct
subdirectory (typically called “cgi-bin” in order for this web page
to work. The dominant Web server is the freely-available Apache (see
http://www.Apache.org), which runs on virtually all platforms. Some
word-processing/spreadsheet packages even come with Web servers. It’s
also quite cheap and easy to get an old PC and install Linux along with an
inexpensive network card. Linux automatically sets up the Apache server for
you, and you can test everything on your local network as if it were live on
the Internet. One way or another it’s possible to install a Web server
for local tests, so you don’t need to have a remote Web server and
permission to install CGI programs on that server.
One
of the advantages of this design is that, now that
CGIpair
and
CGImap
are defined, most of the work is done for you so you can easily create your own
CGI program simply by modifying
main( ).
Using
POST
The
CGIpair
and
CGImap
from
CGImap.h
can be used as is for a CGI program that handles POSTs. The only thing you need
to do is get the data from a
Post
object
instead of from the
QUERY_STRING
environment variable. The following listing shows how simple it is to write
such a CGI program:
//: C26:CGI_POST.cpp
// CGImap works as easily with POST as it
// does with GET.
#include <iostream>
#include "CGImap.h"
using namespace std;
int main() {
cout << "Content-type: text/plain\n" << endl;
Post p; // Get the query string
CGImap query(p);
// Test: dump all names and values
for(CGImap::iterator it = query.begin();
it != query.end(); it++) {
cout << (*it).first << " = "
<< (*it).second << endl;
}
} ///:~
After
creating a
Post
object, the query string is no different from a GET query string, so it is
handed to the constructor for
CGImap.
The different fields in the vector are then available just as in the previous
example. If you wanted to get even more terse, you could even define the
Post
as a temporary directly inside the constructor for the
CGImap
object:
To
test this program, you can use the following Web page:
//:! C26:POSTtest.html
<HTML><HEAD>
<TITLE>A test of standard HTML POST</TITLE>
</HEAD>Test, uses standard html POST
<Form method="POST" ACTION="/cgi-bin/CGI_POST.exe">
<P>Field1: <INPUT TYPE = "text" NAME = "Field1"
VALUE = "This is a test" size = "40"></p>
<P>Field2: <INPUT TYPE = "text" NAME = "Field2"
VALUE = "of the emergency" size = "40"></p>
<P>Field3: <INPUT TYPE = "text" NAME = "Field3"
VALUE = "broadcast system" size = "40"></p>
<P>Field4: <INPUT TYPE = "text" NAME = "Field4"
VALUE = "this is only a test" size = "40"></p>
<P>Field5: <INPUT TYPE = "text" NAME = "Field5"
VALUE = "In a real emergency" size = "40"></p>
<P>Field6: <INPUT TYPE = "text" NAME = "Field6"
VALUE = "you will be instructed" size = "40"></p>
<p><input type = "submit" name = "submit" > </p>
</Form></HTML>
///:~
When
you press the “submit” button, you’ll get back a simple text
page containing the parsed results, so you can see that the CGI program works
correctly. The server turns around and feeds the query string to the CGI
program via standard input.
Handling
mailing lists
Managing
an email list is the kind of problem many people need to solve for their Web
site. As it is turning out to be the case for everything on the Internet, the
simplest approach is always the best. I learned this the hard way, first trying
a variety of Java applets (which some firewalls do not allow) and even
JavaScript (which isn’t supported uniformly on all browsers). The result
of each experiment was a steady stream of email from the folks who
couldn’t get it to work. When you set up a Web site, your goal should be
to never get email from anyone complaining that it doesn’t work, and the
best way to produce this result is to use plain HTML (which, with a little
work, can be made to look quite decent).
The
second problem was on the server side. Ideally, you’d like all your email
addresses to be added and removed from a single master file, but this presents
a problem. Most operating systems allow more than one program to open a file.
When a client makes a CGI request, the Web server starts up a new invocation of
the CGI program, and since a Web server can handle many requests at a time,
this means that you can have many instances of your CGI program running at
once. If the CGI program opens a specific file, then you can have many programs
running at once that open that file. This is a problem if they are each reading
and writing to that file.
There
may be a function for your operating system that “locks” a file, so
that other invocations of your program do not access the file at the same time.
However, I took a different approach, which was to make a unique file for each
client. Making a file unique was quite easy, since the email name itself is a
unique character string. The filename for each request is then just the email
name, followed by the string “.add” or “.remove”. The
contents of the file is also the email address of the client. Then, to produce
a list of all the names to add, you simply say something like (in Unix):
(or
the equivalent for your system). For removals, you say:
cat
*.remove > removelist
Once
the names have been combined into a list you can archive or remove the files.
The
HTML code to place on your Web page becomes fairly straightforward. This
particular example takes an email address to be added or removed from my C++
mailing list:
<h1 align="center"><font color="#000000">
The C++ Mailing List</font></h1>
<div align="center"><center>
<table border="1" cellpadding="4"
cellspacing="1" width="550" bgcolor="#FFFFFF">
<tr>
<td width="30" bgcolor="#FF0000"> </td>
<td align="center" width="422" bgcolor="#0">
<form action="/cgi-bin/mlm.exe" method="GET">
<input type="hidden" name="subject-field"
value="cplusplus-email-list">
<input type="hidden" name="command-field"
value="add"><p>
<input type="text" size="40"
name="email-address">
<input type="submit" name="submit"
value="Add Address to C++ Mailing List">
</p></form></td>
<td width="30" bgcolor="#FF0000"> </td>
</tr>
<tr>
<td width="30" bgcolor="#000000"> </td>
<td align="center" width="422"
bgcolor="#FF0000">
<form action="/cgi-bin/mlm.exe" method="GET">
<input type="hidden" name="subject-field"
value="cplusplus-email-list">
<input type="hidden" name="command-field"
value="remove"><p>
<input type="text" size="40"
name="email-address">
<input type="submit" name="submit"
value="Remove Address From C++ Mailing List">
</p></form></td>
<td width="30" bgcolor="#000000"> </td>
</tr>
</table>
</center></div>
Each
form contains one data-entry field called
email-address,
as well as a couple of hidden fields which don’t provide for user input
but carry information back to the server nonetheless. The
subject-field
tells the CGI program the subdirectory where the resulting file should be
placed. The
command-field
tells the CGI program whether the user is requesting that they be added or
removed from the list. From the
action,
you can see that a GET is used with a program called
mlm.exe
(for “mailing list manager”). Here it is:
//: C26:mlm.cpp
// A GGI program to maintain a mailing list
#include "CGImap.h"
#include <fstream>
using namespace std;
const string contact("Bruce@EckelObjects.com");
// Paths in this program are for Linux/Unix. You
// must use backslashes (two for each single
// slash) on Win32 servers:
const string rootpath("/home/eckel/");
int main() {
cout << "Content-type: text/html\n"<< endl;
CGImap query(getenv("QUERY_STRING"));
if(query["test-field"] == "on") {
cout << "map size: " << query.size() << "<br>";
query.dump(cout, "<br>");
}
if(query["subject-field"].size() == 0) {
cout << "<h2>Incorrect form. Contact " <<
contact << endl;
return 0;
}
string email = query["email-address"];
if(email.size() == 0) {
cout << "<h2>Please enter your email address"
<< endl;
return 0;
}
if(email.find_first_of(" \t") != string::npos){
cout << "<h2>You cannot use white space "
"in your email address" << endl;
return 0;
}
if(email.find('@') == string::npos) {
cout << "<h2>You must use a proper email"
" address including an '@' sign" << endl;
return 0;
}
if(email.find('.') == string::npos) {
cout << "<h2>You must use a proper email"
" address including a '.'" << endl;
return 0;
}
string fname = email;
if(query["command-field"] == "add")
fname += ".add";
else if(query["command-field"] == "remove")
fname += ".remove";
else {
cout << "error: command-field not found. Contact "
<< contact << endl;
return 0;
}
string path(rootpath + query["subject-field"]
+ "/" + fname);
ofstream out(path.c_str());
if(!out) {
cout << "cannot open " << path << "; Contact"
<< contact << endl;
return 0;
}
out << email << endl;
cout << "<br><H2>" << email << " has been ";
if(query["command-field"] == "add")
cout << "added";
else if(query["command-field"] == "remove")
cout << "removed";
cout << "<br>Thank you</H2>" << endl;
} ///:~
Again,
all the CGI work is done by the
CGImap.
From then on it’s a matter of pulling the fields out and looking at them,
then deciding what to do about it, which is easy because of the way you can
index into a
map
and also because of the tools available for standard
strings.
Here, most of the programming has to do with checking for a valid email
address. Then a file name is created with the email address as the name and
“.add” or “.remove” as the extension, and the email
address is placed in the file.
Maintaining
your list
Once
you have a list of names to add, you can just paste them to end of your list.
However, you might get some duplicates so you need a program to remove those.
Because your names may differ only by upper and lowercase, it’s useful to
create a tool that will read a list of names from a file and place them into a
container of strings, forcing all the names to lowercase as it does:
//: C26:readLower.h
// Read a file into a container of string,
// forcing each line to lower case.
#ifndef READLOWER_H
#define READLOWER_H
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <cctype>
#include "../require.h"
inline char downcase(char c) {
using namespace std; // Compiler bug
return tolower(c);
}
std::string lcase(std::string s) {
std::transform(s.begin(), s.end(),
s.begin(), downcase);
return s;
}
template<class SContainer>
void readLower(char* filename, SContainer& c) {
std::ifstream in(filename);
assure(in, filename);
const int sz = 1024;
char buf[sz];
while(in.getline(buf, sz))
// Force to lowercase:
c.push_back(string(lcase(buf)));
}
#endif // READLOWER_H ///:~
Since
it’s a
template,
it will work with any container of
string
that supports
push_back( ).
Again,
you may want to change the above to the form
readln(in,
s)
instead of using a fixed-sized buffer, which is more fragile.
Once
the names are read into the list and forced to lowercase, removing duplicates
is trivial:
//: C26:RemoveDuplicates.cpp
// Remove duplicate names from a mailing list
#include <vector>
#include <algorithm>
#include "../require.h"
#include "readLower.h"
using namespace std;
int main(int argc, char* argv[]) {
requireArgs(argc, 2);
vector<string> names;
readLower(argv[1], names);
long before = names.size();
// You must sort first for unique() to work:
sort(names.begin(), names.end());
// Remove adjacent duplicates:
unique(names.begin(), names.end());
long removed = before - names.size();
ofstream out(argv[2]);
assure(out, argv[2]);
copy(names.begin(), names.end(),
ostream_iterator<string>(out,"\n"));
cout << removed << " names removed" << endl;
} ///:~
A
vector
is
used here instead of a
list
because
sorting requires random-access which is much faster in a
vector.
(A
list
has a built-in
sort( )
so that it doesn’t suffer from the performance that would result from
applying the normal
sort( )
algorithm shown above).
The
sort must be performed so that all duplicates are adjacent to each other. Then
unique( )
can remove all the adjacent duplicates. The program also keeps track of how
many duplicate names were removed.
When
you have a file of names to remove from your list,
readLower( )
comes in handy again:
//: C26:RemoveGroup.cpp
// Remove a group of names from a list
#include <list>
#include "../require.h"
#include "readLower.h"
using namespace std;
typedef list<string> Container;
int main(int argc, char* argv[]) {
requireArgs(argc, 3);
Container names, removals;
readLower(argv[1], names);
readLower(argv[2], removals);
long original = names.size();
Container::iterator rmit = removals.begin();
while(rmit != removals.end())
names.remove(*rmit++); // Removes all matches
ofstream out(argv[3]);
assure(out, argv[3]);
copy(names.begin(), names.end(),
ostream_iterator<string>(out,"\n"));
long removed = original - names.size();
cout << "On removal list: " << removals.size()
<< "\n Removed: " << removed << endl;
} ///:~
Here,
a
list
is
used instead of a
vector
(since
readLower( )
is
a
template,
it adapts). Although there is a
remove( )
algorithm that can be applied to containers, the built-in
list::remove( )
seems to work better.
The
second command-line argument is the file containing the list of names to be
removed. An iterator is used to step through that list, and the
list::remove( )
function removes every instance of each name from the master list. Here, the
list doesn’t need to be sorted first.
Unfortunately,
that’s not all there is to it. The messiest part about maintaining a
mailing list is the bounced messages. Presumably, you’ll just want to
remove the addresses that produce bounces. If you can combine all the bounced
messages into a single file, the following program has a pretty good chance of
extracting the email addresses; then you can use
RemoveGroup
to delete them from your list.
//: C26:ExtractUndeliverable.cpp
// Find undeliverable names to remove from
// mailing list from within a mail file
// containing many messages
#include <cstdio>
#include <string>
#include <set>
#include "../require.h"
using namespace std;
char* start_str[] = {
"following address",
"following recipient",
"following destination",
"undeliverable to the following",
"following invalid",
};
char* continue_str[] = {
"Message-ID",
"Please reply to",
};
// The in() function allows you to check whether
// a string in this set is part of your argument.
class StringSet {
char** ss;
int sz;
public:
StringSet(char** sa, int sza):ss(sa),sz(sza) {}
bool in(char* s) {
for(int i = 0; i < sz; i++)
if (strstr(s, ss[i]) != 0)
return true;
return false;
}
};
// Calculate array length:
#define ALEN(A) ((sizeof A)/(sizeof *A))
StringSet
starts(start_str, ALEN(start_str)),
continues(continue_str, ALEN(continue_str));
int main(int argc, char* argv[]) {
requireArgs(argc, 2,
"Usage:ExtractUndeliverable infile outfile");
FILE* infile = fopen(argv[1], "rb");
FILE* outfile = fopen(argv[2], "w");
require(infile != 0); require(outfile != 0);
set<string> names;
const int sz = 1024;
char buf[sz];
while(fgets(buf, sz, infile) != 0) {
if(starts.in(buf)) {
puts(buf);
while(fgets(buf, sz, infile) != 0) {
if(continues.in(buf)) continue;
if(strstr(buf, "---") != 0) break;
const char* delimiters= " \t<>():;,\n\"";
char* name = strtok(buf, delimiters);
while(name != 0) {
if(strstr(name, "@") != 0)
names.insert(string(name));
name = strtok(0, delimiters);
}
}
}
}
set<string>::iterator i = names.begin();
while(i != names.end())
fprintf(outfile, "%s\n", (*i++).c_str());
} ///:~
The
first thing you’ll notice about this program is that contains some C
functions, including C I/O. This is not because of any particular design
insight. It just seemed to work when I used the C elements, and it started
behaving strangely with C++ I/O. So the C is just because it works, and you may
be able to rewrite the program in more “pure C++” using your C++
compiler and produce correct results.
A
lot of what this program does is read lines looking for string matches. To make
this convenient, I created a
StringSet
class with a member function
in( )
that tells you whether any of the strings in the set are in the argument. The
StringSet
is initialized with a constant two-dimensional of strings and the size of that
array. Although the
StringSet
makes the code easier to read, it’s also easy to add new strings to the
arrays.
Both
the input file and the output file in
main( )
are manipulated with standard I/O, since it’s not a good idea to mix I/O
types in a program. Each line is read using
fgets( ),
and if one of them matches with the
starts
StringSet,
then what follows will contain email addresses, until you see some dashes (I
figured this out empirically, by hunting through a file full of bounced email).
The
continues
StringSet
contains strings whose lines should be ignored. For each of the lines that
potentially contains an addresses, each address is extracted using the Standard
C Library function
strtok( )
and then it is added to the
set<string>
called
names.
Using a
set
eliminates duplicates (you may have duplicates based on case, but those are
dealt with by
RemoveGroup.cpp.
The resulting
set
of names is then printed to the output file.
Mailing
to your list
There
are a number of ways to connect to your system’s mailer, but the
following program just takes the simple approach of calling an external command
(“fastmail,” which is part of Unix) using the Standard C library
function
system( ).
The program spends all its time building the external command.
When
people don’t want to be on a list anymore they will often ignore
instructions and just reply to the message. This can be a problem if the email
address they’re replying with is different than the one that’s on
your list (sometimes it has been routed to a new or aliased address). To solve
the problem, this program prepends the text file with a message that informs
them that they can remove themselves from the list by visiting a URL. Since
many email programs will present a URL in a form that allows you to just click
on it, this can produce a very simple removal process. If you look at the URL,
you can see it’s a call to the
mlm.exe
CGI program, including removal information that incorporates the same email
address the message was sent to. That way, even if the user just replies to the
message, all you have to do is click on the URL that comes back with their
reply (assuming the message is automatically copied back to you).
//: C26:Batchmail.cpp
// Sends mail to a list using Unix fastmail
#include <iostream>
#include <fstream>
#include <string>
#include <strstream>
#include <cstdlib> // system() function
#include "../require.h"
using namespace std;
string subject("New Intensive Workshops");
string from("Bruce@EckelObjects.com");
string replyto("Bruce@EckelObjects.com");
ofstream logfile("BatchMail.log");
int main(int argc, char* argv[]) {
requireArgs(argc, 2,
"Usage: Batchmail namelist mailfile");
ifstream names(argv[1]);
assure(names, argv[1]);
string name;
while(getline(names, name)) {
ofstream msg("m.txt");
assure(msg, "m.txt");
msg << "To be removed from this list, "
"DO NOT REPLY TO THIS MESSAGE. Instead, \n"
"click on the following URL, or visit it "
"using your Web browser. This \n"
"way, the proper email address will be "
"removed. Here's the URL:\n"
<< "http://www.mindview.net/cgi-bin/"
"mlm.exe?subject-field=workshop-email-list"
"&command-field=remove&email-address="
<< name << "&submit=submit\n\n"
"------------------------------------\n\n";
ifstream text(argv[2]);
assure(text, argv[1]);
msg << text.rdbuf() << endl;
msg.close();
string command("fastmail -F " + from +
" -r " + replyto + " -s \"" + subject +
"\" m.txt " + name);
system(command.c_str());
logfile << command << endl;
static int mailcounter = 0;
const int bsz = 25;
char buf[bsz];
// Convert mailcounter to a char string:
ostrstream mcounter(buf, bsz);
mcounter << mailcounter++ << ends;
if((++mailcounter % 500) == 0) {
string command2("fastmail -F " + from +
" -r " + replyto + " -s \"Sent " +
string(buf) +
" messages \" m.txt eckel@aol.com");
system(command2.c_str());
}
}
} ///:~
The
first command-line argument is the list of email addresses, one per line. The
names are read one at a time into the
string
called
name
using
getline( ).
Then a temporary file called
m.txt
is created to build the customized message for that individual; the
customization is the note about how to remove themselves, along with the URL.
Then the message body, which is in the file specified by the second
command-line argument, is appended to
m.txt.
Finally, the command is built inside a
string:
the “-F” argument to
fastmail
is who it’s from, the “-r” argument is who to reply to. The
“-s” is the subject line, the next argument is the file containing
the mail and the last argument is the email address to send it to.
You
can start this program in the background and tell Unix not to stop the program
when you sign off of the server. However, it takes a while to run for a long
list (this isn’t because of the program itself, but the mailing process).
I like to keep track of the progress of the program by sending a status message
to another email account, which is accomplished in the last few lines of the
program.
A
general information-extraction
CGI
program
One
of the problems with CGI is that you must write and compile a new program every
time you want to add a new facility to your Web site. However, much of the time
all that your CGI program does is capture information from the user and store
it on the server. If you could use hidden fields to specify what to do with the
information, then it would be possible to write a single CGI program that would
extract the information from any CGI request. This information could be stored
in a uniform format, in a subdirectory specified by a hidden field in the HTML
form, and in a file that included the user’s email address – of
course, in the general case the email address doesn’t guarantee
uniqueness (the user may post more than one submission) so the date and time of
the submission can be mangled in with the file name to make it unique. If you
can do this, then you can create a new data-collection page just by defining
the HTML and creating a new subdirectory on your server. For example, every
time I come up with a new class or workshop, all I have to do is create the
HTML form for signups – no CGI programming is required.
The
following HTML page shows the format for this scheme. Since a CGI POST is more
general and doesn’t have any limit on the amount of information it can
send, it will always be used instead of a GET for the
ExtractInfo.cpp
program that will implement this system. Although this form is simple, yours
can be as complicated as you need it.
//:! C26:INFOtest.html
<html><head><title>
Extracting information from an HTML POST</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF"
vlink="#800080"> <hr>
<p>Extracting information from an HTML POST</p>
<form action="/cgi-bin/ExtractInfo.exe"
method="POST">
<input type="hidden" name="subject-field"
value="test-extract-info">
<input type="hidden" name="reminder"
value="Remember your lunch!">
<input type="hidden" name="test-field"
value="on">
<input type="hidden" name="mail-copy"
value="Bruce@EckelObjects.com;eckel@aol.com">
<input type="hidden" name="confirmation"
value="confirmation1">
<p>Email address (Required): <input
type="text" size="45" name="email-address" >
</p>Comment:<br>
<textarea name="Comment" rows="6" cols="55">
</textarea>
<p><input type="submit" name="submit">
<input type="reset" name="reset"</p>
</form><hr></body></html>
///:~
Right
after the form’s
action
statement, you see
This
means that particular field will not appear on the form that the user sees, but
the information will still be submitted as part of the data for the CGI program.
The
value of this field named “subject-field” is used by
ExtractInfo.cpp
to
determine the subdirectory in which to place the resulting file (in this case,
the subdirectory will be “test-extract-info”). Because of this
technique and the generality of the program, the only thing you’ll
usually need to do to start a new database of data is to create the
subdirectory on the server and then create an HTML page like the one above. The
ExtractInfo.cpp
program will do the rest for you by creating a unique file for each submission.
Of course, you can always change the program if you want it to do something
more unusual, but the system as shown will work most of the time.
The
contents of the “reminder” field will be displayed on the form that
is sent back to the user when their data is accepted. The
“test-field” indicates whether to dump test information to the
resulting Web page. If “mail-copy” exists and contains anything
other than “no” the value string will be parsed for mailing
addresses separated by ‘;’ and each of these addresses will get a
mail message with the data in it. The “email-address” field is
required in each case and the email address will be checked to ensure that it
conforms to some basic standards.
The
“confirmation” field causes a second program to be executed when
the form is posted. This program parses the information that was stored from
the form into a file, turns it into human-readable form and sends an email
message back to the client to confirm that their information was received (this
is useful because the user may not have entered their email address correctly;
if they don’t get a confirmation message they’ll know something is
wrong). The design of the “confirmation” field allows the person
creating the HTML page to select more than one type of confirmation. Your first
solution to this may be to simply call the program directly rather than
indirectly as was done here, but you don’t want to allow someone else to
choose – by modifying the web page that’s downloaded to them
– what programs they can run on your machine.
Here
is the program that will extract the information from the CGI request:
//: C26:ExtractInfo.cpp
// Extracts all the information from a CGI POST
// submission, generates a file and stores the
// information on the server. By generating a
// unique file name, there are no clashes like
// you get when storing to a single file.
#include "CGImap.h"
#include <iostream>
#include <fstream>
#include <cstdio>
#include <ctime>
using namespace std;
const string contact("Bruce@EckelObjects.com");
// Paths in this program are for Linux/Unix. You
// must use backslashes (two for each single
// slash) on Win32 servers:
const string rootpath("/home/eckel/");
void show(CGImap& m, ostream& o);
// The definition for the following is the only
// thing you must change to customize the program
void
store(CGImap& m, ostream& o, string nl = "\n");
int main() {
cout << "Content-type: text/html\n"<< endl;
Post p; // Collect the POST data
CGImap query(p);
// "test-field" set to "on" will dump contents
if(query["test-field"] == "on") {
cout << "map size: " << query.size() << "<br>";
query.dump(cout);
}
if(query["subject-field"].size() == 0) {
cout << "<h2>Incorrect form. Contact " <<
contact << endl;
return 0;
}
string email = query["email-address"];
if(email.size() == 0) {
cout << "<h2>Please enter your email address"
<< endl;
return 0;
}
if(email.find_first_of(" \t") != string::npos){
cout << "<h2>You cannot include white space "
"in your email address" << endl;
return 0;
}
if(email.find('@') == string::npos) {
cout << "<h2>You must include a proper email"
" address including an '@' sign" << endl;
return 0;
}
if(email.find('.') == string::npos) {
cout << "<h2>You must include a proper email"
" address including a '.'" << endl;
return 0;
}
// Create a unique file name with the user's
// email address and the current time in hex
const int bsz = 1024;
char fname[bsz];
time_t now;
time(&now); // Encoded date & time
sprintf(fname, "%s%X.txt", email.c_str(), now);
string path(rootpath + query["subject-field"] +
"/" + fname);
ofstream out(path.c_str());
if(!out) {
cout << "cannot open " << path << "; Contact"
<< contact << endl;
return 0;
}
// Store the file and path information:
out << "///{" << path << endl;
// Display optional reminder:
if(query["reminder"].size() != 0)
cout <<"<H1>" << query["reminder"] <<"</H1>";
show(query, cout); // For results page
store(query, out); // Stash data in file
cout << "<br><H2>Your submission has been "
"posted as<br>" << fname << endl
<< "<br>Thank you</H2>" << endl;
out.close();
// Optionally send generated file as email
// to recipients specified in the field:
if(query["mail-copy"].length() != 0 &&
query["mail-copy"] != "no") {
string to = query["mail-copy"];
// Parse out the recipient names, separated
// by ';', into a vector.
vector<string> recipients;
int ii = to.find(';');
while(ii != string::npos) {
recipients.push_back(to.substr(0, ii));
to = to.substr(ii + 1);
ii = to.find(';');
}
recipients.push_back(to); // Last one
// "fastmail" only available on Linux/Unix:
for(int i = 0; i < recipients.size(); i++) {
string cmd("fastmail -s"" \"" +
query["subject-field"] + "\" " +
path + " " + recipients[i]);
system(cmd.c_str());
}
}
// Execute a confirmation program on the file.
// Typically, this is so you can email a
// processed data file to the client along with
// a confirmation message:
if(query["confirmation"].length() != 0) {
string conftype = query["confirmation"];
if(conftype == "confirmation1") {
string command("./ProcessApplication.exe "+
path + " &");
// The data file is the argument, and the
// ampersand runs it as a separate process:
system(command.c_str());
string logfile("Extract.log");
ofstream log(logfile.c_str());
}
}
}
// For displaying the information on the html
// results page:
void show(CGImap& m, ostream& o) {
string nl("<br>");
o << "<h2>The data you entered was:"
<< "</h2><br>"
<< "From[" << m["email-address"] << ']' <<nl;
for(CGImap::iterator it = m.begin();
it != m.end(); it++) {
string name = (*it).first,
value = (*it).second;
if(name != "email-address" &&
name != "confirmation" &&
name != "submit" &&
name != "mail-copy" &&
name != "test-field" &&
name != "reminder")
o << "<h3>" << name << ": </h3>"
<< "<pre>" << value << "</pre>";
}
}
// Change this to customize the program:
void store(CGImap& m, ostream& o, string nl) {
o << "From[" << m["email-address"] << ']' <<nl;
for(CGImap::iterator it = m.begin();
it != m.end(); it++) {
string name = (*it).first,
value = (*it).second;
if(name != "email-address" &&
name != "confirmation" &&
name != "submit" &&
name != "mail-copy" &&
name != "test-field" &&
name != "reminder")
o << nl << "[{[" << name << "]}]" << nl
<< "[([" << nl << value << nl << "])]"
<< nl;
// Delimiters were added to aid parsing of
// the resulting text file.
}
} ///:~
The
program is designed to be as generic as possible, but if you want to change
something it is most likely the way that the data is stored in a file (for
example, you may want to store it in a comma-separated ASCII format so that you
can easily read it into a spreadsheet). You can make changes to the storage
format by modifying
store( ),
and to the way the data is displayed by modifying
show( ). main( )
begins using the same three lines you’ll start with for any POST program.
The rest of the program is similar to
mlm.cpp
because it looks at the “test-field” and
“email-address”
(checking it for correctness). The file name combines the user’s email
address and the current date and time in hex – notice that
sprintf( )
is used because it has a convenient way to convert a value to a hex
representation. The entire file and path information is stored in the file,
along with all the data from the form, which is tagged as it is stored so that
it’s easy to parse (you’ll see a program to parse the files a bit
later). All the information is also sent back to the user as a simply-formatted
HTML page, along with the reminder, if there is one. If “mail-copy”
exists and is not “no,” then the names in the
“mail-copy” value are parsed and an email is sent to each one
containing the tagged data. Finally, if there is a “confirmation”
field, the value selects the type of confirmation (there’s only one type
implemented here, but you can easily add others) and the command is built that
passes the generated data file to the program (called
ProcessApplication.exe).
That program will be created in the next section.
Parsing
the data files
You
now have a lot of data files accumulating on your Web site, as people sign up
for whatever you’re offering. Here’s what one of them might look
like:
///{/home/eckel/super-cplusplus-workshop-registration/Bruce@EckelObjects.com35B589A0.txt
From[Bruce@EckelObjects.com]
[{[subject-field]}]
[([
super-cplusplus-workshop-registration
])]
[{[Date-of-event]}]
[([
Sept 2-4
])]
[{[name]}]
[([
Bruce Eckel
])]
[{[street]}]
[([
20 Sunnyside Ave, Suite A129
])]
[{[city]}]
[([
Mill Valley
])]
[{[state]}]
[([
CA
])]
[{[country]}]
[([
USA
])]
[{[zip]}]
[([
94941
])]
[{[busphone]}]
[([
415-555-1212
])]
///:~
This
is a brief example, but there are as many fields as you have on your HTML form.
Now, if your event is compelling you’ll have a whole lot of these files
and what you’d like to do is automatically extract the information from
them and put that data in any format you’d like. For example, the
ProcessApplication.exe
program mentioned above will use the data in an email confirmation message.
You’ll also probably want to put the data in a form that can be easily
brought into a spreadsheet. So it makes sense to start by creating a
general-purpose tool that will automatically parse any file that is created by
ExtractInfo.cpp:
//: C26:FormData.h
#include <string>
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
class DataPair : public pair<string, string> {
public:
DataPair() {}
DataPair(istream& in) { get(in); }
DataPair& get(istream& in);
operator bool() {
return first.length() != 0;
}
};
class FormData : public vector<DataPair> {
public:
string filePath, email;
// Parse the data from a file:
FormData(char* fileName);
void dump(ostream& os = cout);
string operator[](const string& key);
}; ///:~
The
DataPair
class looks a bit like the
CGIpair
class, but it’s simpler. When you create a
DataPair,
the constructor calls
get( )
to extract the next pair from the input stream. The
operator
bool
indicates an empty
DataPair,
which usually signals the end of an input stream.
FormData
contains the path where the original file was placed (this path information is
stored within the file), the email address of the user, and a
vector<DataPair>
to hold the information. The
operator[
]
allows you to perform a map-like lookup, just as in
CGImap. Here
are the definitions:
//: C26:FormData.cpp {O}
#include "FormData.h"
#include "../require.h"
DataPair& DataPair::get(istream& in) {
first.erase(); second.erase();
string ln;
getline(in,ln);
while(ln.find("[{[") == string::npos)
if(!getline(in, ln)) return *this; // End
first = ln.substr(3, ln.find("]}]") - 3);
getline(in, ln); // Throw away [([
while(getline(in, ln))
if(ln.find("])]") == string::npos)
second += ln + string(" ");
else
return *this;
}
FormData::FormData(char* fileName) {
ifstream in(fileName);
assure(in, fileName);
require(getline(in, filePath) != 0);
// Should be start of first line:
require(filePath.find("///{") == 0);
filePath = filePath.substr(strlen("///{"));
require(getline(in, email) != 0);
// Should be start of 2nd line:
require(email.find("From[") == 0);
int begin = strlen("From[");
int end = email.find("]");
int length = end - begin;
email = email.substr(begin, length);
// Get the rest of the data:
DataPair dp(in);
while(dp) {
push_back(dp);
dp.get(in);
}
}
string FormData::operator[](const string& key) {
iterator i = begin();
while(i != end()) {
if((*i).first == key)
return (*i).second;
i++;
}
return string(); // Empty string == not found
}
void FormData::dump(ostream& os) {
os << "filePath = " << filePath << endl;
os << "email = " << email << endl;
for(iterator i = begin(); i != end(); i++)
os << (*i).first << " = "
<< (*i).second << endl;
} ///:~
The
DataPair::get( )
function assumes you are using the same
DataPair
over and over (which is the case, in
FormData::FormData( ))
so it first calls
erase( )
for its
first
and
second
strings.
Then it begins parsing the lines for the key (which is on a single line and is
denoted by the “
[{[”
and “
]}]”)
and the value (which may be on multiple lines and is denoted by a begin-marker
of “
[([”
and an end-marker of “
])]”)
which it places in the
first
and
second
members, respectively.
The
FormData
constructor is given a file name to open and read. The
FormData
object always expects there to be a file path and an email address, so it reads
those itself before getting the rest of the data as
DataPairs. With
these tools in hand, extracting the data becomes quite easy:
//: C26:FormDump.cpp
//{L} FormData
#include "FormData.h"
#include "../require.h"
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
FormData fd(argv[1]);
fd.dump();
} ///:~
The
only reason that
ProcessApplication.cpp
is busier is that it is building the email reply. Other than that, it just
relies on
FormData:
//: C26:ProcessApplication.cpp
//{L} FormData
#include <cstdio>
#include "FormData.h"
#include "../require.h"
using namespace std;
const string from("Bruce@EckelObjects.com");
const string replyto("Bruce@EckelObjects.com");
const string basepath("/home/eckel");
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
FormData fd(argv[1]);
char tfname[L_tmpnam];
tmpnam(tfname); // Create a temporary file name
string tempfile(basepath + tfname + fd.email);
ofstream reply(tempfile.c_str());
assure(reply, tempfile.c_str());
reply << "This message is to verify that you "
"have been added to the list for the "
<< fd["subject-field"] << ". Your signup "
"form included the following data; please "
"ensure it is correct. You will receive "
"further updates via email. Thanks for your "
"interest in the class!" << endl;
FormData::iterator i;
for(i = fd.begin(); i != fd.end(); i++)
reply << (*i).first << " = "
<< (*i).second << endl;
reply.close();
// "fastmail" only available on Linux/Unix:
string command("fastmail -F " + from +
" -r " + replyto + " -s \"" +
fd["subject-field"] + "\" " +
tempfile + " " + fd.email);
system(command.c_str()); // Wait to finish
remove(tempfile.c_str()); // Erase the file
} ///:~
This
program first creates a temporary file to build the email message in. Although
it uses the Standard C library function
tmpnam( )
to create a temporary file name, this program takes the paranoid step of
assuming that, since there can be many instances of this program running at
once, it’s possible that a temporary name in one instance of the program
could collide with the temporary name in another instance. So to be extra
careful, the email address is appended onto the end of the temporary file name.
The
message is built, the
DataPairs
are added to the end of the message, and once again the Linux/Unix
fastmail
command is built to send the information. An interesting note: if, in
Linux/Unix, you add an ampersand (
&)
to the end of the command before giving it to
system( ),
then this command will be spawned as a background process and
system( )
will immediately return (the same effect can be achieved in Win32 with
start).
Here, no ampersand is used, so
system( )
does not return until the command is finished – which is a good thing,
since the next operation is to delete the temporary file which is used in the
command.
The
final operation in this project is to extract the data into an easily-usable
form. A spreadsheet is a useful way to handle this kind of information, so this
program will put the data into a form that’s easily readable by a
spreadsheet program:
//: C26:DataToSpreadsheet.cpp
//{L} FormData
#include "FormData.h"
#include <string>
#include <cstdio>
#include "../require.h"
using namespace std;
string delimiter("\t");
int main(int argc, char* argv[]) {
for(int i = 1; i < argc; i++) {
FormData fd(argv[i]);
cout << fd.email << delimiter;
FormData::iterator i;
for(i = fd.begin(); i != fd.end(); i++)
if((*i).first != "workshop-suggestions")
cout << (*i).second << delimiter;
cout << endl;
}
} ///:~
Common
data interchange formats use various delimiters to separate fields of
information. Here, a tab is used but you can easily change it to something
else. Also note that I have checked for the “workshop-suggestions”
field and specifically excluded that, because it tends to be too long for the
information I want in a spreadsheet. You can make another version of this
program that only extracts the “workshop-suggestions” field.
This
program assumes that all the file names are expanded on the command line. Using
it under Linux/Unix is easy since file-name global expansion
(“globbing”) is handled for you. So you say:
DataToSpreadsheet
*.txt >> spread.out
In
Win32 (at a DOS prompt) it’s a bit more involved, since you must do the
“globbing” yourself:
For
%f in (*.txt) do DataToSpreadsheet %f >> spread.out
This
technique is generally useful for writing Win32/DOS command lines.
[65]
Actually, Java Servlets look like a much better solution than CGI, but –
at least at this writing – Servlets are still an up-and-coming solution
and you’re unlikely to find them provided by your typical ISP.
[66]
Free Web servers are relatively common and can be found by browsing the
Internet; Apache, for example, is the most popular Web server on the Internet.
[67]
GNU stands for “Gnu’s Not Unix.” The project, created by the
Free Software Foundation, was originally intended to replace the Unix operating
system with a free version of that OS. Linux appears to have replaced this
initiative, but the GNU tools have played an integral part in the development
of Linux, which comes packaged with many GNU components.
Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru