In this exercise you will:
Design the first version of an XML pretty-printer | |
Produce a set of UML class and sequence diagrams to document your design | |
Experience design patterns in actual C++ or Java code | |
Write a covering set of unit tests |
The deadline for this exercise is Sunday, March 30th, 2003.
In this exercise you only need to support a small subset of the XML format standard. An XML document is a string which starts with a <Tag> and end with the same </Tag>. Inside the tag could be a text value, or other tags. For example:
<Book id = "123" library = "cs">
<Name>Refactoring</Name>
<Author>
<FirstName>Martin</FirstName>
<LastName>Fowler</LastName>
<!-- Here's an example of a list
within a list -->
<OtherBooks>
<BookName>UML
Distilled</BookName>
<BookName>Analysis
Patterns</BookName>
</OtherBooks>
</Author>
<Copies>2<Status>Loaned</Status>1<Status>Available</Status></Copies>
</Book>
XML documents follow these rules:
A tag name must be start with an English letter, and may continue only with letters, digits or underscore (_) characters. | |
Opening and closing tags must match (i.e. | |
There must be exactly one root tag (i.e. not starting with a tag is illegal, and so is having two or more top-level tags). | |
Tag names are case sensitive (i.e. | |
There is no limit to the depth of the tags hierarchy. | |
The same tag can be used more than once (as with the | |
Each tag can contain either a text value (like the |
Two special document elements in the above example are comments and attributes:
A comment looks like a tag that starts with | |
An attribute is written as attrname = "attrvalue" inside the tag, after the tag name. A tag can have several attributes, but it cannot have two attributes with the same name. |
In this exercise, you do not have to support the self-closing tags
syntax such as <Name value = "Fowler"/>
which the XML standard
supports.
Write an executable program called 'xmlp' that can read an XML file, and output it (in this version) as a text file. The program takes as command-line arguments a filename, and one optional argument:
xmlp -nc filename
The program parses the XML document in the given file, and reports a
detailed error and exits if it is not a valid XML document. Otherwise, it
creates a file called
filename.txt
(where 'filename' is the name of the input
file), which includes the contents of the XML document in a more readable way:
The first line is "Contents of <filename>:", where <filename> is the name of the input file. | |
Each text element is described by a 'name: value' line. | |
Each list element is described by a 'name:' line, followed by its contents in the next line, indented by another tab. | |
Each mixed element is described by a 'name:' line, followed by its contents (both text and tags) in the next line, indented by another tab. | |
Comments are described by a '\\ comment text' line. | |
Attributes are not printed to the output. |
For example, if the XML document describing a book, in the previous
section, were the contents of a file called
abook.xml
, then running xmlp abook.xml
should produce a file
called abook.xml.txt
with these contents:
Contents of abook.xml:
Book:
Name: Refactoring
Author:
FirstName: Martin
LastName: Fowler
// Here's an example of a list within
a list
OtherBooks:
BookName: UML
Distilled
BookName:
Analysis Patterns
Copies:
2
Status: Loaned
1
Status: Available
The -nc
command-line arguments implies that comments should not
appear in the output.
The program should not write anything to the standard output, and should write to the standard error in cases of error. Errors include inability to open the input file or create the output file, parsing errors of the input file, unsupplied input file and so forth. If the output file already exists, it should be overwritten.
While this exercise can be easily programmed
within a single class, this won't work since this xmlp
is only a
first version, so it is crucial to maintain an open mind with respect to
possible future requirements. Consider the following possibilities:
It may be required to produce the output in formats other than text -
HTML, PDF, Word or others. It may also be required to
write output in several formats at once, for example: xmlp -txt -pdf -html
abook.xml
It may be required to support other kinds of XML elements besides text, comments and attributes - the standard also defines references, meta-tags and others. It will be required to read such elements and print them to the output.
It may be required to print attributes (as defined above), and also to use attributes to decide how their tags should be printed. For example, certain attributes can decide that their tags will not be printed, will be printed in uppercase, and so forth.
It may be required to modify the input document before pretty-printing it: for example, sort elements, rename tag names, change text to uppercase, and so on.
It may be required to read different data formats of hierarchical data besides XML - for example object graphs, relational tables with foreign keys and others - and to be able to produce the same output for them.
You must design your program so that it is easy to add code that implements the above requirements. Assume that you are the one who will actually have to code it - that's how it usually is in "real life". For each of the above requirements write an explanation in your README file, not more than three sentences long, which explains how it should be coded. For example:
Requirement: It may be required to define filters on which parts of the input document get printed. For example, new command-line arguments can dictate that only the simple (non-hierarchical) elements get printed, that only elements that start with a given string get printed, and so forth.
Solution: Write an Iterator for each kind of filter, whose
next()
method will move to the next element for which the filter is true. Such iterators are implemented as Decorators of other iterators, which easily enables to dynamically combine different filters and does not require changing or recompiling existing code.
It is important that each solution you present will be at most three sentences long. The intention is to enforce the use of design patterns vocabulary rather than elaborating specific class and object relationships.
This exercise intends you to divide your time equally between actual coding and between design, writing UML diagrams, and answering the above six design questions. You may choose between C++ and Java as the implementation languages; in any language chosen, use the standard libraries to their full extent - the standard streams, strings and data structures. With a proper design, this exercise is quick and simple to code.
It is also required that you submit unit tests to test your work. Organize your unit tests into classes by subject, and write a method for each small test. Each test should be self-validating - that is, know by itself whether it has passed or failed. Writing unit tests should be an integral part of coding, and is essential when code must be changed in newer versions. You will have a chance to estimate the convenience of unit tests in exercise 3. Until then:
Read the following article about unit testing as part of coding and the build process. | |
If you write in C++, you must arrange your | |
If you write in Java, you must use the JUnit Unit Testing Framework for your unit tests. |
The code you submit must be built with no compiler warnings, and pass all unit tests.
One metric for measuring the usefulness of a set of unit tests is called coverage, which means the percentage of your code that the unit tests actually run. Coverage of 90% or above is considered good, and you should aim to that goal.
This is an advanced course, so there is no intention to take points for coding style or naming conventions - the emphasis is on proper design. However, you are as always expected to write clear code with a consistent style.
First, submit a tar file electronically, with the following contents:
All program source code, and Makefile if working with C++ | |
All unit tests source code should be in a sub-directory called test/xmlp | |
The README file, with the usual contents (IDs, logins and full names, descriptive list of files and features) and answers to the six possible extensions in the design section above. The README file should also describe parts of the design or design choices that are not evident from reading the UML diagrams. |
In addition to submitting electronically, submit into the course's submission box at Ross -2 the following:
A printout of your README file | |
UML class diagrams, describing all classes in the program. There can be one or several diagrams, as long as they are readable. | |
UML sequence diagrams of two non-trivial object interactions of your choice. |
Diagrams can be hand-written or computer-generated. There are several free UML editing software package on the Internet, including ArgoUML, ProxyDesigner and others.
Read these general guidelines for Object-Oriented design (especially 'Design Guidelines') | |
Re-read about the Composite, Decorator, Iterator and Visitor design patterns | |
Design the program; start with the data structures, then the major operations, and finish with the "main" program. Create UML class and sequence diagrams (see also this tutorial) as you go. | |
Read about unit testing and the JUnit framework before starting to code. | |
Code the program in the same three steps, writing unit tests in parallel with code. That is, write a test class for building the data structure, then a test class for the main operations, then a test class for the main program. | |
If during the coding and testing process you decide to change the design, maintain the UML diagrams to reflect the changes. | |
Register to the course and then submit the exercise according to the above instructions. | |
For general knowledge only, you can also read the XML and XSLT standards. |
Good luck!