Proposal: Summer School on Experimental Computer Science

Summer School on

Experimental Computer Science

Computer Science has an established mathematical theory of what can be computed and at what cost. It also has a well-developed engineering side, spanning hardware development, software development, and the crafting of applications ranging from massive search through computer vision to robotic control. But it has much less to offer in terms of experimentation as it is commonly done in the natural sciences.

In Computer Science, experimentation is often taken to mean implementing a prototype or running a simulation. But in the natural sciences experimental science is more about observation and measurement of nature. It is this connection to reality that seems to be most often missing in Computer Science research. Too much work is being based on assumptions that are either mathematically convenient, or seem to make sense, without verifying that they indeed hold in practice.

The goal of the proposed summer school is to teach students how to be good scientists in addition to being good engineers. It is planned to be a 5-day event, with course offerings in the following three major topic areas. The idea of holding such a summer school was raised at the educational roundtable held at the Workshop on Experimental Computer Science in San Diego as part of ACM FCRC 2007.

Understanding Complex Systems.
While computer-based systems are man-made, many of today's systems are so complex that even their designers cannot claim to fully understand their operational characteristics. For example, this is true of the detailed interactions among architectural features of modern microprocessors, and of the structure and workings of the Internet. Therefore such systems need to be studied much as natural systems are studied, by observation and measurement.
Possible courses in this topic area include the following.
- Basics of measurement: active and passive monitoring; unobtrusive measurement; errors and noise; confidence intervals.
- Measuring the Internet: exploiting the Internet infrastructure for measurement; probing the Internet structure; effect of different points of view; Internet traffic.
- Monitoring infrastructure, such as the KernInst project (option of a hands-on course)
- workload characterization and modeling: workloads as the input to system evaluations; distribution fitting; heavy tails and their implications; correlations in workloads; self similarity; usage examples.
Experimental Engineering.
It is often convenient to think about system construction as a linear process, in which requirements lead to design and implementation. But today it is increasingly being recognized that an iterative and incremental process may be much better, with feedback from actual usage under realistic conditions guiding the direction of subsequent developments. This is manifest in the Unified Software Development Process, in agile software development, and in the procedure used by companies like Google who test new features on real users before incorporating them in the main product line.
Possible courses in this topic area include the following.
- Experimental infrastructure: constructing and using large-scale infrastructure such as PlanetLab (option for hands-on course).
- Microarchitecture development: benchmark design; assessing the coverage of benchmarks; assessing the overlap of benchmarks; architecture-benchmark interactions.
- Reliable evaluations: conducting tournaments; evaluation under equivalent conditions; bias; standardization vs. innovation; examples such as TREC or robocup.
- Experimental algorithmics: the interaction of experimentation and theory; experimental analysis of algorithms that cannot be analyzed; examples from bin packing; examples from phase transitions in NP-complete problems.
Experimenting with Humans.
Most computer systems are built by humans for use by humans. Therefore one cannot escape the need to understand how humans interact with systems and think about them. One should be cognizant of the fact that easily measured metrics do not necessarily correspond to what human users really care about, that users exhibit a large variety of behaviors, and that many of these behaviors are surprising.
Possible courses in this topic area include the following.
- Basics of human cognition: human thought processes; human memory capacity; how humans understand systems and processes; effects of previous experience.
- Experiments with humans: focus groups; experiment design and setup; articulating tasks and requirements; using rewards.
- Human-system interaction: usability testing (option for a hands-on course).
In addition, it is planned to hold a series of open lectures on common topics such as the following:
- The scientific method: history and development of scientific thought.
- Handling data: exploratory data analysis; data visualization; data archiving and sharing.
- Replication: replication vs. repetition; controlled experiments; ensuring replicability of results.

dgf / 31 Oct 2007