Topics in Performance Evaluation – Project

Topics in Performance Evaluation

Project – Extracting Feedback from Data

This project is not part of the course requirements,
But if you do it well your work will be publicized in the Parallel Workload Archive,
and it can also turn into an "Avoda Mudrechet"!
(Come talk with me if you want to do this)

There are two basic models of how work arrives at a system: the open model and the closed model. The difference is that in the closed model there exists a feedback effect from the system's performance to the submittal of additional work. The problem is that usually we don't have any data about such feedback. In this project we attempt to elicit such data.

Background

In an open system model, the work arrives from some external users or clients. The assumption is that the user population is infinite, and they send jobs irrespective of the current state of the system. In particular, they continue to send jobs even when the system is overloaded. Our simulations of M/M/1 and related systems follow this model.

The alternative is a closed system, in which a finite population of users is assumed. These users only submit a new job after the previous one terminated. So if the system is slow, the submittal rate will drop. As a result, the system is inherently stable.

In a real system both models typically apply. Some work is submitted by new users who do not know about the system state. But each user usually submits more than one job, so successive jobs are already subject to feedback from the system performance.

The standard workload format used on the Parallel Workloads Archive includes two fields to describe dependencies between jobs: for each job we can specify which other job it depends on (field 17), and what is the think time between the termination of that job and the submittal of this one (field 18). These fields are intended to be used as a replacement for the arrival time (field 2) in simulations. However, the logs do not contain any data in these fields.

Our goal in this project is to generate such dependencies based on educated guessing. We will look at the trace, and try to guess whether submitted jobs depend on previous ones. Specifically, we will look at previous jobs submitted by the same user (as identified by field 12), and when they terminated. We will then assign dependencies, and re-write the trace with this added heuristic data.

Assignment

  1. Define the algorithm.

    The main part of this project is to come up with a methodology to extract potential dependencies from the trace data. For example, if a certain user submits a job, and soon after it terminates he submits another job, it is reasonable to assume that the second job depends on the first one, and call the time from the first termination to the second submittal the think time.

    The question is what to do when things are not so clear cut. For example, if the second job is submitted 23 minutes after the first one terminates, is it still considered dependent? and what if it comes 47 minutes later? or 1.5 hours? or even more, e.g. the next job is only 3 weeks later? Obviously some threshold is needed. But maybe each user should have a different threshold?

    Also, what if several jobs are submitted without waiting for the immediately preceding jobs to terminate? does that mean that there is no dependency? or maybe the dependency is on some earlier job that did terminate already? If so, which one?

    In short, you need to define a heuristic (or several competing heuristics) that makes sense. Note that "makes sense" has two components. First, it needs to take the available workload data into account. For example, if your heuristic is that jobs depend on each other only if the second one is submitted at exactly the same second as the first one terminated, but the data indicates that this never happens, then it is not a very useful heuristic. Second, it should match your intuitive notion of how human users may be expected to behave.

  2. Implement it.

    Write a program to parse a parallel job trace, assign dependencies according to your algorithm, and re-write the trace with your dependencies incorporated in fields 17 and 18.

  3. Use and evaluate it.

    Run your program on available data files from the Parallel Workloads Archive. Then analyze the results to verify that they make sense. For example, what is the distribution of session lengths that is generated, where sessions are defined to be sequences of jobs that depend on each other? Note that this can involve two separate notions of "length": how many jobs are there in each session, and how long is it from the first to the last job. Can you think of any other analyses?

Given that this is not a graded exercise, you are welcome (and even expected) to come and discuss what you are doing or planning to do.

Submit

Write a report on your work, with a description of your algorithm for deciding on job dependencies, and your justification for this approach. Also include the result of applying this program to traces from the archive.

If you want to do this project, come to me to discuss the details.

To the course home page