This is an ongoing project (see paper) with many subprojects. We developed a game-like platform to measure how hard it is for developers to understand short snippets of code, and used it to measure snippets with different properties. Future work can focus on comparing different styles of loops, on using negation in predicates, on composing various elements together, and so on.
Code reviews are commonly used to verify the quality of code before it goes to production. Does this include a review of variable names? We can conduct a survey on this to learn about the practices used by professional developers, and specifically how they decide whether names are good or not.
Finding good names is hard. And sometimes names we think are good turn out not to be (see our paper on names, including misleading ones). This project is about these problematic cases. Can we characterize them? A possible beginning is to create a naming experiment, and ask participants which variables were hard to name and why.
Complexity metrics for OOP are based on reasonable ideas, but have seldom been validated in any way on real code. The project is to study inheritence and composition as they appear in real code (using large popular open-source projects), and come up with hypotheses and experiments on how these properties affect complexity.
It is commonly accepted that a method should perform a single task. But how small should it be? Making methods very small is good for unit testing, as only few tests are needed. But this may cause the code to become fragmented to the point that it is harder to understand. The project is to try to characterize this tradeoff, and also to review the situation in real open-source code.
Linux (and many other large systems) follows a perpetual development lifecycle model, with a continuous backbone of development activity that is interrupted at times to produce a new stable release (see our paper characterizing this process). It is conjectured that the release activity is related to improving the code and the systems stability, but can this be quantified using code metrics? The project is to characterize what happens both in the preperation before a release and in the period following it, and how it differs from normal development on one hand and continued maintenance on the other.
A developer that enters a long term software project suffers an inherent disadvantage: the code he sees is the result of a long evolutionary process, and is therefore harder to understand than it was to the developers that actually followed the process of its creation. This project is about creating a tool to reconstruct historical views of the code, and assessing whether it indeed helps developers understand the code better. This may be based on file history flow graphs.
System growth is typically measured in LOC or number of modules. But in an operating system like Linux one may be able to measure functional growth. We have already shown that the growth in number of system calls is slowing down considerably, but the growth in configuration options is accelerating (see paper). The project is to extend this to other metrics, such as flags to system calls and ioctl calls.