Mr. Bits: A Quantitative Model of Program Comprehension

Michael Hansen, Rob Goldstone, Andrew Lumsdaine

PALM Lab, Fall 2014

What Makes Code Hard to Understand?

Software complexity is "a measure of resources expended by a system [human or other] while interacting with a piece of software to perform a given task."

Basili, 1980

Program Complexity
- Computational complexity
- Representational complexity
- Psychological complexity
  - Problem/functional (necessary) complexity
  - Document/structural (accidental)
  - Programmer characteristics

Reducing the (Accidental) Complexity of a Program

New programming languages/paradigms
New methodoligies/principles
New development tools

Many of the claims made for the efficacy and utility of new approaches to software engineering - structured methodologies, new programming paradigms, new tools, and so on. Evidence to support such claims is thin and such evidence, as there is, is largely anecdotal. Of proper scientific evidence there is remarkably little.

Futhermore, such as there is can be described as “black box”, that is, it demonstrates a correlation between the use of certain technique and an improvement in some aspect of the development. It does not demonstrate how the technique achieves the observed effect.

Francoise Detienne, 2001

Modeling The Symptoms of Complexity

Cognitive models of programmers needed

One feature which all of these approaches have in common is that they begin with certain characteristics of the software and attempt to determine what effect they might have on the difficulty of various programmer tasks.

A more useful approach would be first to analyze the processes involved in the programmer tasks, as well as the paramters which govern the effort involved in those processes.

...one should start with the symptoms of complexity, which are all manifested in the mind, and attempt to understand the processes which produce those symptoms.

Cant, 1995

From Chess to Code

DeGroot (1965)

Soloway (1984)

Soloway (1984) - Alpha

Soloway (1984) - Beta

Of Metrics and Models

Justification: Miller's 7 ± 2

Category	Metric
Graph theoretic	McCabe's Cyclomatic Number
	Gill and Kemerer's Complexity Density
	Woodward's Knots
	Nejmeh's NPATH Complexity
	Emerson's Cohension Metric
Textual	Line Count
	Character/Keyword Count
	Douce's Spatial Complexity
	Aggarwal's Function/Class Template Factors

Category	Metric
Information theoretic	Chen's MIN (Nesting)
	Harrison's AICC
	Halstead's Effort
	Ramamarthy and Melton's Weighted Pair
	Hansen's Complexity Pair
	Davis and LeBlan's Entropy
	Woodfield's Logical Module Complexity
	Gannon's Number of Generic Packages
	Veldhuzien's Minimum Description Length

See Cant (1995) for complete listing

Tractz (1979) - The Human Information Processing System

Soloway (1984) - Rules of Discourse

More complex programs violate these rules:

Variable names should reflect function.
Don't include code that won't be used.
If there is a test for a condition, then the condition must have the potential of being true.
A variable that is initialized via an assignment statement should be updated via an assignment statement.
Don't do double duty with code in a non-obvious way.
An if should be used when a statement body is guaranteed to be executed only once, and a while used when a statement body may be repeatedly executed.

Soloway (1986) - Plan Knowledge

von Mayrhauser (1995) - Integrated Metamodel

Any of the three submodels may become active at any time during the comprehension process.
von Mayrhauser (1995)

Which Program is More Complex?

1 for i in [1, 2, 3, 4]:
2     print "The count is", i
3     print "Done counting"

1 for i in [1, 2, 3, 4]:
2     print "The count is", i 
3 
4 
5     print "Done counting"

 1 def add_1(num):
 2     num = num + 1
 3 
 4 def twice(num):
 5     num = num * 2
 6 
 7 added = 4
 8 add_1(added)
 9 twice(added)
10 add_1(added)
11 twice(added)
12 print added

 1 def add_1(added):
 2     added = added + 1
 3 
 4 def twice(added):
 5     added = added * 2
 6 
 7 added = 4
 8 add_1(added)
 9 twice(added)
10 add_1(added)
11 twice(added)
12 print added

Cant (1995) - Cognitive Complexity Model

Process model of program comprehension
Chunking
- Reading and understanding a coupled block of code
- Faster when chunks are smaller, less complex
Tracing
- Visual jump to required sub-chunk
- Faster when chunk is closer, more salient

Chunking

Tracing

Cognitive Complexity Model Terms

Term	Description
$R_F$	Speed of recall or review (familiarity)
$R_S$	Chunk size
$R_C$	Type of control structure in which chunk is embedded
$R_E$	Difficulty of understanding complex Boolean or other expressions
$R_R$	Recognizability of chunk (conformance to expectations)
$R_V$	Effects of visual structure (separable from other chunks)
$R_D$	Disruptions caused by dependencies (due to short-term memory)

$T_F$	Dependency familiarity
$T_L$	Localization of dependency (local, remote, etc.)
$T_A$	Ambiguity of dependency source (many possibilities?)
$T_S$	Spatial distance (e.g., lines of code)
$T_C$	Level of cueing (obscurity of reference)

Hansen (2012/2013) - Mr. Bits

A cognitive architecture can operationalize CCM
Task-free complexity is not meaningful
ACT-R + output prediction, reading/output duration

Term	Description	ACT-R Module
$R_F$	Speed of recall or review (familiarity)	DM
$R_S$	Chunk size	Visual
$R_C$	Type of control structure in which chunk is embedded
$R_E$	Difficulty of understanding complex Boolean or other expressions
$R_R$	Recognizability of chunk (conformance to expectations)
$R_V$	Effects of visual structure (separable from other chunks)	Visual
$R_D$	Disruptions caused by dependencies (due to short-term memory)	DM

$T_F$	Dependency familiarity	DM
$T_L$	Localization of dependency (local, remote, etc.)	Visual
$T_A$	Ambiguity of dependency source (many possibilities?)
$T_S$	Spatial distance (e.g., lines of code)	Visual
$T_C$	Level of cueing (obscurity of reference)

	Delay from typing predicted output	Manual

Mr. Bits - Python Debugger + ACT-R

Model Workflow

Python debugger runs program, records lines visited and variable values
Program abstract syntax tree (AST) is analyzed, variable reads, writes, and computations noted
ACT-R model is generated with a fixed goal stack for the program
ACT-R trace is parsed for eye positions and latencies

Use of ACT-R Modules

Lines are read (each token) with EMMA if not in DM
Variables are visually traced if not in DM
Computations (+, <, etc.) have high base level in DM
Response characters are typed with manual module

Open Questions/Issues

Model is on rails - retrieval failures cannot influence response
Cannot make errors
How to tokenize program code? What are the "words"?
Trace locations currently hard-coded

The eyeCode Experiment

1 experiment = 10 Python programs
2-3 versions of each program
Eye movements and keystrokes recorded

Programs

10 categories, 2-3 versions each (25 total)

between - filter two lists, intersection
counting - for loop with bug
funcall - function call with different spacing
initvar - summation and factorial
order - 3 functions called (in order/shuffled)

rectangle - compute area of 2 rectangles
overload - overloaded + operator (number/strings)
partition - partition list of numbers
scope - function calls with no effect
whitespace - linear equations

[ most errors ] • [ fewer errors ] • [ no errors ]

Fixations to Relative Line Times

Transition Matrices

First-order transition probabilities

Rolling Metrics

Computed over a rolling time window
Window size and step can vary

Evaluating the Model

Qualitative
- Relative time on each line
- Visual appearance of transition matrix
- Fixation timeline
- Rolling metrics and responses (mental computations)
Quantitative
- Fixation metrics (average duration, count, rate)
- Likelihood of trial time (actual distribution)
- Spearman correlation of relative time on each line
- Transition matrix distance (actual, equally likely)
- Errors made?

Advanced ACT-R Model

Basic model + ontology + spatial
Map visual locations/areas to program semantics (loops, functions)
Use qualtitative spatial reasoning
Variable name similarity (prefix), semantic priming

Parameters

Initial DM activations
DM weights (similarity)
Standard ACT-R parameters

Limitations

No IDE/tool interaction
Only single file, simple programs
High level rules of discourse?

Thank you!