In [1]:
from IPython import display

Mr. Bits: A Quantitative Model of Program Comprehension

Michael Hansen, Rob Goldstone, Andrew Lumsdaine

PALM Lab, Fall 2014

What Makes Code Hard to Understand?

Software complexity is "a measure of resources expended by a system [human or other] while interacting with a piece of software to perform a given task."

Basili, 1980
  • Program Complexity
    • Computational complexity
    • Representational complexity
    • Psychological complexity
      • Problem/functional (necessary) complexity
      • Document/structural (accidental)
      • Programmer characteristics

Reducing the (Accidental) Complexity of a Program

  • New programming languages/paradigms
  • New methodoligies/principles
  • New development tools
Many of the claims made for the efficacy and utility of new approaches to software engineering - structured methodologies, new programming paradigms, new tools, and so on. Evidence to support such claims is thin and such evidence, as there is, is largely anecdotal. Of proper scientific evidence there is remarkably little.

Futhermore, such as there is can be described as “black box”, that is, it demonstrates a correlation between the use of certain technique and an improvement in some aspect of the development. It does not demonstrate how the technique achieves the observed effect.

Francoise Detienne, 2001

Modeling The Symptoms of Complexity

  • Cognitive models of programmers needed
One feature which all of these approaches have in common is that they begin with certain characteristics of the software and attempt to determine what effect they might have on the difficulty of various programmer tasks.

A more useful approach would be first to analyze the processes involved in the programmer tasks, as well as the paramters which govern the effort involved in those processes.

...one should start with the symptoms of complexity, which are all manifested in the mind, and attempt to understand the processes which produce those symptoms.

Cant, 1995

From Chess to Code

DeGroot (1965)

Soloway (1984)


Soloway (1984) - Alpha

Soloway (1984) - Beta

Of Metrics and Models

Justification: Miller's 7 ± 2

Category Metric
Graph theoretic McCabe's Cyclomatic Number
  Gill and Kemerer's Complexity Density
  Woodward's Knots
  Nejmeh's NPATH Complexity
  Emerson's Cohension Metric
Textual Line Count
  Character/Keyword Count
  Douce's Spatial Complexity
  Aggarwal's Function/Class Template Factors

Category Metric
Information theoretic Chen's MIN (Nesting)
  Harrison's AICC
  Halstead's Effort
  Ramamarthy and Melton's Weighted Pair
  Hansen's Complexity Pair
  Davis and LeBlan's Entropy
  Woodfield's Logical Module Complexity
  Gannon's Number of Generic Packages
  Veldhuzien's Minimum Description Length

See Cant (1995) for complete listing

Tractz (1979) - The Human Information Processing System

Soloway (1984) - Rules of Discourse

More complex programs violate these rules:

  1. Variable names should reflect function.
  2. Don't include code that won't be used.
  3. If there is a test for a condition, then the condition must have the potential of being true.
  4. A variable that is initialized via an assignment statement should be updated via an assignment statement.
  5. Don't do double duty with code in a non-obvious way.
  6. An if should be used when a statement body is guaranteed to be executed only once, and a while used when a statement body may be repeatedly executed.

Soloway (1986) - Plan Knowledge

von Mayrhauser (1995) - Integrated Metamodel

Any of the three submodels may become active at any time during the comprehension process.
von Mayrhauser (1995)

Which Program is More Complex?

1 for i in [1, 2, 3, 4]:
2     print "The count is", i
3     print "Done counting"
1 for i in [1, 2, 3, 4]:
2     print "The count is", i 
3 
4 
5     print "Done counting" 

 1 def add_1(num):
 2     num = num + 1
 3 
 4 def twice(num):
 5     num = num * 2
 6 
 7 added = 4
 8 add_1(added)
 9 twice(added)
10 add_1(added)
11 twice(added)
12 print added
 1 def add_1(added):
 2     added = added + 1
 3 
 4 def twice(added):
 5     added = added * 2
 6 
 7 added = 4
 8 add_1(added)
 9 twice(added)
10 add_1(added)
11 twice(added)
12 print added

Cant (1995) - Cognitive Complexity Model


  • Process model of program comprehension
  • Chunking
    • Reading and understanding a coupled block of code
    • Faster when chunks are smaller, less complex
  • Tracing
    • Visual jump to required sub-chunk
    • Faster when chunk is closer, more salient


Chunking

Tracing

Cognitive Complexity Model Terms

Term Description
$R_F$ Speed of recall or review (familiarity)
$R_S$ Chunk size
$R_C$ Type of control structure in which chunk is embedded
$R_E$ Difficulty of understanding complex Boolean or other expressions
$R_R$ Recognizability of chunk (conformance to expectations)
$R_V$ Effects of visual structure (separable from other chunks)
$R_D$ Disruptions caused by dependencies (due to short-term memory)
 
$T_F$ Dependency familiarity
$T_L$ Localization of dependency (local, remote, etc.)
$T_A$ Ambiguity of dependency source (many possibilities?)
$T_S$ Spatial distance (e.g., lines of code)
$T_C$ Level of cueing (obscurity of reference)

Hansen (2012/2013) - Mr. Bits

  1. A cognitive architecture can operationalize CCM
  2. Task-free complexity is not meaningful
  3. ACT-R + output prediction, reading/output duration
Term Description ACT-R Module
$R_F$ Speed of recall or review (familiarity) DM
$R_S$ Chunk size Visual
$R_C$ Type of control structure in which chunk is embedded
$R_E$ Difficulty of understanding complex Boolean or other expressions
$R_R$ Recognizability of chunk (conformance to expectations)
$R_V$ Effects of visual structure (separable from other chunks) Visual
$R_D$ Disruptions caused by dependencies (due to short-term memory) DM
 
$T_F$ Dependency familiarity DM
$T_L$ Localization of dependency (local, remote, etc.) Visual
$T_A$ Ambiguity of dependency source (many possibilities?)
$T_S$ Spatial distance (e.g., lines of code) Visual
$T_C$ Level of cueing (obscurity of reference)
 
  Delay from typing predicted output Manual

Mr. Bits - Python Debugger + ACT-R


Model Workflow

  1. Python debugger runs program, records lines visited and variable values
  2. Program abstract syntax tree (AST) is analyzed, variable reads, writes, and computations noted
  3. ACT-R model is generated with a fixed goal stack for the program
  4. ACT-R trace is parsed for eye positions and latencies

Use of ACT-R Modules

  • Lines are read (each token) with EMMA if not in DM
  • Variables are visually traced if not in DM
  • Computations (+, <, etc.) have high base level in DM
  • Response characters are typed with manual module


Open Questions/Issues

  • Model is on rails - retrieval failures cannot influence response
  • Cannot make errors
  • How to tokenize program code? What are the "words"?
  • Trace locations currently hard-coded

The eyeCode Experiment

  • 1 experiment = 10 Python programs
  • 2-3 versions of each program
  • Eye movements and keystrokes recorded

Programs

10 categories, 2-3 versions each (25 total)


  • between - filter two lists, intersection
  • counting - for loop with bug
  • funcall - function call with different spacing
  • initvar - summation and factorial
  • order - 3 functions called (in order/shuffled)

  • rectangle - compute area of 2 rectangles
  • overload - overloaded + operator (number/strings)
  • partition - partition list of numbers
  • scope - function calls with no effect
  • whitespace - linear equations


[ most errors ] • [ fewer errors ] • [ no errors ]

Fixations to Relative Line Times


Transition Matrices

  • First-order transition probabilities


Rolling Metrics

  • Computed over a rolling time window
  • Window size and step can vary

Evaluating the Model

  • Qualitative

    • Relative time on each line
    • Visual appearance of transition matrix
    • Fixation timeline
    • Rolling metrics and responses (mental computations)
  • Quantitative

    • Fixation metrics (average duration, count, rate)
    • Likelihood of trial time (actual distribution)
    • Spearman correlation of relative time on each line
    • Transition matrix distance (actual, equally likely)
    • Errors made?

Advanced ACT-R Model


  • Basic model + ontology + spatial
  • Map visual locations/areas to program semantics (loops, functions)
  • Use qualtitative spatial reasoning
  • Variable name similarity (prefix), semantic priming


Parameters

  • Initial DM activations
  • DM weights (similarity)
  • Standard ACT-R parameters

Limitations

  • No IDE/tool interaction
  • Only single file, simple programs
  • High level rules of discourse?

Thank you!