Towards Quantifying Program Complexity and Comprehension

Michael Hansen, Andrew Lumsdaine, Robert Goldstone

Ph.D. Candidate Computer Science & Cognitive Science, Indiana University USA

Dagstuhl Seminar 13502, 8-11 December 2013

Cognitive Complexity in the Context of Programming

Software complexity is "a measure of resources expended by a system [human or other] while interacting with a piece of software to perform a given task."

Basili, 1980

One feature which all of these [theoretical] approaches have in common is that they begin with certain characteristics of the software and attempt to determine what effect they might have on the difficulty of the various programmer tasks.

A more useful approach would be first to analyze the processes involved in programmer tasks, as well as the parameters which govern the effort involved in those processes. From this point one can deduce, or at least make informed guesses, about which code characteristics will affect those parameters.

Cant et. al, 1995

Psychology of Programming

Knowledge
- Experienced programmers represent at multiple levels of abstraction: syntactic, semantic, and schematic
- Conventions and common programming plans allow experts to quickly infer intent and avoid unnecessary details
Strategies
- Experienced programmers use more design strategies (top-down, bottomup, breadth-first, depth-first)
- Current strategy is chosen based on factors like familiarity, problem domain, and available language features
Task
- Current task or goal will change which kinds of program knowledge and reading strategies are advantageous
- Experienced programmers read and remember code differently depending on whether they intend to edit, debug, or reuse it
Environment
- Programmers use their tools to off-load mental work and to build up representations of the current problem state
- The benefits of specific tools, such as program visualization, also depend on programming expertise

Douce's Stores Model of Code Cognition

Image from Douce, 2008

Quantitative Software Complexity

Cognitive/psychological complexity is...

Function of source code (complexity metrics, cognitive weights)
- \(\vec{c} = f({code})\)
- \(c_i > {threshold}_i\) is bad
Poor support for user activities (Cognitive Dimensions of Notation)
- Resistance to change, hidden dependencies, etc.
- Programming languages are used to try out ideas
Unfamiliar schemas/implicit rule violations (Détienne/Soloway)
- Explains novice/expert differences
- Don't \(X\), IF/THEN rules

Properties of a cognitive model trace (Cognitive Complexity Metric, Mr. Bits)
- Cognitive resource constraints + effects of notation
- Task time, eye movement metrics, contents of memory, etc.

A (Quantitative) Cognitive Model of Programming

[A cognitive model] seeks to explain basic mental processes and their interactions; processes such as perceiving, learning, remembering, problem solving, and decision making.

Busemeyer and Diederich, 2010

Cognitive Complexity Metric - Cant et. al, 1995

Mr. Chips (Legge, 2002)

Ideal observer model of text reading based on EZ-Reader
Single, long line of text
Make the saccade that minimizes uncertainty

Retina

Rectangular, discrete fovea + para-fovea
Only characters/whitespace distinguishable in para-fovea

Eye Movement Model

Gaussian variability in saccade length
Model is aware of noise

Lexicon

Words and relative frequencies

Image from Legge et. al, 2002

Mr. Bits

Computational process model with active vision
- Software complexity is resource expenditure
Reading and predicting printed output of code
- Same task as human programmers (eyeCode experiment)
Implemented on top of Python ACT-R
- CMU Cognitive architecture

Limitations

Subset of Python
- for, if, def, print
- Procedural programs
Single file, single screen
Output prediction task
Discrete retina, line-based reading
No production learning (learning happens in DM)

Mr. Bits and ACT-R

Goal
- High-level strategy: skim vs. predict
- Sub-goal: chunk or trace
Visual
- Line-aligned sensor
- Whitespace-separated tokens into Imaginal
Imaginal
- Context of current line/branch/loop
- Type of line, role of variables
Declarative
- Short and long-term memory for variables/locations
- Forgetting causes a re-trace
Procedural
- Behaviors to categorize lines, update DM, drive sensor
Manual
- Type response

(p encode-letter
   =goal>
      isa         read-letters
      state       attend
   =visual>
      isa         text
      value       =letter1
   ?imaginal>
      buffer      empty
==>
   =goal>
      state       wait
   +imaginal>
      isa         array
      letter1     =letter1
)

Douce's Stores Model of Code Cognition + Mr. Bits

IM = imaginal buffer, DM = declarative memory (ACT-R)
BMs = behavior models (productions + state)

Base image from Douce, 2008

Prototype Model: Mr. Bits 0.1a

overload - plusmixed

a = 4
b = 3
print a + b

c = 7
d = 2
print c + d

e = "5"
f = "3"
print e + f

a = 4
b = 3
print a + b

Move eye to line 1
Parse a = 4, categorize as variable assignment
Store a in DM as number with value 4
Move eye to line 2
Parse b = 3, categorize as variable assignment
Store b in DM as number with value 3
Move eye to line 3
Parse print a + b, categorize as print line
Retrieve a from DM (delay)
Retrieve b from DM (delay)
Look up 4 + 3 in DM or procedural
Type 7

...

Model Predictions

Faster

a = 4
b = 3
print a + b

c = 7
d = 2
print c + d

e = 5
f = 3
print e + f

Slower

c = 7
d = 2
b = 3
f = 3
a = 4
e = 5


print a + b
print c + d
print e + f

A Model of Human Programmers?

Not exactly, but
- Framework for testing a family of models
  - Predictions for future experiments
- Provides timing and output predictions for task
  - Possibility of errors (GOMS)
- Makes important questions/assumptions explicit
  - DM contents, cognitive processes
- First attempt at a program comprehension process model based on a cognitive architecture
  - Path to better complexity metrics

Application to Inductive Programming

Software Synthesis

Mr. Bits as a partial order
- Reduce/order program candidate set
- Less "complex" program candidates are given first
- Easier code for humans to understand and modify
Model Task?
- Current design focuses on output prediction
- Predict value of specific variable
- Time to build representation
Better than program size?
- Time/resource intensive
- Many software dependencies
- Better results than cognitive weights?

Resources

Software Design - Cognitive Aspects (Détienne, 2001)

http://eyecode.synesthesiam.com

Towards Quantifying Program Complexity and Comprehension

Michael Hansen, Andrew Lumsdaine, Robert Goldstone

Ph.D. Candidate Computer Science & Cognitive Science, Indiana University USA

Dagstuhl Seminar 13502, 8-11 December 2013

Cognitive Complexity in the Context of Programming

Psychology of Programming

Douce's Stores Model of Code Cognition

Quantitative Software Complexity

Cognitive/psychological complexity is...

A (Quantitative) Cognitive Model of Programming

Mr. Chips (Legge, 2002)

Retina

Eye Movement Model

Lexicon

Mr. Bits

Limitations

Mr. Bits and ACT-R

Douce's Stores Model of Code Cognition + Mr. Bits

Prototype Model: Mr. Bits 0.1a

overload - plusmixed

Model Predictions

Faster

Slower

A Model of Human Programmers?

Application to Inductive Programming

Software Synthesis

Resources

Thank you!

Questions?