In [1]:
%%html
<link rel="stylesheet" href="files/custom_devel.css" />

Towards Quantifying Program Complexity and Comprehension

Michael Hansen, Andrew Lumsdaine, Robert Goldstone

Ph.D. Candidate Computer Science & Cognitive Science, Indiana University USA

Dagstuhl Seminar 13502, 8-11 December 2013

Cognitive Complexity in the Context of Programming

Software complexity is "a measure of resources expended by a system [human or other] while interacting with a piece of software to perform a given task."

Basili, 1980



One feature which all of these [theoretical] approaches have in common is that they begin with certain characteristics of the software and attempt to determine what effect they might have on the difficulty of the various programmer tasks.

A more useful approach would be first to analyze the processes involved in programmer tasks, as well as the parameters which govern the effort involved in those processes. From this point one can deduce, or at least make informed guesses, about which code characteristics will affect those parameters.

Cant et. al, 1995

Psychology of Programming

  • Knowledge
    • Experienced programmers represent at multiple levels of abstraction: syntactic, semantic, and schematic
    • Conventions and common programming plans allow experts to quickly infer intent and avoid unnecessary details
  • Strategies
    • Experienced programmers use more design strategies (top-down, bottomup, breadth-first, depth-first)
    • Current strategy is chosen based on factors like familiarity, problem domain, and available language features
  • Task
    • Current task or goal will change which kinds of program knowledge and reading strategies are advantageous
    • Experienced programmers read and remember code differently depending on whether they intend to edit, debug, or reuse it
  • Environment
    • Programmers use their tools to off-load mental work and to build up representations of the current problem state
    • The benefits of specific tools, such as program visualization, also depend on programming expertise

Douce's Stores Model of Code Cognition


  • Image from Douce, 2008

Quantitative Software Complexity

Cognitive/psychological complexity is...

  • Function of source code (complexity metrics, cognitive weights)
    • \(\vec{c} = f({code})\)
    • \(c_i > {threshold}_i\) is bad
  • Poor support for user activities (Cognitive Dimensions of Notation)
    • Resistance to change, hidden dependencies, etc.
    • Programming languages are used to try out ideas
  • Unfamiliar schemas/implicit rule violations (Détienne/Soloway)
    • Explains novice/expert differences
    • Don't \(X\), IF/THEN rules
  • Properties of a cognitive model trace (Cognitive Complexity Metric, Mr. Bits)
    • Cognitive resource constraints + effects of notation
    • Task time, eye movement metrics, contents of memory, etc.

A (Quantitative) Cognitive Model of Programming

[A cognitive model] seeks to explain basic mental processes and their interactions; processes such as perceiving, learning, remembering, problem solving, and decision making.

Busemeyer and Diederich, 2010

Cognitive Complexity Metric - Cant et. al, 1995

Mr. Chips (Legge, 2002)

  • Ideal observer model of text reading based on EZ-Reader
  • Single, long line of text
  • Make the saccade that minimizes uncertainty

Retina

  • Rectangular, discrete fovea + para-fovea
  • Only characters/whitespace distinguishable in para-fovea

Eye Movement Model

  • Gaussian variability in saccade length
  • Model is aware of noise

Lexicon

  • Words and relative frequencies

  • Image from Legge  et. al, 2002


Mr. Bits

  • Computational process model with active vision
    • Software complexity is resource expenditure
  • Reading and predicting printed output of code
    • Same task as human programmers (eyeCode experiment)
  • Implemented on top of Python ACT-R
    • CMU Cognitive architecture

Limitations

  • Subset of Python
    • for, if, def, print
    • Procedural programs
  • Single file, single screen
  • Output prediction task
  • Discrete retina, line-based reading
  • No production learning (learning happens in DM)


Mr. Bits and ACT-R

  • Goal
    • High-level strategy: skim vs. predict
    • Sub-goal: chunk or trace
  • Visual
    • Line-aligned sensor
    • Whitespace-separated tokens into Imaginal
  • Imaginal
    • Context of current line/branch/loop
    • Type of line, role of variables
  • Declarative
    • Short and long-term memory for variables/locations
    • Forgetting causes a re-trace
  • Procedural
    • Behaviors to categorize lines, update DM, drive sensor
  • Manual
    • Type response
(p encode-letter
   =goal>
      isa         read-letters
      state       attend
   =visual>
      isa         text
      value       =letter1
   ?imaginal>
      buffer      empty
==>
   =goal>
      state       wait
   +imaginal>
      isa         array
      letter1     =letter1
)


Douce's Stores Model of Code Cognition + Mr. Bits

  • IM = imaginal buffer, DM = declarative memory (ACT-R)
  • BMs = behavior models (productions + state)


  • Base image from Douce, 2008

Prototype Model: Mr. Bits 0.1a

overload - plusmixed

a = 4
b = 3
print a + b

c = 7
d = 2
print c + d

e = "5"
f = "3"
print e + f
a = 4
b = 3
print a + b

  1. Move eye to line 1
  2. Parse a = 4, categorize as variable assignment
  3. Store a in DM as number with value 4
  4. Move eye to line 2
  5. Parse b = 3, categorize as variable assignment
  6. Store b in DM as number with value 3
  7. Move eye to line 3
  8. Parse print a + b, categorize as print line
  9. Retrieve a from DM (delay)
  10. Retrieve b from DM (delay)
  11. Look up 4 + 3 in DM or procedural
  12. Type 7

...


Model Predictions

Faster

a = 4
b = 3
print a + b

c = 7
d = 2
print c + d

e = 5
f = 3
print e + f

Slower

c = 7
d = 2
b = 3
f = 3
a = 4
e = 5


print a + b
print c + d
print e + f


A Model of Human Programmers?

  • Not exactly, but
    • Framework for testing a family of models
      • Predictions for future experiments
    • Provides timing and output predictions for task
      • Possibility of errors (GOMS)
    • Makes important questions/assumptions explicit
      • DM contents, cognitive processes
    • First attempt at a program comprehension process model based on a cognitive architecture
      • Path to better complexity metrics

Application to Inductive Programming

Software Synthesis

  • Mr. Bits as a partial order
    • Reduce/order program candidate set
    • Less "complex" program candidates are given first
    • Easier code for humans to understand and modify
  • Model Task?
    • Current design focuses on output prediction
    • Predict value of specific variable
    • Time to build representation
  • Better than program size?
    • Time/resource intensive
    • Many software dependencies
    • Better results than cognitive weights?

Resources

Software Design - Cognitive Aspects (Détienne, 2001)


Thank you!

Questions?