In [1]:
from IPython import display

Mr. Bits: A Quantitative Process Model of Simple Program Understanding &
Nibbles: A Constraint-Based Model of Program Reading and Inference

Michael Hansen, Rob Goldstone, Andrew Lumsdaine

Percepts and Concepts Lab, Spring 2015

Motivation and Goals

Motivation

  • Existing code complexity metrics are not "cognitive"
  • Almost all cognitive models of program comprehension are qualitative
  • Evaluate different modeling formalisms in a very complex domain

Goals

  • Produce a computational model of simple program evaluation
  • End-to-end model: text to fixations/keystrokes
  • Produce human-like errors and timings

Choose Your Own Adventure!

Mr. Bits

  • ACT-R, declarative memory calculus!
  • Fixation and keystroke timings!
  • LISP!

Nibbles

  • Mental models, Cognitive Domain Ontologies!
  • Constraint solvers!
  • LISP!

Mr. Bits

  • Transforms program text to fixation/keystroke timings
  • Uses ACT-R cognitive architecture
  • Based on Cant Cognitive Complexity Metric (Cant, 1995)

Goal Stack (Mr. Bits)

  • Generated a-prori using Python debugger
  • Realized with custom ACT-R model

Variable Context (Mr. Bits)

  • Unique identity for variables over time
  • Variable identity $\leftrightarrow$ ACT-R memory chunk
  • Does not cover OO method dispatch

ACT-R Model (Mr. Bits)

Symbolic vs. Sub-Symbolic (Mr. Bits)

  • ACT-R mode for all modules

Symbolic

  • No variable timings
  • All memory chunks retrieved immediately

Sub-symbolic

  • Variable timings
  • Memory retrievals faster/slower
  • Memory chunks may be forgotten

Sub-Symbolic + Remember Lists (Mr. Bits)

  • Relative time spent on each line
  • Humans (left), Mr. Bits (right)

Sub-Symbolic + Forget Lists (Mr. Bits)

  • Relative time spent on each line
  • Humans (left), Mr. Bits (right)

Trial Time Comparison (Mr. Bits)

  • Human trial times, sorted low to high (bars)
  • Mr. Bits trial times (lines)
Sym+Remember: - - - Sym+Forget: - - - Sub-Sym+Remember: ----  Sub-Sym+Forget: ---- 

counting - two_spaces


for i in [1, 2, 3, 4]:
    print "The count is", i 


    print "Done counting"

Limitations and Threats to Validity (Mr. Bits)

Text Reading

  • Not a true model of reading
  • No errors possible

Program Scope

  • Tiny, simple programs
  • No syntax highlighting, coding environment

ACT-R

  • Large number of productions
  • Hand-picked parameters

Nibbles

  • Model of program text to mental model transformation
  • Cognitive Domain Ontology + constraint solver
  • Attention, priming, errors

Cognitive Domain Ontologies (Nibbles)

  • Based on Set Entity Structure (SES) theory (Zeigler & Hammonds, 2007)
  • Capture spaces of behavior or situational knowledge (Douglass & Mittal, 2013)
  • Represent mental models and reasoning processes of abduction, deduction, and induction (Johnson-Laird, 1986)

Tree Structure

  • Entities and Relations (alternating)
  • Attached variables on entities
  • 3 Relations:
    • Sub-parts
    • Choice-point
    • Instances

Constraints

  • Remove non-sensical solutions from space
  • Built on 1st order logic
  • Higher-order constraints on instances

Choice Points = Source of Generativity


Ball Example (Nibbles)

  • Simple domain: two choice points
  • CDO tree defines structure
  • Constraints will add domain knowledge

Ball Example (Nibbles)

All Combinations (9 solutions)


(all-values
  (let
    ((size
       (either 'small 'medium 'large))
      (sport
        (either 'baseball 'golf 'basketball)))
     (list size sport)))

With Domain Knowledge (3 solutions)


(all-values
  (let 
     ((size
        (either 'small 'medium 'large))
      (sport
         (either 'baseball 'golf 'basketball)))
      (assert! (and
                 (<-> 'small 'golf)
                 (<-> 'medium 'baseball)
                 (<-> 'large 'basketball)))
      (list size sport)))


Solution Size Sport
1 Small Baseball
2 Small Golf
3 Small Basketball
4 Medium Baseball
5 Medium Golf
6 Medium Basketball
7 Large Baseball
8 Large Golf
9 Large Basketball

Cheryl's Birthday Example (Nibbles)

Cheryl has a birthday, and she doesn't want to share it right away. So she gives her friends Albert and Bernard a list of 10 possible birthday dates:

  • May 15, 16, 19
  • June 17, 18
  • July 14, 16
  • August 14, 15, 17

Cheryl then tells Albert and Bernard separately the month and the day of her birthday respectively.

  1. Albert: I don't know when Cheryl's birthday is, but I know that Bernard doesn't know too.
  2. Bernard: At first I didn't know when Cheryl's birthday was, but I know now.
  3. Albert: Then I also know when Cheryl's birthday is.

So when is Cheryl's birthday?

Cheryl's Birthday CDO (Nibbles)

Cheryl's Birthday Solution (Nibbles)

Cheryl's Birthday Constraints (Nibbles)


Albert Knows the Month

  • All possible Albert dates have the same month
  • Only 1 distinct month is possible

Albert Knows Bernard's Days

  • All possible A.Bernard dates have days in Albert's month

Miscellaneous

  • At least one date is possible!
  • If a date is not possible for Bernard, it's not possible for Albert

Example

  1. Albert is told May (5)
  2. Possible Albert dates are 5/15, 5/16, 5/19
  3. Possible A.Bernard dates are 5/15, 8/15, 5/16, 7/16, 5/19


Cheryl's Birthday Constraints (Nibbles)


Bernard Knows the Day

  • All possible Bernard dates have the same day
  • Only 1 distinct day is possible

Bernard Knows Albert's Months

  • All possible B.Albert dates have months with Bernard's days

Miscellaneous

  • At least one date is possible!
  • If a date is not possible for Albert, it's not possible for Bernard

Example

  1. Bernard is told day 14
  2. Possible Bernard dates are 7/14, 8/14
  3. Possible B.Albert dates are 7/14, 7/16, 8/14, 8/15, 8/17


Cheryl's Birthday Step 1 (Nibbles)

  • Albert: I don't know when Cheryl's birthday is, but I know that Bernard doesn't know too.
    • Albert has more than 1 possible date
    • A.Bernard does not have a possible, distinguishing day - 19 (May), 18 (June)

Possible Dates (A.Bernard):

  • May 15, 16, 19
  • June 17, 18
  • July 14, 16
  • August 14, 15, 17

Cheryl's Birthday Step 2 (Nibbles)

  • Bernard: At first I didn't know when Cheryl's birthday was, but I know now.
    • B.Albert does not have possible months with a distinguishing day - May (19), June (18)
    • Bernard has only 1 possible date

Possible Dates (B.Albert):

  • May 15, 16, 19
  • June 17, 18
  • July 14, 16
  • August 14, 15, 17

With Bernard's Existing Constraints...

  • All possible Bernard dates have the same day
  • If a date is not possible for Albert, it's not possible for Bernard

Possible Dates (Bernard):

  • May 15, 16, 19
  • June 17, 18
  • July 14, 16
  • August 14, 15, 17

Cheryl's Birthday Step 3 (Nibbles)

  • Albert: Then I also know when Cheryl's birthday is.
    • A.Bernard does not have possible months with a distinguishing day - May (19), June (18)
    • A.Bernard only has 1 possible date
    • Albert only has 1 possible date
    • If a date is not possible for Bernard, it's not possible for Albert

Possible Dates (Albert):

  • May 15, 16, 19
  • June 17, 18
  • July 14, 16
  • August 14, 15, 17

Only One Solution!

Abduction vs. Deduction

Abduction

  • Add months, days to Albert, A.Bernard, Bernard, B.Albert
  • Constraint solver generates Possible/Not Possible combinations
  • Solution(s) gives you Cheryl's birthday(s)

Deduction

  • Add possible date(s) to Albert, A.Bernard, Bernard, B.Albert (hypothesis)
  • Constraint solver checks constraints
  • A null set indicates an invalid hypothesis

Model Architecture (Nibbles)

Program Model CDO (Nibbles)

Program Model CDO (Nibbles)


Program

  • 1 or more line groups

Line Group

  • 1 or more related lines
  • Forced on indent (e.g., function def)

Line

  • 1 or more tokens on a single line

Token

  • 1 or more characters
  • Usually whitespace separated


GENERIC-PROGRAM-TYPE:
  LG1. GENERIC-LINE-GROUP-TYPE:

L1. ASSIGNMENT-LINE-TYPE | (indent:1):
  EXPR-LITERAL : PYTYPE-NUMBER
  [NAME] a
  [SYMBOL] =
  [NUMBER] 1

L2. ASSIGNMENT-LINE-TYPE | (indent:1):
  EXPR-LITERAL : PYTYPE-NUMBER
  [NAME] b
  [SYMBOL] =
  [NUMBER] 2

L3. PRINT-LINE-TYPE | (indent:1):
  EXPR-VV-PLUS : PYTYPE-NUMBER
  [KEYWORD] print
  [NAME] a    
  [SYMBOL] +    
  [NAME] b    


Program Model Constraints 1/2 (Nibbles)

Program

  • Fixated line is high detail (character level)
  • Line above and below fixed are low detail + indent
  • Other lines are unknown + line/token sizes

Line Group

  • line-group-indent - if lines A and B in same group, A|B or A>B
  • all-lines-active - every line belongs to 1 line group, in order
  • line-group->lines:
    • generic, for-loop-block, if-stmt-block, fun-def-block
    • for, if, def start group + indent

Program Model Constraints 2/2 (Nibbles)

Line

  • line->tokens:
    • unknown, assignment, fun-def, fun-call, print-line, for-loop, if-stmt, return, whitespace
      • Right-hand side expression
        • Type: N/A, number, string
        • Form: literal, variable, literal/literal, variable/variable
        • Operator: plus-sum, plus-append
  • line-whitespace-at-end - empty tokens on the right

Token

  • token->char:
    • unknown, keyword, symbol, name, number, text, whitespace
    • Size check (number of characters)

Reading a Program (Nibbles)

  • Fixate line-by-line with discrete sensor
  • Unobserved code is abduced

The Mighty Choice Point (Nibbles)

Priming

  • Ordered enumeration of solution space
  • Primed choice points are tried first

Classification

  • Each choice is a type or category
  • Constraints categorize entities

Group Membership

  • Active or inactive choice for each member
  • Constraints control membership

Restrict Solution Space

  • High and low detail choice before instances
  • Assertions control attention

Example 1: counting


  • Whitespace primes wrong grouping
  • Lack of indent info fails to cull wrong solution (last print is in its own line group)


Predictions

  • Moving print up 2 lines will reduce errors (yes)
  • Participants who make errors will recall last line without indent (?)
  • Adding an "end" token below final print will reduce errors (?)


Open Questions

  • Does changing "Done counting" make a difference?
  • Does adding a nested for look help?

for i in [1, 2, 3, 4]:
    print "The count is", i 


    print "Done counting"


Example 2: overload


  • Priming re-orders (number/string, plus/append) choice points
  • Weak number/string constraint fails to cull wrong solutions


Predictions

  • Priming with append + will reduce errors (maybe)
  • Not including numeric characters will reduce errors (?)
  • Adding type tokens or making strings and numbers visually distinct will reduce errors (?)


Open Questions

  • How does this help determine "cognitively optimal" operator overloading?

a = 4
b = 3
print a + b

c = 7
d = 2
print c + d

e = "5"
f = "3"
print e + f


Example 3: scope


  • Expectations re-order choice points (a function must "do something")
  • Incomplete parameter passing constraints allow wrong solution
  • Right solution is buried


Predictions

  • Experience will reduce errors (yes)
  • Parameter name will not influence errors (yes)
  • Reminding participants of Python pass-by-value will reduce errors (?)


Open Questions

  • How to formalize "do something"? (Rist, 1986)


"...the smallest piece of knowledge used in program understanding is a transaction. A transaction can be described using three categories: what operation takes place (action), where it takes place (location) and what object is acted upon (object)."

def add_1(num):
    num = num + 1
    
def twice(num):
    num = num * 2

added = 4
add_1(added)
twice(added)
add_1(added)
twice(added)
print added


Model Capabilities (Nibbles)

Attention

  • Full CDO space is infeasible to search entirely
  • Combinatorics are carefully controlled via detail, assertions

Errors

  • First solution assumed to be correct (other orderings?)
  • Need back-tracking on null set

Abduction of Missing Code

  • Bottom-up sensing + top-down knowledge
  • A kind of "cogent confabulation"? (Hecht-Nielson, 2007)

Limitations and Threats to Validity (Nibbles)

Incomplete

  • Need to finish Example 3
  • Doesn't generate goals for end-to-end model
  • No back-tracking when errors are noticed

Engineered Knowledge

  • Domain structure and constraints are hand-built
  • Few experiments to draw from

Cognitive?

  • First solution assumed to be correct
  • Simple priming from choice point ordering

Future Work

Accelerate CDO Solver

  • LISP -> Java -> Multi-core/GPGPU
  • Large programs, richer domain knowledge

Combine Mr. Bits and Nibbles

  • End to end model (text to fixations/keystrokes)
  • Back-tracking and error correction

New Experiment

  • More complex programs (e.g., recursion)
  • Different languages

Thank you!

References

  • Bernard P. Zeigler and Phillip E. Hammonds. Modeling & Simulation-Based Data Engineering: Introducing Pragmatics into Ontologies for Net-Centric Information Exchange. Academic Press, 2007.
  • Douglass, Scott A. and Mittal, Saurabh. A framework for modeling and simulation of the artificial. In Ontology, Epistemology, and Teleology for Modeling and Simulation, pages 271–317. Springer, 2013.
  • Hecht-Nielsen, Robert. Confabulation Theory: The Mechanism of Thought. Heidelberg: Springer, 2007.
  • Johnson-Laird, Philip N. Mental models. No. 6. Harvard University Press, 1986.
  • Rist, Robert S. "Plans in programming: definition, demonstration, and development." Empirical studies of programmers. Ablex, Norwood, NJ, 1986.