Mr. Bits: A Quantitative Process Model of Simple Program Understanding &
Nibbles: A Constraint-Based Model of Program Reading and Inference¶

Michael Hansen, Rob Goldstone, Andrew Lumsdaine¶

Percepts and Concepts Lab, Spring 2015

Motivation and Goals¶

Motivation¶

Existing code complexity metrics are not "cognitive"
Almost all cognitive models of program comprehension are qualitative
Evaluate different modeling formalisms in a very complex domain

Goals¶

Produce a computational model of simple program evaluation
End-to-end model: text to fixations/keystrokes
Produce human-like errors and timings

Choose Your Own Adventure!¶

Mr. Bits ¶

ACT-R, declarative memory calculus!
Fixation and keystroke timings!
LISP!

Nibbles ¶

Mental models, Cognitive Domain Ontologies!
Constraint solvers!
LISP!

Mr. Bits

Transforms program text to fixation/keystroke timings
Uses ACT-R cognitive architecture
Based on Cant Cognitive Complexity Metric (Cant, 1995)

Goal Stack (Mr. Bits)¶

Generated a-prori using Python debugger
Realized with custom ACT-R model

Variable Context (Mr. Bits)¶

Unique identity for variables over time
Variable identity $\leftrightarrow$ ACT-R memory chunk
Does not cover OO method dispatch

ACT-R Model (Mr. Bits)¶

Symbolic vs. Sub-Symbolic (Mr. Bits)¶

ACT-R mode for all modules

Symbolic¶

No variable timings
All memory chunks retrieved immediately

Sub-symbolic¶

Variable timings
Memory retrievals faster/slower
Memory chunks may be forgotten

Sub-Symbolic + Remember Lists (Mr. Bits)¶

Relative time spent on each line
Humans (left), Mr. Bits (right)

Sub-Symbolic + Forget Lists (Mr. Bits)¶

Relative time spent on each line
Humans (left), Mr. Bits (right)

Trial Time Comparison (Mr. Bits)¶

Human trial times, sorted low to high (bars)
Mr. Bits trial times (lines)

Sym+Remember: - - -

Sym+Forget: - - -

Sub-Sym+Remember: ----

Sub-Sym+Forget: ----

`counting - two_spaces`¶


for i in [1, 2, 3, 4]:
    print "The count is", i 


    print "Done counting"

Limitations and Threats to Validity (Mr. Bits)¶

Text Reading¶

Not a true model of reading
No errors possible

Program Scope¶

Tiny, simple programs
No syntax highlighting, coding environment

ACT-R¶

Large number of productions
Hand-picked parameters

Nibbles

Model of program text to mental model transformation
Cognitive Domain Ontology + constraint solver
Attention, priming, errors

Cognitive Domain Ontologies (Nibbles)¶

Based on Set Entity Structure (SES) theory (Zeigler & Hammonds, 2007)
Capture spaces of behavior or situational knowledge (Douglass & Mittal, 2013)
Represent mental models and reasoning processes of abduction, deduction, and induction (Johnson-Laird, 1986)

Tree Structure¶

Entities and Relations (alternating)
Attached variables on entities
3 Relations:
- Sub-parts
- Choice-point
- Instances

Constraints¶

Remove non-sensical solutions from space
Built on 1st order logic
Higher-order constraints on instances

Choice Points = Source of Generativity

Ball Example (Nibbles)¶

Simple domain: two choice points
CDO tree defines structure
Constraints will add domain knowledge

Ball Example (Nibbles)¶

All Combinations (9 solutions)


(all-values
  (let
    ((size
       (either 'small 'medium 'large))
      (sport
        (either 'baseball 'golf 'basketball)))
     (list size sport)))

With Domain Knowledge (3 solutions)


(all-values
  (let 
     ((size
        (either 'small 'medium 'large))
      (sport
         (either 'baseball 'golf 'basketball)))
      (assert! (and
                 (<-> 'small 'golf)
                 (<-> 'medium 'baseball)
                 (<-> 'large 'basketball)))
      (list size sport)))

Solution	Size	Sport
1	Small	Baseball
2	Small	Golf
3	Small	Basketball
4	Medium	Baseball
5	Medium	Golf
6	Medium	Basketball
7	Large	Baseball
8	Large	Golf
9	Large	Basketball

Cheryl's Birthday Example (Nibbles)¶

Cheryl has a birthday, and she doesn't want to share it right away. So she gives her friends Albert and Bernard a list of 10 possible birthday dates:

May 15, 16, 19
June 17, 18
July 14, 16
August 14, 15, 17

Cheryl then tells Albert and Bernard separately the month and the day of her birthday respectively.

Albert: I don't know when Cheryl's birthday is, but I know that Bernard doesn't know too.
Bernard: At first I didn't know when Cheryl's birthday was, but I know now.
Albert: Then I also know when Cheryl's birthday is.

So when is Cheryl's birthday?¶

Cheryl's Birthday CDO (Nibbles)¶

Cheryl's Birthday Solution (Nibbles)¶

Cheryl's Birthday Constraints (Nibbles)¶

Albert Knows the Month¶

All possible Albert dates have the same month
Only 1 distinct month is possible

Albert Knows Bernard's Days¶

All possible A.Bernard dates have days in Albert's month

Miscellaneous¶

At least one date is possible!
If a date is not possible for Bernard, it's not possible for Albert

Example¶

Albert is told May (5)
Possible Albert dates are 5/15, 5/16, 5/19
Possible A.Bernard dates are 5/15, 8/15, 5/16, 7/16, 5/19

Cheryl's Birthday Constraints (Nibbles)¶

Bernard Knows the Day¶

All possible Bernard dates have the same day
Only 1 distinct day is possible

Bernard Knows Albert's Months¶

All possible B.Albert dates have months with Bernard's days

Miscellaneous¶

At least one date is possible!
If a date is not possible for Albert, it's not possible for Bernard

Example¶

Bernard is told day 14
Possible Bernard dates are 7/14, 8/14
Possible B.Albert dates are 7/14, 7/16, 8/14, 8/15, 8/17

Cheryl's Birthday Step 1 (Nibbles)¶

Albert: I don't know when Cheryl's birthday is, but I know that Bernard doesn't know too.
- Albert has more than 1 possible date
- A.Bernard does not have a possible, distinguishing day - 19 (May), 18 (June)

Possible Dates (A.Bernard):¶

May 15, 16, 19
June 17, 18
July 14, 16
August 14, 15, 17

Cheryl's Birthday Step 2 (Nibbles)¶

Bernard: At first I didn't know when Cheryl's birthday was, but I know now.
- B.Albert does not have possible months with a distinguishing day - May (19), June (18)
- Bernard has only 1 possible date

Possible Dates (B.Albert):¶

~~May 15, 16, 19~~
~~June 17, 18~~
July 14, 16
August 14, 15, 17

With Bernard's Existing Constraints...¶

All possible Bernard dates have the same day
If a date is not possible for Albert, it's not possible for Bernard

Possible Dates (Bernard):¶

~~May 15, 16, 19~~
~~June 17, 18~~
July 14, 16
~~August 14, 15, 17~~

Cheryl's Birthday Step 3 (Nibbles)¶

Albert: Then I also know when Cheryl's birthday is.
- A.Bernard does not have possible months with a distinguishing day - May (19), June (18)
- A.Bernard only has 1 possible date
- Albert only has 1 possible date
- If a date is not possible for Bernard, it's not possible for Albert

Possible Dates (Albert):¶

~~May 15, 16, 19~~
~~June 17, 18~~
July 14, 16
~~August 14, 15, 17~~

Only One Solution!¶

Abduction vs. Deduction¶

Abduction¶

Add months, days to Albert, A.Bernard, Bernard, B.Albert
Constraint solver generates Possible/Not Possible combinations
Solution(s) gives you Cheryl's birthday(s)

Deduction¶

Add possible date(s) to Albert, A.Bernard, Bernard, B.Albert (hypothesis)
Constraint solver checks constraints
A null set indicates an invalid hypothesis

Model Architecture (Nibbles)¶

Program Model CDO (Nibbles)¶

Program Model CDO (Nibbles)

Program¶

1 or more line groups

Line Group¶

1 or more related lines
Forced on indent (e.g., function def)

Line¶

1 or more tokens on a single line

Token¶

1 or more characters
Usually whitespace separated


GENERIC-PROGRAM-TYPE:
  LG1. GENERIC-LINE-GROUP-TYPE:

L1. ASSIGNMENT-LINE-TYPE | (indent:1):
  EXPR-LITERAL : PYTYPE-NUMBER
  [NAME] a
  [SYMBOL] =
  [NUMBER] 1

L2. ASSIGNMENT-LINE-TYPE | (indent:1):
  EXPR-LITERAL : PYTYPE-NUMBER
  [NAME] b
  [SYMBOL] =
  [NUMBER] 2

L3. PRINT-LINE-TYPE | (indent:1):
  EXPR-VV-PLUS : PYTYPE-NUMBER
  [KEYWORD] print
  [NAME] a    
  [SYMBOL] +    
  [NAME] b

Program Model Constraints 1/2 (Nibbles)¶

Program¶

Fixated line is high detail (character level)
Line above and below fixed are low detail + indent
Other lines are unknown + line/token sizes

Line Group¶

line-group-indent - if lines A and B in same group, A|B or A>B
all-lines-active - every line belongs to 1 line group, in order
line-group->lines:
- generic, for-loop-block, if-stmt-block, fun-def-block
- for, if, def start group + indent

Program Model Constraints 2/2 (Nibbles)¶

Line¶

line->tokens:
- unknown, assignment, fun-def, fun-call, print-line, for-loop, if-stmt, return, whitespace
  - Right-hand side expression
    - Type: N/A, number, string
    - Form: literal, variable, literal/literal, variable/variable
    - Operator: plus-sum, plus-append
line-whitespace-at-end - empty tokens on the right

Token¶

token->char:
- unknown, keyword, symbol, name, number, text, whitespace
- Size check (number of characters)

Reading a Program (Nibbles)¶

Fixate line-by-line with discrete sensor
Unobserved code is abduced

The Mighty Choice Point (Nibbles)¶

Priming¶

Ordered enumeration of solution space
Primed choice points are tried first

Classification¶

Each choice is a type or category
Constraints categorize entities

Group Membership¶

Active or inactive choice for each member
Constraints control membership

Restrict Solution Space¶

High and low detail choice before instances
Assertions control attention

Example 1: `counting`

Whitespace primes wrong grouping
Lack of indent info fails to cull wrong solution (last print is in its own line group)

Predictions

Moving print up 2 lines will reduce errors (yes)
Participants who make errors will recall last line without indent (?)
Adding an "end" token below final print will reduce errors (?)

Open Questions

Does changing "Done counting" make a difference?
Does adding a nested for look help?

for i in [1, 2, 3, 4]:
    print "The count is", i 


    print "Done counting"

Example 2: `overload`

Priming re-orders (number/string, plus/append) choice points
Weak number/string constraint fails to cull wrong solutions

Predictions

Priming with append + will reduce errors (maybe)
Not including numeric characters will reduce errors (?)
Adding type tokens or making strings and numbers visually distinct will reduce errors (?)

Open Questions

How does this help determine "cognitively optimal" operator overloading?

a = 4
b = 3
print a + b

c = 7
d = 2
print c + d

e = "5"
f = "3"
print e + f

Example 3: `scope`

Expectations re-order choice points (a function must "do something")
Incomplete parameter passing constraints allow wrong solution
Right solution is buried

Predictions

Experience will reduce errors (yes)
Parameter name will not influence errors (yes)
Reminding participants of Python pass-by-value will reduce errors (?)

Open Questions

How to formalize "do something"? (Rist, 1986)

"...the smallest piece of knowledge used in program understanding is a transaction. A transaction can be described using three categories: what operation takes place (action), where it takes place (location) and what object is acted upon (object)."

def add_1(num):
    num = num + 1
    
def twice(num):
    num = num * 2

added = 4
add_1(added)
twice(added)
add_1(added)
twice(added)
print added

Model Capabilities (Nibbles)¶

Attention¶

Full CDO space is infeasible to search entirely
Combinatorics are carefully controlled via detail, assertions

Errors¶

First solution assumed to be correct (other orderings?)
Need back-tracking on null set

Abduction of Missing Code¶

Bottom-up sensing + top-down knowledge
A kind of "cogent confabulation"? (Hecht-Nielson, 2007)

Limitations and Threats to Validity (Nibbles)¶

Incomplete¶

Need to finish Example 3
Doesn't generate goals for end-to-end model
No back-tracking when errors are noticed

Engineered Knowledge¶

Domain structure and constraints are hand-built
Few experiments to draw from

Cognitive?¶

First solution assumed to be correct
Simple priming from choice point ordering

Future Work¶

Accelerate CDO Solver¶

LISP -> Java -> Multi-core/GPGPU
Large programs, richer domain knowledge

Combine Mr. Bits and Nibbles¶

End to end model (text to fixations/keystrokes)
Back-tracking and error correction

New Experiment¶

More complex programs (e.g., recursion)
Different languages

Thank you!

References¶

Bernard P. Zeigler and Phillip E. Hammonds. Modeling & Simulation-Based Data Engineering: Introducing Pragmatics into Ontologies for Net-Centric Information Exchange. Academic Press, 2007.
Douglass, Scott A. and Mittal, Saurabh. A framework for modeling and simulation of the artificial. In Ontology, Epistemology, and Teleology for Modeling and Simulation, pages 271–317. Springer, 2013.
Hecht-Nielsen, Robert. Confabulation Theory: The Mechanism of Thought. Heidelberg: Springer, 2007.
Johnson-Laird, Philip N. Mental models. No. 6. Harvard University Press, 1986.
Rist, Robert S. "Plans in programming: definition, demonstration, and development." Empirical studies of programmers. Ablex, Norwood, NJ, 1986.

Mr. Bits: A Quantitative Process Model of Simple Program Understanding & Nibbles: A Constraint-Based Model of Program Reading and Inference¶

Michael Hansen, Rob Goldstone, Andrew Lumsdaine¶

Motivation and Goals¶

Motivation¶

Goals¶

Choose Your Own Adventure!¶

Mr. Bits¶

Nibbles¶

Mr. Bits

Goal Stack (Mr. Bits)¶

Variable Context (Mr. Bits)¶

ACT-R Model (Mr. Bits)¶

Symbolic vs. Sub-Symbolic (Mr. Bits)¶

Symbolic¶

Sub-symbolic¶

Sub-Symbolic + Remember Lists (Mr. Bits)¶

Sub-Symbolic + Forget Lists (Mr. Bits)¶

Trial Time Comparison (Mr. Bits)¶

counting - two_spaces¶

Limitations and Threats to Validity (Mr. Bits)¶

Text Reading¶

Program Scope¶

ACT-R¶

Nibbles

Cognitive Domain Ontologies (Nibbles)¶

Tree Structure¶

Constraints¶

Ball Example (Nibbles)¶

Ball Example (Nibbles)¶

All Combinations (9 solutions)

With Domain Knowledge (3 solutions)

Cheryl's Birthday Example (Nibbles)¶

So when is Cheryl's birthday?¶

Cheryl's Birthday CDO (Nibbles)¶

Cheryl's Birthday Solution (Nibbles)¶

Cheryl's Birthday Constraints (Nibbles)¶

Albert Knows the Month¶

Albert Knows Bernard's Days¶

Miscellaneous¶

Example¶

Cheryl's Birthday Constraints (Nibbles)¶

Bernard Knows the Day¶

Bernard Knows Albert's Months¶

Miscellaneous¶

Example¶

Cheryl's Birthday Step 1 (Nibbles)¶

Possible Dates (A.Bernard):¶

Cheryl's Birthday Step 2 (Nibbles)¶

Possible Dates (B.Albert):¶

With Bernard's Existing Constraints...¶

Possible Dates (Bernard):¶

Cheryl's Birthday Step 3 (Nibbles)¶

Possible Dates (Albert):¶

Only One Solution!¶

Abduction vs. Deduction¶

Abduction¶

Deduction¶

Model Architecture (Nibbles)¶

Program Model CDO (Nibbles)¶

Program Model CDO (Nibbles)

Program¶

Line Group¶

Line¶

Token¶

Program Model Constraints 1/2 (Nibbles)¶

Program¶

Line Group¶

Program Model Constraints 2/2 (Nibbles)¶

Line¶

Token¶

Reading a Program (Nibbles)¶

The Mighty Choice Point (Nibbles)¶

Priming¶

Classification¶

Group Membership¶

Restrict Solution Space¶

Example 1: counting

Predictions

Open Questions

Example 2: overload

Mr. Bits: A Quantitative Process Model of Simple Program Understanding &
Nibbles: A Constraint-Based Model of Program Reading and Inference¶

Mr. Bits ¶

Nibbles ¶

`counting - two_spaces`¶

Example 1: `counting`

Example 2: `overload`

Example 3: `scope`