In [1]:

```
from IPython import display
```

*Percepts and Concepts Lab, Spring 2014*

- Program comprehension is poorly understood (Détienne, 2002)
- Language design is not human-centered (Stefik, 2013)
- Eye movements can be mapped to cognitive processes (Salvucci, 1999)

- Learn about program comprehension via eye movements
- Compare strategies/errors between versions of a program
- Develop a
**quantitative**cognitive model of simple program comprehension

- eyeCode - programmers read and evaluate small Python programs
- Different versions randomly assigned
- All keystrokes recorded (eye movements locally)

- 1 experiment = 10 programs
- 2-3 versions of each program

- 1 trial = 1 program
- All answers final
- Intermediary keystrokes recorded

`between`- filter two lists, intersection`counting`-`for`

loop with bug`funcall`

- function call with different spacing`initvar`

- summation and factorial`order`

- 3 functions called (in order/shuffled)

`rectangle`- compute area of 2 rectangles`overload`

- overloaded + operator (number/strings)`partition`

- partition list of numbers`scope`- function calls with no effect`whitespace`

- linear equations

[ most errors ] • [ fewer errors ] • [ no errors ]

- Participant response vs. true program output
- Normalized Levenshtein distance (0-1)
- Perfect vs. correct (formatting ignored)

[1, 2] [3, 4]

[1, 2] [3, 4]

1 2 3,4

```
[1, 3]
[3, 4]
```

- Python library for analyzing eye movements
- Built on top of
**pandas**(R-like data frames) and**scipy/numpy**(stats/numerics) - Automatic
**areas of interest**(text, code) **Hit testing**with layers- Code-specific
**plots and metrics**

In [10]:

```
import pandas, scipy.stats, numpy
# Numpy and pandas work together to create R-like data frames
data = numpy.random.uniform(size=25).reshape((5, 5))
frame = pandas.DataFrame(data, columns=["A", "B", "C", "D", "E"])
# Non-numeric columns are supported
frame["F"] = ["x", "x", "o", "o", "o"]
frame
```

Out[10]:

A | B | C | D | E | F | |
---|---|---|---|---|---|---|

0 | 0.988141 | 0.601001 | 0.313569 | 0.493887 | 0.702926 | x |

1 | 0.256549 | 0.802963 | 0.632752 | 0.250193 | 0.152852 | x |

2 | 0.875510 | 0.354493 | 0.665779 | 0.787402 | 0.279158 | o |

3 | 0.473622 | 0.000874 | 0.153416 | 0.064031 | 0.127221 | o |

4 | 0.255761 | 0.794496 | 0.568220 | 0.868812 | 0.488199 | o |

5 rows Ă— 6 columns

In [29]:

```
# Statistics are available in scipy.stats
r, p = scipy.stats.spearmanr(frame.A, frame.B)
print "Spearman: r={0}, p={1}".format(r, p)
# Pandas lets you sort, group, and slice data frames
for f_val, sub_frame in frame.groupby("F"):
# Underneath, they are accessible as standard numpy arrays
s_vals = sub_frame[["A", "B", "C", "D", "E"]].values.flatten()
print "When F={0}, sum is {1}, std is {2}".format(f_val, s_vals.sum(), s_vals.std())
```

- Automatic with Pygments
- Works for any supported language!

- Fixations mapped to areas of interest
- Highest area of overlap (or point)
- Overlap area ∝ probability?

- Saccade = quick jump between fixations
- Plot angles and lengths

- Fixations → AOIs → Scanpath
- Repeats removed, duration lost

- First-order transition probabilities
- Steady state = AOI importance?

- Example: words and fixations in LTR reading order
- All fixations are same duration

- Time on each line is proportional to line length

- Computed over a rolling time window
- Window size and step can vary

- Reading behavior (descriptive, predicting)
- Saccade angles and scanpaths (string-edit distance)
- Important lines (fixation duration, steady state)
- Strategies and calculations (rolling metrics)

**Fixation Duration**- 273 ms avg (expected 300-400 ms)**Fixation Count**- 172 avg (0.97 corr w/ duration)**Fixation Rate**- 2.73 per sec avg**Code/Output Transitions**- 6 avg per trial

- Simple OLS models, fit with
`statsmodels`

package - Keywords least fixated
- Line length, number useful for predicting duration

Type | Dep. Var | Useful | Not Useful |
---|---|---|---|

Line | First fixation | Line # | Other text/content |

Line | Fixation duration | Line length, whitespace prop | Experience level |

Element | Fixation duration | Category, line # |

**Left**- eyeCode experiment (all programs)**Center**- eyeCode experiment (`counting`

programs only)**Right**- Natural language experiment

- For each pair of trials (same program)
- Block or line + output box scanpath (no repeats)
- Normalized Levenshtein distance (string edit distance)
- Consider ScanMatch (Needleman-Wunsch algorithm)

- Fixation duration
- Steady state

- Spikes in average fixation duration correlated with mental calculations

- 0
- 1
- 1
- 6
- 2
- 11

```
1 intercept = 1
2 slope = 5
3
4 x_base = 0
5 x_other = x_base + 1
6 x_end = x_base + x_other + 1
7
8 y_base = slope * x_base + intercept
9 y_other = slope * x_other + intercept
10 y_end = slope * x_end + intercept
11
12 print x_base, y_base
13 print x_other, y_other
14 print x_end, y_end
```

**Batch (top)**- Read through entire program, then respond

**Incremental (bottom)**- Respond to each
`print`statement

- Respond to each

```
1 def area(xy_1, xy_2):
2 width = xy_2[0] - xy_1[0]
3 height = xy_2[1] - xy_1[1]
4 return width * height
5
6 r1_xy_1 = (0, 0)
7 r1_xy_2 = (10, 10)
8 r1_area = area(r1_xy_1, r1_xy_2)
9 print r1_area
10
11 r2_xy_1 = (5, 5)
12 r2_xy_2 = (10, 10)
13 r2_area = area(r2_xy_1, r2_xy_2)
14 print r2_area
```

- Focus on lists and
`between()`comparison - Less time on second list
- Common error assumes
`common()`for`x_between`and`y_between`

[8, 7, 9] [1, 0, 8, 1] [8, 9, 0]

```
[8, 7, 9]
[1, 0, 8, 1]
[8]
```

`nospace`- 80% correct (top),`twospaces`- 36% correct (bottom)- Line 2 → 3 transition twice as likely for:
`nospace`trials- Correct
`twospaces`trial

- More (relative) time spent on string literals in
`plusmixed` - Priming from mathematical use of
`+`? - More likely transitions from output box to
`"5" + "3"`

- Double-checking in
`plusmixed`? (at 35 seconds)

- Focus on
`area()`arguments - Width/height calculation verification in
`tuples`?

- Average fixation duration different than other experiments (273 ms vs. 300-400 ms)
- Experience level not correlated with avg. fix duration
- Keywords receieved least amount of time

- Time on line - line length and whitespace proportion
- First fix on line - line number
- Time on code element - category and line number

- Only
`counting`,`initvar`,`overload`, and`rectangle` - Other program versions not distinguishable with eye/performance metrics
- Programs may be too simple

- Develop a quantitative cognitive model of program comprehension
- Compare model to human eye movements and errors

- Line reader model
- Basic ACT-R model
- Advanced ACT-R model

- Process model for program comprehension
**Chunking**- Reading and understanding a coupled block of code
- Faster when chunks are smaller, less complex

**Tracing**- Visual jump to required sub-chunk
- Faster when chunk is closer, more salient

- Inspiration for quantitative models

- Chunking (7)
- Tracing (5)

- Definition of a chunk
- Not a cognitive model
- All parameters necessary?
- Does not make errors

- Reads and evaluates lines like a debugger
- Chunk = 1 line
- Chunk complexity ∝ f(line length)
- Tracing complexity ∝ g(line count)

- Chunking delay
- Tracing delay
- Chunk/tracing recall factors
- Typing speed

- Single line
- Not a cognitive model
- Does not make errors
- Unrealistic typing

- ACT-R is a cognitive architecture, programmed with Common Lisp
- Adaptive Control of Thought - Rational
- Computational theory of cognition

- Cognition is transforming and moving
**chunks**between**buffers**- Chunks are a collection of key/value pairs
- Buffers hold one chunk at a time
- Productions are micro-steps in a task

- Modules for
- Declarative memory
- Task state (imaginal), goals
- Vision, hearing (aural), speaking (vocal), typing (manual)

- Bottleneck in procedural
**Sub-symbolic**effects- Memory retrieval delay
- Manual jamming
- Eye movements

```
(p encode-letter
=goal>
isa read-letters
state attend
=visual>
isa text
value =letter1
?imaginal>
buffer empty
==>
=goal>
state wait
+imaginal>
isa array
letter1 =letter1
)
```

- Declarative memory
- Chunk/tracing recall

- Visual module
- Chunk size, tracing distance

- Manual module
- Typing response

- Initial DM activations
- Standard ACT-R parameters

- Single line
- Cannot recognize algorithms
- Beginner typing skill

- Basic model + ontology + spatial
- Map visual locations/areas to program semantics (loops, functions)
- Use qualtitative spatial reasoning
- Variable name similarity (prefix), semantic priming

- Initial DM activations
- DM weights (similarity)
- Standard ACT-R parameters

- No IDE/tool interaction
- Only single file, simple programs
- High level rules of discourse?

Thank you!

- Françoise Détienne and Frank Bott. Software designâ€“cognitive aspects. Springer Verlag, 2002.
- Dario D. Salvucci. Mapping eye movements to cognitive processes. PhD thesis, Carnegie Mellon University, 1999.
- Andreas Stefik and Susanna Siebert. An empirical investigation into programming language syntax. ACM Transactions on Computing Education (TOCE), 13(4):19, 2013.