One possible explanation for the context effect in letter perception is that a word context such a R_AD enables subjects to guess what the missing letter is, similar to contestants on the popular game show "Wheel of Fortune." There are several possible words that fit the context R_AD such as ROAD and READ. Conceivably, following a single brief flash of the letter string READ subjects might correctly guess that the target letter is E without actually perceiving the letter E.
Reicher (1969) conducted a seminal experiment which controlled for the type of guessing described above and demonstrated conclusively that a word context improves the perception of single letters independent of word-level guessing strategies. Reicher presented a target letter in a word, a non-pronouncable non-word, or by itself, replaced after a brief delay by a masker flanked by a pair of letters in the target-letter position. One of the letters in the pair was the target letter and the other was a letter that also fit the context. For example, the word READ, the non-word AEDR, and the single letter E (representing the three possible contexts: word, non-word, and letter) were followed by a row of dashes and the flanking letters E and O in the target position:
The subjects' task was to indicate which of the two letters actually
appeared in the target position. Since E and O form a word in the
target position (READ or ROAD) this eliminates the possibility of
using a word guessing strategy to identify the target letter. If the
subject just guesses a word that fits the context, they presumably
would be equally likely to choose between the letters E or O. That
is, the word context does not provide a guessing advantage. However,
even when controlling for the use of a word guessing strategy, Reicher
found that subjects were more accurate at making letter judgments when
the letter was presented in the context of a word than when it was
presented in the context of a non-word, or when it was presented by
itself. This phenomenon, called the word superiority effect, suggests
that a word context improves the perception of the component letters.
The Interactive Activation Model
The Interactive Activation (IA) model addresses three specific
questions which we will focus on in the exercises that follow.
First, can a connectionist model account for the basic fact
that word context facilitates letter perception? Second,
can the IA model account for the reported advantage of a pseudo-word
context, such as ZOG, over a non-word context, such as AOE.
Third, can the model account for the related finding that context
can alter the perception of a letter (transforming a non-word into
In developing the IA model, McClelland and Rumelhart make several important assumptions. First, they assume perception takes place in a multilevel processing system with at least three levels of representation: the visual feature level, the letter level, and the word level. A consequence of the multilevel assumption is that more abstract levels of representation are only accessed via intermediate levels. Thus, the word level is accessed via the letter level, and the letter level is accessed via the visual feature level. A third assumption is that processing combines both bottom-up and top-down information. That is, readers can use their (top-down) knowledge of words to help identify letter sequences from (bottom-up) visual input. These assumptions are made explicit in the IA network.
The IA network is composed of three levels: the feature level, the letter level, and the word level (Figure 1). At the feature and letter levels there are separate pools of units for one of three possible letter positions (all words are exactly three letters long).
Figure 1: The IA model of Letter Perception (McClelland & Rumelhart, 1981).
Each letter is composed of 14 (numbered 0-13) line segment features (Figure 2) based on a simple font used by Rumelhart and Siple (1974). There are 14 binary feature units in each letter position to represent the presence (1) or absense (0) of each feature in the font. At the letter level, there are three sets of letters, one for each letter position. Here, we include only those letter units that are necessary for the model simulations. At the word level, there are word units for each word in the tested data set. The original IA model of McClelland and Rumelhart (1981) included 1179 four-letter words taken from the word list of Kucera and Francis (1967). In the BrainWave version, we included only 12 3-letter words so that all of the word units could be placed in the workspace, making it easier to understand how the IA model can be used to explain context effects in letter perception.
Figure 2: The 14 features in the Rumelhart-Siple font. In the IA model, the letter A would be represented by the binary feature pattern: 11111010100000.
In Figure 1, there should be three pattern sets representing the letter positions (P1, P2 and P3). The letters in each set can be used to compose different letter sequences as input to the network. Each letter is represented by a feature vector for that letter in that position. By clicking on a letter in a pattern set, the corresponding features become activated at the feature layer of the network.
In addition to the pattern sets representing the letter positions, there are five pattern sets containing the different sorts of weights. Using these pattern sets you can make each type of weight visible and invisible and you can select just that type of weight (useful for changing the values of the weights). The different types of weights are:
Feature-to-letter weights: Feature units have excitatory connections to all of the letter units in the same spatial position that have those features (set E1-2) and and inhibitory weights to those that do not have those features (set I1-2). Thus, an active feature unit excites the letters which contain that feature and inhibits letters which do not.
Letter-to-word weights: Each letter unit in a given position has excitatory connections to the word units that have that letter in that position (set E2-3), and inhibitory connections to all of the word units that do not have that letter in that position (set I2-3). For example, the letter A in the second position has positive weights to the CAT, HAT, and BAT word units, and negative weights to all of the other word units.
The feature-to-letter and letter-to-word weights embody the bottom-up knowledge of the network.
Word-to-letter weights: Each word unit also has excitatory connections to all of the letter units which compose that word (set E2-3). For example, the CAT word unit has positive weights to the C letter unit in the first position, the A letter unit in the second position, and the T letter unit in the third position.
The word-to-letter weights embody the top-down knowledge of the network.
Within-level weights: In addition to bottom-up and top-down (between-level) connections, the IA model also has within-level connections. All of the word units are mutually inhibitory. That is, each word unit has negative connections to every other word unit (set I3-3). In McClelland and Rumelhart's original IA model, there were also inhibitory connections within the pools of letter units in each spatial position. However, for the simulations reported by McClelland and Rumelhart (1986) the between-letter connections were set to zero, and so are not included in the BrainWave version.
Investigating the Role of Context in Letter Perception
The remainder of this chapter consists of exercises illustrating the main results obtained by McClelland and Rumelhart with the IA model.
Exercise 1: Reset the activations of the units using the Zero Units button and create a graph for the SUN word unit. Cycle the network on the letter sequence for SUN. How many steps does it take for the units of the network to converge to stable activations? What is the final activation of the SUN word unit?
Exercise 2: Freeze the graph for the SUN word unit and reset the unit activations. Now try cycling the network on the letter sequence for HAT (creating a graph for the HAT word unit as in the previous example). How many steps does it take for the units of the network to converge to stable activations. What is the final activation of the HAT word unit? Why is the network slower to respond to HAT than it is to respond to SUN?
Let's now consider how the network responds to an ambiguous letter sequence. Can the IA model use context to help it identify a noisy letter?
Exercise 3: Delete the graphs for both the SUN and HAT word units and again reset the unit activations. Create a graph for the N letter unit in the third position and reset the unit activations. Cycle the network on the ambiguous letter sequence SU[n/t] and monitor the network's response. How many cycles does it take for the units of the network to converge to stable activations? In terms of the network connectivity, explain how the model uses context to disambiguate the noisy letter features in the third position (Why does the N unit become strongly activated compared with the T unit?)
To examine the word superiority effect, we need to compare the perception of a single letter in a word context to that in a non-word context. Consider the perception of the letter O in the context of the word ZOO compared with the unpronouceable non-word context AOE.
Exercise 4: To monitor the model's response, create a graph for the letter O in the second position and reset the unit activations. Cycle the network for 15 iterations on the AOE pattern until the activations are stable. Following the presentation of the AOE pattern, freeze the graph of the O letter unit and create a second graph to monitor the network's response to the ZOO pattern (remember to reset the unit activations). Compare the activation of the O letter unit in the AOE context to that in the ZOO context. It's handy to move the graphs on top (or next to) each other to compare the differences in activation levels. Does the model exhibit the word superiority effect?
A somewhat surprising finding reported by McClelland and Rumelhart (1981) is that the advantage for the perception of letters in words compared with letters in non-words, also holds for the perception of letters in pseudo-words (pronounceable non-words). For example, subjects are better at perceiving the letter O in a pseudo-word context such as ZOG than in a non-word context (e.g., AOE), since ZOG partially overlaps with a number of other words (e.g., FOG and DOG). In the model, the letter sequence OG is sufficient to partially activate the FOG and DOG units providing top down activation of the letter O in the second position. (In a non-prononceable non-word, top-down activation would be absent or significantly attenuated). The word superiority effect in pseudo-word contexts can sometimes be stronger than that in real word contexts, depending on the number of words that provide a partial match.
Exercise 5: To demonstrate the effect of context with pseudowords, freeze the graph for the ZOO pattern and create a third graph for the pseudo-word context ZOG. Reset the unit activations and cycle the network for 15 cycles. Does the word superiority effect hold for the pseudoword ZOG. Is the effect stronger or weaker than in the ZOO context? Explain.
A related finding to the presence of the word-superiority effect in pseudowords is that contextual effects can also result in misperceptions of letters (transforming non-words to words). For example, the O in BOT might be perceived as an A due to contextual (top-down) effects from the similar word BAT.
Exercise 6: To evaluate the role of context in letter misperceptions, delete all of the current graphs in the workspace and create new graphs for the letters O and A, each in the second position. Cycle the network on BOT and monitor the activations of the O and A units. Does the B_T context cause the network to misperceive the letter O as an A? If not, how might the IA network be altered to model this phenomenon?
Kucera, H. and Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.
McClelland, J. L. & Rumelhart, D. E. (Eds.). (1988). Explorations in parallel distributed processing: A handbook of models, programs, and exercises. Cambridge, MA: MIT Press.
Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts.
Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of the stimulus material. Journal of Experimental Psychology, 81, 274-280.
Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89, 60-94.