x
x
   
   
HOME
Human Computer
Interaction
How Visual Encoding Learning
Enables Fluent Reading
Visual Coding
MY RESEARCH
InfoSleuth Agent System for
Data Gathering & Analysis
Neural Network Optical Character Recognition
.
HOW VISUAL ENCODING
LEARNING CONTRIBUTES TO FLUENT READING
While I was training neural nets to recognize handwritten characters, I was also teaching my 3-year old daughter to read. She had no problem learning to identify single letters, and with a bit more effort, she learned to associate each letter with its corresponding sound or sounds, but she didn't begin to learn how to read words until two years later. What caused this delay? I knew that neural net training gets significantly more difficult as image size increases, and so I wondered if her slowed progress in learning to read might stem from the same computational problems my nets encountered in learning to classify larger character images--the curse of dimensionality. Perhaps one of the reasons why learning to read is difficult is because it requires learning to classify fixated text images, each of which is rather large, spanning many characters.
.
Fluent readers of English correctly classify an average of 7-8 letters in each fixated text image, and at least partially classify as many as 14. Letter classification within an image seems to occur in parallel, within the first 150 ms of the fixation. Furthermore, classification of any single character in the text image benefits from that character being embedded within the familiar context of a word, suggesting that  children learn to classify a familiar letter sequence in parallel. However, this skill isn't acquired overnight. Under the best of circumstances, it requires extensive practice from childhood into adulthood.  Increasing the letter recognition span speeds reading by enabling readers to cover a line text using fewer fixations. I developed a computational model, called Encoder, to discover how people might minimize the curse of dimensionality through various constraints.
Encoder is a backpropagation, feedforward  neural network with two hidden layers. Its inputs consist of text images spanning about 14 letters. The text images were generated using the complete text of the book The Wizard of Oz by Frank Baum. Three different type fonts and both upper- and lower-case letters were used in producing the text images.  The network architecture has two hidden layers.

As predicted by the curse of dimensionality, Encoder fails at learning to classify  the 14-letter- wide  images unless learning is constrained in the following ways.
For more information on the model, see:

G. Martin (2004) Encoder: A Connectionist Model of How Learning to Visually Encode Fixated Text Images Improves Reading Fluency. 
PDF
G. Martin (1997)  From Image to Word: A Computational Model of Word Recognition in Reading. PDF
Shaywitz, B. A., Shaywitz, S. E., Pugh, K. R., Mencl, E., Fulbright, R. K., Skudarski, P., Constable, R. T., Marchione, K. E., Fletcher, J. M., Lyon, G. R., Gore, J.C. (2002) Disruption of Posterior Brain Systems for Reading in Children with Developmental Dyslexia.
McCandliss, B. D. Cohen, Laurent, & Dehaene, Stanislas (2003) The Visual Word Form Area: Expertise for Reading in the Fusiform Gyrus
THE ENCODER MODEL
Each hidden node has a local, shared receptive field, such that the network learns to represent images in terms of a limited number of learned, local features that can occur anywhere in the input image.

Network learning starts small, such that the net first learns to classify the leftmost character in each image, then learning is extended to the next character to the right, and so on.

The fixated text images the net learns to encode are generated using consistent fixation positions falling just to the left of the center of the fixated word.
Once trained to encode fixated text images, Encoder spontaneously exhibits the following human-like  behaviors with respect to reading words and word-like stimuli.
Word frequency effects                                       Pseudo-word superiority effects
Word length x frequency effects                         Trigram frequency effects
Word superiority effects                                      Minimal effects of printing words in aLtErNaTiNg cases
Encoder differs from previous models of reading proficiency by demonstrating that reading fluency can stem, to a significant degree,  from visual encoding learning that  widens the span over which letters are encoded, thereby reducing the number of fixations used in reading a segment of text. Previous models of reading fluency  have focused on learning that occurs  at more abstract levels of processing, such as reducing activation thresholds for representations of frequently-occuring words  (Morton's logogen model) or establishing interactive links between letters that tend to co-occur within a  word (McClelland & Rumelhart's Interactive Activation Model). 

See:
Morton, J. (1969) Interaction of information in word recognition.
Psychological Review, 76, 165-178
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception, Part 1: An account of basic findings.
Psychological Review, 88, 375-405

Encoder  also differs from other backpropagation neural net models of reading, developed by McClelland, Seidenberg, Plaut,  and others,in that  the  inputs  Encoder processes are  two-dimensional images, each of which  spans a sequence of 14 letters, Encoder does not involve  interactive activation, and it focuses on understanding how people overcome the significant computational difficulties associated with learning to classify large images. In contrast, previous connectionist  models have focused on modeling the roles played by phonological and semantic coding as well as interactivity in the development of reading skills.

See:
Cohen, L. & Dehaene, S. (2004) Specialization Within the Ventral Stream: the Case for the Visual Word Form Area.
ALTERNATIVE COMPUTATIONAL MODELS OF READING
Seidenberg, M. S. and McClelland, J. L. (1989) A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568.

Plaut, D. C., McClelland, J. L., Seidenberg, M. S. & Patterson, K. (1996) Understanding normal and impaired word reading: computational principles in Quasi-Regular Domains. Psychological Review
The Encoder model claims that learning to read fluently requires extensive practice  in learning to visually encode fixated text images. This claim is  supported by  recent brain imaging research indicating that adults having a history of developmental dyslexia exhibit a reduced tendency to activate an area of the left posterior occipitotemporal region of the brain sometimes referred to as the visual word form area. Activation in this area of the brain during reading becomes more pronounced as the reader develops fluent reading skills. Furthermore, brain injury to this region is  associated with so-called "letter-by-letter reading," in which the span over which the reader can rapidly identify letters comprising a word is significantly reduced, thereby significantly reducing reading speed. 

See:
RELEVANT NEUROSCIENCE RESEARCH