x
x
   
Currently, I'm working on analyzing the visual coding that Encoder's neural nets developed in learning to classify text images. Examples of these images are illustrated on the left. Each image spans about 14 or more characters, and depicts letters printed in one of three different fonts, in either mixed- or upper-case. All images were generated using text from the story The Wizard of Oz, by Frank Baum.

The analysis focuses on determining how the trained neural networks represent the visual structure of individual letters, the ordinal position of each letter within a word, and the letter sequence regularities that give rise to word and pseudo-word superiority effects. The data being analyzed include the learned weights associated with each feature detector in hidden layers 1 and 2, and each output node, as well as the activation patterns that arise in the hidden units in the two hidden layers.
To understand Encoder's visual coding, one must first understand its network architecture. As illustrated on the right, Encoder has two hidden layers, with each layer having multiple horizontal planes. Each node in a hidden layer has a local, shared receptive field. Local means that the node receives its input from a small, local region in the layer below. Shared means that the weight initializations and subsequent learning-induced weight changes are linked across hidden nodes within a given plane.

- Nodes in the same horizontal plane of a hidden layer apply the same feature detector to different local regions of the layer below.

- Nodes in the same column of a hidden layer apply different feature detectors to the same local region of the layer below.

Each node in the output layer is trained to detect a given character(A-Z) or space in a given ordinal letter position (1st-14th). Each output node receives its inputs from all of the nodes in the 2nd hidden layer. On average, and like most fluent human readers of English, Encoder correctly classifies the first (leftmost) 7-8 letters per text image.
The figure on the right depicts the fifteen different feature detectors that developed in the first hidden layer of one of the Encoder neural nets, as a result of backpropagation learning.

Black squares correspond to positively-valued weights; white squares to negatively-valued weights. Larger squares correspond to higher absolute values for weights.

So what are these feature detectors detecting? My first thought was that each one is a different type of oriented-edge detector that is activated when an input image patch correlates well with the to-be-detected edge feature. Perhaps, like Pandemonium models of visual pattern recognition, Encoder constructs a structural description of letters using spatially adjacent, activated first hidden layer nodes.... However, the figure below suggests otherwise.
The column on the far left shows some of the feature detectors from the first hidden layer. To the right of each of these are samples of image patches that activate the feature detector.

Looking horizontally across each row, note that the images activating the same feature detector can be very different from one another. vertically along each column, note that the images activating different feature detectors can be quite similar to one another. Thus, a given first hidden layer node does not seem to be detecting slight variations of a simple feature, such as an oriented edge. Without access to such features to use as building blocks, it is not clear how a structural-description, "Pandemonium type" model of visual pattern recognition could be realized.

Why is a first hidden layer node activated by such a diverse set of image patches? Individual weights can take on positive and negative values; whereas a pixel can only take on values of 1 for background pixels or 0 for foreground pixels. When the first hidden layer node computes the weighted sum of incoming activation values it therefore ignores the weight values associated with the foreground, zero-valued pixels, making it difficult for an external observer to predict which input image patches will activate the node and which will not.

Nevertheless, even with such inelegant first hidden layer feature detectors, Encoder manages to achieve human-like accuracy rates at encoding fixated text images.
   
HOME
Human Computer
Interaction
How Visual Encoding Learning
Enables Fluent Reading
Visual Coding
MY RESEARCH
InfoSleuth Agent System for
Data Gathering & Analysis
Neural Network Optical Character Recognition
VISUAL CODING FOR READING
Encoder's Neural Network Architecture
What Are Encoder's First Hidden Layer Nodes Doing?
x
x
x