| x |
| x |
| Martin, Gale L. and Pittman, Jay (1989) Recognizing Hand Printed Letters and Digits NIPS 89 PDF |
| Martin, Gale L.; Rashid, Mosfeq; Chapman, Dave; and Pittman, Jay (1992) Learning to See Where and What, Training a Net to Make Saccades and Recognize Handwritten Characters. NIPS 92 PDF |
| NEURAL-NET-BASED OPTICAL CHARACTER RECOGNITION |
![]() |
| People are much better than computers at reading handwritten characters, however backpropagation neural networks have helped narrow this gap. My research in this area focused first on developing nets that recognize single character images. This work required the use of a separate, external program that accepts as input a character-string image, and generates as output a sequence of single-character images. Since character segmentation algorithms have much higher error rates when characters touch or overlap, my later work focused on developing integrated segmentation and recognition algorithms. |
| LEARNING TO CLASSIFY SINGLE-CHARACTER IMAGES |
| However, it isn't quite that simple. Part of the problem is that classification learning for input images is particularly difficult because the high dimensionality (i.e.,number of pixels) of images requires that a large number of input-output pair training samples be used to sufficiently eliminate possible candidate mapping functions. Another problem raised by critics is that the number of candidate mapping functions that can be represented by a net is determined by the number of hidden nodes. The complaint was that significant experimentation would be required to determine the optimal number of hidden nodes for a given classification problem. Given such concerns, one theorist, when asked about the potential of backpropagation learning to dramatically improve visual pattern recognition, cautioned "hang onto your wallet." Research evaluating the extent to which these were valid criticisms of backpropagation learning tended to focus on optical character recognition applications, partly because large samples of training images existed in this domain. My research at MCC, using an NCR database of 40,000 images of single handwritten digits extracted from bankchecks, focused on testing the accuracy of some of these complaints against the backpropagation learning algorithm. My colleagues and I found that large training set sizes do improve generalization performance (i.e., classification accuracy for image samples on which the net has never been trained), and that excess representational capacity (increasing the number of hidden nodes beyond that required to achieve high training accuracy) does not reduce generalization performance. This latter finding suggests that there is something particularly useful about the backpropagation learning algorithm with respect to classification learning. For more information on this work see: |
| MCC Collaborations newsletter describing how this research benefitted MCC Sponsors. |
| INTEGRATED SEGMENTATION AND RECOGNITION ALGORITHMS |
![]() |
| Eventually, after working to improve classification accuracy for pre-segmented, single-character images, I found that many of the remaining errors were caused by errors in the external segmentation algorithm, and so I began developing approaches for training a network to combine segmentation and recognition decisions. One approach, developed by other researchers, involves simply convolving the input field of an pre-existing network (that had been trained to classify single character images) over the much wider image of a character string. While somewhat effective, this approach introduces new errors. For example, a single character at a given location can be recognized multiple times as the convolution proceeds even though only example of the character is displayed. Another example is that the net can incorrectly insert a character when the net's input window is centered between two characters and the resulting image is misclassified. Centered-Object Integrated Segmentation and Recognition (COISR) To avoid such problems, I developed the COISR (centered object integrated segmentation and recognition) approach, which avoids the problems described above by training the network to know when it is centered over a character and when it is not. This centering information is detected by an additional output node. The figure below illustrates this approach and the sequences of activation values in the output nodes. This approach worked reasonably well but, at the time, was rather slow due to the convolution process. |
| Martin, Gale L. and Rashid, Mosfeq (1991) Recognizing Hand Printed Characters by Centered Object Integrated Segmentation and Recognition NIPS 91 PDF |
| For more information on this work see: |
| Saccade-Based Integrated Segmentation and Recognition This alternative approach to COISR used a wider input window, and involved training the net to both detect when it's input window was and was not centered over a character, to classify that character, and to estimate the distances from the center of the window to the centermost character and to the next character to the right. Thus, the net is trained on both "what" and "where" information. Then, during the "reading" phase the accompanying software uses this information to execute saccades across the field, and enable recognition of the displayed character string. This approach is something like what people do when they read, except that, in the case of proficient readers of English, readers identify about seven letters per fixation. Although the approach was somewhat successful, it sometimes got itself into infinite loops, and I found myself spending too much time thinking about how to avoid these problems. The benefit of this work was that it set me on a path to learning more about how people encode fixated text images and execute eye movements in reading, which led me to model visual encoding learning in human reading. |
![]() |
| For more information on this work see: |
| MY RESEARCH |
| Neural Network Optical Character Recognition |
| Classification corresponds to using an input-output mapping function to convert each input pattern into its corresponding output. Classification learning corresponds to using samples of input-output pairs to winnow away incorrect mapping functions from the set of candidate mapping functions (the search space), eventually narrowing the search down to a single mapping function. The backpropagation learning algorithm was initially hailed as a fundamental scientific advance by some, because it suggested an automatic approach to developing visual pattern recognition applications. Simply collect samples of input images paired with the corresponding target outputs, and then set the neural net to work. |
![]() |