Article contents
Optical Character Recognition
Published online by Cambridge University Press: 02 September 2013
Extract
Optical character recognition (OCR) is a process by which printed text is detected and transformed into a computer text file. OCR consists of two basic processes: scanning and recognition. Scanning, performed with a device called a scanner, digitizes the printed page, creating a coded graphics version of the text that may be stored on disk. That coded version transforms the scanned image into pixels, and it is readable by graphics programs.
The separate recognition process translates the picture of an “A” into the letter “A.” A new file is created in a format determined by user instructions. That file is readable by word processor, statistics, and/or database software supported by the OCR program used.
OCR is a technique that can be useful to political scientists. For example, research notes taken from printed sources, rather than being laboriously typed, could be scanned, processed, and saved as a file readable by a word processing package. Content analysis might be almost completely mechanized. Numerical data from government reports could be scanned rather than entered by hand and then made readable by a spreadsheet, database management program, or statistics package.
- Type
- Research Article
- Information
- Copyright
- Copyright © The American Political Science Association 1992
References
Notes
1. A scanner may also be used to digitize graphic images such as photographs, which may then be included in word processing text or other computer output. That use is not discussed here.
2. Computer Friends, Inc., 14250 N.W. Science Park Drive, Portland, OR 97229. Tel: (800) 547-3303. FAX: (503) 643-5379. The DEST scanner appears to be identical to the Lightscan, but sells for $699 through Global Computer Supplies, 1050 Northbrook Pkwy., Dept. 31, Suwanee, GA 30174. Tel: (800) 227-1246. FAX: (404) 339-0033.
3. CAT Hand Scan Adapter LPT, manufactured by Computer Aided Technology, Inc., Dallas, TX. Tel: (214) 350-0888. $149. Supports the following scanner brands: The Complete PC; DFI; GeniScan; Logitech; Marstek; and Niscan.
4. OCR Systems also manufactures ReadRight Personal for hand scanners. It costs $249 and does not require Windows. A minimum of 575K of RAM must be available after DOS is loaded. The program can take advantage of but does not require expanded or extended memory. ReadRight Personal's major disadvantage is that it cannot process a scanned image wider than 5 inches. ReadRight Personal supports most major brands of hand scanners, and it can process a TIFF file. A non-Windows version of ReadRight is also available at $495. Its hardware requirements are similar to those of ReadRight Personal. It claims a maximum accuracy of 99.5% compared to its Windows version claim of 99.9%.
5. Powell, Brian and Steelman, Lala Carr, “Variations in State SAT Performance: Meaningful or Misleading?” Harvard Educational Review, Vol. 54, No. 4 (November 1984), 399 CrossRefGoogle Scholar.
6. June 1990, p. 175.
7. This is a measurement of pitch. OCR manufacturers more commonly express minimum and maximum recogizable print sizes in points, but pitch is easier for an individual to measure and more effectively conveys the size of the characters being processed.
8. Statistical Abstract of the United States, 1986, p. 461 Google Scholar.
9. Statistical Abstract of the United States, 1991, p. 146 Google Scholar.
10. It includes Scan & Print, Plus (edits scanned photographs), Vect (converts bitmapped images to vector images), and Panorama (an image database manager).
11. The permanent swap file acts as a de facto extension of RAM. As far as Windows is concerned, a computer with 4MB of RAM and a 4MB permanent swap file has 8MB of RAM.
- 2
- Cited by