THL Toolbox > Scanning & OCR > How to OCR a PDF
Contributor(s): Scholars' Lab staff, Adriana Barcenas, Steven Weinberger, Zach Rowinski
This is the process for running OCR on a PDF so that it is searchable, using Acrobat Professional:
- For most PDFs, you want to run Optimize after you scan them. First rename the file; then pull down the Document menu and select Optimize.
- Then, to run OCR: open the PDF file you want to run OCR on.
- Pull down the File menu, choose "Save as," and add "-ocr.pdf" to the file name
- Pull down the Document menu, point to "OCR Text Recognition," and then point to "Recognize Text Using OCR…" and "start"
- The OCR process will start. It will take some time, depending on the number of pages in the PDF.
- When it finishes, save the file. Be sure to check by doing a search on "the" or another word in the file and make sure it returns results.
To OCR roman text with diacritic characters, investigate using Abbyy's FineReader (http://www.abbyy.com/). No THL staff have used this and we have no experience with it. For more information, see Zach Rowinski's assesssment.