In a previous blog, Tips for Eliminating Tedious, Mundane Tasks in Adobe Acrobat, I talked about how you can convert a scanned page from an image of text into usable, selectable, copy-and-pasteable text with the Adobe® Acrobat® Text Recognition feature, which uses optical character recognition (OCR).
This handy feature of Acrobat has saved me from having to waste time retyping text on more than one occasion, but I’ve also run into instances where it simply won’t work.
Just yesterday, in fact, we received a client-supplied Microsoft® Word document that contained content for a brochure they wanted us to design. The Word document contained pages and pages of large, multi-column specification tables that were images instead of selectable text. These tables appeared to have been scanned from some older document, and the client said the original source files for that document weren’t available.
Few things are more tedious or difficult to retype accurately than tables of numbers (text in a foreign language that you don’t understand might be a distant second), so the first thing we did was make a PDF from the Word document. But when we tried to run Acrobat Text Recognition, we received the error message shown in Figure 1.
The problem was that in addition to all those specification table images, the document also contained numerous paragraphs of real text, typed in Word. Acrobat can’t perform Text Recognition on a document that contains even one line of real or “renderable” text among the scanned images of text.
In my experience, when Text Recognition doesn’t work, nine times out of 10 this is why. I’m happy to tell you, though, that there’s a workaround. It adds another step to the process, but it still beats retyping by a long shot.
Take a Step Back to Move Forward: Turn the Entire Document into an Image
In order for Acrobat Text Recognition to work properly, the entire document needs to be an image, not a combination of text and images of text. It may seem counterintuitive, but to achieve the end goal of ALL live text, you need to start with NO live text.
Here’s how to accomplish that. (For your reference, I’m using Adobe Acrobat XI Pro running on the Microsoft Windows® 7 operating system to do this.)
- Open your PDF document in Acrobat as you normally would.
- Under the File menu, choose Print.
- For Printer, choose Adobe PDF, and then click the right-most button next to the printer selection that says Advanced. (See Figure 2.)
- Check the box for Print as Image (see Figure 3), and then click Okay at the bottom of the menu.
- This will take you back to your Print menu, where you will just click the Print button at the bottom of that menu. When prompted, give the PDF file you’re about to create a new name (rather than trying to overwrite the existing file, which won’t work) and choose where to save it.
Acrobat will then create a PDF that is all image with no live, renderable text.
Now, at last, you can run Acrobat Text Recognition, which you’ll find under the Tools menu on the right-hand side of your screen (see Figure 4), on the new PDF — without getting frustrating error messages.
As marketing professionals, we all need to escape the trees of tedium to see the forest of strategic possibility, and eliminating unnecessary tasks is key to that. If you have other tips for automating mundane tasks, please share!