Scanned PDF documents
The intent of this technique is to ensure that visually rendered text is presented in such a manner that it can be perceived without its visual presentation interfering with its readability.
A document that consists of scanned images of text is inherently inaccessible because the content of the document is images, not searchable text. Assistive technologies cannot read or extract the words; users cannot select, edit, resize, or reflow text nor can they change text and background colors; and authors cannot manipulate the PDF for accessibility.
For these reasons, authors should use actual text rather than images of text, using an authoring tool such as Microsoft Word or Oracle Open Office to author and convert content to PDF.
If authors do not have access to the source file and authoring tool, scanned images of text can be converted to PDF using optical character recognition (OCR). Adobe Acrobat Pro can then be used to create accessible text.
This example is shown with Adobe Acrobat Pro. There are other software tools that perform similar functions.
This example uses a simple one-page scanned image of text. To ensure that actual text is stored in the document, perform the following steps:
The following image shows a scanned one-page document in Adobe Acrobat Pro.
The next image shows the converted content after adding tags to the document. It will be necessary to use the Reading Order tool and the Tags panel to tag the content properly. The Reading Order tool was used in this example to hide the image of the hand as decorative image / artifact (see PDF4). The recipe title was tagged as a first level header.
Note: Acrobat Pro may automatically add tags when the file is run through OCR.
This example is shown in operation in the working example of generating actual text and the result of tagging text created with OCR.
For each page converted to text using OCR, ensure that the resulting PDF has been converted correctly, using one of the following ways: