If you want to test formatted text, the only way to do it is to render
a PDF page and compare the result with an image of the correctly formatted
content. Section
3.20: “Layout - Entire PDF Pages”
describes layout-tests using rendered pages.
And the utility RenderPdfToImages
renders a PDF document
page by page into PNG files.
:: :: Render PDF into image files. Each page as a file. :: @echo off setlocal set CLASSPATH=./lib/bouncycastle-jdk15on-153/*;%CLASSPATH% set CLASSPATH=./lib/commons-logging-1.2/*;%CLASSPATH% set CLASSPATH=./lib/pdfbox-2.0.0/*;%CLASSPATH% set CLASSPATH=./lib/pdfunit-2016.05/*;%CLASSPATH% set TOOL=com.pdfunit.tools.RenderPdfToImages set OUT_DIR=./tmp set IN_FILE=documentUnderTest.pdf set PASSWD= java %TOOL% %IN_FILE% %OUT_DIR% %PASSWD% endlocal
The input file documentUnderTest.pdf
consists of 4 pages with
different images and text. The PDF Reader “SumatraPDF”
(http://code.google.com/p/sumatrapdf)
shows the first page:
After running the rendering program 4 files are created:
.\tmp\_rendered_documentUnderTest_page1 .\tmp\_rendered_documentUnderTest_page2 .\tmp\_rendered_documentUnderTest_page3 .\tmp\_rendered_documentUnderTest_page4
The first of these files looks the same as seen with the PDF Reader:
Internally, PDFUnit uses the same algorithm to render the pages as the rendering program does. So, any difference found by a test is due to a change in the PDF document.