3.16.  Images in PDF Documents

Overview

Images in a PDF document are seldom optical decorations of minor importance. More often, they transfer information which can have contractual meaning. Typical errors with images include:

  • Does an image appear on the expected page?

  • Is an image missing in the document because it was not found during document creation?

  • Does a letter show the new logo and not the old one?

  • If an image contains rendered text, is it the expected text?

  • Is the content of a bar code or a QR code the expected one?

All errors can be detected with these test methods:

// Testing images in PDF:
.hasImage().matching(..)
.hasImage().matchingOneOf(..)

.hasImage().withBarcode()           3.4: “Bar Code” 
.hasImage().withBarcodeInRegion()   3.4: “Bar Code” 
.hasImage().withText().xxx(..)      3.31: “Text - in Images (OCR)” 
.hasImage().withTextInRegion()      3.31: “Text - in Images (OCR)” 
.hasImage().withQRCode()            3.27: “QR Code” 
.hasImage().withQRCodeInRegion()    3.27: “QR Code” 
.hasImage().withHeightInPixel()
.hasImage().withWidthInPixel()

.hasNoImage()

.hasNumberOfDifferentImages(..)
.hasNumberOfVisibleImages(..)

Number of different Images inside PDF

The number of images inside a PDF document is typically not the same as the number of images you can see when it is printed. A logo visible on 10 pages is stored only once within the document. So, PDFUnit provides two test methods. The method hasNumberOfDifferentImages(..) validates the number of images stored internally and the method hasNumberOfVisibleImages(..) validates the number of visible images.

The following listing shows the syntax for verifing the number of images internally stored in PDF:

@Test
public void hasNumberOfDifferentImages() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasNumberOfDifferentImages(2) 
  ;
}

How do you know in this example that 2 is the right number? How do you know which images are stored internally for a given PDF? The answer to both questions is given by the utility program ExtractImages. You can use it to extract all images from a document into separate files. Chapter 9.7: “Extract Images from PDF” describes this topic in detail.

Number of visible Images inside a PDF

The next example validates the number of visible images:

@Test
public void hasNumberOfVisibleImages() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasNumberOfVisibleImages(8) 
  ;
}

The sample document has 8 images on 6 pages, but 2 images on page 3, no image on page 4 and 3 images on page 6.

The test for the visual images can be limited to specified pages. In the following example, only the images in a defined region on page 6 are counted:

@Test
public void numberOfVisibleImages() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  int leftX  = 14; // in millimeter
  int upperY = 91;
  int width  = 96;
  int height = 43;
  PageRegion pageRegion = new PageRegion(leftX, upperY, width, height);
  PagesToUse page6 = PagesToUse.getPage(6);

  AssertThat.document(filename)
            .restrictedTo(page6)
            .restrictedTo(pageRegion)
            .hasNumberOfVisibleImages(1)
  ;
}

The same image shown twice on a page is counted twice.

The possibilities for limitting tests to specified pages are described in chapter 13.2: “Page Selection”.

Validate the Existence of an Expected Image

After counting images you might need to test the images themselves. In the following example, PDFUnit verifies that a given image is part of a PDF document:

@Test  
public void hasImage() throws Exception {
  String filename = "documentUnderTest.pdf";
  String imageFile = "images/apache-software-foundation-logo.png";
  
  AssertThat.document(filename)
            .restrictedTo(ANY_PAGE)
            .hasImage()
            .matching(imageFile)
  ; 
}

The result of a comparison of two images depends on their file formats. PDFUnit can handle all image formats which can be converted into java.awt.image.BufferedImage: JPEG, PNG, GIF, BMP and WBMP. The images are compared byte by byte. Therefore, BMP and PNG versions of an image are not recognized as equal.

The picture may pass to the method in different types:

// Types for images:

.hasImage().matching(BufferedImage image);
.hasImage().matching(String imageFileName);
.hasImage().matching(File imageFile);
.hasImage().matching(InputStream imageStream);
.hasImage().matching(URL imageURL);

A tool which generates PDF files may do a format conversion when importing images from a file because not all image formats are supported in PDF. This might make it impossible for PDFUnit to successfully compare an image from the PDF file with the original image file. If you have encounter this problem, extract the images of the PDF under test into new image files and use them for the tests. Validate them 'by eye' first.

All images in a PDF document can be compared to the images of a referenced PDF. Those tests are described in chapter 4.8: “Comparing Images”.

Use an Array of Images for Comparison

It might be, that a PDF document contains one of three possible signature images. Use the method matchingOneOf(..) to test such a situation:

@Test
public void containsOneOfManyImages() throws Exception {
  BufferedImage signatureAlex = ImageHelper.getAsImage("images/signature-alex.png");
  BufferedImage signatureBob  = ImageHelper.getAsImage("images/signature-bob.png");
  BufferedImage[] allPossibleImages = {signatureAlex, signatureBob};

  String documentSignedByAlex = "letter-signed-by-alex.pdf";
  AssertThat.document(documentSignedByAlex)
            .restrictedTo(LAST_PAGE)
            .matchingOneOf(allPossibleImages)
  ;
  
  String documentSignedByBob = "letter-signed-by-bob.pdf";
  AssertThat.document(documentSignedByBob)
            .restrictedTo(LAST_PAGE)
            .matchingOneOf(allPossibleImages)
  ;
}

This test can also refer to several sides of a document, as the following section shows.

Validate Images on Specified Pages

The tests for images can be restricted to single pages, multiple individual or multiple contiguous pages. All possibilities are described in chapter 13.2: “Page Selection”.

Here is an example:

@Test 
public void containsImage_OnAllPagesAfter5() throws Exception {
  String filename = "documentUnderTest.pdf";
  String imageFileName = "images/apache-ant-logo.jpg";
  File imageFile = new File(imageFileName);
  
  AssertThat.document(filename)
            .restrictedTo(EVERY_PAGE.after(5))
            .hasImage()
            .matching(imageFile)
  ; 
}

Validate the Absences of Images

Some pages or page regions are meant to be empty. That can also be tested. The following example validates that the page body - without header and footer - does not have images or text:

@Test
public void lastPageBodyShouldBeEmpty() throws Exception {
  String pdfUnderTest = "documentUnderTest.pdf";
  PageRegion textBody = createBodyRegion();
  
  AssertThat.document(pdfUnderTest)
            .restrictedTo(LAST_PAGE)
            .restrictedTo(textBody)
            .hasNoImage()
            .hasNoText()
  ;
}