PDFUnit in Perl - Typical Examples

PDFUnit can test visible and invisible parts of a PDF document. Text from a PDF page can be processed as text or as a rendered image.

Furthermore, it is possible to compare many properties of a PDF document against a reference document.

The following examples show a small part of the capabilities.

Contents on individual PDF pages

lives_ok {
  my $pdfUnderTest = "$resources_dir/document-under-test.pdf";
  my $expectedText = "Chapter 3";
  my $page2        = PagesToUse->getPage(2);
  
  AssertThat->document($pdfUnderTest)
            ->restrictedTo($page2)
            ->hasText()
            ->containing($expectedText)
  ;
} "validateTextOnPageTwo";

Compare page regions of two documents as text

The following example shows how to define and use a page region. Within the given region the text of a PDF under test and a reference document has to be the same.

lives_ok {
  my $pdfUnderTest = "$resources_dir/document-under-test.pdf";
  my $pdfReference = "$resources_dir/reference.pdf";
  my $pages12 = PagesToUse->getPages( [1, 2] );
    
  AssertThat->document($pdfUnderTest)
            ->and($pdfReference)
            ->restrictedTo($pages12)
            ->haveSameText()
  ;
} "comparePageBodyWithReference";

Compare page regions of two documents as rendered images

PDFUnit can compare both the text, as well as a rendered page with a reference document. The following example shows such a comparison which is also limited to a region of the first page.

lives_ok {
  my $pdfUnderTest = "$resources_dir/document-under-test.pdf";
  my $pdfReference = "$resources_dir/reference.pdf";
  
  my $leftX  =  80;  # in millimeter
  my $ipperY = 175;
  my $width  =  60;
  my $height =   9;
  my $region = PageRegion->new($leftX, $upperY, $width, $height);
  
  AssertThat->document($pdfUnderTest)
            ->and($pdfReference)
            ->restrictedTo(FIRST_PAGE)
            ->restrictedTo(region)
            ->haveSameAppearance()
  ;
} "haveSameAppearanceOnFirstPageInRegion";

Content of a QR code

QR codes are more and more part of documents. For the validation of its content PDFUnit provides appropriate methods.

lives_ok {
  my $expectedText = "hello, world";
  my $page2        = PagesToUse->getPage(2);
  my $firstQRCodeRegion = _createQRCodeRegion();
    
  AssertThat->document($pdfUnderTest)
            ->restrictedTo($page2)
            ->restrictedTo($firstQRCodeRegion)
            ->hasImage()
            ->withQRCode()
            ->equalsTo($expectedText)
  ;
} "validateQRCode";

Contents in ZUGFeRD data

Often, the invisible ZUGFeRD data and the visible data of a PDF document should be the same. PDFUnit provides appropriate test methods for this requirement. The next example validates that the International Bank Account Number (IBAN) in the ZUGFeRD data is the same as the IBAN in a given region of the first page of a document:

lives_ok {
  my $pdfUnderTest = "$resources_dir/document-under-test.pdf";
  my $nodeIBAN = XMLNode->new("ram:IBANID");

  my $ibanLeftX  =  80;  # in millimeter
  my $ibanUpperY = 175;
  my $ibanWidth  =  60;
  my $ibanHeight =   9;
  my $regionIBAN = PageRegion->new($ibanLeftX, $ibanUpperY, $ibanWidth, $ibanHeight);

  AssertThat->document($pdfZugferd)
            ->restrictedTo(FIRST_PAGE)
            ->restrictedTo($regionIBAN)
            ->hasText()
            ->containingZugferdData($nodeIBAN)
  ;
} "validateIBANInZugferdData";

More examples

All features of PDFUnit can be found in the manual of PDFUnit-Java. The manual of PDFUnit-Perl describes a lot of examples, but not all. The reason for that is, to reduce the time for documentation. Perl programmers are clever enough to transfer the syntax from Java to Perl. The names of the methods in Java are the same as in Perl.

Both manuals can be downloaded as PDF.