Using XPath to evaluate parts of a PDF document opens a wider range of testing capabilities than an API alone can provide.
Several chapters in this manual describe XPath tests. The current chapter gives you an overview with references to the special chapters.
<!-- Overview over XPath related test facilities: --> <hasBookmarks><matchingXPath />... 3.4: “Bookmarks and Named Destinations” <hasFields><matchingXPath />... 3.10: “Form Fields” <hasFonts><matchingXPath />... 3.9: “Fonts” <hasSignatures><matchingXPath />... 3.23: “Signatures and Certificates” <hasXFAData><matchingXPath />... 3.30: “XFA Data” <hasXMPData><matchingXPath />... 3.31: “XMP Data” <!-- Comparing two documents using XPath: --> <haveSameXFAData><matchingXPath />... 4.18: “Comparing XFA Data” <haveSameXMPData><matchingXPath />... 4.19: “Comparing XMP Data”
PDFUnit uses XMLUnit internally to compare XML structures (http://xmlunit.sourceforge.net). This means that the rules of XML syntax are respected, for example:
The order of attributes doesn't matter.
Whitespaces between element nodes are ignored.
More rules for “Canonical XML” are well described in Wikipedia (http://de.wikipedia.org/wiki/Canonical_XML).
The general configuration of XMLUnit is documented on the project site http://xmlunit.sourceforge.net/userguide/html/index.html#Configuring%20XMLUnit. PDFUnit uses the following:
XMLUnit.setXSLTVersion("2.0"); XMLUnit.setNormalizeWhitespace(true); XMLUnit.setIgnoreWhitespace(true); XMLUnit.setIgnoreAttributeOrder(true); XMLUnit.setIgnoreComments(true);
PDFUnit provides utility programs for all parts of a PDF document which can be tested using XML/XPath. They extract the information into XML files:
// Utilities to extract XML from PDF: com.pdfunit.tools.ExtractBookmarks com.pdfunit.tools.ExtractFieldsInfo com.pdfunit.tools.ExtractFontsInfo com.pdfunit.tools.ExtractSignaturesInfo com.pdfunit.tools.ExtractXFAData com.pdfunit.tools.ExtractXMPData
The utilities are described in the chapter 9.1: “Common Remarks for all Utilities”:
A namespace with an existing prefix will be detected automatically by PDFUnit. This applies to both XML files and PDF-internal XML data.
The default namespace is not detected automatically because the XML standard allows the definition of namespaces multiple times in an XML document. A default namespace has to be declared and you have to use a prefix:
<!-- The default namespace has to be declared, but any alias can be used for it. --> <testcase name="hasXFAData_UsingDefaultNamespace"> <assertThat testDocument="xfa/xfa-enabled.pdf"> <hasXFAData> <withNode tag="foo:log/foo:to" value="memory" defaultNamespace="http://www.xfa.org/schema/xci/2.6/" /> </hasXFAData> </assertThat> </testcase>
Note that the prefixes in this example are named foo
for the first
and bar
for the second usage.
In real projects please use only one prefix - and not “foo”
or “bar”.
The evaluation of an XPath expression generally results in distinct node types.
The expected result type has to be declared when comparing XFA or XMP data
from two PDF documents.
The available result types are defined as constants for the attribute
withResultType
.
<!-- Result types for XPath-processing: -->
withResultType="BOOLEAN"
withResultType="NUMBER"
withResultType="NODE"
withResultType="NODESET"
withResultType="STRING"
Tests with the expected node type BOOLEAN
are a problem because XPath can not distinguish between “not found”
and “false”. Try to use another XPath expression with a different
result type.