XMP is the abbreviation for “Extensible Metadata Platform”, an open standard initiated by Adobe to embed metadata into files. Not only PDF documents are able to embed data, but also images. For example, metadata can be location and time.
The metadata in a PDF file can be important when processing a document, so they should be correct. PDFUnit provides the same tags for XMP data as for XFA data:
<!-- Tags to test XMP data: --> <hasXMPData /> <hasNoXMPData /> <!-- Inner tags of hasXMPData: --> <hasXMPData> <matchingXPath /> (optional) <matchingXML /> (optional) <withNode /> (optional) </hasXMPData>
The following examples show how to verify the existence and absence of XMP data:
<testcase name="hasXMPData"> <assertThat testDocument="xmp/metadata-added.pdf"> <hasXMPData /> </assertThat> </testcase>
<testcase name="hasNoXMPData"> <assertThat testDocument="xmp/bookmarkWithURLAction_noXMP.pdf"> <hasNoXMPData /> </assertThat> </testcase>
With the utility ExtractXMPData
you can extract the XMP data from a
PDF document into an XML file which can be used later in a test:
<testcase name="hasXMPData_MatchingXML"> <assertThat testDocument="xmp/metadata-added.pdf"> <hasXMPData> <matchingXML file="xmp/metadata-added.xml" /> </hasXMPData> </assertThat> </testcase>
Tests can check a single node of the XMP data and its value. The next example is based on the following XML-snippet:
<x:xmpmeta xmlns:x="adobe:ns:meta/"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> ... <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/"> <xmp:CreateDate>2011-02-08T15:04:19+01:00</xmp:CreateDate> <xmp:ModifyDate>2011-02-08T15:04:19+01:00</xmp:ModifyDate> <xmp:CreatorTool>My program using iText</xmp:CreatorTool> </rdf:Description> ... </rdf:RDF> </x:xmpmeta>
If you want to check that a node exists in the structure of
the XMP data you can use the tag <withNode />
.
The next example checks the existence of two nodes:
<testcase name="hasXMPData_WithNode_ValidateExistence"> <assertThat testDocument="xmp/metadata-added.pdf"> <hasXMPData> <withNode name="xmp:CreateDate" /> <withNode name="xmp:ModifyDate" /> </hasXMPData> </assertThat> </testcase>
When you want to verify the value of a node, you also have
to declare the expected value using the attribute value=".."
:
<!-- When the node name occurs multiple times in the document, only the first node will be returned. --> <testcase name="hasXMPData_WithNodeAndValue"> <assertThat testDocument="xmp/metadata-added.pdf"> <hasXMPData> <withNode name="xmp:CreateDate" value="2011-02-08T15:04:19+01:00" /> <withNode name="xmp:ModifyDate" value="2011-02-08T15:04:19+01:00" /> </hasXMPData> </assertThat> </testcase>
If an expected node exists multiple times within the XMP data, the first match is used.
The XPath expression may not start with the document root, because
PDFUnit adds //
internally.
Of course, the node may also be an attribute node.
With the tag <matchingXPath />
you can use the full power
of XPath:
<testcase name="hasXMPData_MatchingXPath_CreateDateWithValue"> <assertThat testDocument="xmp/metadata-added.pdf"> <hasXMPData> <matchingXPath expr="//xmp:CreateDate[node() = '2011-02-08T15:04:19+01:00']" /> </hasXMPData> </assertThat> </testcase>
<testcase name="hasXMPData_MatchingXPath_MultipleInvocation"> <assertThat testDocument="xmp/metadata-added.pdf"> <hasXMPData> <matchingXPath expr="count(//xmp:CreateDate) = 1" /> <matchingXPath expr="count(//xmp:CreateDate[1][node()='2011-02-08T15:04:19+01:00']) = 1" /> </hasXMPData> </assertThat> </testcase>
The capability to evaluate XPath expressions depends on the XML parser or more exactly the XPath engine. By default PDFUnit uses the parser in the JDK/JRE. So the capability is vendor dependent.
As already described for XFA tests, XML namespaces are detected automatically. But the default namespaces has to be declared by the test because namespaces can occur more than once in an XML document.
The next example shows the default namespaces for the tag <matchingXPath />
:
<testcase name="hasXMPData_MatchingXPath_WithDefaultNamespace"> <assertThat testDocument="xmp/metadata-added.pdf"> <hasXMPData> <matchingXPath expr="//foo:format = 'application/pdf'" defaultNamespace="http://purl.org/dc/elements/1.1/" /> </hasXMPData> </assertThat> </testcase>
Default namespace for the tag <withNode />
with an expected value:
<testcase name="hasXMPData_WithDefaultNamespace_XMLNode"> <assertThat testDocument="xmp/metadata-added.pdf"> <hasXMPData> <withNode name="foo:ModifyDate" value="2011-02-08T15:04:19+01:00" defaultNamespace="http://ns.adobe.com/xap/1.0/" /> </hasXMPData> </assertThat> </testcase>