3.31.  XMP Data

Overview

XMP is the abbreviation for Extensible Metadata Platform, an open standard initiated by Adobe to embed metadata into files. Not only PDF documents are able to embed data, but also images. For example, metadata can be location and time.

The metadata in a PDF file can be important when processing a document, so they should be correct. PDFUnit provides the same tags for XMP data as for XFA data:

<!-- Tags to test XMP data: -->

<hasXMPData   />
<hasNoXMPData />

<!-- Inner tags of hasXMPData: -->

<hasXMPData>
  <matchingXPath />  (optional)
  <matchingXML   />  (optional)
  <withNode      />  (optional)
</hasXMPData>

Existence and Absence of XMP

The following examples show how to verify the existence and absence of XMP data:

<testcase name="hasXMPData">
  <assertThat testDocument="xmp/metadata-added.pdf">
    <hasXMPData />
  </assertThat>
</testcase>
<testcase name="hasNoXMPData">
  <assertThat testDocument="xmp/bookmarkWithURLAction_noXMP.pdf">
    <hasNoXMPData />
  </assertThat>
</testcase>

Comparing XMP against an XML File

With the utility ExtractXMPData you can extract the XMP data from a PDF document into an XML file which can be used later in a test:

<testcase name="hasXMPData_MatchingXML">
  <assertThat testDocument="xmp/metadata-added.pdf">
    <hasXMPData>
      <matchingXML file="xmp/metadata-added.xml" /> 1
    </hasXMPData>
  </assertThat>
</testcase>

1

Whitespaces are ignored when comparing XML data.

Validate Single XML-Tags

Tests can check a single node of the XMP data and its value. The next example is based on the following XML-snippet:

<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    ...
    <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/">
      <xmp:CreateDate>2011-02-08T15:04:19+01:00</xmp:CreateDate>
      <xmp:ModifyDate>2011-02-08T15:04:19+01:00</xmp:ModifyDate>
      <xmp:CreatorTool>My program using iText</xmp:CreatorTool>
    </rdf:Description>
    ...
  </rdf:RDF>
</x:xmpmeta>

If you want to check that a node exists in the structure of the XMP data you can use the tag <withNode />. The next example checks the existence of two nodes:

<testcase name="hasXMPData_WithNode_ValidateExistence">
  <assertThat testDocument="xmp/metadata-added.pdf">
    <hasXMPData>
      <withNode name="xmp:CreateDate" />
      <withNode name="xmp:ModifyDate" />
    </hasXMPData>
  </assertThat>
</testcase>

When you want to verify the value of a node, you also have to declare the expected value using the attribute value="..":

<!-- 
  When the node name occurs multiple times in the document, only
  the first node will be returned.
-->
<testcase name="hasXMPData_WithNodeAndValue">
  <assertThat testDocument="xmp/metadata-added.pdf">
    <hasXMPData>
      <withNode name="xmp:CreateDate" value="2011-02-08T15:04:19+01:00" />
      <withNode name="xmp:ModifyDate" value="2011-02-08T15:04:19+01:00" />
    </hasXMPData>
  </assertThat>
</testcase>

If an expected node exists multiple times within the XMP data, the first match is used.

The XPath expression may not start with the document root, because PDFUnit adds // internally.

Of course, the node may also be an attribute node.

XPath based XMP Tests

With the tag <matchingXPath /> you can use the full power of XPath:

<testcase name="hasXMPData_MatchingXPath_CreateDateWithValue">
  <assertThat testDocument="xmp/metadata-added.pdf">
    <hasXMPData>
      <matchingXPath expr="//xmp:CreateDate[node() = '2011-02-08T15:04:19+01:00']" />
    </hasXMPData>
  </assertThat>
</testcase>
<testcase name="hasXMPData_MatchingXPath_MultipleInvocation">
  <assertThat testDocument="xmp/metadata-added.pdf">
    <hasXMPData>
      <matchingXPath expr="count(//xmp:CreateDate) = 1" />
      <matchingXPath 
         expr="count(//xmp:CreateDate[1][node()='2011-02-08T15:04:19+01:00']) = 1" />
    </hasXMPData>
  </assertThat>
</testcase>

The capability to evaluate XPath expressions depends on the XML parser or more exactly the XPath engine. By default PDFUnit uses the parser in the JDK/JRE. So the capability is vendor dependent.

Default Namespaces in XPath

As already described for XFA tests, XML namespaces are detected automatically. But the default namespaces has to be declared by the test because namespaces can occur more than once in an XML document.

The next example shows the default namespaces for the tag <matchingXPath />:

<testcase name="hasXMPData_MatchingXPath_WithDefaultNamespace">
  <assertThat testDocument="xmp/metadata-added.pdf">
    <hasXMPData>
      <matchingXPath expr="//foo:format = 'application/pdf'"
                     defaultNamespace="http://purl.org/dc/elements/1.1/"
      />
    </hasXMPData>
  </assertThat>
</testcase>

Default namespace for the tag <withNode /> with an expected value:

<testcase name="hasXMPData_WithDefaultNamespace_XMLNode">
  <assertThat testDocument="xmp/metadata-added.pdf">
    <hasXMPData>
      <withNode name="foo:ModifyDate" 
                value="2011-02-08T15:04:19+01:00" 
                defaultNamespace="http://ns.adobe.com/xap/1.0/"
      />
    </hasXMPData>
  </assertThat>
</testcase>