XMP is the abbreviation for “Extensible Metadata Platform”, an open standard initiated by Adobe to embed metadata into files. Not only PDF documents are able to embed data, but also images. For example, metadata can be location and time.
The metadata in a PDF file can be important when processing a document, so they should be correct. PDFUnit provides the same test methods for XMP data as for XFA data:
// Methods to test XMP data:
.hasXMPData()
.hasXMPData().matchingXPath(..)
.hasXMPData().withNode(..)
.hasNoXMPData()
The following examples show how to verify the existence and absence of XMP data:
@Test public void hasXMPData() throws Exception { String filename = "documentUnderTest.pdf"; AssertThat.document(filename) .hasXMPData() ; }
@Test public void hasNoXMPData() throws Exception { String filename = "documentUnderTest.pdf"; AssertThat.document(filename) .hasNoXMPData() ; }
Tests can check a single node of the XMP data and its value. The next example is based on the following XML-snippet:
<x:xmpmeta xmlns:x="adobe:ns:meta/"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> ... <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/"> <xmp:CreateDate>2011-02-08T15:04:19+01:00</xmp:CreateDate> <xmp:ModifyDate>2011-02-08T15:04:19+01:00</xmp:ModifyDate> <xmp:CreatorTool>My program using iText</xmp:CreatorTool> </rdf:Description> ... </rdf:RDF> </x:xmpmeta>
With the utility ExtractXMPData
you can extract the XMP data from a
PDF document into an XML file. Chapter
9.12: “Extract XMP Data to XML”
describes how to use the utility.
In the example the existence of XML-nodes are validated.
The method withNode(..)
needs an instance of com.pdfunit.XMLNode
as
a parameter:
@Test public void hasXMPData_WithNode_ValidateExistence() throws Exception { String filename = "documentUnderTest.pdf"; XMLNode nodeCreateDate = new XMLNode("xmp:CreateDate"); XMLNode nodeModifyDate = new XMLNode("xmp:ModifyDate"); AssertThat.document(filename) .hasXMPData() .withNode(nodeCreateDate) .withNode(nodeModifyDate) ; }
When you want to verify the value of a node, you also have
to pass the expected value to the constructor of XMLNode
:
@Test public void hasXMPData_WithNodeAndValue() throws Exception { String filename = "documentUnderTest.pdf"; XMLNode nodeCreateDate = new XMLNode("xmp:CreateDate", "2011-02-08T15:04:19+01:00"); XMLNode nodeModifyDate = new XMLNode("xmp:ModifyDate", "2011-02-08T15:04:19+01:00"); AssertThat.document(filename) .hasXMPData() .withNode(nodeCreateDate) .withNode(nodeModifyDate) ; }
The XPath expression may not start with the document root, because
PDFUnit adds //
internally.
If an expected node exists multiple times within the XMP data, the first match is used.
Of course, the node may also be an attribute node.
To take advantage of the full power of XPath, the method
matchingXPath(..)
is provided.
The following two examples help give an idea of what is possible:
@Test public void hasXMPData_MatchingXPath() throws Exception { String filename = "documentUnderTest.pdf"; String xpathString = "//xmp:CreateDate[node() = '2011-02-08T15:04:19+01:00']"; XPathExpression expression = new XPathExpression(xpathString); AssertThat.document(filename) .hasXMPData() .matchingXPath(expression) ; }
@Test public void hasXMPData_MatchingXPath_MultipleInvocation() throws Exception { String filename = "documentUnderTest.pdf"; String xpathDateExists = "count(//xmp:CreateDate) = 1"; String xpathDateValue = "//xmp:CreateDate[node()='2011-02-08T15:04:19+01:00']"; XPathExpression exprDateExists = new XPathExpression(xpathDateExists); XPathExpression exprDateValue = new XPathExpression(xpathDateValue); AssertThat.document(filename) .hasXMPData() .matchingXPath(exprDateValue) .matchingXPath(exprDateExists) ; // The same test in a different style: AssertThat.document(filename) .hasXMPData().matchingXPath(exprDateValue) .hasXMPData().matchingXPath(exprDateExists) ; }
The capability to evaluate XPath expressions depends on the XML parser or more exactly the XPath engine. By default PDFUnit uses the parser in the JDK/JRE.
Chapter 13.12: “JAXP-Configuration” explains how to use any other XML-Parser:
XML namespaces are detected automatically, but the default namespace has to be
declared explicitly using an instance of DefaultNamespace
.
This instance must have a prefix. Any value can be chosen for the prefix:
@Test public void hasXMPData_MatchingXPath_WithDefaultNamespace() throws Exception { String filename = "documentUnderTest.pdf"; String xpathAsString = "//foo:format = 'application/pdf'"; String stringDefaultNS = "http://purl.org/dc/elements/1.1/"; DefaultNamespace defaultNS = new DefaultNamespace(stringDefaultNS); XPathExpression expression = new XPathExpression(xpathAsString, defaultNS); AssertThat.document(filename) .hasXMPData() .matchingXPath(expression) ; }
The default namespace must be used not only with the class XPathExpression
,
but also with the class XMLNode
:
@Test public void hasXMPData_WithDefaultNamespace_SpecialNode() throws Exception { String filename = "documentUnderTest.pdf"; String stringDefaultNS = "http://ns.adobe.com/xap/1.0/"; DefaultNamespace defaultNS = new DefaultNamespace(stringDefaultNS); String nodeName = "foo:ModifyDate"; String nodeValue = "2011-02-08T15:04:19+01:00"; XMLNode nodeModifyDate = new XMLNode(nodeName, nodeValue, defaultNS); AssertThat.document(filename) .hasXMPData() .withNode(nodeModifyDate) ; }