The utility program ExtractXMPData
writes the document level
XMP data from a PDF document into an XML file. This file can be used for the PDFUnit tests
described in section
3.31: “XMP Data”.
XMP data can be found on other places in the PDF than just the document level. Such XMP data is currently not extracted. But it is intended to provide the extraction of all XMP data in the next release of PDFUnit.
:: :: Extract XMP data from a PDF document as XML :: @echo off setlocal set CLASSPATH=./lib/pdfunit-2015.10/*;%CLASSPATH% set CLASSPATH=./lib/itext-5.5.1/*;%CLASSPATH% set CLASSPATH=./lib/bouncycastle-jdk15on-150/*;%CLASSPATH% set TOOL=com.pdfunit.tools.ExtractXMPData set OUT_DIR=./tmp set IN_FILE=LXX_vocab.pdf set PASSWD= java %TOOL% %IN_FILE% %OUT_DIR% %PASSWD% endlocal
A part of the output file _xmpdata_LXX_vocab.out.xml
is shown here:
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?> <?adobe-xap-filters esc="CRLF"?> <x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-14, framework 1.6'> <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:iX='http://ns.adobe.com/iX/1.0/' > ... <rdf:Description rdf:about='uuid:f6a30687-f1ac-4b71-a555-34b7622eaa94' xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='Acrobat Distiller 6.0.1 (Windows)' pdf:Keywords='LXX, Septuagint, vocabulary, frequency'> </rdf:Description> <rdf:Description rdf:about='uuid:f6a30687-f1ac-4b71-a555-34b7622eaa94' xmlns:xap='http://ns.adobe.com/xap/1.0/' xap:CreateDate='2006-05-02T11:35:38-04:00' xap:CreatorTool='PScript5.dll Version 5.2.2' xap:ModifyDate='2006-05-02T11:37:57-04:00' xap:MetadataDate='2006-05-02T11:37:57-04:00'> </rdf:Description> ... </rdf:RDF> </x:xmpmeta>
During the processing, PDFUnit uses the method PdfReader.getMetadata()
from the iText-Project (http://www.itextpdf.com).