3.9.  Document Properties

Overview

PDF documents contain information about title, author, keywords and other properties. These standard properties can be extended by individual key-value data. Such metadata are playing an ever increasing role in the context of search engines and archive systems, so PDF document properties should be set wisely. PDFUnit provides some test to verify them.

An example of very poor document properties is a PDF document entitled jfqd231.tmp (that really is its title). Nobody will ever search for that and therefore it will never be found. It is a typewriter document by an U.S. government organization that was scanned in 1993. But not only is the title useless, also the file name lacks any meaning. So, the benefit of this document is only marginally greater than if it didn't exist at all.

The following methods are available to verify document properties:

// Testing document properties:

.hasAuthor()
.hasCreator()
.hasKeywords()
.hasProducer()
.hasProperty(..) 
.hasSubject()
.hasTitle()

.hasNoAuthor()
.hasNoCreator()
.hasNoKeywords()
.hasNoProducer()
.hasNoProperty(..) 
.hasNoSubject()
.hasNoTitle()

.hasCreationDate()       1
.hasModificationDate()   2
.hasNoCreationDate()
.hasNoModificationDate()

1 2

Tests for creation date and modification date are described in chapter 3.7: “Dates” because they differ from tests for the other document properties.

Document properties of a test document can also be compared with the properties of a another document. Such tests are described in chapter 4.5: “Comparing Document Properties”.

Testing the Author ...

You can verify the author of a document manually with any PDF reader, but an automated test is quicker.

It is very simple to check whether a document has any value for the property author:

@Test
public void hasAuthor() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasAuthor()
  ;
}

Use the method hasNoAuthor() to verify that the document property author does not exist:

@Test
public void hasNoAuthor() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasNoAuthor()
  ;
}

The next test verifies the value of the property author:

@Test
public void hasAuthor() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasAuthor()
            .equalsTo("PDFUnit.com")
  ;
}

There are several methods to compare an expected property value with the actual one. The names are self-explanatory:

// Comparing text for author, creator, keywords, producer, subject, title:
.containing(..)
.endingWith(..)
.equalsTo(..)
.matchingRegex(..)
.notContaining(..)
.notMatchingRegex(..)
.startingWith(..)

Whitespaces are not changed by these methods. Typically property values are short, so the test-developer has to use whitespaces in a correct way.

All test methods are is case sensitive.

The method matchingRegex() follows the rules of java.util.regex.Pattern .

... and Creator, Keywords, Producer, Subject and Title

Tests on the content of creator, keywords, producer, subject and title work just like those for Author above.

Concatenating of Validation Methods

Of course, methods can be concatenated:

@Test
public void hasKeywords_allTextComparingMethods() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasKeywords().notContaining("--")
            .hasKeywords().matchingRegex(".*key.*")
            .hasKeywords().startingWith("PDFUnit")
  ;
}

Common Validation as a Key-Value Pair

All tests for document properties shown in the previous sections can also be implemented with the general method hasProperty(..). The method validates any document property as a key-value pair:

@Test
public void hasProperty_StandardProperties() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasProperty("Title")
            .equalsTo("PDFUnit sample - Demo for Document Infos")
            .hasProperty("Subject").equalsTo("Demo for Document Infos")
            .hasProperty("CreationDate").equalsTo("D:20131027172417+01'00'")
            .hasProperty("ModDate").equalsTo("D:20131027172417+01'00'")
  ;
}

The PDF document in the following example has two custom properties as can be seen with Adobe Reader®:

And this is the test for custom properties:

@Test
public void hasProperty_CustomProperties() throws Exception {
  String filename = "documentUnderTest.pdf";
  String key1 = "Company";
  String expectedValue1 = "Signature Perfect KG";
  String key2 = "SourceModified";
  String expectedValue2 = "D:20081204045205";
  
  AssertThat.document(filename)
            .hasProperty(key1).equalsTo(expectedValue1)
            .hasProperty(key2).equalsTo(expectedValue2)
  ;
}

To ensure that a property does not exist, use the following method:

@Test
public void hasNoProperty() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasNoProperty("OldProperty_ShouldNotExist")
  ;
}

PDF documents of version PDF-1.4 or higher can have metadata as XML (Extensible Metadata Platform, XMP). Chapter 3.38: “XMP Data” explains that in detail.