3.9.  Fonts

Overview

Fonts are a difficult topic in PDF documents. The PDF standard defines 14 fonts, but does your document use any others? Fonts are also important for archiving. PDFUnit provides several tags for different requirements.

<!-- Tags to test fonts: -->

<hasNumberOfFonts identifiedBy=".."  (Filters are explained later)
/>
  
<hasFonts ofThisTypeOnly="..">       (Either this attribute, 
  <matchingXPath />                  or one of the
  <matchingXML   />                  nested elements.)
</hasFonts>

<hasFont withNameContaining=".."     (One of this two
         withNameNotContaining=".."  attributes is required)
/>

Number of Fonts

What is a font? Should a subset of a font count as a separate font? In most situations this question is irrelevant for developers, but for a testing tool the question has to be answered. And not only a testing tool - every PDF tool has to make this decision. That is the reason why they show different numbers of fonts for the same document. Since a goal of unit testing is that the second run of a test has the same result as the first one, it doesn't really matter how fonts are identified.

In PDFUnit two fonts are equal to each other, when the compared criteria have the same values. The criteria you want to test can be set with the attribute identifiedBy="..":

<!-- Constants to identify fonts: -->

identifiedBy="ALLPROPERTIES"                            
identifiedBy="BASENAME"                        
identifiedBy="BASENAME_ENCODING"               
identifiedBy="BASENAME_ENCODING_ENCODINGDIFF"  
identifiedBy="CONVERTIBLE2UNICODE"             
identifiedBy="EMBEDDED"                        
identifiedBy="EMBEDDED_CONVERTIBLE2UNICODE"    
identifiedBy="NAME"                            
identifiedBy="NAME_TYPE"                       
identifiedBy="TYPE"

The following list explains the available criteria to compare fonts.

Constant Description
ALLPROPERTIES All properties of a font are used to identify a font. Two fonts having the same values for all properties considered equal.
BASENAME Fonts are different when they have different base fonts.
BASENAME_ENCODING The combination of the name of a base font and the encoding are used to distinguish fonts.
BASENAME_ENCODING_ENCODINGDIFF Two fonts have to have same values in the properties basename, encoding and the property encoding-difference to be considered equal. The Encoding-difference is the value of the PDF object with the key /Differences.
CONVERTIBLE2UNICODE This filter means that only fonts are considered, which are convertible into Unicode.
EMBEDDED This filter counts only fonts that are embedded.
EMBEDDED_CONVERTIBLE2UNICODE In addition to the previous filter the ability of a font to be converted into Unicode is the other distinguishing property.
NAME Only the fonts' names are relevant to the test.
NAME_TYPE Only the font name and font type are used to compare fonts.
TYPE Only the types of the fonts are considered in the comparison.

The following example shows all filters:

<testcase name="hasNumberOfFonts_Japanese">
  <assertThat testDocument="fonts/fonts_11_japanese.pdf">
    <hasNumberOfFonts identifiedBy="ALLPROPERTIES">65</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="BASENAME">9</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="BASENAME_ENCODING">16</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="BASENAME_ENCODING_ENCODINGDIFF">16</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="CONVERTIBLE2UNICODE">46</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="EMBEDDED">6</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="EMBEDDED_CONVERTIBLE2UNICODE">0</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="NAME">50</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="NAME_TYPE">55</hasNumberOfFonts>
    <hasNumberOfFonts identifiedBy="TYPE">3</hasNumberOfFonts>
  </assertThat>
</testcase>

Font Names

Testing the names of fonts are easy:

<testcase name="hasFont_WithNameContaining">
  <assertThat testDocument="fonts/fonts_15_openoffice.pdf">
    <hasFont withNameContaining="Arial" />
  </assertThat>
</testcase>

Sometimes font names in a PDF document have a prefix, e.g. FGNNPL+ArialMT. Because this prefix is worthless for tests, PDFUnit only checks whether the desired font name is a substring of the existing font names.

You can test multiple font names in one test:

<testcase name="hasHasFont_MultipleNames">
  <assertThat testDocument="fonts/fonts_15_openoffice.pdf">
    <hasFont withNameContaining="Arial" />
    <hasFont withNameContaining="Georgia" />
    <hasFont withNameContaining="Tahoma" />
    <hasFont withNameContaining="TimesNewRoman" />
    <hasFont withNameContaining="Verdana" />
    <hasFont withNameContaining="Verdana-BoldItalic" />
  </assertThat>
</testcase>

Because it is sometimes interesting to know that a particular font is not included in a document, PDFUnit provides a suitable test for it:

<testcase name="hasFontWithName_NotContaining">
  <assertThat testDocument="fonts/fonts_15_openoffice.pdf">
    <hasFont withNameNotContaining="ComicSansMS" />
  </assertThat>
</testcase>

Complex tests for font names can be implemented using XPath. They are described later in this chapter:

Font Types

You can check that all fonts used in a PDF document are of a certain type:

<testcase name="hasFonts_OfThisTypeOnly_TrueType">
  <assertThat testDocument="fonts/fonts_15_openoffice.pdf">
    <hasFonts ofThisTypeOnly="TRUETYPE" />
  </assertThat>
</testcase>

Predefined font types are:

<!-- constants for font types -->

ofThisTypeOnly="CID" 
ofThisTypeOnly="CID_TYPE0" 
ofThisTypeOnly="CID_TYPE2" 
ofThisTypeOnly="CJK" 
ofThisTypeOnly="MMTYPE1" 
ofThisTypeOnly="OPENTYPE" 
ofThisTypeOnly="TRUETYPE" 
ofThisTypeOnly="TYPE0" 
ofThisTypeOnly="TYPE1" 
ofThisTypeOnly="TYPE3"

XML for Font Tests

You can extract all properties of all fonts from a PDF document into an XML file using the utility ExtractFontsInfo. This XML file can be used for various tests.

The file contains the following information:

<?xml version="1.0" encoding="UTF-8" ?>
<fontlist>
  ...
  <font name="Courier"             baseFontName="Courier" 
        type="Type1"               embedded="false" 
        encoding="WinAnsiEncoding" convertibleToUnicode="false" 
  />
  <font name="FGNNPL+ArialMT"      baseFontName="ArialMT" 
        type="TrueType"            embedded="true" 
        encoding="WinAnsiEncoding" convertibleToUnicode="false" 
  />
  ...
</fontlist>

Here is a test based on that XML file:

<testcase name="hasFontsMatchingXML_ComparedAsFile">
  <assertThat testDocument="fonts/fonts_52_itext.pdf">
    <hasFonts>
      <matchingXML file="fonts/fonts_52_itext.xml" />
    </hasFonts>
  </assertThat>
</testcase>

Whitespaces are ignored when comparing an XML file with the font properties of a PDF document.

XPath for Font Tests

Sophisticated tests can be implemented using XPath queries:

<!-- 
  This XML code needs double quotes outside and single quotes inside, 
  because the generated Java code also needs double quotes outside. 
-->
<testcase name="hasFontsMatchingXPath_MultipleInvocation">
  <assertThat testDocument="fonts/fonts_52_itext.pdf">
    <hasFonts>
      <matchingXPath expr="count(//font[@baseFontName='ArialMT']) = 1" />
      <matchingXPath expr="count(//font[@type='Type1']) = 5" />
    </hasFonts>
  </assertThat>
</testcase>

If you have problems with XPath, extract the font information with the utility ExtractFontsInfo and verify the XPath expression against the XML file. You can use Eclipse's has the XPath-View.

Further information about XPath can be found in chapter 8: “Using XPath”.