Fonts are a difficult topic in PDF documents. The PDF standard defines 14 fonts, but does your document use any others? Fonts are also important for archiving. PDFUnit provides several tags for different requirements.
<!-- Tags to test fonts: --> <hasNumberOfFonts identifiedBy=".." (Filters are explained later) /> <hasFonts ofThisTypeOnly=".."> (Either this attribute, <matchingXPath /> or one of the <matchingXML /> nested elements.) </hasFonts> <hasFont withNameContaining=".." (One of this two withNameNotContaining=".." attributes is required) />
What is “a font”? Should a “subset” of a font count as a separate font? In most situations this question is irrelevant for developers, but for a testing tool the question has to be answered. And not only a testing tool - every PDF tool has to make this decision. That is the reason why they show different numbers of fonts for the same document. Since a goal of unit testing is that the second run of a test has the same result as the first one, it doesn't really matter how fonts are identified.
In PDFUnit two fonts are “equal” to each other,
when the compared criteria have the same values.
The criteria you want to test can be set with the attribute identifiedBy=".."
:
<!-- Constants to identify fonts: -->
identifiedBy="ALLPROPERTIES"
identifiedBy="BASENAME"
identifiedBy="BASENAME_ENCODING"
identifiedBy="BASENAME_ENCODING_ENCODINGDIFF"
identifiedBy="CONVERTIBLE2UNICODE"
identifiedBy="EMBEDDED"
identifiedBy="EMBEDDED_CONVERTIBLE2UNICODE"
identifiedBy="NAME"
identifiedBy="NAME_TYPE"
identifiedBy="TYPE"
The following list explains the available criteria to compare fonts.
Constant | Description |
---|---|
ALLPROPERTIES
| All properties of a font are used to identify a font. Two fonts having the same values for all properties considered equal. |
BASENAME
| Fonts are different when they have different base fonts. |
BASENAME_ENCODING
| The combination of the name of a base font and the encoding are used to distinguish fonts. |
BASENAME_ENCODING_ENCODINGDIFF
|
Two fonts have to have same values in the properties “basename”,
“encoding” and the property “encoding-difference” to
be considered equal.
The “Encoding-difference” is the value of the PDF object with
the key /Differences .
|
CONVERTIBLE2UNICODE
| This filter means that only fonts are considered, which are convertible into Unicode. |
EMBEDDED
| This filter counts only fonts that are embedded. |
EMBEDDED_CONVERTIBLE2UNICODE
| In addition to the previous filter the ability of a font to be converted into Unicode is the other distinguishing property. |
NAME
| Only the fonts' names are relevant to the test. |
NAME_TYPE
| Only the font name and font type are used to compare fonts. |
TYPE
| Only the types of the fonts are considered in the comparison. |
The following example shows all filters:
<testcase name="hasNumberOfFonts_Japanese"> <assertThat testDocument="fonts/fonts_11_japanese.pdf"> <hasNumberOfFonts identifiedBy="ALLPROPERTIES">65</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="BASENAME">9</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="BASENAME_ENCODING">16</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="BASENAME_ENCODING_ENCODINGDIFF">16</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="CONVERTIBLE2UNICODE">46</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="EMBEDDED">6</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="EMBEDDED_CONVERTIBLE2UNICODE">0</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="NAME">50</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="NAME_TYPE">55</hasNumberOfFonts> <hasNumberOfFonts identifiedBy="TYPE">3</hasNumberOfFonts> </assertThat> </testcase>
Testing the names of fonts are easy:
<testcase name="hasFont_WithNameContaining"> <assertThat testDocument="fonts/fonts_15_openoffice.pdf"> <hasFont withNameContaining="Arial" /> </assertThat> </testcase>
Sometimes font names in a PDF document have a prefix, e.g. FGNNPL+ArialMT
.
Because this prefix is worthless for tests, PDFUnit only checks whether the
desired font name is a substring of the existing font names.
You can test multiple font names in one test:
<testcase name="hasHasFont_MultipleNames"> <assertThat testDocument="fonts/fonts_15_openoffice.pdf"> <hasFont withNameContaining="Arial" /> <hasFont withNameContaining="Georgia" /> <hasFont withNameContaining="Tahoma" /> <hasFont withNameContaining="TimesNewRoman" /> <hasFont withNameContaining="Verdana" /> <hasFont withNameContaining="Verdana-BoldItalic" /> </assertThat> </testcase>
Because it is sometimes interesting to know that a particular font is not included in a document, PDFUnit provides a suitable test for it:
<testcase name="hasFontWithName_NotContaining"> <assertThat testDocument="fonts/fonts_15_openoffice.pdf"> <hasFont withNameNotContaining="ComicSansMS" /> </assertThat> </testcase>
Complex tests for font names can be implemented using XPath. They are described later in this chapter:
You can check that all fonts used in a PDF document are of a certain type:
<testcase name="hasFonts_OfThisTypeOnly_TrueType"> <assertThat testDocument="fonts/fonts_15_openoffice.pdf"> <hasFonts ofThisTypeOnly="TRUETYPE" /> </assertThat> </testcase>
Predefined font types are:
<!-- constants for font types -->
ofThisTypeOnly="CID"
ofThisTypeOnly="CID_TYPE0"
ofThisTypeOnly="CID_TYPE2"
ofThisTypeOnly="CJK"
ofThisTypeOnly="MMTYPE1"
ofThisTypeOnly="OPENTYPE"
ofThisTypeOnly="TRUETYPE"
ofThisTypeOnly="TYPE0"
ofThisTypeOnly="TYPE1"
ofThisTypeOnly="TYPE3"
You can extract all properties of all fonts from a PDF document into an XML
file using the utility ExtractFontsInfo
.
This XML file can be used for various tests.
The file contains the following information:
<?xml version="1.0" encoding="UTF-8" ?> <fontlist> ... <font name="Courier" baseFontName="Courier" type="Type1" embedded="false" encoding="WinAnsiEncoding" convertibleToUnicode="false" /> <font name="FGNNPL+ArialMT" baseFontName="ArialMT" type="TrueType" embedded="true" encoding="WinAnsiEncoding" convertibleToUnicode="false" /> ... </fontlist>
Here is a test based on that XML file:
<testcase name="hasFontsMatchingXML_ComparedAsFile"> <assertThat testDocument="fonts/fonts_52_itext.pdf"> <hasFonts> <matchingXML file="fonts/fonts_52_itext.xml" /> </hasFonts> </assertThat> </testcase>
Whitespaces are ignored when comparing an XML file with the font properties of a PDF document.
Sophisticated tests can be implemented using XPath queries:
<!-- This XML code needs double quotes outside and single quotes inside, because the generated Java code also needs double quotes outside. --> <testcase name="hasFontsMatchingXPath_MultipleInvocation"> <assertThat testDocument="fonts/fonts_52_itext.pdf"> <hasFonts> <matchingXPath expr="count(//font[@baseFontName='ArialMT']) = 1" /> <matchingXPath expr="count(//font[@type='Type1']) = 5" /> </hasFonts> </assertThat> </testcase>
If you have problems with XPath, extract the font information with the utility
ExtractFontsInfo
and verify the XPath expression against the
XML file. You can use Eclipse's has the “XPath”-View.
Further information about XPath can be found in chapter 8: “Using XPath”.