9.6.  Extract Font Information to XML

As described in chapter 3.12: “Fonts” fonts are a topic which need to be tested. Information about fonts can be extracted using the utility ExtractFontInfo. This generated file shows you, how PDFUnit sees the fonts.

The algorithm that generates the XML file is the same as the one used by the PDFUnit tests.

Program Start

::
:: Extract information about fonts of a PDF document into an XML file
::

@echo off
setlocal
set CLASSPATH=./lib/aspectj-1.8.7/*;%CLASSPATH%
set CLASSPATH=./lib/bouncycastle-jdk15on-153/*;%CLASSPATH%
set CLASSPATH=./lib/commons-collections4-4.1/*;%CLASSPATH%
set CLASSPATH=./lib/commons-logging-1.2/*;%CLASSPATH%
set CLASSPATH=./lib/pdfbox-2.0.0/*;%CLASSPATH%
set CLASSPATH=./lib/pdfunit-2016.05/*;%CLASSPATH%

set TOOL=com.pdfunit.tools.ExtractFontInfo
set OUT_DIR=./tmp
set IN_FILE=fonts_11_japanese.pdf
set PASSWD=

java  %TOOL%  %IN_FILE%  %OUT_DIR%  %PASSWD%
endlocal

Input

For the Japanese PDF document fonts_11_japanese.pdf the Adobe Reader® shows the following fonts:

Output

The output file _fontinfo_fonts_11_japanese.out.xml contains the underlined names:

<?xml version="1.0" encoding="UTF-8" ?>
<fonts>
  <font basename="Arial-BoldMT"       name="Arial-BoldMT"      
        type="TrueType"               vertical="false"          embedded="false" />
  <font basename="ArialMT"            name="ArialMT"           
        type="TrueType"               vertical="false"          embedded="false" />
  <font basename="Century"            name="MEEADE+Century"    
        type="TrueType"               vertical="false"          embedded="true"  />
  <font basename="HGPGothicE"         name="MFHLHH+HGPGothicE" 
        type="Type0"                  vertical="false"          embedded="true"  />
  <font basename="MS-Gothic"          name="MDOLLI+MS-Gothic"  
        type="Type0"                  vertical="true"           embedded="true"  />
  <font basename="MS-Gothic"          name="MDOLLI+MS-Gothic"  
        type="Type0"                  vertical="false"          embedded="true"  />
  <font basename="MS-Mincho"          name="MEOFCM+MS-Mincho"  
        type="Type0"                  vertical="false"          embedded="true"  />
  <font basename="MS-PGothic"         name="MDOMCG+MS-PGothic" 
        type="Type0"                  vertical="false"          embedded="true"  />
  <font basename="MS-PGothic"         name="MDOMCG+MS-PGothic" 
        type="Type0"                  vertical="true"           embedded="true"  />
  <font basename="MS-PMincho"         name="MEKHMP+MS-PMincho" 
        type="Type0"                  vertical="false"          embedded="true"  />
  <font basename="TimesNewRomanPSMT"  name="TimesNewRomanPSMT" 
        type="TrueType"               vertical="false"          embedded="false" />
</fonts>

Because the XML file contains all subsets of a font it might differ from what Adobe Reader® shows.