Java “understands Unicode” as does XML. So PDFUnit also “understands” Unicode. The section 7: “Unicode” deals with Unicode in detail.
This section describes a utility program that converts a Unicode string into its ASCII hex code. The hex code can be used in many of your tests. If you are using a small number of Unicode characters it is easier to use ASCII hex code than to install a new font on your computer. And maybe you don't have permission anything.
The utility ConvertUnicodeToHex
converts any string into ASCII and escapes all non-ASCII characters
into their corresponding Unicode hex code. For example, the Euro character
is converted into \u20AC
.
The input file can be of any encoding, but you have to define the right encoding before executing the program.
You start the Java program with the parameter -D
:
:: :: Converting Unicode content of the input file to hex code. :: @echo off setlocal set CLASSPATH=./lib/pdfunit-2015.10/*;%CLASSPATH% set TOOL=com.pdfunit.tools.ConvertUnicodeToHex set OUT_DIR=./tmp set IN_FILE=convert-unicode-to-hex.in.txt java -Dfile.encoding=UTF-8 %TOOL% %IN_FILE% %OUT_DIR% endlocal
The name of the output file is derived from the name of the input file.
So _convert-unicode-to-hex.out.txt
with the following content
is generated:
#Unicode created by com.pdfunit.tools.ConvertUnicodeToHex #Wed Jan 16 21:50:04 CET 2013 convert-unicode-to-hex.in_as-ascii=\u00E4\u00F6\u00FC \u20AC @
The output file is written in the encoding of the Java Runtime,
derived from the environment parameter file.encoding
.
Leading and trailing whitespaces in the input string will be trimmed! When you need them for your test, add them later by hand.