PDFUnit can handle Unicode. The section 11: “Unicode” deals with this topic in detail.
The following sections describe a utility program that converts a Unicode string into its hex code. The hex code can be used in many of your tests. If you are using a small number of Unicode characters it is easier to use hex code than to install a new font on your computer.
The utility ConvertUnicodeToHex
converts any string into ASCII and escapes all non-ASCII characters
into their corresponding Unicode hex code. For example, the Euro character
is converted into \u20AC
.
The input file can be of any encoding, but you have to define the right encoding before executing the program.
You start the Java program with the parameter -D
:
:: :: Converting Unicode content of the input file to hex code. :: @echo off setlocal set CLASSPATH=./lib/pdfunit-2016.05/*;%CLASSPATH% set TOOL=com.pdfunit.tools.ConvertUnicodeToHex set OUT_DIR=./tmp set IN_FILE=unicode-to-hex.in.txt java -Dfile.encoding=UTF-8 %TOOL% %IN_FILE% %OUT_DIR% endlocal
So, the created file _unicode-to-hex.out.txt
contains the following data:
#Unicode created by com.pdfunit.tools.ConvertUnicodeToHex #Wed Jan 16 21:50:04 CET 2013 unicode-to-hex.in_as-ascii=\u00E4\u00F6\u00FC \u20AC @
Leading and trailing whitespaces in the input string will be trimmed! When you need them for your test, add them later by hand.