9.2. Convert Unicode Text into Hex Code

PDFUnit can handle Unicode. The section 11: “Unicode” deals with this topic in detail.

The following sections describe a utility program that converts a Unicode string into its hex code. The hex code can be used in many of your tests. If you are using a small number of Unicode characters it is easier to use hex code than to install a new font on your computer.

The utility ConvertUnicodeToHex converts any string into ASCII and escapes all non-ASCII characters into their corresponding Unicode hex code. For example, the Euro character is converted into \u20AC.

The input file can be of any encoding, but you have to define the right encoding before executing the program.

Program Start

You start the Java program with the parameter -D:

::
:: Converting Unicode content of the input file to hex code.
::
  
@echo off
setlocal
set CLASSPATH=./lib/pdfunit-2016.05/*;%CLASSPATH%

set TOOL=com.pdfunit.tools.ConvertUnicodeToHex
set OUT_DIR=./tmp
set IN_FILE=unicode-to-hex.in.txt

java -Dfile.encoding=UTF-8 %TOOL%  %IN_FILE%  %OUT_DIR% 
endlocal

Input

The input file unicode-to-hex.in.txt contains this data:

äöü € @

Output

So, the created file _unicode-to-hex.out.txt contains the following data:

#Unicode created by com.pdfunit.tools.ConvertUnicodeToHex
#Wed Jan 16 21:50:04 CET 2013
unicode-to-hex.in_as-ascii=\u00E4\u00F6\u00FC \u20AC @

Leading and trailing whitespaces in the input string will be trimmed! When you need them for your test, add them later by hand.

Prev	Up	Next
Chapter 9. Utility Programs	Home	9.3. Extract Field Information to XML