Tests with RTL-text do not differ from tests with LTR-text, so all methods for comparing text can be used:
// Testing page content: .hasText() // pages and regions has to be specified before // Validating expected text: .hasText().containing(..) .hasText().containing(.., WhitespaceProcessing) .hasText().endingWith(..) .hasText().endingWith(.., WhitespaceProcessing) .hasText().equalsTo(..) .hasText().equalsTo(.., WhitespaceProcessing) .hasText().matchingRegex(..) .hasText().startingWith(..) // Prove the absence of defined text: .hasText().notContaining(..) .hasText().notContaining(.., WhitespaceProcessing) .hasText().notEndingWith(..) .hasText().notMatchingRegex(..) .hasText().notStartingWith(..) // Validate multiple text in an expected order: .hasText().inOrder(..) .hasText().containingFirst(..).then(..)
The next examples use two PDF documents which contain the text 'hello, world' in Arabic and in Hebrew:
// Testing RTL text: @Test public void hasRTLText_HelloWorld_Arabic() throws Exception { String filename = "helloworld_ar.pdf"; String rtlHelloWorld = "مرحبا، العالم"; // english: 'hello, world!' int leftX = 97; int upperY = 69; int width = 69; int height = 16; PageRegion pageRegion = new PageRegion(leftX, upperY, width, height); AssertThat.document(filename) .restrictedTo(FIRST_PAGE) .restrictedTo(pageRegion) .hasText() .startingWith(rtlHelloWorld) ; }
// Testing RTL text: @Test public void hasRTLText_HelloWorld_Hebrew() throws Exception { String filename = "helloworld_iw.pdf"; String rtlHelloWorld = "שלום, עולם"; // english: 'hello, world!' int leftX = 97; int upperY = 69; int width = 69; int height = 16; PageRegion pageRegion = new PageRegion(leftX, upperY, width, height); AssertThat.document(filename) .restrictedTo(FIRST_PAGE) .restrictedTo(pageRegion) .hasText() .endingWith(rtlHelloWorld) ; }
It's interesting that the Java-editor in Eclipse can handle text with both text directions. Here is a screenshot of the Java code from the previous example:
Internally, PDFUnit uses the PDF-Parser PDFBox. PDFBox parses RTL-text and converts it into a Java string without the need for any special method calls. Congratulations to the development team for such an achievement!