13.4.  Comparing Text

Expected text and actual text on a PDF page can be compared using the following methods:

// Methods with configurable whitespace processing. Default is NORMALIZE:
.containing(searchToken                                   )   1
.containing(searchToken             , WhitespaceProcessing)   2
.containing(String[] searchTokens                         )
.containing(String[] searchTokens   , WhitespaceProcessing)
.endingWith(searchToken                                   )
.endingWith(searchToken             , WhitespaceProcessing)
.equalsTo(searchToken                                     )
.equalsTo(searchToken               , WhitespaceProcessing)
.first(searchToken                                        )
.first(searchToken                  , WhitespaceProcessing)   3
.notContaining(searchToken                                )
.notContaining(searchToken          , WhitespaceProcessing)
.notContaining(String[] searchTokens                      )
.notContaining(String[] searchTokens, WhitespaceProcessing)
.startingWith(searchToken                                 )
.startingWith(searchToken           , WhitespaceProcessing)
.then(searchToken)                                            4

// Methods with whitespace processing NORMALIZE:
.notEndingWith(searchToken)
.notStartingWith(searchToken)

// Methods without whitespace processing:
.matchingRegex(regex)
.notMatchingRegex(regex)

1

Methods without the second parameter normalize the whitespaces. That means whitespaces at the beginning and the end are removed and all sequences of any whitespace within a text are reduced to one space.

2

The processing of whitespaces in these methods is controlled by the second parameter. For this parameter, the constants IGNORE, NORMALIZE, and KEEP exist. The constants are explained separately in section 13.5: “Whitespace Processing”. They can be used in all methods with 'WhitespaceProcessing' as a second parameter.

3 4

The method then(..) always processes whitespaces in the same way as first(..).

Comparisons with regular expressions follow the rules and possibilities of the class java.util.regex.Pattern :

// Using regular expression to compare page content
@Test
public void hasText_MatchingRegex() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .restrictedTo(FIRST_PAGE)
            .hasText()
            .matchingRegex(".*[Cc]ontent.*")  
  ;
}

The methods containing(String[]) and notContaining(String[]) can be called with multiple search terms. A test with containing(String[]) is considered successful if each expected term appears on every selected page. A test with notContaining(String[]) is considered successful if none of the terms exist on any of the selected pages:

@Test
public void hasText_NotContaining_MultipleSearchTokens() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .restrictedTo(FIRST_PAGE)
            .hasText()
            .notContaining("even pagenumber", "Page #2") 
  ;
}