Assertions

Assertions allows a regular expression to match only under certain controlled conditions.

An assertion does not need a character to match, it rather investigates the surroundings of a possible match before acknowledging it. For example the word boundary assertion does not try to find a non word character opposite a word one at its position, instead it makes sure that there is not a word character. This means that the assertion can match where there is no character, i.e. at the ends of a searched string.

Some assertions actually do have a pattern to match, but the part of the string matching that will not be a part of the result of the match of the full expression.

Regular Expressions as documented here supports the following assertions:

^ (caret: beginning of string)

Matches the beginning of the searched string.

The expression ^Peter will match at Peter in the string Peter, hey! but not in Hey, Peter!

$ (end of string)

Matches the end of the searched string.

The expression you\?$ will match at the last you in the string You didn't do that, did you? but nowhere in You didn't do that, right?

\b (word boundary)

Matches if there is a word character at one side and not a word character at the other.

This is useful to find word ends, for example both ends to find a whole word. The expression \bin\b will match at the separate in in the string He came in through the window, but not at the in in window.

\B (non word boundary)

Matches wherever \b does not.

That means that it will match for example within words: The expression \Bin\B will match at in window but not in integer or I'm in love.

(?=PATTERN) (Positive lookahead)

A lookahead assertion looks at the part of the string following a possible match. The positive lookahead will prevent the string from matching if the text following the possible match does not match the PATTERN of the assertion, but the text matched by that will not be included in the result.

The expression handy(?=\w) will match at handy in handyman but not in That came in handy!

(?!PATTERN) (Negative lookahead)

The negative lookahead prevents a possible match to be acknowledged if the following part of the searched string does match its PATTERN.

The expression const \w+\b(?!\s*&) will match at const char in the string const char* foo while it can not match const QString in const QString& bar because the & matches the negative lookahead assertion pattern.

(?<=PATTERN) (Positive lookbehind)

Lookbehind has the same effect as the lookahead, but works backwards. A lookbehind looks at the part of the string previous a possible match. The positive lookbehind will match a string only if it is preceded by the PATTERN of the assertion, but the text matched by that will not be included in the result.

The expression (?<=cup)cake will match at cake if it is succeeded by cup (in cupcake but not in cheesecake or in cake alone).

(?<!PATTERN) (Negative lookbehind)

The negative lookbehind prevents a possible match to be acknowledged if the previous part of the searched string does match its PATTERN.

The expression (?<![\w\.])[0-9]+ will match at 123 in the strings =123 and -123 while it can not match 123 in .123 or word123.

(PATTERN) (Capturing group)

The sub pattern within the parentheses is captured and remembered, so that it can be used in back references. For example, the expression (&quot;+)[^&quot;]*\1 matches """"text"""" and "text".

See the section Capturing matching text (back references) for more information.

(?:PATTERN) (Non-capturing group)

The sub pattern within the parentheses is not captured and is not remembered. It is preferable to always use non-capturing groups if the captures will not be used.