RegExPlus

Conditionals

Conditionals allow part of the match to only be performed when a specific condition occurs,

Syntax: (?(condition)yes-pattern)

or to match one of two possibilities depending if the condition is true or false.

Syntax: (?(condition)yes-pattern|no-pattern)

Reference condition

Syntax: (?(n)yes-pattern|no-pattern)

When using a reference to a group as the condition, if the group matched, then the yes-pattern is attempted; otherwise, (if specified) the no-pattern is tried.

Note: if n is negative, the reference is a relative reference.

Relative reference condition

A negative number for n in a reference condition results in a relative reference, with -1 referring to the most recent group, -2 to the second most recent, etc.

Relative references are useful in long patterns and patterns that are "building blocks" for more complex patterns.

For example, in the regex (abc(def)ghi)(?(-1)1|2), the condition refers to the second group (def), and is equivalent to (?(2)1|2).

Reference condition (example)

An example of a reference condition is the regex (a)?(?(1)b|c) which matches only ab or c.

For ab, group 1 matches the a. Since the condition is true, the yes-pattern is attempted, and it matches the b, yielding the match ab.

For c, group 1 doesn't match. Since the condition is false, the no-pattern is attempted, and it matches the c, yielding the match c.

Named reference condition

Syntax: (?(name)yes-pattern|no-pattern)

Name can be optionally surrounded by quotes 'name' or angle brackets <name>.


When using a reference to a group as the condition, if the group matched, then the yes-pattern is attempted; otherwise, (if specified) the no-pattern is tried.

Named reference condition (example)

An example of a named reference condition is the regex (?<group>a)?(?('group')b|c) which matches only ab or c.

For ab, group matches the a. Since the condition is true, the yes-pattern is attempted, and it matches the b, yielding the match ab.

For c, group doesn't match. Since the condition is false, the no-pattern is attempted, and it matches the c, yielding the match c.

Assert condition

Syntax: (?(assert)yes-pattern|no-pattern)

When using an assertion as the condition, if the assertion is true, then the yes-pattern is attempted; otherwise, (if specified) the no-pattern is tried.

Assert condition (example)

An example of an assert condition would be if you wanted a pattern to perform one match if the previous character is a digit, and another match otherwise (that is, if the previous character is a non-digit or if there is no previous character).

The regex (?(?<=\d)a|b) meets this need. In this case the first branch is attempted if the previous character is a digit; otherwise, it will try to match the second branch.

It will match, for example, the a in 1a, but it won't match 1b, since the b has a digit before it.

Assert condition repetition (limited support)

Applying a repetition to an assertion group is only partially supported.

Once the assertion is true, if the assertion is false on a later repetition, it won't match.

For example, (?(?=\d)\d|\D){3} should match any three characters, but it (incorrectly) doesn't match 1aa.

Note: I'm working on a way to remove this limitation. Single iteration use (which should be the most common) is functional (except for the Java bug mentioned below).

Assert condition single iteration Java bug

This bug occurs when the assertion matches, but the string doesn't match; however, when advancing the start position, it does.

For example, the regex (?(?=[^a-z]*[a-z])aa|1) should match the 1 in a1.

In the above pattern, the assertion matches, but the sting doesn't match if you start with the a. However, when the RegEx engine advances the "cursor" and starts with the 1, the assertion fails, and the 1 should match, but (in Java) it doesn't.


Implementation note: RegExPlus refactors the inputted pattern to an equivalent form, usable by the native Java RegEx engine. The refactoring for the above pattern works in PCRE, .NET and Ruby, but not in Java - so this is a Java bug, not a RegExPlus bug. I'm currently looking for a workaround.

The refactored form

For those that are interested, the regex

(?(?=[^a-z]*[a-z])aa|1)

refactors as

(?:(?>(?=[^a-z]*[a-z])())?(?:(?=\1)aa|(?!\1)1))


This refactored form is functionally equivalent to the original regular expression, and correctly matches the 1 in a1 in PCRE, .NET, and Ruby, but sadly, not in Java.

Online tools used for testing

Tutorials