Conditionals
Conditionals allow part of the match to only be performed when a specific condition occurs,
Syntax: (?(condition)yes-pattern)
or to match one of two possibilities depending if the condition is true or false.
Syntax: (?(condition)yes-pattern|no-pattern)
Reference condition
Syntax: (?(n)yes-pattern|no-pattern)
When using a reference to a group as the condition, if the group matched, then the yes-pattern is attempted; otherwise, (if specified) the no-pattern is tried.
Note: if n is negative, the reference is a relative reference.
Relative reference condition
A negative number for n in a reference condition
results in a relative reference, with -1
referring
to the most recent group, -2
to the second most
recent, etc.
Relative references are useful in long patterns and patterns that are "building blocks" for more complex patterns.
For example, in the regex (abc(def)ghi)(?(-1)1|2)
, the condition refers to the second
group (def)
, and is equivalent to (?(2)1|2)
.
Reference condition (example)
An example of a reference condition is the regex
(a)?(?(1)b|c)
which matches only ab
or c
.
For ab
, group 1 matches the a
. Since the condition is true, the yes-pattern is
attempted, and it matches the b
, yielding the match
ab
.
For c
, group 1 doesn't match. Since the
condition is false, the no-pattern is attempted, and it matches the
c
, yielding the match c
.
Named reference condition
Syntax: (?(name)yes-pattern|no-pattern)
Name can be optionally surrounded by quotes 'name'
or angle brackets <name>
.
When using a reference to a group as the condition, if the group matched, then the yes-pattern is attempted; otherwise, (if specified) the no-pattern is tried.
Named reference condition (example)
An example of a named reference condition is the regex
(?<group>a)?(?('group')b|c)
which matches only
ab
or c
.
For ab
, group matches the a
. Since the condition is true, the yes-pattern is
attempted, and it matches the b
, yielding the match
ab
.
For c
, group doesn't match. Since the
condition is false, the no-pattern is attempted, and it matches the
c
, yielding the match c
.
Assert condition
Syntax: (?(assert)yes-pattern|no-pattern)
When using an assertion as the condition, if the assertion is true, then the yes-pattern is attempted; otherwise, (if specified) the no-pattern is tried.
Assert condition (example)
An example of an assert condition would be if you wanted a pattern to perform one match if the previous character is a digit, and another match otherwise (that is, if the previous character is a non-digit or if there is no previous character).
The regex (?(?<=\d)a|b)
meets this need. In
this case the first branch is attempted if the previous character is a digit;
otherwise, it will try to match the second branch.
It will match, for example, the a
in 1a
, but it won't match 1b
, since
the b
has a digit before it.
Assert condition repetition (limited support)
Applying a repetition to an assertion group is only partially supported.
Once the assertion is true, if the assertion is false on a later repetition, it won't match.
For example, (?(?=\d)\d|\D){3}
should match any
three characters, but it (incorrectly) doesn't match 1aa
.
Note: I'm working on a way to remove this limitation. Single iteration use (which should be the most common) is functional (except for the Java bug mentioned below).
Assert condition single iteration Java bug
This bug occurs when the assertion matches, but the string doesn't match; however, when advancing the start position, it does.
For example, the regex (?(?=[^a-z]*[a-z])aa|1)
should match the 1
in a1
.
In the above pattern, the assertion matches, but the sting doesn't match if
you start with the a
. However, when the RegEx
engine advances the "cursor" and starts with the 1
,
the assertion fails, and the 1
should match, but
(in Java) it doesn't.
Implementation note: RegExPlus refactors the inputted pattern to an equivalent form, usable by the native Java RegEx engine. The refactoring for the above pattern works in PCRE, .NET and Ruby, but not in Java - so this is a Java bug, not a RegExPlus bug. I'm currently looking for a workaround.
The refactored form
For those that are interested, the regex
(?(?=[^a-z]*[a-z])aa|1)
refactors as
(?:(?>(?=[^a-z]*[a-z])())?(?:(?=\1)aa|(?!\1)1))
This refactored form is functionally equivalent to the original regular
expression, and correctly matches the 1
in
a1
in PCRE, .NET, and Ruby, but sadly, not
in Java.