RegExPlus

Back References

Back references provide a way to match previously captured text.

You can specify a back reference by number, by name, or by group.

By number

\n - reference by number ("reluctant")

Quoted from the Java API for the Pattern class

In this class, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.

For example, if \123 is the back reference, but only 11 (or fewer) groups exist at the current point in the regular expression, then the group (n) would be 1 (since group 12 hasn't occurred yet), and the 23 is treated as literal digits.

If the PERL_OCTAL flag were to be set, the above "back reference" would instead be treated as the octal character \123.

\gn - reference by number ("greedy")

When using this syntax, the group number (n) is the entire number specified as the back reference.

For example, if \g123 is the back reference, the group number would be 123, regardless how many groups exist at the current point in the regular expression, or how many groups exist in all.

Note: in some cases, the group may refer to a non-existent group or be a forward reference (a reference to a group which hasn't yet occurred).


If n is negative, the back reference is a relative reference.

\g{n} - reference by number ("exact")

When using this syntax, the group number (n) is the number specified between the curly brackets.

For example, if \g{12}3 is the back reference, the group would be 12, and the 3 is treated as a literal digit.

Note: in some cases, the group may refer to a non-existent group or be a forward reference (a reference to a group which hasn't yet occurred).


If n is negative, the back reference is a relative reference.

Relative reference

A negative number for n in the \gn and \g{n} syntax results in a relative reference, with -1 referring to the most recent group, -2 to the second most recent, etc.

Relative references are useful in long patterns and patterns that are "building blocks" for more complex patterns.

For example, in the regex (abc(def)ghi)\g{-1}, the back reference \g{-1} is to the second group (def), and is equivalent to \2.

Octal characters (ambiguity)

Quoted from the Java API for the Pattern class

In Perl, \1 through \9 are always interpreted as back references; a backslash-escaped number greater than 9 is treated as a back reference if at least that many subexpressions exist, otherwise it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero.

Since RegExPlus is a Java library, by default, the above behavior is performed.

To instead use Perl's octal syntax in a pattern (as described above), specify the PERL_OCTAL flag.

By name

With the addition of named groups comes the ability to reference a group by name. A reference by name refers to the first matched group with the specified group name.


For example, the below regex matches the number, followed by it's first digit.

(?<digit>[12])[34]\k<digit>

and matches 131, 141, 232, and 242.

For example, the below regex matches the name of a day, followed by the abbreviated version

(?J:(?<Day>Sat)urday|(?<Day>Sun)day)\k<Day>

and matches both SaturdaySat and SundaySun.


Note: the (?J) embedded flag is to allow duplicate names.

By name syntaxes

For a by name reference, the supported syntaxes are the same ones supported by PCRE.

  • \k<name>
  • \k'name'
  • \g{name}
  • \k{name}
  • (?P=name)

Note: the different syntaxes function in exactly the same way.

By group

Referencing by group extends a named reference by adding an occurrence.

Syntax: use one of the by name syntaxes with the name being groupName[occurrence].


The occurrence value starts at 1 and is similar to an array index. It allows specifying a specific group - useful if multiple groups with the same name exist.

For example, group[1] is the first occurrence of group, and group[2] is the second.

Note: if occurrence is greater than the group count group count for the group name, the reference is to a non-existent group.

Relative Occurrence

By Group - Syntax: use one of the by name syntaxes with the name being groupName[occurrence].


If occurrence is less than 0, it is a relative reference, with -1 being the most recent occurrence, -2 being the second most recent, etc.

For example, in the regex (?J)(?<group>1)(?<group>2)\g{group[-1]}, group[-1] refers to the second occurrence of group, with the RegEx matching 122.

Omitting the occurrence

By Group - Syntax: use one of the by name syntaxes with the name being groupName[occurrence].


If the occurrence is zero or omitted, the behavior is identical to if the reference was one by name - the reference refers to the first matched group with the given group name.


For example, (1)\g{[1][0]} is the same as (1)\g{[1]} (which is the same as (1)\g{1}).

And for named groups, (?'name'1)\g{name[0]} is the same as (?'name'1)\g{name}.

By group - by number

By Group - Syntax: use one of the by name syntaxes with the name being groupName[occurrence].


The by group syntax allows referring to a group by number.

[groupNumber][occurrence]

In this case, the groupName is the group number surrounded by square brackets. The reference looks like an array index, with the name omitted.


This syntax is useful if you want to refer to a specific branch in a branch reset pattern.

For example, the regex

(?|(0x[[:xdigit:]]++)|(\d++)) (?([1][1])hex|dec)

will match a hexadecimal number, followed by a space and the word hex, or a decimal number, followed by a space and the word dec - matches include 0x123 hex and 456 dec.

Note: you can specify a relative reference by specifying a negative group number.

Forward references

A forward reference is a reference to a group that hasn't yet occurred (but does occur later in the pattern).

In Java 1.5, only forward references to the next group are valid. For example, the regex \1(1) is valid, but \2(1)(2) will throw a PatternSyntaxException. PatternSyntaxException.


Java 1.6+ supports forward references to groups 1 through 9; however, currently, there is no way to refer to groups above 9.

If there is a forward reference to the 10th capture group (or higher), a PatternSyntaxException PatternSyntaxException is throw to indicate the error.

For example, the pattern ()()()()()()()()()(?:\g10a|(a)) has a forward reference \g10, and will throw a PatternSyntaxException.

Nested references

A nested reference is a back reference inside the capturing group that it references,

(\1two|(one))+.


Nested references are fully supported by Java, even for groups 10 and up,

()()()()()()()()()(\g10two|(one))+.


Both of these RegExes match, for example, oneonetwo.

Non-existent groups

By default, a back reference to a non-existent group will not thrown an exception, but if encountered will always fail (resulting in backtracking).

Verify groups flag

If the VERIFY_GROUPS flag is set when compiling a Pattern, a PatternSyntaxException will be thrown if a back reference refers to a non-existent group, whereas, by default, no exception is thrown.

By setting this flag, you can verify, at compile time, that all back references refer to existing groups.