Unicode
RegExPlus includes the same unicode syntax available in Java regular expressions, and adds some additional syntax found in other regex engines.
\xhh
Matches the character with hexadecimal value 0xhh
Note: if the first hexadecimal digit is zero, it can be omitted. For
example, \x5
and \x05
are
equivalent.
\x{hhh...}
Matches the character with hexadecimal value 0xhhh..
With this syntax, you can match characters specified with either the
\xFF
or \uFFFF
syntax, as
well as supplementary characters, e.g. \x{10000}
.
\uhhhh
Matches the character with hexadecimal value 0xhhhh
Note: you can omit any leading zeros, e.g. \uE0
and \u00E0
are both valid - they
match à
.
\X
\X
matches a single grapheme. It is similar to
the .
, but \X
always
matches a newline character.
It's equivalent to the regex (?>\P{M}\p{M}*)
,
that is, a non-combining mark, followed by zero or more combining marks.
In unicode, some characters can be encoded in multiple ways. For example,
à and à
look the same, but the first is a single character, while the second is the
letter a
followed by ̀ (a combining
grave accent, U+0300).
The regex ^\X$
would match both of the above
representations, whereas ^.$
would only match the
first.