Unicode

RegExPlus includes the same unicode syntax available in Java regular expressions, and adds some additional syntax found in other regex engines.

Syntax

\xhh

\x{hhh..}

\uhhhh

\X - match a single grapheme

JavaScript is disabledThis page uses JavaScript. Your browser either doesn't support JavaScript or you have it turned off. As a result, some conviences are not available, though all content is viewable.How this page uses JavaScript:

Double clicking a (RegEx) will select the entire RegEx - this overrides the default selection behavior of the browser.

`\x`hh

Matches the character with hexadecimal value 0xhh

Note: if the first hexadecimal digit is zero, it can be omitted. For example, \x5 and \x05 are equivalent.

`\x{`hhh.`..}`

Matches the character with hexadecimal value 0xhhh..

With this syntax, you can match characters specified with either the \xFF or \uFFFF syntax, as well as supplementary characters, e.g. \x{10000}.

`\u`hhhh

Matches the character with hexadecimal value 0xhhhh

Note: you can omit any leading zeros, e.g. \uE0 and \u00E0 are both valid - they match à.

`\X`

\X matches a single grapheme. It is similar to the ., but \X always matches a newline character.

It's equivalent to the regex (?>\P{M}\p{M}*), that is, a non-combining mark, followed by zero or more combining marks.

In unicode, some characters can be encoded in multiple ways. For example, à and à look the same, but the first is a single character, while the second is the letter a followed by ̀ (a combining grave accent, U+0300).

The regex ^\X$ would match both of the above representations, whereas ^.$ would only match the first.

Unicode

\xhh

\x{hhh...}

\uhhhh

\X

Tutorials

`\x`hh

`\x{`hhh.`..}`

`\u`hhhh

`\X`