regexMisleadingUnicodeCharacters
Reports characters in regex character classes that appear as single visual characters but are made of multiple code points.
β This rule is included in the tslogicalandlogicalStrictpresets.
Some characters that appear as a single visual unit are actually composed of multiple Unicode code points. When these appear in regex character classes, each code point is matched separately, which is typically not the intended behavior.
This rule detects several types of multi-code-point characters in character classes:
- Surrogate pairs: Characters like π that require two UTF-16 code units
- Combined characters: Base characters with combining marks like Γ (A + combining accent)
- Emoji with modifiers: Emoji with skin tone modifiers like πΆπ»
- Regional indicator symbols: Flag emoji like π―π΅ (two regional indicators)
- ZWJ sequences: Characters joined with a zero-width joiner like π¨βπ©βπ¦
Examples
Section titled βExamplesβ// Surrogate pair without unicode flagconst pattern = /[π]/;// Combined character (A + combining accent)const pattern = /[Γ]/;// Emoji with skin tone modifierconst pattern = /[πΆπ»]/u;// Regional indicator symbols (flag)const pattern = /[π―π΅]/u;// ZWJ sequence (family emoji)const pattern = /[π¨βπ©βπ¦]/u;// Unicode flag handles surrogate pairs correctlyconst pattern = /[π]/u;// Match outside character classconst pattern = /π/;// Use precomposed characterconst pattern = /[Γ]/;// Match emoji sequence outside character classconst pattern = /πΆπ»/;// Use \q{} syntax with v flag for grapheme clustersconst pattern = /[\q{πΆπ»}]/v;// Solo regional indicator is fineconst pattern = /[π―]/u;Options
Section titled βOptionsβThis rule is not configurable.
When Not To Use It
Section titled βWhen Not To Use ItβIf you intentionally want to match individual code points rather than visual characters, or if your regex pattern specifically needs to match partial Unicode sequences, you might prefer to disable this rule. Some specialized text processing may require matching individual surrogate halves or combining marks.
Further Reading
Section titled βFurther Readingβ- MDN: Regular expressions - Unicode character class escape
- Unicode Standard Annex #29: Unicode Text Segmentation
- MDN: RegExp - Unicode flag