
Care should be taken with a complement list, as regular expressions are always multi-line, and hence means any character except an alphabetic character. The complement of the characters in the set. ] (these are collating sequence in Spanish). You can use a collating sequence in character ranges, like in

This indicates a set of characters, for example, \Ca all stand for the SOH control character 0x01. The TAB control character 0x09 (tab, or hard tab, horizontal tab).Ĭharacter by stripping all but its 6 lowest order bits. This is part of the DOS/Windows end of line sequence CR-LF, and was the EOL character on Mac 9 and earlier. The CR control character 0x0D (carriage return). This is the regular end of line under Unix systems. The LF control character 0x0A (line feed). The FF control character 0x0C (form feed). This is only allowed inside a character class definition. The BS control character 0x08 (backspace). See also the discussion on character ranges. This trick also works with symbolic names of control characters, like For instance, in Spanish, "ch" is a single letter, though it is written using two characters. If the document is ANSI encoded, this construct is invalid.Ī single byte character whose code in octal isĬollating sequence stands for. Like above, but matches a full 16-bit Unicode character. \圎9 may match an é or a θ depending on the code page in an ANSI encoded document.

What this stands for depends on the text encoding. \d stands for "a digit", while "d" is just an ordinary letter. Adding the backslash (this is calledĮscaping) works the other way round, as it makes special a character that otherwise isn't. [ and not as the start of a character set. Г that would otherwise have a special meaning. This is useful if you have a Unicode encoded text with accents as separate, combining characters. Matches a single non-combining characer followed by any number of combining characters. will only match characters within a line, and not the line ending characters ( matches newline", the dot will indeed do that, enabling the "any" character to run over multiple lines.

In a regular expression (shortened into regex throughout), special characters interpreted are: Single-character matches.
