outwit.com
Regular Expressions Quick Start

Marking a Regular Expression: /myRegExp/

Most of the time, simple strings will be enough as markers or selection criteria. Such a litteral string must be typed and will be searched as is in the data. Therefore, if you want to use a regular expression instead, you must mark it, so that the program can identify it as such. This is done by adding a / before and after the reg exp pattern.

Escaping Special Characters

Characters that are used in the regular expressions syntax, like .$*+-^\(){}[]/ should be 'escaped' when used literally in a regular expression (i.e. when used as the character itself, not as part of the reg. exp. syntax). Escaping means placing a backslash character \ before that special character to have it be treated literally. To search for a backslash character, for instance, double it \\ so that its first occurrence will escape the second.

Most common "special" characters in Reg Exp:

Wildcard

. (dot): any character except a line break (or carriage return)

Character Classes (Ranges of Characters)

In a character class, a caret character ^ excludes all characters specified by a character class, if placed immediately after the opening bracket [^... ].

[abc] list: any of the character a, b, c

[^abc] exclusion list: any character except a, b, c

[a-z] range: any character from a to z

[^aeiou] any character which is not a vowel

[a-zA-Z0-9] any character from a-z, A-Z, or 0-9

[^0-9aeiou] any character that is neither a digit nor a vowel

Escaped matching characters

\r line break (carriage return)
\n Unix line break (line feed)
\t tab
\f page break (form feed)
\\ backslash

\s any space character (space, tab, return, line feed, form feed)
\S any non-space character (any character not matched by \s)
\w any word character (a-z, A-Z, 0-9, _, and certain 8-bit characters)
\W any non-word character (all characters not included by \w, incl. returns)
\d any digit (0-9)
\D any non-digit character (including carriage return)
\b any word boundary (position between a \w character and a \W character)
\B any position that is not a word boundary

Alternation

| (pipe): Separates two expressions and matches either

Position

^: (when not in a character class) beginning of string
$: end of string

Quantifiers

x*: zero or more x
x+: one or more x
x?: zero or one x
x{COUNT}: exactly COUNT x, where COUNT is an integer
x{MIN,}: at least MIN x, where MIN is an integer
x{MIN, MAX}: at least MIN x, but no more than MAX

Note:
+ and * are 'greedy': they match the longest string possible. If you do not want this "longest match" behavior, you can use non-greedy quantifiers, by adding a ?.

*?: zero or more (non-greedy)
+?: one or more (non-greedy)
??: zero or one (non-greedy)

For example, Instead of:
/\<SCRIPT>.*<\/SCRIPT>/
Use the non-greedy quantifier:
/\<SCRIPT>.*?<\/SCRIPT>/