The table below describes the structure of regular expressions and shows which characters may be used:
Character | Description | |
---|---|---|
[1] | char | Matches itself, unless it is a special character (metachar): . \ [ ] * + ^ $ |
[2] | . | Matches any character |
[3] | \ |
Matches the subsequent character unless it is a left or right round bracket, a digit 1 to 9 or a left or right angle bracket (see [7], [8] and [9]). It is used as an escape character for all other meta-characters, and itself. When used in a set (see [4]), it is treated as an ordinary character. |
[4] | [set] |
Matches one of the characters in the set. If the first character in the set is "^", it matches a character NOT in the set. The shorthand “S-E” is used to specify a set of characters from S to E, inclusively. The special characters "]" and "-" have no special meaning if they appear as the first chars in the set. Examples: [a-z] any lowercase letter [^]-] any character except ] and - [^A-Z] any character except uppercase letters [a-zA-Z] any letter |
[5] | * | Any regular expression of [1] to [4] followed by the closure character (*) matches a string of zero or more characters that have that form. |
[6] | + | Same as [5], except it matches one or more. |
[7] | A regular expression of form [1] to [10] enclosed as \(form\) matches whatever the specified form matches. The enclosure creates a set of tags, used for [8] and for pattern substitution. The tagged forms are numbered starting from 1. | |
[8] | A \ character followed by a digit 1 to 9 matches whatever a previously tagged regular expression ([7]) matched. For example, “\5” represents the fifth pattern specified in the “\(form\)” format. | |
[9] |
\< \> |
A regular expression starting with a \< construct and/or ending with a \> construct, restricts the pattern matching to the beginning of a word, and/or the end of a word. A word is defined to be a character string beginning and/or ending with the characters A-Z a-z 0-9 and _. It must also be preceded and/or followed by any character other than those mentioned. |
[10] | A composite regular expression xy, where x and y are of form [1] to [10] matches the longest match of x followed by a match for y. | |
[11] | ^ $ | A regular expression starting with a ^ character and/or ending with a $ character restricts the pattern matching to the beginning of the line or to the end of line. Elsewhere in the expression, ^ and $ are treated as ordinary characters. |
Pattern | Matches |
---|---|
foo*.* | fo foo fooo foobar fobar foxx ... |
fo[ob]a[rz] | fobar fooar fobaz fooaz |
foo\\+ | foo\ foo\\ foo\\\ ... |
\(foo\)[1-3]\1(same as foo[1-3]foo) | foo1foo foo2foo foo3foo |
\(fo.*\)-\1 | foo-foo fo-fo fob-fob foobar-foobar ... |