Appendix B: Regular Expressions

The UNIX Option uses regular expressions to perform the search/replace operations that are possible during both the import and publish operations.

During search/replace operations, each text file is read as a series of lines. Each line is processed by applying all of the applicable search/replace patterns to it. The order in which these patterns are applied is controlled by the order in which they were specified in the UNIX Option Setup

Note: Because each line is processed individually, it is not possible to write a pattern that can search across multiple lines.

The syntax for the regular expressions is very similar to the syntax used by the UNIX "grep" command. Regular expressions include both normal characters and metacharacters. Metacharacters have special meaning, or change the meaning of other regular characters. For example, if you have used the DOS prompt on a PC, you will be familiar with the dir *.* command; in this case, the asterisks (wildcards) are metacharacters which are equivalent to zero or more normal characters.

Search Patterns

The following metacharacters are supported for defining search patterns:

Meta-character Meaning
^ Matches the start of the line. Inside a character class, it negates the class
$ End of line
. Matches any character
[ Start of character class
] End of character class
* Matches 0 or more occurrences of the preceding regular expression
+ Matches 1 or more occurrences of the preceding regular expression
? Matches exactly 0 or 1 occurrence of the preceding regular expression
| Matches expression on either the left side or right side of it
( Start of substring
) End of substring
" Delimit character for a literal string
\ Escape character

Replace Patterns

The following metacharacters are supported for defining replace patterns:

Meta-character Meaning
& The string that the search pattern matched. If it os followed by a number (n) between 1 and 9, it is the string that matched substring number n.
\ Escape character

Escape Characters

The escape character is used to escape the special meaning of metacharacters.

For example, a search pattern of "$HOME" would fail because the $ character has a special meaning. To make this work correctly, you would specify the pattern as "\$HOME"; the backslash indicates that the special meaning of the character that follows it should be ignored.

In addition, the escape character is used to define some special characters that are difficult or impossible to represent otherwise. These are termed escape sequences and the following are recognized:

Escape Sequence Meaning
\b Backspace
\e ASCII escape character
\f Form feed
\n New line
\r Carriage return
\s Space
\t Tab
\\ Backslash character
\ddd Character specified by 1 - 3 octal digits (d)
\xdd Character specified by 1 - 2 hexadecimal digits (d)
\x^c Control character specified by letter (c)

Filename Patterns

The filename patterns on the search/replace dialogs use a standard UNIX-style wildcard matching syntax instead of full regular expressions. The following metacharacters are recognized:

Meta-character Meaning
* Any string of 0 or more characters
? Any single character
[] Define a character class for a single character
\ Escape any of the previous special characters. Use "\\" to match a backslash

Search Examples

The following examples introduce the various metacharacters.

Search Pattern Meaning
^Start Matches the word "Start" if it is the first thing on the line of text.
End$ Matches the word "End" if it is the last thing on a line of text.

Note: The UNIX Option does not pass the line termination characters e.g. CRLF or LF, to the search pattern.

file\.dat Matches the exact word "file.dat" anywhere on the line of text.

Note: The escape character is used before the . since the period is a metacharacter.

file.\.dat This is an example of a metacharacter. The . matches any one valid character. This pattern matches strings such as "filea.dat", "fileX.dat", "file9.dat", and so on.
file..\.dat Metacharacters can be used multiple times. This example matches any strings that contain "file", followed by exactly two characters, followed by ".dat".
file..?\.dat This is an example of a repeating metacharacter. The "?" character matches exactly 0 or 1 occurrences of the previous regular expression, which in this case is a . metacharacter. This example therefore matches any strings that contains "file", followed by 1 or 2 other characters, followed by ".dat".
file.*\.dat This example contains another repeating metacharacter. The "*" matches 0 or more of the preceding regular expression, which again is a . metacharacter. This example matches "file", followed by any number of valid characters followed, by ".dat".
file[ABC]\.dat This is an example of a character class. A character class contains a list of valid characters, in this case the letters A, B and C. This pattern matches "fileA.dat", "fileB.dat" or "fileC.dat".
file[0-9]\.dat A character class can contain a range of characters; this is specified using a hyphen. This example defines a character class that matches any number from 0 to 9. This pattern matches "file", followed by a numeric digit, followed by ".dat".
file[0-9A-F]+\.dat This is example is the most complex so far. The character class contains two ranges, 0 through 9, and A through F; that is, a hexadecimal digit. The "+" metacharacter matches 1 or more of the preceding regular expression, which is the character class. This pattern therefore matches "file", followed by 1 or more hexadecimal digits, followed by ".dat".
file(\.dat)? Substrings can be used to group multiple character together into one logical regular expression. In this example, the "\.dat" pattern is within a substring a followed by a "?" metacharacter. The "?" matches exactly 0 or 1 occurrences of the preceding regular expression, which, in this case is the entire substring.This pattern therefore matches "file" or "file.dat".

Note: Without the substring, the pattern "file\.dat?" would match "file.da" or "file.dat".

file\.(dat)|(idx) This example contains substrings and the option metacharacter "|". The option metacharacter matches either the regular expression on the left or the regular expression on the right. This pattern matches "file." followed by "dat" or "idx"
"file.dat" When a search string is encased in double quotes, it ignores all other metacharacters within the quotes (except the escape character). This example matches "file.dat"

Replace Examples

The true power of regular expressions becomes apparent when you can replace whatever it is that you matched as part of the search. The substring operator is essential for you to be able to set the focus on whatever it is that you want to replace.

Search Pattern Replace Pattern Comment
"file.dat" newfile Searches for a literal string and a direct replacement with a different literal string.
(.*)\.htm &1.html Searches for any string ending in ".htm" and replaces it with the string that the search pattern matched, followed by ".html".
\"file([0-9A-F]+ \.dat\" "newname&1.data" This search statement is an extension of one of the previous examples. It searches for a hexadecimal based filename within quotes. The quotes are escapeed and the substring delimiter around the hexadecimal digits sets the focus we want. The replacement string is "newname" followed by the hexadecimal digits from the search string, then the new extension. So, "file9F.dat" would become "newname9F.data"

Copyright © 2007 Micro Focus (IP) Ltd. All rights reserved.