Previous Topic Next topic Print topic


C$REGEXP

This routine allows you to search strings using regular expressions.

Note: This ACUCOBOL-GT library routine is available in this COBOL version. Any compatibility issues in this COBOL system are in the Compatibility Issues section at the end of the topic.

Usage

CALL "C$REGEXP" 
    USING OP-CODE, parameters
    GIVING return-value

Parameters

Op-codes specify the operation to perform. Each operation is defined in acucobol.def and is described in detail below. Op-codes include:
Code Operation
1 AREGEXP-GET-LEVEL
2 AREGEXP-COMPILE
3 AREGEXP-MATCH
4 AREGEXP-RELEASE-MATCH
5 AREGEXP-RELEASE
6 AREGEXP-NUMGROUPS
7 AREGEXP-GETMATCH
20 AREGEXP-LAST-ERROR

Parameters vary depending on the operation selected. They provide information and hold results.

return-value: Numeric data item.

Unless otherwise noted, each operation returns a value or a status in return-value. Its contents vary by operation and the result of the operation.

Description

This routine allows you to use a regular expression to search a text string.

A regular expression is a formula for matching strings that have a certain pattern. For a complete description of regular expressions, see the POSIX 1003.2 standard appropriate for your platform. Windows platforms use the CAtlRegExp library; UNIX platforms use the POSIX C routines native to the platform.

A simple use of C$REGEXP is outlined in the following steps.

  1. Use the AREGEXP-GET-LEVEL op-code to validate that the host platform provides support for regular expressions.
  2. Validate and compile your regular expression with op-code AREGEXP-COMPILE. Your program should include an error handling routine in the event that the compiler finds an error in the expression.
  3. Use op-code AREGEXP-MATCH to apply a compiled regular expression to a string to search for a match. You may want to do this iteratively to find all matches in the string.
  4. Use op-codes AREGEXP-NUMGROUPS and AREGEXP-GETMATCH to work with subexpression matches.
  5. Manage the memory used by this routine with op-codes AREGEXP-RELEASE-MATCH and AREGEXP-RELEASE.

Op-codes and Parameters

AREGEXP-GET-LEVEL (op-code 1)

This operation indicates whether regular expression support is available on the host. Its usage is:

CALL "C$REGEXP" USING AREGEXP-GET-LEVEL GIVING return-value

The value of return-value can be one of the following (defined in acucobol.def):

AREGEXP-NONE 0 regular expression processing is not available
AREGEXP-WINDOWS    1    Windows regular expressions supported
AREGEXP-POSIX 2 POSIX regular expressions supported

AREGEXP-COMPILE (op-code 2)

This operation compiles a regular expression to ensure that it has a valid form, returning a handle to the compiled regular expression or NULL if there is an error. Its usage is:

CALL "C$REGEXP" USING AREGEXP-COMPILE, reg-expr, flags
                GIVING return-value

reg-expr

Must be a NULL-terminated regular expression. It must be NULL-terminated because trailing spaces are allowed in regular.

flags

(Optional) is the sum of one or more of the following values (defined in acucobol.def):
AREGEXP_COMPILE_IGNORECASE     1 Ignore case when matching patterns. (Windows or UNIX)
AREGEXP_COMPILE_BASIC 2 Change the type of regular expression from extended to basic. (UNIX only) (For an explanation of extended and basic, see the POSIX 1003.2 standard.)
AREGEXP_COMPILE_NO_SPECIAL 4 Treat all characters as ordinary characters with no special meaning. (UNIX only)
AREGEXP_COMPILE_NO_SUB 8 When matching, determine only if there is a match, without returning the offsets of the match. (UNIX only)
AREGEXP_COMPILE_NEWLINE 16    Treat newlines as special (end-of-line marker) characters. (UNIX only)
AREGEXP_COMPILE_NO_REGEXP 32 Treat reg-expr as text, not as a regular expression.

return-value contains a handle to the compiled expression, or NULL if an error occurred.

AREGEXP-MATCH (op-code 3)

This operation applies a regular expression to a string, and returns a handle. The match-start parameter must be initialized before AREGEXP-MATCH is attempted, as it specifies the start position in which matching should begin. To see if there is a match you need to check match-start; If it is 0, there is no match. Its usage is:

CALL "C$REGEXP" USING AREGEXP-MATCH,
   reg-expr-handle, string, length, match-start, match-end
   GIVING return-value
reg-expr-handle is a handle to a regular expression returned by AREGEXP-COMPILE.
string is the string to test for a match.
length is the length of string. If length is zero, the size of string is used.
match-start as an input parameter, match-start indicates where the matching should begin. As an output parameter, it returns the index of the start of the pattern that matched.
match-end returns one byte beyond the pattern that matched. To test the string for additional matches, start a new AREGEXP-MATCH after setting match-start to match-end.
return-value contains a handle to the match or zero if no match is found or an error occurred.

AREGEXP-RELEASE-MATCH (op-code 4)

This operation frees memory that is allocated when AREGEXP-MATCH is called. Return-value is not used. Its usage is:

CALL "C$REGEXP" USING AREGEXP-RELEASE-MATCH match-handle
match-handle is a handle to a match returned by AREGEXP-MATCH.

AREGEXP-RELEASE (op-code 5)

This operation frees the memory allocated when AREGEXP-COMPILE is called. Return-value is not used. Its usage is:

CALL "C$REGEXP" USING AREGEXP-RELEASE reg-expr-handle
reg-expr-handle is a handle to a regular expression returned by AREGEXP-COMPILE.

AREGEXP-NUMGROUPS (op-code 6)

This operation returns the number of substrings that matched any subgroups in the regular expression. Its usage is:

CALL "C$REGEXP" USING AREGEXP-NUMGROUPS match-handle
                GIVING return-value
match-handle is a handle returned by AREGEXP-MATCH.
return-value returns the number of matches.

Depending on the construction of a regular expression, it is possible for a subgroup of the regular expression to match multiple substrings. This operation reports the number of instances found in the last AREGEXP-MATCH operation. For more information, rules, and examples, see the POSIX 1003.2 documentation or one of the many books available on regular expressions.

AREGEXP-GETMATCH (op-code 7)

This operation returns a set of indices into a string passed to AREGEXP-MATCH that match the subexpression of the regular expression. Its usage is:

CALL "C$REGEXP" 
    USING AREGEXP-GETMATCH, match-handle, group, 
    idx-start, idx-end
    GIVING return-value

The parameters are defined as follows:

match-handle is a handle returned by AREGEXP-MATCH.
group is a number between 1 and the value returned by AREGEXP-NUMGROUPS.
idx-start returns an index into the beginning of the string that matches the subexpression of the regular expression.
idx-end returns an index to the end of the string that matches the subexpression of the regular expression.
return-value returns 1 if the operation succeeds, and zero if there is an error.

AREGEXP-LAST-ERROR (op-code 20)

This operation returns the last error code returned by a call to C$REGEXP. Its usage is:

CALL "C$REGEXP" USING AREGEXP-LAST-ERROR GIVING return-value

The error value is returned in return-value. The possible error values (described in acucobol.def) have the following meanings:

AREGEXP-ERROR-NO-ERROR 0 No error
AREGEXP-ERROR-NO-MEMORY 1 Insufficient memory to handle the request
AREGEXP-ERROR-BRACE-EXPECTED 2 A closing brace is missing
AREGEXP-ERROR-PAREN-EXPECTED 3 A closing parenthesis is missing
AREGEXP-ERROR-BRACKET-EXPECTED    4 A closing bracket is missing
AREGEXP-ERROR-UNEXPECTED 5 An unknown error occurred
AREGEXP-ERROR-EMPTY-RANGE 6 An empty range was given
AREGEXP-ERROR-INVALID-GROUP 7 The group provided was invalid
AREGEXP-ERROR-INVALID-RANGE 8 An invalid range was given
AREGEXP-ERROR-EMPTY-REPEATOP 9 A repeat operator was given on an empty subexpression
AREGEXP-ERROR-INVALID-INPUT 10 The input was invalid
AREGEXP-ERROR-INVALID-HANDLE 11 The handle is not a regular expression handle or a match handle
AREGEXP-ERROR-INVALID-ARGUMENT 12 One of the arguments given is invalid
AREGEXP-ERROR-INVALID-CALL-SEQ 13 The order of C$REGEXP operations is an invalid sequence.
AREGEXP-ERROR-NO-MATCH 14    The regular expression did not find a match in the given string.
Note: If the error code returned does not match a value in the list, it may be that the value is coming from the host's regular expression library. See the documentation for the host's regular expression library.

Compatibility Issues

None.
Previous Topic Next topic Print topic