An HTML rewriter profile allows you to customize the rewriting process and specify the profile that is selected to rewrite content on a page.
Access Gateway has the following types of profiles:
The default Word profile, named default, is not specific to a reverse proxy or its proxy services.
If you enable HTML rewriting and do not define a custom Word profile for the proxy service, the default Word profile is used. This profile is preconfigured to rewrite Web Server Host Name and any other names listed in Additional DNS Name List. The preconfigured profile matches all URLs with the following content-types:
text/html |
text/javascript |
text/xml |
application/javascript |
text/css |
application/x-javascript |
When you modify the behavior of a default profile, remember its scope. If the default profile does not match your requirements, create your own custom Word profile or custom Character profile.
A Word profile searches for matches on words. For example, “get” matches the word “get” and words that begin with “get” such as “getaway”. It does not match the “get” in “together” or “beget.”
For information about how strings are replaced in a Word profile, see the following:
You must create a custom Word profile when an application requires rewrites of paths in JavaScript. If the application needs strings replaced or new content-types, these can also be added to the custom profile.
In a custom Word profile, you can also configure the match criteria so that the profile matches specific URLs. For more information, see Page Matching Criteria for Rewriter Profiles.
When you create a custom Word profile, you need to position it before the default profile in the list of profiles. Only one Word profile is applied per page. The first Word profile that matches the page is applied. Profiles lower in the list are ignored.
A custom Character profile searches for matches on a specified set of characters. For example, “top” matches the word “top” and the “top” in “tabletop,” “stopwatch,” and “topic.” If you need to replace strings that require this type of search, you must create a custom Character profile.
For information about how strings are replaced in a Character profile, see String Replacement Rules for Character Profiles.
In a custom Character profile, you can also configure the match criteria so that the profile matches specific URLs. For more information, see Page Matching Criteria for Rewriter Profiles.
After the rewriter finds and applies the Word profile that matches the page, it finds and applies one Character profile. The first Character profile that matches the page is applied. Character profiles lower in the list are ignored.
You specify the following matching criteria for selecting the profile:
The URLs to match
The URLs that cannot match
The content types to match
Use Requested URLs to Search of the profile to set up the matching policy. The first Word profile and the first Character profile that matches the page is applied. Profiles lower in the list are ignored.
URLs: The URLs specified in the policy must use the following formats:
Sample URL |
Description |
---|---|
http://www.a.com/content |
Matches pages only if the requested URL does not contain a trailing slash. |
http://www.a.com/content/ |
Matches pages only if the requested URL does contain a trailing slash. |
http://www.a.com/content/index.html |
Matches only this specific file. |
http://www.a.com/content/* |
Matches the requested URL whether it has a trailing slash and matches all files in the directory. |
http://www.a.com/* |
Matches the proxy service and everything it is protecting. |
You can specify two types of URLs. In the If Requested URL Is list, you specify the URLs of the pages you want this profile to match. In the And Requested URL Is Not list, you specify the URLs you do not want this profile to match. You can use the asterisk wildcard for a URL in the If Requested URL Is list to match pages you really don’t want this profile to match, then use a URL in the And Requested URL Is Not list to exclude them from matching.
If a page matches both a URL in the If Requested URL Is list and in the And Requested URL Is Not list, the profile does not match the page.
For example, you could specify the following URL in the If Requested URL Is list:
http://www.a.com/*
You could then specify the following URL in the And Requested URL Is Not list:
http://www.a.com/content/*
These two entries cause the profile to match all pages on the www.a.com web server except for the pages in the /content directory and its subdirectories.
IMPORTANT:If nothing is specified in either of the two lists, the profile skips the URL matching requirements and uses the content-type to determine if a page matches.
Content-Type: In the And Document Content-Type Is section, you specify the content-types you want this profile to match. To add a new content-type, click New and specify the name, such as text/dns. Search your web pages for content-types to determine if you need to add new types. To add multiple values, enter each value on a separate line.
Regardless of content-types you specify, the page matches the profile if the file extension is html, htm, shtml, jhtml, asp, or jsp and you have not specified any URL matching criteria.
The rewriter action section of the profile determines the actions the rewriter performs when a page matches the profile. Select from the following:
Inbound Actions: A profile might require these options if the proxy service has the following characteristics:
URLs appear in query strings, Post Data, or headers.
The web server uses WebDAV methods.
If your profile needs to match pages from this type of proxy service, you might need to enable the options listed below. They control the rewriting of query strings, Post Data, and headers from Access Gateway to the web server.
Rewrite Inbound Query String Data: Select this option to rewrite the domain and URL in the query string to match the web server configuration or to remove the path from the query string on a path-based multi-homing proxy with the Remove Path on Fill option enabled.
Rewrite Inbound Post Data: Select this option to rewrite the domain and URL in the Post Data to match the web server configuration or to remove the path from the Post Data on a path-based multi-homing proxy with the Remove Path on Fill option enabled.
Rewrite Inbound Headers: Select this option to rewrite the following headers:
The inbound options are not available for a Character profile.
Enabling or Disabling Rewriting: The Enable Rewriter Actions option determines whether the rewriter performs any actions:
Select the option to have the rewriter rewrite the references and data on the page.
Leave the option deselected to disable rewriting. This allows you to create a profile for the pages you do not want rewritten.
Additional Names to Search for URL Strings to Rewrite with Host Name: Use this section to specify the name of the variable, attribute, or method in which the hostname might appear. These options are not available for a Character profile.
Variable and Attribute Name to Search for Is: Use this section to specify the HTML attributes or JavaScript variables that you want searched for DNS names that might need to be rewritten. For the list of HTML attribute names that are automatically searched, see HTML Tags. You might want to add the following attributes:
value: This attribute enables the rewriter to search the <param> elements on the HTML page for value attributes and rewrite the value attributes that are URL strings.
If you need more granular control (some need to be rewritten but others do not) and you can modify the page, see Disabling with Page Modifications.
formvalue: This attribute enables the rewriter to search the <form> element on the HTML page for <input>, <button>, and <option> elements and rewrite the value attributes that are URL strings. For example, if your multi-homing path is /test and the form line is <input name="navUrl" type="hidden" value="/IDM/portal/cn/GuestContainerPage/656gwmail">, this line would be rewritten to the following value before sending the response to the client:
<input name="navUrl" type="hidden" value="/test/IDM/portal/cn/GuestContainerPage/656gwmail">
The formvalue attribute enables rewriting of all URLs in the <input>, <button>, and <option> elements in the form. For a granular control (some need to be rewritten but others do not) and you can modify the form page, see Disabling with Page Modifications.
Replacing URLs in Java Methods: The JavaScript Method to Search for Is list allows you to specify the Java methods to search to see if their parameters contain a URL string.
String Replacement: The Additional Strings to Replace list allows you to search for a string and replace it. The search boundary (word or character) that you specified when creating the profile is used when searching for the string.
Word profile search and replace actions take precedence over character profile actions.
For the rules and tokens that can be used in the search strings, see the following:
For information about how the Additional Strings to Replace list can be used to reduce the number of Java methods you need to list, see Using $path to Rewrite Paths in JavaScript Methods or Variables.
In a Word profile, a string matches all paths that start with the characters in the specified string. For example:
Search String |
Matches This String |
Does not’ Match This String |
---|---|---|
/path |
/path /pathother /path/other /path.html |
/mypath |
On Access Gateway Service, you can use the following special tokens to modify the default matching rules. Access Gateway Appliance does not support these tokens.
[w] to match one white space character
[ow] to match 0 or more white space characters
[ep] to match a path element in a URL path, excluding words that end in a period
[ew] to match a word element in a URL path, including words that end in a period
[oa] to match one or more alphanumeric characters
White Space Tokens: You use the [w] and the [ow] tokens to specify where white space might occur in the string. For example:
[ow]my[w]string[w]to[w]replace[ow]
If you don’t know, or don’t care, whether the string has zero or more white characters at the beginning and at the end, use [ow] to specify this. The [w] specifies exactly one white character.
Path Tokens: You use the [ep] and [ew] tokens to match path strings. The [ep] token can be used to match the following types of paths:
Search String |
Matches This String |
Does not’ Match This String |
---|---|---|
/path[ep] |
/path /home/path/other |
/path.html /home/pathother |
The [ew] token can be used to match the following types of paths:
Search String |
Matches This String |
Does not Match This String |
---|---|---|
/path[ew] |
/path.html /home/path |
/paths |
Name Tokens: You use the [oa] token to match function or parameter names that have a set string to start the name and end the name, but the middle part of the name is a computer-generated alphanumeric string. For example, the [oa] token can be used to match the following types of names:
Search String |
Matches This String |
Does not’ Match This String |
---|---|---|
javaFunction-[oa]( |
javaFunction-1234a56() javaFunction-a() |
javaFunction() |
When you configure multiple strings for replacement, the rewriter uses the following rules for determining how characters are replaced in strings:
String replacement is done as a single pass.
String replacement is not performed recursively. Suppose you have listed the following search and replacement strings:
DOG to be replaced with CAT A to be replaced with O
All occurrences of the string DOG are replaced with CAT, regardless of whether it is the word DOG or the word DOGMA. Only one replacement pass occurs. The rewritten CAT is not replaced with COT.
Because string replacement is done in one pass, the string that matches first takes precedence. Suppose you have listed the following search and replacement strings:
ABC to be replaced with XYZ BCDEF to be replaced with PQRSTUVWXYZ
If the original string is ABCDEFGH, the replaced string is XYZDEFGH.
If two specified search strings match the data portion, the search string of longer length is used for the replacement except for the case detailed above. Suppose you have listed the following search and replacement strings:
ABC to be replaced with XYZ ABCDEF to be replaced with PQRSTUVWXYZ
If the original string is ABCDEFGH, the replaced string is PQRSTUVWXYZGH.
You can use the $path token to rewrite paths on a path-based multi-homing service that has the Remove Path on Fill option enabled. This token is useful for web applications that require a dedicated web server and are therefore installed in the root directory of the web server. If you protect this type of application with Access Manager using a path-based multi-homing service, your clients access the application with a URL that contains a /path value. The proxy service uses the path to determine which web server a request is sent to, and the path must be removed from the URL before sending the request to the web server.
The application responds to the requests. If it uses JavaScript methods or variables to generate paths to resources, these paths are sent to client without prepending the path for the proxy service. When the client tries to access the resource specified by the web server path, the proxy service cannot locate the resource because the multi-homing path is missing. The figure below illustrates this flow with the rewriter adding the multi-homing path in the reply.
Figure 2-14 Rewriting with a Multi-homing Path
To ensure that all paths generated by JavaScript are rewritten, you must search the web pages of the application. You can then either list all the JavaScript methods and variables in the Additional Names to Search for URL Strings to Rewrite with Host Name section of the rewriter profile, or you can use the $path token in the Additional Strings to Replace section. The $path token reduces the number of JavaScript methods and variables that you otherwise need to list individually.
To use the $path token, you add a search string and a replace string that uses the token. For example, if the /prices/pricelist.html page is generated by JavaScript and the multi-homing path for the proxy service is /inner, you would specify the following strings:
Search String |
Replacement String |
---|---|
/prices |
$path/prices |
This configuration allows the following paths to be rewritten before the web server sends the information to the browser.
Web Server String |
Rewritten String for the Browser |
---|---|
/prices/pricelist.html |
/inner/prices/pricelist.html |
/prices |
/inner/prices |
This token can cause strings that should not be changed to be rewritten. If you enable Rewrite Inbound Query String Data, Rewrite Inbound Post Data, and Rewrite Inbound Header, the rewriter checks these strings and ensures that they contain the information the web server expects. For example, when these options are enabled, the following paths and domain names are rewritten when found in query strings, Post Data, Call-Back, Destination, If, Notification-Type, or Referer headers.
Browser String |
Rewritten String for the Web Server |
---|---|
/inner/prices/pricelist.html |
/prices/pricelist.html |
/inner/prices |
/prices |
example.com/inner/prices |
inner.com/prices |