Configuring Redundant Page Detection

Highly dynamic sites could create an infinite number of resources (pages) that are virtually identical. If allowed to pursue each resource, the sensor would never be able to finish the scan. The Perform redundant page detection option compares page structure to determine the level of similarity, allowing the sensor to identify and exclude processing of redundant resources.

Important! Redundant page detection works in the crawl portion of the scan. If the audit introduces a session that would be redundant, the session will not be excluded from the scan.

To configure redundant page detection:

  1. Select the Perform redundant page detection check box.

  2. Configure settings as described in the following table.

    Setting Description
    Page Similarity Threshold (%) Indicates how similar two pages must be to be considered redundant. Enter a percentage from 1 to 100, where 100 is an exact match. The default setting is 95 percent.
    Tag attributes to include

    Identifies the tag attributes to include in the page structure. Typically, tag attributes and their values are dropped when determining structure. Identifying tag attributes in this list adds those attributes and their values in the page structure. By default, id and class tag attributes are included.

    To add tag attributes:

    1. Type the attribute name in the Tag item box. Do not include tag brackets (< and >).

    2. Click ADD.

      The tag attribute is added to the Tag attributes to include list.

    Tip: Certain sites may be primarily composed of one type of tag, such as <div>. Including these attributes creates a more rigid page match. Excluding these attributes creates a less strict match.