Stream Processor Groups

Transformation Hub implements three types of stream processors to process events: routing stream processors, transforming stream processors, and enrichment stream processors.

Routing Stream Processors

Routing stream processors process event data and send it to destinations, based on Transformation Hub routing rules specified in ArcSight Management Center. There are two types of routing stream processors:

As a general guideline for routing stream processors, stream processor configurations and routes are refreshed every 60 seconds. Consider this factor when adding, deleting, or editing routing rules using ArcMC.

Transforming Stream Processors

As of ArcSight SmartConnector 8.1, the SmartConnector is capable of sending events to Transformation Hub in the Avro event format from which they can be consumed by Avro formatted event consumers, such as ESM and Database. Earlier versions of the SmartConnector were not capable of this and, as such, would send CEF formatted events to Transformation Hub that then needed to be transformed to Avro format in order to be consumed by Avro formatted event consumers. The following default CEF to Avro or C2AV transforming stream processors work to transform CEF data in the CEF source topic and route it to the dedicated Avro destination topic for use by Avro consumers.

  1. The CEF-to-Avro stream processor transforms events from the th-cef topic to the th-arcsight-avro topic.
  2. The CEF-to-Avro ESM Filtered Stream Processor transforms events from the mf-event-cef-esmfiltered topic to the mf-event-avro-esmfiltered topic. For more information about filtering events for ESM, see Filtering Events for ESM.

Enrichment Stream Processors

Introduced in Transformation Hub 3.5.0, an enrichment stream processor processes events coming from the selected source topic (by default, th-arcsight-avro) by executing enrichment tasks , which include generating a Global ID. Events are then routed to the topic mf-event-avro-enriched.

If you are enabling enrichment stream processors, ensure that the Generator ID is enabled.

Use the CDF Management Portal to configure the following aspects of the enrichment stream processor.

Number of enrichment stream processor groups: By default, Transformation Hub has 1 enrichment stream processor group with 2 instances enabled.

Source topic: Choose one of the following source topics according to your deployment needs.

For more information on local and global ESM event enrichment, see below.

Global Event ID Enrichment:Transformation Hub ensures that all the events that passes through the Enrichment Stream processor have a global ID. If the event's global ID value is missing, then a new global ID is assigned to it.

Global Event ID generation enrichment is always enabled. You can also enable Event Integrity enrichment.

Event Integrity Enrichment (ArcSight Recon)

ArcSight Recon can check the integrity of event data, to provide assurance that event data sent by Connectors and other producers through the ingestion pipeline is not modified, and that events are not subsequently lost or deleted.

To achieve this objective, Transformation Hub provides event integrity enrichment that publishes summary events (such as M1 or agent:040 Connectors events), about messages that pass through the enrichment source topic. Each summary event will contain a calculated hash of data, a list of fields used to generate the hash, and list of the global event IDs of each message that is summarized. These three pieces of information will enable downstream consumers to verify that message data was not lost or modified.

It’s important to tune the number of partitions of the enrichment stream processor source topic before enabling Integrity Events Enrichment. If you change the number of partitions of the source topic after enabling it, you must browse to Kafka Manager’s Topics section and do the following:
1. Adjust and match the number of partitions of the Integrity events Enrichment changelog with the source topic number of partitions. The internal topic is named with the following format and pattern: com.arcsight.th.AVRO_ENRICHMENT_1-integrityMessageStore-changelog.
2. Restart the TH Web services pod by running the following command:
kubectl delete pod th-web-service-xxxxxxxxx-yyyyy -n arcsight-installer-yyyyy

Configuring Event Integrity Enrichment: You can configure event integrity while doing a fresh installation or during an upgrade. Set values of the following parameters accordingly:

Event integrity enrichment generates an internal topic named with the following format and pattern com.arcsight.th.AVRO_ENRICHMENT_1-integrityMessageStore-changelog. The setting “# of replicas assigned to each Kafka Topic” setting also applies to it.
If the flow of events is not consistent, and there are long intervals between the reception of events, the feature will check every hour (60 mins) for a summary event that hasn’t reached the verification event batch size. If it hasn’t been sent for more than 4 hours (240 mins), then it will be sent with the aggregated info of the previous number of events, regardless of whether it reached the verification event batch size.

For more information about verifying event data, see "Checking the Integrity of Event Data" in the User's Guide to ArcSight Recon.

Local and Global ESM Event Enrichment

ESM event enrichment can be configured locally or globally.

Local ESM Event Enrichment: With local ESM event enrichment (the default setting), ArcSight capabilities such as Recon and Intelligence can benefit from ESM Correlation. When local ESM event enrichment is configured:

Global ESM Event Enrichment: With global event enrichment, events enriched by ESM are shared with all other ArcSight capabilities, including Recon and Intelligence. When global ESM event enrichment is configured:

Configuring ESM Event Enrichment: To configure ESM event enrichment:

Describing Routing

Each stream processor includes six processing threads. All routes with the same source topic are processed by one routing stream processor group. You can scale a processor group independently as load increases by adding more routing processor instances to the group.

  • You configure routing in ArcMC.
  • Tuning Stream Processor Groups

    The performance of stream processors is critical to Transformation Hub performance. In general, you can follow these guidelines for tuning stream processors and drive better performance.

    Best Practices for Routing Stream Processors

    The following best practices apply to management of routing stream processors.