Stream Processor Groups
Transformation Hub implements three types of stream processors to process events: routing stream processors, transforming stream processors, and enrichment stream processors.
- Routing Stream Processors
- Transforming Stream Processors
- Enrichment Stream Processors
- Event Integrity Enrichment (ArcSight Recon)
- Local and Global ESM Event Enrichment
- Describing Routing
- Tuning Stream Processor Groups
- Best Practices for Routing Stream Processors
Routing Stream Processors
Routing stream processors process event data and send it to destinations, based on Transformation Hub routing rules specified in ArcSight Management Center. There are two types of routing stream processors:
- CEF-to-CEF routing stream processing is supported in Transformation Hub 3.4.0 and all previous versions.
- In Transformation Hub 3.4.0 and later versions, Avro-to-Avro routing stream processing occurs between two
event-avro
topics. To use an Avro topic, it should be of the typeevent-avro
. You can configure a topic with this type in two ways:- Create the topic with type
event-avro
using ArcMC 2.9.6 or later and Transformation Hub 3.4, or, - Change the type of an existing topic to
event-avro
using ArcMC 2.9.6 or later.
- Create the topic with type
Transforming Stream Processors
As of ArcSight SmartConnector 8.1, the SmartConnector is capable of sending events to Transformation Hub in the Avro event format from which they can be consumed by Avro formatted event consumers, such as ESM and Database. Earlier versions of the SmartConnector were not capable of this and, as such, would send CEF formatted events to Transformation Hub that then needed to be transformed to Avro format in order to be consumed by Avro formatted event consumers. The following default CEF to Avro or C2AV transforming stream processors work to transform CEF data in the CEF source topic and route it to the dedicated Avro destination topic for use by Avro consumers.
- The CEF-to-Avro stream processor transforms events from the th-cef topic to the th-arcsight-avro topic.
- The CEF-to-Avro ESM Filtered Stream Processor transforms events from the mf-event-cef-esmfiltered topic to the mf-event-avro-esmfiltered topic. For more information about filtering events for ESM, see Filtering Events for ESM.
Enrichment Stream Processors
Introduced in Transformation Hub 3.5.0, an enrichment stream processor processes events coming from the selected source topic (by default, th-arcsight-avro
) by executing enrichment tasks , which include generating a Global ID. Events are then routed to the topic mf-event-avro-enriched
.
Use the CDF Management Portal to configure the following aspects of the enrichment stream processor.
Number of enrichment stream processor groups: By default, Transformation Hub has 1 enrichment stream processor group with 2 instances enabled.
Source topic: Choose one of the following source topics according to your deployment needs.
th-arcsight-avro:
(default source topic) Use this topic for local ESM event enrichment when ESM is deployed.mf-event-avro-esmfiltered
: Use this topic for global ESM event enrichment when ESM is deployed.
For more information on local and global ESM event enrichment, see below.
Global Event ID Enrichment:Transformation Hub ensures that all the events that passes through the Enrichment Stream processor have a global ID. If the event's global ID value is missing, then a new global ID is assigned to it.
Event Integrity Enrichment (ArcSight Recon)
ArcSight Recon can check the integrity of event data, to provide assurance that event data sent by Connectors and other producers through the ingestion pipeline is not modified, and that events are not subsequently lost or deleted.
To achieve this objective, Transformation Hub provides event integrity enrichment that publishes summary events (such as M1 or agent:040 Connectors events), about messages that pass through the enrichment source topic. Each summary event will contain a calculated hash of data, a list of fields used to generate the hash, and list of the global event IDs of each message that is summarized. These three pieces of information will enable downstream consumers to verify that message data was not lost or modified.
1. Adjust and match the number of partitions of the Integrity events Enrichment changelog with the source topic number of partitions. The internal topic is named with the following format and pattern:
com.arcsight.th.AVRO_ENRICHMENT_1-integrityMessageStore-changelog.
2. Restart the TH Web services pod by running the following command:
kubectl delete pod th-web-service-xxxxxxxxx-yyyyy -n arcsight-installer-yyyyy
Configuring Event Integrity Enrichment: You can configure event integrity while doing a fresh installation or during an upgrade. Set values of the following parameters accordingly:
-
Generate verification events for parsed field integrity checks: (Default value: false) If true, a verification event is generated that accompanies a batch of events for checking the integrity of parsed fields in each event. Recon uses this verification event to check event integrity. If true, then specify a value for Verification event batch size as described below.
-
Verification event batch size (4-375): (Default value: 256) Specifies the number of events to be associated with a verification event. A lower value indicates fewer associated events need to be included in the batch for integrity checks; however, it will also result in higher resource consumption by generating more verification events.
For more information about verifying event data, see "Checking the Integrity of Event Data" in the User's Guide to ArcSight Recon.
Local and Global ESM Event Enrichment
ESM event enrichment can be configured locally or globally.
Local ESM Event Enrichment: With local ESM event enrichment (the default setting), ArcSight capabilities such as Recon and Intelligence can benefit from ESM Correlation. When local ESM event enrichment is configured:
- ESM reads the topic
mf-event-avro-esmfiltered
, enriches events found there, and stores them in ESM. - ESM can be configured to send Correlation events to the
th-arcsight-avro
topic. - Transformation Hub's Event Enrichment Stream Processor reads events from the
th-arcsight-avro
topic, enriches them, and sends them tomf-event-avro-enriched
for Recon and Intelligence to read.
Global ESM Event Enrichment: With global event enrichment, events enriched by ESM are shared with all other ArcSight capabilities, including Recon and Intelligence. When global ESM event enrichment is configured:
- ESM reads the topic
th-arcsight-avro
, enriches events found there, and stores them in ESM. - You must configure ESM to send all enriched events and Correlation events to the
mf-event-avro-esmfiltered
topic. - Transformation Hub's Event Enrichment Stream Processor reads events from
mf-avro-esmfiltered
, enriches them, and sends them tomf-event-avro-enriched
for Recon and Intelligence to read.
Configuring ESM Event Enrichment: To configure ESM event enrichment:
- For local ESM event enrichment, no configuration is needed by default.
- For global ESM event enrichment, in the CDF Management Portal, set the source topic for Enrichment Stream Processors to the
mf-event-avro-esmfiltered
topic.
Describing Routing
Each stream processor includes six processing threads. All routes with the same source topic are processed by one routing stream processor group. You can scale a processor group independently as load increases by adding more routing processor instances to the group.
- The number of routing stream processor groups should match the number of source topics they are processing.
- Each routing stream processor group can contain multiple routing stream processors.
- You can configure up to 10 routing stream processor groups on Transformation Hub in the CDF Management Portal, allowing Transformation Hub to support up to 10 source topics.
Tuning Stream Processor Groups
The performance of stream processors is critical to Transformation Hub performance. In general, you can follow these guidelines for tuning stream processors and drive better performance.
- Since all routes which use the same source topic share the same routing stream processor group, adding more source topics can speed up processing.
- Increase the number of source topic partitions to handle high EPS throughput, depending on the CPU and memory resources of each worker node. For example, when the partition number is increased to 60, up to 10 routing (or C2AV) process instances can be used. Each stream processor uses 6 threads by default.
- Where possible, limit the number of routing rules per route.
-
If stream processors display a
TimeoutException
in logs, consider overriding the application properties by slightly increasing the following settings, until the exceptions are no longer returned in logs:-
max.block.ms
(default is 60000 milliseconds) -
delivery.timeout.ms
(default is 120000 milliseconds)
-
Best Practices for Routing Stream Processors
The following best practices apply to management of routing stream processors.
- By default, Transformation Hub has 1 routing stream processor group. Accordingly, if you create 2 or more routes with different source topics, then make sure to enable more stream processor groups according to the number of source topics used in such routes (this applies to both type of routings: CEF-to-CEF or Avro-to-Avro).
- To enable and increase the number of instances of routing stream processor groups, in the CDF Management Portal, browse to the Reconfigure page. Identify the desired group number; and to enable it, just increase it from 0 to the desired value.
- To support high availability, routing stream processor groups can scale out and down partially. Once a group is enabled, you can increase or decrease the number of instances. However, it might never be reduced to 0, or the source topic mapped to that service group will no longer route until you increase the number of instances above 0.
- Always consider the available resources when enabling more routing stream processor groups.
-
C2AV and routing stream processing in Transformation Hub are Kafka Streams applications. By default, Kafka Streams are using at‐least‐once processing guarantees in the presence of failure. This means that if the stream processing application fails, no data records are lost or will fail to be processed, but some data records maybe re‐read and therefore reprocessed. Therefore, C2AV and routing stream processing is using an at‐least‐once processing guarantees configuration. In this case, when C2AV/Routing pods are killed abnormally and restarted, the user might see duplicated events.