Monitoring the Database

You can monitor the Database by using commands, or the out-of-the-box Health and Performance Monitoring dashboard included in the component.

Understanding Database Watchdog
Monitoring Database Status
Monitoring Scheduler Status, Events, and Messages
Using the Health and Performance Monitoring Dashboard
Removing Rejected Events

Understanding Database Watchdog

Database includes a watchdog, which is configured as a cron job to automatically run once an hour to monitor the database and perform the following operations:

When it detects a database cluster node is in down state, it will try to restart the node.
Create the database event ingestion process (Kafka Scheduler) if it is missing.
Start the database event ingestion process (Kafka Scheduler) if it is stopped.
Unless there is a policy in place, do not use watchdog to delete reject events.

Monitoring Database Status

Monitor the database status by using the following command:

/opt/arcsight-db-tools/db_installer status

Monitoring Scheduler Status, Events, and Messages

Monitor the scheduler's status by using the following command:

/opt/arcsight-db-tools/kafka_scheduler status

Monitor scheduler events by using the following command:

/opt/arcsight-db-tools/kafka_scheduler events

Monitor scheduler messages by using the following command:

/opt/arcsight-db-tools/kafka_scheduler messages

Using the Health and Performance Monitoring Dashboard

You can also monitor the status of the database by using the out-of-the-box Health and Performance Monitoring dashboard included in the component. The dashboard includes the following widgets.

Database Event Ingestion Timeline

The Database Event Ingestion Timeline widget represents the rate of event ingestion into the database. This widget measures when the database receives the event data.

As a SOC Manager or an IT Administrator you want to monitor the event ingestion rate into the database. Due to differences in how quickly an event from different sources arrive at the database for storage, the moment when a database stores an event differs from when the event occurred. In this widget, you can monitor when the database receives the event data.

In the Database Event Ingestion widget, you can set the Upper and Medial Threshold values. Yellow represents the EPS values occurring in between the Medial and Upper Thresholds, and red represents the values occurring above the Upper Threshold. Green represents the EPS values occurring below the Medial Threshold.

Removing Rejected Events

For default tenants, use this procedure to ensure there are no rejected events. If /opt/vertica/data/fusiondb/v_fusiondb_node000*_data/RejectionTableData is not empty, then reject event exists and you need to take action immediately.

A high volume of rejected events impacts the query performance and occupies disk space, which retention policy cannot reduce. The rejected events will continue to occur until the root cause is resolved.

If you are using watchdog to delete rejected events, be sure a policy is in place, such as if the reject events utilization is > 1% of the storage then delete the reject events.

Analyze the content of reject_event_file to determine the root cause or save the reject_event_file for further analysis. Without resolving the root cause, the reject event creation will continue.
Delete reject events after completed analysis by using the following command:

rm -rf /opt/vertica/data/fusiondb/v_fusiondb_node000*_data/RejectionTableData*