Monitoring the Database
You can monitor the Database by using commands, or the out-of-the-box Health and Performance Monitoring dashboard included in the component.
- Understanding Database Watchdog
- Monitoring Database Status
- Monitoring Scheduler Status, Events, and Messages
- Using the Health and Performance Monitoring Dashboard
- Removing Rejected Events
Understanding Database Watchdog
Database includes a watchdog, which is configured as a cron job to automatically run once an hour to monitor the database and perform the following operations:
- When it detects a database cluster node is in down state, it will try to restart the node.
- Create the database event ingestion process (Kafka Scheduler) if it is missing.
- Start the database event ingestion process (Kafka Scheduler) if it is stopped.
- Unless there is a policy in place, do not use watchdog to delete reject events.
Monitoring Database Status
Monitor the database status by using the following command:
/opt/arcsight-db-tools/db_installer status
Monitoring Scheduler Status, Events, and Messages
Monitor the scheduler's status by using the following command:
/opt/arcsight-db-tools/kafka_scheduler status
Monitor scheduler events by using the following command:
/opt/arcsight-db-tools/kafka_scheduler events
Monitor scheduler messages by using the following command:
/opt/arcsight-db-tools/kafka_scheduler messages
Using the Health and Performance Monitoring Dashboard
You can also monitor the status of the database by using the out-of-the-box Health and Performance Monitoring dashboard included in the component. The dashboard includes the following widgets.
Database Event Ingestion Timeline
The Database Event Ingestion Timeline widget represents the rate of event ingestion into the database. This widget measures when the database receives the event data.
As a SOC Manager or an IT Administrator you want to monitor the event ingestion rate into the database. Due to differences in how quickly an event from different sources arrive at the database for storage, the moment when a database stores an event differs from when the event occurred. In this widget, you can monitor when the database receives the event data.
In the Database Event Ingestion widget, you can set the Upper and Medial Threshold values. Yellow represents the EPS values occurring in between the Medial and Upper Thresholds, and red represents the values occurring above the Upper Threshold. Green represents the EPS values occurring below the Medial Threshold.
Removing Rejected Events
For default tenants, use this procedure to ensure there are no rejected events. If /opt/vertica/data/fusiondb/v_fusiondb_node000*_data/RejectionTableData
is not empty, then reject event exists and you need to take action immediately.
A high volume of rejected events impacts the query performance and occupies disk space, which retention policy cannot reduce. The rejected events will continue to occur until the root cause is resolved.
- Analyze the content of reject_event_file to determine the root cause or save the reject_event_file for further analysis. Without resolving the root cause, the reject event creation will continue.
- Delete reject events after completed analysis by using the following command:
rm -rf /opt/vertica/data/fusiondb/v_fusiondb_node000*_data/RejectionTableData*