Recovering and Restoring Elasticsearch Data

If you have deployed the Intelligence capability, follow the instructions provided in this section.

Elasticsearch Monitoring Action Fails During EKS Upgrade
- Deleting Unassigned Shards
- Restoring Elasticsearch Data
Elasticsearch Monitoring Action Succeeds During EKS Upgrade
If Analytics Run Fails

Elasticsearch Monitoring Action Fails During EKS Upgrade

If the Elasticsearch monitoring actions fails for the Intelligence pods (this could have happened either in step 9 of the Performing the EKS Upgrade procedure, or step 5 of the Performing the Nodes Update/Replacement procedure), you must delete the unassigned shards that might impede the recovery process:

Deleting Unassigned Shards
Restoring Elasticsearch Data

Deleting Unassigned Shards

Run the following command, being sure to replace the <password> value with yours:

kubectl exec -it -n $(kubectl get ns |awk '/arcsight/ {print $1}') elasticsearch-master-0 -c  elasticsearch -- curl -k -XGET https://elastic:<password>@localhost:9200/_cat/shards | grep UNASSIGNED | awk {'print $1'} | xargs -i curl -k -XDELETE 'https://elastic:<password>@localhost:9200/{}'

To monitor the delete progress, run the following command:

kubectl exec -n $(kubectl get ns |awk '/arcsight/ {print $1}') elasticsearch-master-0 -c  elasticsearch -it -- curl -k -XGET 'https://elastic:<password>@localhost:9200/_cat/health?v=true'

Example output:

epoch      timestamp cluster  status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1671118161 15:29:21  interset green           6         3   1128 583    0    0        0             0                  -                100.0%

In the example above, the value has reached a 100% and the status is green.

Restoring Elasticsearch Data

If the result of the command is a yellow status, and a value under 100%, apply the following procedure:
1. Login with system-admin role to the interset UI https://<CLUSTER FQDN>/interset
2. Click on the gear icon on the top right corner and select Search Manager.
3. Click on the Job History list box.
4. Select Submit a Job.
5. Click on the Job type list box and select Restore.
6. Enter 0 for the Customer to apply Snapshot to field.
7. Click the SUBMIT JOB button.
To verify the job status, complete the following steps:
1. On the Job history page, check the Snapshot job ID status.
2. Click the REFRESH button until the status becomes either COMPLETED_SUCCESS or COMPLETED_FAILED.
3. If the final status is COMPLETED_FAILED, execute the following commands to monitor the health:
```
kubectl exec -n $(kubectl get ns |awk '/arcsight/ {print $1}') elasticsearch-master-0 -c  elasticsearch -it curl -k -XGET 'https://elastic:<password>@localhost:9200/_cat/health?v=true'
```
  Example command and output:
```
curl -k -XGET 'https://elastic:changeme@localhost:9200/_cat/health?v=true'
```
```
epoch      timestamp cluster  status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1671118161 15:29:21  interset green           6         3   1128 583    0    0        0             0                  -                100.0%	
```
  In the example above, the value has reached a 100% and the status is green.
4. If the result of the command is a yellow status, and a value under 100%, you could wait 5 minutes and then repeat the command until the green status is achieved.

Scale up logstash using the following command:

kubectl -n $(kubectl get ns |awk '/arcsight/ {print $1}') scale statefulset interset-logstash --replicas=<replica count>

Now, run analytics on demand.

Elasticsearch Monitoring Succeeds During EKS Upgrade

If the Elasticsearch monitoring actions in step 9 of the Performing the EKS Upgrade procedure, or step 5 of the Performing the Nodes Update/Replacement procedure succeeds for the Intelligence pods, run analytics on demand.

If Analytics Run Fails

Follow the workaround, if you encounter any of the following issues:

Issue: If Analytics were to fail after the EKS upgrade, there are two things you can check to remedy it:

Workaround: Perform the following steps:

Review the logs for the following error:

Suppressed: org.elasticsearch.client.ResponseException: method [POST], 
host [https://elasticsearch-svc:9200], URI [/_aliases?master_timeout=30s&timeout=30s], 
status line [HTTP/1.1 503 Service Unavailable]{"error":{"root_cause":[{"type":"not_master_exception",
"reason":"no longer master. source: [index-aliases]"}],"type":"master_not_discovered_exception",
"reason":"NotMasterException[no longer master. source: [index-aliases]]",
"caused_by":{"type":"not_master_exception","reason":"no longer master. source: [index-aliases]"}},"status":503}

If this error is found, execute the following commands:

kubectl -n $(kubectl get ns |awk '/arcsight/ {print $1}') scale statefulset elasticsearch-master --replicas=0

kubectl -n $(kubectl get ns |awk '/arcsight/ {print $1}') scale statefulset elasticsearch-data --replicas=0

kubectl -n $(kubectl get ns |awk '/arcsight/ {print $1}') scale statefulset elasticsearch-master --replicas={replica_count}

kubectl -n $(kubectl get ns |awk '/arcsight/ {print $1}') scale statefulset elasticsearch-data --replicas={replica_count}

Issue: If Analytics fails after the EKS upgrade because the HDFS namenode has entered safe mode.

Workaround: Perform the following steps:

Execute the following command to restart HDFS pods:

kubectl delete pods -n $(kubectl get ns |awk '/arcsight/ {print $1}')  $(kubectl get pods -n $(kubectl get ns |awk '/arcsight/ {print $1}') -o wide |  grep "hdfs-" | cut -d ' ' -f1)

Execute the following command:

kubectl exec -n $(kubectl get ns |awk '/arcsight/ {print $1}') hdfs-namenode-0 -c  hdfs-namenode -it bash

hdfs dfsadmin -safemode leave

Run Analytics on demand.