The success of your high-availability cluster solution depends on its stability and robustness. Use the guidelines in this section to design your OES Cluster Services cluster and cluster environment.
IMPORTANT:For information about the system requirements for installing and using OES Cluster Services, see Section 8.0, Clustering and High Availability.
The purpose of designing a resilient cluster is to ensure that your essential data and services are highly available. Setting up data and services as cluster resources allows them to be moved between nodes in the same cluster. This helps eliminate or minimize the downtime caused by a server failure or maintenance.
You can determine what data and services to set up as cluster resources by asking and answering the following questions:
What are the key services that drive your business?
What services are essential for business continuance?
What is the cost of downtime for the essential services?
Based on their mission-critical nature and cost of downtime, what services are the highest priority for business continuance?
What data is necessary to support the highest-priority services?
How much data is involved, and how important is it?
Using the following cluster best practices can help you avoid potential problems with your cluster:
Ensure that eDirectory is stable before implementing a cluster.
Ensure that you have full Read/Write replicas of the entire eDirectory tree co-located in the data center where you are setting up the cluster.
Ensure that IP addresses are unique.
Consistently apply IP address assignments for each cluster and its cluster resources.
Make IP address changes for the cluster and cluster resources only by using the procedure described in Moving a Cluster or Changing IP Addresses of Cluster Nodes and Resources in the OES 23.4: OES Cluster Services for Linux Administration Guide
IP address changes for cluster resources should always be made on the Protocols page of the iManager Clusters plug-in, not directly in load, unload, and monitor scripts. This is the only way to change the IP address on the virtual NCS:NCP Server object in eDirectory.
Ensure that Volume IDs used for a cluster resources are unique across all nodes.
Each cluster node automatically assigns volume ID 0 to volume SYS and volume ID 1 to volume _ADMIN. Cluster-enabled volumes use high volume IDs, starting from 254 in descending order. Volume IDs can be assigned in the cluster load script. You can view the volume IDs assigned on a node by using the ncpcon volumes command.
The Client for Open Enterprise Server uses the volume ID to access a volume.
Consider each node’s configuration requirements for each of the services it is intended to host.
Create failover matrixes for each cluster resource so that you know what service is supported and which nodes are the preferred nodes for failover.
The primary objective of LAN connectivity in a cluster is to provide uninterrupted heartbeat communications. Use the guidelines in this section to design the LAN connectivity for the cluster:
Use a dedicated VLAN (virtual local area network) for each cluster.
The cluster protocol is non-routable, so you cannot direct communications to specific IP addresses. Using a VLAN for the cluster nodes provides a protected environment for the heartbeat process and ensures that heartbeat packets are exchanged only between the nodes of a given cluster.
When you use a VLAN, no foreign host can interfere with the heartbeat. For example, using a VLAN avoids broadcast storms that slow traffic and can result in false split-brain situations.
Servers should be redundantly cabled to the network in order to provide LAN fault tolerance, preferably at both the adapter level and the link level. Consider connecting cluster nodes to redundant access switches for fault tolerance.
Use channel bonding for the server adapters. Channel bonding combines Ethernet interfaces on a host computer for redundancy or increased throughput. Higher-level software uses a single virtual-network interface, and the channel bonding driver handles the complex choice of which physical-network interface to use. Channel bonding helps increase the availability of an individual cluster node, which helps avoid or reduce the occurrences of failover caused by slow LAN traffic. See the /usr/src/linux/Documentation/bonding.txt document.
Use the Spanning Tree Protocol (STP) to eliminate network topology loops. When you configure STP, ensure that the Portfast Bridge Protocol Data Unit (BPDU) guard feature is enabled, or consider using Rapid Spanning Tree Protocol (RSTP, IEEE 802.11w).
The default settings for STP inhibit the heartbeat for over 30 seconds whenever there is a change in link status. Test your STP configuration with Cluster Services running to ensure that a node is not cast out of the cluster when a broken link is restored.
Plan your IP address assignment so that it is consistently applied across each cluster. For each cluster, provide a dedicated IP address range with sufficient addresses for the cluster. The addresses do not need to be contiguous.
You need a unique static IP address for each of the following components of a cluster:
Cluster (master IP address)
Cluster nodes
Cluster resources (file system resources and service resources such as DHCP, DNS, SLP, FTP, and so on)
Ensure that SLP is properly configured for name resolution. See Section 10.5, SLP.
The primary objective of the shared storage connectivity in a cluster is to provide solid and stable connectivity between cluster nodes and the storage system. Before installing OES Cluster Services and setting up a cluster, ensure that the storage configuration is established and verified.
Use the guidelines in this section to design the storage connectivity for a cluster:
Use host-based multipath I/O management. See the following resources:
Connect each node via two fabrics to the storage area network (SAN).
Use redundant SAN connections to provide fault-tolerant connectivity between the cluster nodes and the shared storage devices.
Use LUN masking to exclusively assign each LUN to one or more host connections. See Section 8.2.9, SAN Rules for LUN Masking.
Use the guidelines in this section to design the shared storage solution for a cluster:
For maximum flexibility, you should create only one cluster resource per LUN.
A LUN cannot be concurrently accessed by servers belonging to different clusters. This means that all resources on a given LUN can be active only in a given cluster at any given time.
You should use only one LUN per pool, and only one volume per pool. If you use multiple LUNs for a given shared NSS pool, all LUNs must fail over together.
It is possible to create multiple pools per LUN or to use multiple LUNs per pool, but these alternatives are not recommended.
Your NetIQ eDirectory solution for each cluster must consider the following configuration elements. Your approach should be consistent across all clusters.
Cluster nodes and Cluster objects can exist in any container in the eDirectory tree. The Virtual Server object for the cluster and the objects for cluster resources are automatically created in the eDirectory context of the server where the cluster resource is created and cluster-enabled.
IMPORTANT:You should create cluster resources on the master node of the cluster.
Before you create a new cluster, use iManager to create an OU container for the cluster, and use the OU container for the Cluster objects and Server objects.
Figure 8-1 Example: Cluster1 Container and Its Objects
Partition the Cluster OU, replicate it to dedicated eDirectory servers that are holding a replica of the parent partition, and replicate it to all cluster nodes. This helps prevent resources from being stuck in an NDS Sync state when a cluster resource’s configuration is modified.
If you do not want to put a replica of eDirectory on the node, you must configure one or multiple LDAP servers for the node to use. The LDAP servers must have a master replica or a Read/Write replica of eDirectory. For information about how to modify the LDAP server list that is used by a cluster, see Changing the Administrator Credentials or LDAP Server IP Addresses for a Cluster in the OES 23.4: OES Cluster Services for Linux Administration Guide
OES Cluster Services supports using cluster resources for the following file systems and storage solutions:
File System or Storage Solution |
See |
---|---|
Storage Services (NSS) pools |
Configuring and Managing Cluster Resources for Shared NSS Pools and Volumes |
Linux LVM volume groups and logical volumes |
Configuring and Managing Cluster Resources for Shared LVM Volume Groups |
Linux POSIX volumes |
|
NCP (NetWare Control Protocol) volumes (NCP shares on cluster-enabled Linux POSIX volumes) |
|
Dynamic Storage Technology (DST) volumes (NSS volumes configured in a shadow volume pair) |
|
OES Cluster Services supports using cluster resources for the following OES services: