Author: Madhan Gopalakrishnan | Published on : 11-02-2025

Corosync is an open-source cluster engine that provides high availability (HA) and fault tolerance for distributed applications. It is primarily used in Linux clustering to maintain system reliability and minimize downtime. Corosync ensures node-to-node communication, cluster membership management, and quorum handling.
🔎 Prerequisites
Before setting up Corosync, ensure the following:
✅ Minimum of two nodes with Linux installed (Ubuntu, CentOS, or Debian recommended)
✅ Root or sudo access to all cluster nodes
✅ Static IP addresses assigned to each node
✅ Proper hostname configuration with /etc/hosts
✅ Time synchronization using NTP or Chrony
✅ Firewall rules configured to allow Corosync communication
✅ SELinux and AppArmor disabled or properly configured
✅ Pacemaker installed for advanced cluster management
🌐 Overview of Corosync & Pacemaker
Corosync is designed to handle cluster communication, membership, and quorum management. However, it does not manage resources. This is where Pacemaker comes in. Pacemaker is a cluster resource manager that works alongside Corosync to monitor and manage cluster services efficiently.
✨ Key Features of Corosync & Pacemaker
✅ Corosync provides cluster messaging and membership
✅ Pacemaker manages and monitors cluster resources
✅ Automatic failover and recovery of failed nodes
✅ Supports complex clustering policies
✅ Ensures service continuity with minimal downtime
📊 Corosync & Pacemaker Architecture
Corosync and Pacemaker follow a modular architecture comprising several key components:
1. Corosync Totem Protocol
🔹 A reliable multicast messaging system ensuring all nodes receive the same data.
🔹 Supports redundant rings to enhance cluster resilience.
🔹 Uses UDP or RDMA transport for fast and efficient communication.
2. Quorum System
🔹 Ensures the cluster operates correctly by defining the number of nodes required for decision-making.
🔹 Avoids split-brain scenarios, where nodes make conflicting decisions.
🔹 Configured via corosync.conf. Example:
quorum {
provider: corosync_votequorum
two_node: 1
}
3. Cluster Membership Service
🔹 Detects node failures and updates cluster status dynamically.
🔹 Sends notifications when nodes join or leave the cluster.
🔹 View cluster status using:
pcs status
4. Pacemaker Resource Management
🔹 Manages services running within the cluster.
🔹 Automatically fails over services in case of node failure.
🔹 Provides advanced fencing mechanisms to prevent split-brain scenarios.
🔹 Add a resource to Pacemaker:
pcs resource create WebService ocf:heartbeat:apache
📁 Cluster Information Base (CIB) in PCS Cluster
The Cluster Information Base (CIB) is the central configuration and state repository in a Pacemaker-managed cluster. It stores all cluster-related information, including nodes, resources, constraints, and policies.
🔹 Key Features of CIB:
✅ Maintains the entire cluster configuration in XML format
✅ Holds the current status of cluster resources
✅ Used by Pacemaker to make cluster decisions
✅ Can be manually edited or updated using pcs or crm commands
🔍 Viewing CIB Configuration
To check the CIB configuration, use:
pcs cluster cib
For a detailed XML output:
pcs cluster cib | xmllint --format -
✏️ Modifying CIB Configuration
To edit CIB settings directly:
pcs cluster cib-push <modified_cib.xml>
💾 Backup & Restore CIB
Backup:
pcs cluster cib > backup_cib.xml
Restore:
pcs cluster cib-push backup_cib.xml
🛠️ Installing Corosync & Pacemaker on Linux
Step 1: Update System
sudo apt update && sudo apt upgrade -y # For Debian/Ubuntu
sudo yum update -y # For RHEL/CentOS
Step 2: Install Corosync & Pacemaker
sudo apt install corosync pacemaker -y # Debian/Ubuntu
sudo yum install corosync pacemaker -y # RHEL/CentOS
Step 3: Configure Corosync
Corosync configuration is done using /etc/corosync/corosync.conf. Here’s a basic configuration file:
# Example Corosync configuration
compatibility: whitetank
totem {
version: 2
secauth: off
cluster_name: my_cluster
transport: udpu
}
quorum {
provider: corosync_votequorum
two_node: 1
}
nodelist {
node {
ring0_addr: 192.168.1.1
nodeid: 1
}
node {
ring0_addr: 192.168.1.2
nodeid: 2
}
}
Step 4: Start and Enable Corosync & Pacemaker
sudo systemctl start corosync pacemaker
sudo systemctl enable corosync pacemaker
🔧 Troubleshooting & Log Analysis
Common Issues & Fixes
🔹 Corosync service not starting:
journalctl -xe | grep corosync
sudo systemctl restart corosync
🔹 Node not joining the cluster:
pcs cluster node add <node_name>
🔹 Resource not starting:
pcs resource debug-start <resource_name>
🔹 Check cluster health:
pcs status
Important Log Files
📌 Corosync logs: /var/log/cluster/corosync.log 📌 Pacemaker logs: /var/log/pacemaker.log 📌 System logs: /var/log/messages or /var/log/syslog 📌 Journal logs: journalctl -xe
Checking Logs for Errors
tail -f /var/log/cluster/corosync.log
journalctl -u corosync --no-pager --lines=50
🏆 Conclusion
Corosync and Pacemaker together form a powerful high-availability cluster solution for Linux. This guide covered installation, setup, maintenance, troubleshooting, and log analysis to ensure your HA cluster operates smoothly. 🚀








