Blog: How Tos

Keepalived for high availability and load balancing

Pedro Venda 13 Nov 2018

In a nutshell

Keepalived implements VRRP (Virtual Router Redundancy Protocol) on a Linux system as well as managing Linux Virtual Server configuration. Keepalived can implement High Availability (active/passive) and load balancing (active/active) setups that can be made responsive to several customisable factors.

This page serves as a memory aid describing two relatively basic setups for keepalived.

Dedicated load balancer vs on-server setup

Amongst the myriad of options offered by keepalived, there is the option to implement dedicated load balancing hosts as well as having the actual servers providing the service being the load balancers themselves.

Simple VRRP High Availability setup

In this setup, a VRRP managed virtual IP is setup which points at one of the servers in the respective VRRP instance. Keepalived is configured in VRRP high availability mode (1 active/n-1 passive) running on the same servers that provide the service. All servers on the VRRP instance run the listening service and keepalived; in this case requests are not forwarded over the network towards remote servers. All requests are served only by the active server. This is an IP based implementation of high availability (layer2/layer3), therefore aspects of synchronisation or porting sessions across the pool of servers, if applicable, need to be handled separately.

This setup is suitable for stateless services that respond to a single transaction before closing the connection, such as DNS or SMTP in a high availability (active/passive) setup. The diagram below shows the two notes in the VRRP instance (SRV_A and SRV_B) and the inbound Virtual IP assigned to SRV_A.

                           /---> SRV_A [active]
                          /      |
                         /       |
Service request ----> V_IP     (VRRP)
                         .       |
                          .      |
                           . . > SRV_B [passive]

If a fail event is signalled by one of the servers, it leaves the VRRP instance which could cause the Virtual IP to point at a different host, if the failed host was the active server.

Once setup, keepalived will initiate the VRRP protocol between all nodes in the same instance, assign the Virtual IP to the active node and monitor the instance using the command provided. Requests done to the Virtual IP end up on the active node.

Sample configuration – Server 1

Normally, four aspects need to be configured on a VRRP HA server: keepalived, iptables, sysctl and the service itself (rsyslog in this case).

Keepalived

Having keepalived installed, the following configuration should be applied to server #1:

vrrp_instance VI1SL {
        state MASTER   # (optional) initial state for this server
        interface eth1 # interface where VRRP traffic will exist
        advert_int 5   # interval between sync
        virtual_router_id 71 # unique identifier per VRRP instance (same across all servers on the instance)
        priority 100   # server priority - higher number == higher priority

        # authentication for VRRP messages
        authentication {
                #auth_type PASS # simple authentication (plain)
                auth_type AH    # good authentication
                auth_pass marvin # password
        }
        virtual_ipaddress {
                192.168.111.35/24 dev eth0 # Virtual IP address and interface assignment
        }
        track_script {
                check_rsyslog # tracking script
        }
}
vrrp_script check_rsyslog {
        script "/usr/local/sbin/checkrsyslog.sh"
        interval 5 # 5s per check
        fall 2 # 2 fails - 10s
        rise 2 # 2 OKs - 10s
        #timeout 15 # wait up to 15 seconds for script before assuming fail
        #weight 50 # Reduce priority by 10 on fall
}

The configuration contains several important aspects:

  • The node is setup with an initial state of MASTER, meaning it will be the active server, unless the remaining nodes in the VRRP instance agree otherwise;
  • VRRP traffic will occur on interface eth1, whilst the Virtual IP will exist on interface eth0. eth1 could be made a dedicated network interface or could be an existing network. Or even the same network as that of the Virtual IP;
  • virtual_router_id and authentication must match across all nodes in the VRRP instance;
  • track_script, defined under ‘vrrp_script check_rsyslog’ points to a custom script that checks whether the HA service is live. If the script returns any value other than 0, the HA service is deemed to have failed and the node will voluntarily remove itself from the pool of eligible hosts that can be active. If it was the active server, then another node will become active instead;
  • It is keepalived itself that configures the host’s Virtual IP interface (eth0) with the IP address on the active node;

iptables

iptables needs to be configured in a way that the server accepts incoming traffic on the VRRP interface (eth1 in the sample configuration) from the other VRRP instance nodes under protocols VRRP or AH respectively for auth_type PASS or AH:

-A <CHAIN> -s <OTHER_VRRP_INSTANCE_NODES> -p vrrp -j ACCEPT
-A <CHAIN> -s <OTHER_VRRP_INSTANCE_NODES> -p ah -j ACCEPT

sysctl

The host’s kernel needs to be configured to allow a process to bind to a non-local IP address, seeing as non-active VRRP nodes will not have the Virtual IP configured in any interfaces. This is normally mediated by sysctl and takes action at boot time:

# in /etc/sysctl.conf or similar
net.ipv4.ip_nonlocal_bind=1

Finally, the service needs to be setup to listen on the Virtual IP interface, in this case rsyslog was configured to bind to all interfaces.

Naturally, the keepalived service needs to be started before any of this can work. It is normally configured to produce log entries on syslog, e.g.:

(...)
Starting Keepalived v1.2.13 (05/28,2014)
Starting Healthcheck child process, pid=17313
Starting VRRP child process, pid=17314
Registering Kernel netlink reflector
Registering Kernel netlink command channel
Registering gratuitous ARP shared channel
Opening file '/etc/keepalived/keepalived.conf'.
Truncating auth_pass to 8 characters
Truncating auth_pass to 8 characters
Configuration is using : 76937 Bytes
Using LinkWatch kernel netlink reflector...
VRRP_Instance(VI1SL) Entering BACKUP STATE
(...)

Sample configuration – Server 2

Configuring another server in the VRRP instance requires almost exactly the same settings than the initial server, with a few slight (but important) differences in the keepalived service.

vrrp_instance VI1SL {
        state BACKUP
        interface eth1
        advert_int 5
        virtual_router_id 71
        priority 90   # server priority - higher number == higher priority

        # authentication for VRRP messages
        authentication {
                auth_type AH
                auth_pass marvin
        }
        virtual_ipaddress {
                192.168.111.35/24 dev eth0 # Virtual IP address and interface assignment
        }
        track_script {
                check_rsyslog # tracking script
        }
}
vrrp_script check_rsyslog {
        script "/usr/local/sbin/checkrsyslog.sh"
        interval 5 # 5s per check
        fall 2 # 2 fails - 10s
        rise 2 # 2 OKs - 10s
        #timeout 15 # wait up to 15 seconds for script before assuming fail
        #weight 50 # Reduce priority by 10 on fall
}

Notice how in the above configuration, only the settings ‘state’ and ‘priority’ were changed from server #1’s keepalived configuration. It’s important that the Virtual IP, password and virtual_router_id are kept the same to ensure that the two servers participate in the same VRRP instance.

All other configurations (iptables, sysctl and service) apply in the same way as in server #1.

Dedicated load balancing setup

This is a more complex but more interesting setup. Keepalived is used to implement a dedicated active/passive load balancer across two servers, which forward traffic to a pool of two real servers.

In this setup, a VRRP instance is setup across a pair of dedicated load balancers in an active/passive configuration. Each load balancer is configured to forward traffic to one or more real servers using a load balancing algorithm (such as Round Robin). The active load balancer accepts incoming connections on the Virtual IP and forward traffic to real servers. All real servers respond to requests in this scenario. Only one of the load balancers handles client traffic.

Load balancers LB_1 and LB_2; Real servers RS_A and RS_B; Virtual IP V_IP;

     /---> LB_1 ------> RS_A
    /      |   \   ______/
   /       |    \ / 
V_IP    (VRRP)   X
   .       |    / \---> RS_B
    .      |   /        /
     . . > LB_2 -------/

Traffic is routed from clients to the active load balancer and then to real servers. Responses come directly from real servers to clients. Keepalived can be configured to ensure session persistence between clients and real server for a time period of while the real server is available, however this information is not shared across all load balancers.

Traffic flow of a sample connection load balanced into Real Server A in Direct Routing mode.

                       /--> LB_1 [Active] -->\
                      /                      RS_A
Client ---> (V_IP) --/                         \
    \                                          /
     \<---------------------------------------/
                 (Source: V_IP)

Under normal conditions, access is load balanced across the pool of real servers. If any of the real servers fails, it is removed from the pool of real servers on both all load balancers. If one of the load balancers fails, VRRP swaps the Virtual IP onto the other load balancer ensuring continued access to the real servers.

This model is suitable for simple or more complex transactions, conducted over UDP or TCP, and enables a high level of redundancy and load availability. Any combination of one load balancer and real server can fail simultaneously whilst keeping the service liva.

Sample configuration – Load Balancer #1

Seeing as this server will act as a member node of a VRRP instance, the keepalived and sysctl configuration will be very similar to the sample above. Then the iptables rule set needs to be modified accordingly.

Keepalived

The VRRP instance is configured on keepalived as follows:

# VRRP instance
vrrp_instance VI4SM {
        state MASTER   # (optional) initial state for this server
        interface eth1 # interface where VRRP traffic will exist
        advert_int 5   # interval between sync
        virtual_router_id 74 # unique identifier per VRRP instance (same across all servers on the instance)
        priority 100   # server priority - higher number == higher priority

        # authentication for VRRP messages
        authentication {
                #auth_type PASS # simple authentication (plain)
                auth_type AH    # good authentication
                auth_pass zaphod # password
        }
        virtual_ipaddress {
                192.168.133.38/24 dev eth1 # Virtual IP address and interface assignment
        }
    # notice that track_script does not apply to this setup, at least not in the same way
}

In addition to the VRRP instance, keepalived will be configured with references to two real servers against the set Virtual IP. This is the part of the configuration that enables load balancing.

# Virtual server configured to listen on the VRRP Virtual IP
# This section instructs keepalive to configure the kernel's Virtual Server subsystem
virtual_server 192.168.133.38 25 {
        delay_loop 60 # time between checks, in seconds
        lb_algo rr    # load balancing algorithm
        lb_kind DR    # load balancing type: Direct Routing in this case
        protocol TCP
        persistence_timeout 60 # client to real-server mappings will be maintained for at least 60 seconds after a connection is terminated

        # Real Server A
        real_server 192.168.133.8 25 {
                weight 100 # used to influence the load balancing algorithm
                # real server check mechanism (this can be changed to other types of checking including custom scripts)
                # in this case, if a TCP connection cannot be established to the real server ip:port, then it is
                # removed from the pool
                TCP_CHECK {
                        connect_timeout 6
                }
        }

        # Real Server B
        real_server 192.168.133.12 25 {
                weight 100
                TCP_CHECK {
                        connect_timeout 6
                }
        }
}

Once this is done, the tool ‘ipvsadm’ can be used to inspect (and manage) the virtual server table of the kernel. ‘ipvsadm’ is incredibly powerful and can be used independently. It is in this case configured by keepalived, saving us the hassle.

# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn  
TCP  192.168.133.38:smtp rr persistent 60
  -> 192.168.133.8:smtp   Route   100    0          0

Notice that the 2nd real server (192.168.133.12) is not listed because at the time of this configuration it did not have an SMTP service open, therefore it was not in the pool of real servers used for load balancing. Otherwise the IP Virtual Server subsystem is configured to load balance in round-robin traffic onto 192.168.133.38:25 towards 192.168.133.8:25 with a weight of 100 – exactly as configured on keepalived.

sysctl

The host’s kernel needs to be configured to allow a process to bind to a non-local IP address, seeing as non-active VRRP nodes will not have the Virtual IP configured in any interfaces.

This is normally mediated by sysctl and takes action at boot time:

# in /etc/sysctl.conf or similar
 net.ipv4.ip_nonlocal_bind=1

iptables

Keepalived will now use VRRP to elect an active node, which in turn will listen on the virtual_server IP and port, in this case the VRRP virtual IP and TCP port 25. Therefore, iptables needs to be configured to enable incoming traffic onto Virtual IP on port 25.

# in this example <VIRTUAL_SERVER_PORT> is 25
-A <CHAIN> -p tcp --dport <VIRTUAL_SERVER_PORT> -m state --state NEW -j ACCEPT

There is a nuance to the iptables configuration, which applies to when the load balancers themselves need to access the service being load balanced. The kernel’s netfilter conntrack module will flag this traffic invalid since the host connects to ‘Virtual IP’ (configured on a local interface) but receives a response from the network. Therefore it may be necessary to add a rule to accept traffic coming from the Virtual IP with the virtual_server’s source port. This is required if there is a rule blocking ‘INVALID’ traffic on iptables.

# in this example <VIRTUAL_SERVER_PORT> is 25 and <VIRTUAL_IP> is 192.168.133.38
-A <CHAIN> -p tcp --sport <VIRTUAL_SERVER_PORT> -s <VIRTUAL_IP> -j ACCEPT

Sample configuration – Real Server A

When using Direct Routing as the load balancing kind, the real server should receive incoming requests aimed at the virtual IP and respond directly to the clients. This has some implications on the host’s configuration, however using this method allows the real server to be used individually as before, as if the load balancing setup didn’t exist. Notice that Real Server A and B do not run an instance of Keepalived.

Dummy interface

One of the difficulties with this kind of setup is that it breaks several basic TCP/IP assumptions at multiple layers, in particular due to the VRRP setup relying on ARP to direct traffic. This works fine on an basic active/passive scenario, but when both the load balancers and the real servers need to respond to the Virtual IP, it is required that any non-VRRP nodes do not respond to ARP. There are several ways to do this, but (in my view) the most practical is to use the kernel’s ‘dummy’ module to raise fake network interfaces where the Virtual IP is assigned. ‘dummy’ is exactly suited for this purpose.

On a Debian system, this is achieved by loading the dummy module, setting the module options (optional if there’s only one Virtual IP involved on the real servers) and setting the interface configuration with the Virtual IP.

# /etc/modules
dummy

 

# /etc/modprobe.d/local.conf (or similar)
options dummy numdummies=2

Or on the command line:

modprobe dummy numdummies=2

Virtual IP configuration

Once the host has the module loaded, the interfaces can be configured manually or using the system’s network management system. On Debian 8 (Jessie) this is done on /etc/network/interfaces:

auto dummy0
iface dummy0 inet static
    address 192.168.133.38
    netmask 255.255.255.255

Notice how the netmask is set to /32 to avoid routing traffic onto the dummy interface. After a reboot or on the command line, the interface can be brought up:

# ifup dummy0
# ip addr show dev dummy0
6: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether d6:b7:03:16:e7:16 brd ff:ff:ff:ff:ff:ff
    inet 192.168.133.38/32 brd 192.168.133.38 scope global dummy1
       valid_lft forever preferred_lft forever

sysctl

The host’s kernel may need to be configured to allow a process to bind to a non-local IP address, seeing as non-active VRRP nodes will not have the Virtual IP configured in any interfaces. Also in order to solve the ‘ARP flux’ problem, the server’s interfaces need to be setup to ignore ARP requests and avoid announcing ARP related to their virtual IPs assigned to dummy interfaces (http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.arp_problem.html; http://kb.linuxvirtualserver.org/wiki/Using_arp_announce/arp_ignore_to_disable_ARP).

This is normally mediated by sysctl and takes action at boot time:

# in /etc/sysctl.conf or similar
 net.ipv4.ip_nonlocal_bind=1

# solution to ARP flux problem on both interfaces
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2
net.ipv4.conf.eth1.arp_ignore = 1
net.ipv4.conf.eth1.arp_announce = 2

Sample configuration – Real Server B

The second Real Server is configured identically to the first.  The only differences are where <REAL_SERVER_IP> is referenced on the iptables rule set, which in this case should be set to .12 instead of .8.

Sample configuration – Load Balancer #2

The second load balancer is set as a member node of the same VRRP instance and with the same virtual_server configuration. The sysctl and iptables configurations are identical.

Keepalived

# VRRP instance
vrrp_instance VI4SM {
        state BACKUP
        interface eth1
        advert_int 5
        virtual_router_id 74
        priority 90

        # authentication for VRRP messages
        authentication {
                #auth_type PASS
                auth_type AH
                auth_pass zaphod
        }
        virtual_ipaddress {
                192.168.133.38/24 dev eth1
        }
}

# Virtual Server setup is identical to Load Balancer #1, although there is no requirement for consistency
# both load balancers could be pointing at different pools of real servers.
virtual_server 192.168.133.38 25 {
        delay_loop 60 
        lb_algo rr   
        lb_kind DR    
        protocol TCP
        persistence_timeout 60 # client to real-server mappings will be maintained for at least 60 seconds after a connection is terminated

        # Real Server A
        real_server 192.168.133.8 25 {
                weight 100 # used to influence the load balancing algorithm
                TCP_CHECK {
                        connect_timeout 6
                }
        }

        # Real Server B
        real_server 192.168.133.12 25 {
                weight 100
                TCP_CHECK {
                        connect_timeout 6
                }
        }
}

Conclusion

Keepalived is a ridiculously powerful platform for load balancing and high availability of networked services, that is also straight forward to setup. Most configurations are simple and obvious but there are many pitfalls related to hacking TCP/IP in the way load balancers and VRRP does.