Failover

When each Traffic Manager in a cluster determines that one of its peers has failed, the Traffic Manager may take over some or all of the traffic shares that the failed system was responsible for. The traffic distribution method determines how this is done.

Traffic IP Address Transfer (Single-Hosted Mode)

Each Traffic Manager in a cluster uses its knowledge of which machines are active to determine which Traffic IP addresses it should be running. The cluster uses a fully deterministic algorithm to distribute IP addresses across the machines:

Because the algorithm is deterministic, the Traffic Managers do not need to negotiate between themselves when one of their peers fails or recovers.

The algorithm is optimized to spread the distribution of Traffic IP addresses across the active Traffic Managers in a cluster, and to minimize the number of IP address transfers if a Traffic Manager fails or recovers.

When a Traffic Manager raises a Traffic IP address, it sends several ARP messages to inform adjacent network devices that the MAC address corresponding to the IP address may have changed. The Traffic Manager will send up to 10 ARP messages (tunable using the flipper!arp_count setting); the frequency of these messages is controlled by the flipper!monitor_interval setting (by default, the messages are sent at 0.5-second intervals).

Note that if a Traffic Manager detects that its own network connectivity has failed, it will immediately drop its Traffic IP addresses and broadcast I have failed health messages to its peers. This is in anticipation of other Traffic Managers in the cluster raising the interfaces when they realize that the first Traffic Manager has failed.

Traffic IP Address Transfer (Multi-Hosted Mode)

Each Traffic Manager in the Traffic IP Group deterministically chooses whether or not it should handle each packet, based on the source IP of that packet (and optionally the source port; see Traffic Distribution).

If a Traffic Manager fails, its share of the load is spread evenly between the remaining Traffic Managers. When it recovers, it takes equal shares of the load from its peers, thus ensuring that the traffic is always evenly distributed across the working machines in the Traffic IP Group.

Multi-hosted IP functionality is not included with the Traffic Manager software by default. You can download and install it as an additional kernel module, and is supported on Linux kernels, version 2.6.18 and later. See the Traffic Manager documentation on the Ivanti Web site (www.ivanti.com) for more information on supported versions.

Traffic IP Address Transfer (RHI Mode)

If a Traffic Manager's fault tolerance checks fail, it lowers the addresses used in RHI traffic IP groups and withdraws route advertisements from the network.

If the network detects an inability to reach the designated active Traffic Manager in an RHI traffic IP group, routing decisions for the traffic IP address(es) use instead the next best available route using the lowest metric, such as to the designated passive Traffic Manager, or to a Traffic Manager hosting the same traffic IP address in another datacenter.

Recovering from Failure

When a failed Traffic Manager recovers, its share of traffic is transferred back to it.

Each time traffic shares are transferred from one Traffic Manager to another, any connections currently in that share are dropped. This is inevitable when a transfer occurs because a Traffic Manager fails, but may not be desirable when a Traffic Manager recovers.

In this case, you can disable the flipper!autofailback setting on the System > Fault Tolerance page of the Admin UI. When this is disabled, a Traffic Manager does not take any traffic when it recovers. Instead, the user interface displays a message indicating that the Traffic Manager has recovered and can take back its IP addresses.

When you want to reactivate the Traffic Manager, go to the Diagnose page and select the Reactivate this Traffic Manager link.

Alternatively, you can edit each of your traffic IP groups and set the recovered Traffic Manager to passive. Once you set it to passive in all of the groups, it will not need to take any shares of traffic; it will then reactive automatically, clearing the error state. In addition, no traffic will be lost because not traffic shares will have been transferred.