Health Monitoring

This chapter describes the Traffic Manager’s heath monitoring capabilities, used to monitor server nodes for correct operation, to raise alerts, and to route around failed nodes when an error is detected.

Which Nodes Are Monitored?

The Traffic Manager uses two methods to monitor the correct operation of nodes: passive monitoring (checking the status of a node when it is used) and health monitoring (additional tests that are run against a node on a periodic basis).

Passive monitoring tests are only performed against nodes that are actively in use.

Health monitors are executed against all of the nodes in a pool. They are only run against pools that are in use in the following ways:

Pools that are configured as the default pool for a virtual server.

Pools that are explicitly referenced by name in a TrafficScript or RuleBuilder rule, either by the pool.select() or pool.use() functions.

Pools that are configured as the “failpool” for a pool that is in use.

If a pool is not referenced by the Traffic Manager configuration in this way, it is considered to be “not in use” and is not monitored using health monitors. Any failures of nodes will not be detected.

Pools must be explicitly referenced by name when they are used in a TrafficScript rule. By default, referencing by a variable (pool.use( $name );) or any other means is not permitted by the Traffic Manager.

If you enable the setting trafficscript!variable_pool_use (in System > Global Settings), you can use variables for pool names. If this setting is enabled, the Traffic Manager will execute health monitors against all of the pools you have configured, not just the ones that are clearly in use.

Using Nodes in Multiple Pools

The same node (IP address and port) may be referenced in several different pools. If a node has failed in one pool, it is not used by the load balancing or session persistence decisions. However, that node may be used in other pools until the passive monitors and health monitors assigned to those pools report a failure.

Example

Your Web application has 4 back-end servers (nodes). Each node hosts the same dynamic content and static content. If the dynamic content on a node fails (for example, the Java servlet crashes), you might still want to use that node for other static content:

Create two pools named "Dynamic" and "Static", each containing all 4 nodes.

Create a rule that uses pool "Dynamic" for dynamic content (Java Servlets, PHP and ASP files etc) and uses pool "Static" for all other content.

If a node fails in the "Dynamic" pool and fails to send a valid HTTP response, the passive monitoring used by that pool determines that node has failed. No more traffic from the "Dynamic" pool is sent to that node. However, the "Static" pool continues to send traffic to that node, as long as it returns valid HTTP responses for requests from the "Static" pool.

For more fine-grained detection of errors, you can assign different health monitors to each pool. For example, the health monitors for the "Dynamic" pool can send synthetic PHP or ASP requests. If these monitors fail, the node is considered to have failed in the "Dynamic" pool, but not in other pools.