Passive Health Monitoring

A pool performs a set of checks every time it attempts to send a request to a node; this process is referred to as Passive Monitoring:

The Traffic Manager attempts to connect to a node; if the connection is refused, or is not established within the max_connect_time setting (default 4 seconds), the request is considered to have failed.

The Traffic Manager writes the request data down the connection; if the connection is closed prematurely, or if the beginning of a response is not received within max_reply_time seconds (default 30 seconds), the request is considered to have failed.

For SSL only: if the SSL handshake to the node fails, the request is considered to have failed.

The max_connect_time and max_reply_time settings are properties of the Protocol Settings page in a pool configuration. For more information, see Protocol Settings.

Retrying Failed Requests

If these checks fail, the pool might try the request against a different working node, and can try every node in the pool before abandoning the request. The actual behavior is determined by the “idempotent” status of the request.

By default requests are assumed to be idempotent, in other words they can be safely retried multiple times without undesired side effects. An exception to this is any request received through a virtual server using one of the generic-type protocols (“Generic Client First”, “Generic Server First”, “Generic Streaming”). In order to be idempotent by default, an end-point to the request must first be defined1 in order for failure to be measured and retries to be triggered.

RFC 2616 defines some HTTP requests as non-idempotent2 (for example, they can cause a transaction to take place or change state on the server). The Traffic Manager follows these recommendations and treats HTTP GET, HEAD, PUT, DELETE, OPTIONS and TRACE methods as idempotent; all other requests are considered non-idempotent.

To summarize:

Idempotent (no side effects): The Traffic Manager retries these requests against other believed-to-be working nodes, and can try every node in the pool before abandoning the request.

Non-idempotent (side effects): The Traffic Manager only retries a non-idempotent request if it failed to open a TCP connection to the failed node.

When the Traffic Manager establishes a TCP connection, it immediately writes the request data down that connection. The Traffic Manager is not able to determine whether or not the node has received the request data and begun processing it. Therefore, non-idempotent requests are only retried if the connection could not be established in the first place.

You can override the idempotent/non-idempotent decision made by the Traffic Manager by using the request.setIdempotent() and http.setIdempotent() TrafficScript functions to indicate to the Traffic Manager that a particular request should be considered safe to retry.

Node Failures

The Traffic Manager infers that a node has failed if connections to that node fail consistently, with node_connection_attempts (default 3) failures in a row with no intermediate successful transactions.

If a node has been deemed to have failed, it is not used for at least node_fail_time seconds (default 60 seconds), after which it is tentatively used to determine if it has recovered by periodically attempting to send it idempotent requests from the live service traffic. If the request fails, it can be retried against another, working node. Where all nodes in the pool have failed, the Traffic Manager immediately sends traffic to a recovered node regardless of the node_fail_time setting.

To configure node_connection_attempts and node_fail_time, use the Pools > Edit > Protocol Settings page.

To learn more concerning the cause of node failures, enable log!server_connection_failures in Virtual Servers > Edit > Error Logging. This facility can help provide more useful log information regarding the actual reasons for your node failures. For more details, see Handling Errors.

A node that has been marked as having failed by passive monitoring can recover only if it responds to live service traffic. It cannot be recovered by an active Health Monitor.

Enabling and Disabling Passive Monitoring

Passive Monitoring is used by default, but can be disabled on a per-pool basis using the passive_monitoring setting in the Monitors section of a pool configuration. If this setting is disabled, you should ensure there are suitable Health Monitors configured otherwise failed requests are not detected and subsequently retried.