CherwellMQS Metrics

Identify the metrics on the RabbitMQ management interface that help monitor queues and connections.

Use the following RabbitMQ metrics, which can be found in the management interface, to monitor queue health:

Metrics Descriptions
Disk space used Ideal is less than 50 mb per node.

On the machine where RabbitMQ is installed, an important metric to track is the amount of disk space used. This metric reflects how many messages each RabbitMQ node is using to store messages that are queued or in process. See https://www.rabbitmq.com/disk-alarms.html.

Memory used Ideal is less than 20 percent of the entire machine.

Once RabbitMQ exceeds a default threshold of 40%, RabbitMQ will start to block all connections that are publishing messages. See https://www.rabbitmq.com/memory-use.html.

Connection performance All traffic flows through a TCP connection. Monitoring the connection will help you understand the traffic for the application, see how the network is used, and determine if there are connectivity issues. See https://www.rabbitmq.com/reliability.html.
Data rates This includes throughput and performance.

Queues receive, push, and store messages. After a message is routed through an exchange, it is placed in a queue. A queue is the final destination within RabbitMQ before the message is passed to an application.

Queue depth Number of the messages in the queue.
Messages unacknowledged Number of messages a queue has delivered without receiving an acknowledgement from a consumer.
Messages ready Number of messages available to be consumed.
Message rates Number of messages that move in and out of a queue per second. This is highly variable from tenant to tenant, depending on the type of work being done and how long each piece of work takes. A complicated Automated Process will result in a lower message rate. Monitor this section to see messages pass from the Ready state to the Unacked state. If there are messages in the Ready state, then there should be an equal number of messages in the Unacked column as Consumers for that queue. With enough time on a given tenant, a pattern starts to emerge.
Number of consumers Number of workers that are registered to do work from that queue. Each service is designed to scale within the bounds of its settings. If there is a lower, or higher then expected number of consumers, it could mean there is a failure of some type in Cherwell Service Host.
Consumer utilization Time that a queue’s consumer could take on new messages. It is a number between 0 and 1, or NA if a queue does not have any consumers. If the utilization is less then 1 (100%) then a consumer is not able to take a message. This can be an indication of network congestion.