Troubleshooting Queues in CherwellMQS

It is important to know how to interpret CherwellMQS through the RabbitMQ management interface.

When a Queue Appears Stuck

Sometimes, a worker is unable to process a request, so an item will sit as unacknowledged (unacked) and appear to be stuck. For a scheduled job, deleting the queue may mean that job and subsequent jobs won’t run until the next scheduled time. Similarly, deleting Automation Process (AP) events that are queued and unacked could keep them from being processed, and removing mail delivery queues could keep items that are in an unacked state from being sent. All queues with a Cherwell Service Host identifier can be cleanly deleted if needed.

Use the Get Messages section to check messages in the queue and remove a troublesome message without deleting the whole queue. In normal cases, you will see unacked messages in the queue and the system will not be processing the messages. The number of unacked messages stays the same while the Ready count increases. There are two options in the Get Messages section in the ACK Mode drop-down list that allow you to perform these actions:

  • Nack message requeue true: An ultimately non-destructive action that reads the message and adds it back into the queue in the same position.
  • Automatic ack: A destructive action that acknowledges the message from the queue and then does not requeue the message.

To identify a stuck item in a queue:

  1. In a browser, enter the RabbitMQ Management Interface URL:

    http://localhost:15672

  2. Log in with the RabbitMQ credentials.
  3. On the Queues tab, select the queue with the stuck message.
  4. Expand the Get Messages section.

    Get Messages section of the RabbitMQ Queues tab

  5. Select Nack message requeue true in the ACK Mode list.
  6. Select Get Message(s).
  7. Review the message and note the ID.
  8. Restart the Cherwell Service Host.
  9. Refresh the queue.
  10. On the Get Messages section, select Nack message requeue true in the ACK Mode list again.
  11. Select Get Messages.
  12. Review the message, and if the ID is the same, then that message is stuck.

To remove a single item in a queue:

  1. Pause the E-mail and Event Monitor Service or if there are Automation Process tasks associated with the queue, pause the Automation Process server.
  2. In a browser, enter the RabbitMQ Management Interface URL:

    http://localhost:15672

  3. Log in with the RabbitMQ credentials.
  4. On the Queues tab, select the queue with the stuck message.
  5. Expand the Get Messages section.
  6. Select Automatic ack in the ACK Mode list.

    Leave the Messages field number at 1. If multiple messages need to be removed, you can increase this number, but proceed with caution. This is an ultimately destructive action and cannot be undone.

  7. Select Get Message(s) to remove the item.
  8. Resume the E-mail and Event Monitor Service and the Automation Process server if they were paused in the first step.

Increase the Speed of Automation Processing Events

The quantity and processing time of an AP event directly correlates to how much work the event generates in the system. Disabling events will not prevent the leader from working to filter out each of those items. To maximize performance, you should have only the number of events you need. Broad functionality like Change on Any Field can also have a negative impact on performance, because these items require more effort than may be needed.

Monitor AP Events Processing

For AP Events, the following SQL Queries can provide some context:

Copy
--The longest running AP by BusObType for today
        Declare @Now DateTime;
        Declare @Earlier DateTime;
        --Select up to now.
        Select @Now = GetDate(); --Get from midnight today
        Select @Earlier = DATEFROMPARTS(DatePart(yyyy, @Now) , DatePart(M, @Now), DatePart(D, @Now)) Select td.DefName, Cast(FORMAT(DatePart(HOUR, ttp.Duration), 'D2') as VarChar(20)) + ':' + Cast(FORMAT(DatePart(MINUTE, ttp.Duration), 'D2') as VarChar(20)) + ':' + Cast(FORMAT(DatePart(SECOND, ttp.Duration), 'D2') as VarChar(20)) As ElapsedTime, ttp.RecCount
        from TrebuchetDefs td WITH(NOLOCK)
        INNER JOIN (SELECT BusObTypeID, Duration=(MAX([CompletedDateTime]-[StartedDateTime])), count(RecId) as recCount FROM [dbo].[TrebuchetProcesses] with (NOLOCK) where startedDateTime >= @Earlier AND completedDateTime <= @Now group by [BusObTypeID] ) as ttp ON td.DefId = ttp.BusObTypeID --Count of AP Events pending processing by the leader, high numbers are problematic.
        SELECT COUNT(RecId) FROM TrebuchetEvents WITH(NOLOCK) WHERE EventStatus = 'Logged'