If your CI alarms have been triggered but you see no Incident,
follow these steps to troubleshoot.
Before you begin these steps, you should be able to successfully
complete the form requesting the AWS product, as described in
Cannot Complete AWS Provision Form, including getting specific CFT details for the
AWS products in your Service Catalog portfolio.
After you can see a successfully provisioned product in your AWS
account and received the CI in your CMDB, you may wish to automatically create
incidents for your CI if it has an event. If you follow the configuration steps
for this case but cannot see your incident, follow these troubleshooting steps.
- Problem: Event alarm not in alarm state.
Solution: Before you begin troubleshooting the Incident,
confirm that your alarm is in alarm state. If it has gone into alarm state and
your system was not yet configured, you may need to move it out of alarm state
and back in again to get the proper notification to create the Incident.
- Problem: SNS subscription cannot confirm.
Solution: Your SNS subscription should be able to confirm
automatically. If it doesn't, check to make sure your password is correctly
entered in the webhook and in SNS.
If you have complex characters in your password, make sure
they are properly encoded for URL transmission. For more information, see:
HTML
URL Encoding Reference.
- Problem: SNS message not sent.
Solution: Once you are confident that your SNS subscription
for the CSM webhook is correct, ensure the message has been sent by SNS.
You can set up a Hookbin subscription for your SNS topic (see
https://hookbin.com/), or another service to receive the JSON sent by SNS. This
will serve the dual purpose of allowing you to see precisely what SNS has sent
your webhook, but also confirm that the message has been sent.
Make sure you leave the tab with your Hookbin URL open or you
won't be able to get back to the results. If you accidentally close the tab,
you will need to recreate the Hookbin endpoint and recreate your SNS
subscription.
- Problem: Message stuck in RabbitMQ.
When SNS hits the webhook successfully, you will see a log
message in the CSM Web API log as follows:
{"Level":"STATS","Message":"Execution of Post took 477
Milliseconds.","TimeStamp":"2020-12-04T21:27:47.5775259+00:00","ThreadName":"Thread_14","Domain":null,"pid":"11908","DebugCategory":"InfoLogMessage","Host":"EC2AMAZ-KBCFCVH","Object[]":"[\"ActionName\",\"Post\",\"RequestUri\",\"https://awsmapp.cherwelltest.com/CherwellAPI/api/Webhooks/createawsevent\",\"ElapsedTime\",477,\"RequestContentSize\",2108,\"ResponseContentSize\",null,\"ResponseStatusCode\",200]"}
Solution: If you see this message in the logs, but you do
not see your object in the staging table, you can check RabbitMQ to confirm
that it has not gotten stuck in queue. If it has, try restarting the Cherwell
Service Host and testing your provision again.
- Problem: Staging table object is created, but corresponding
CI or Incident is not created.
Solution: If the object created by your webhook (either
AWS Event or
AWS Config Staging) has been created but you
do not see a corresponding
Incident or
Configuration Item, you can check these
things:
- Ensure your Cherwell Service Host is turned on and pointed to
the correct database.
- To navigate to the staging object, select
Searching >
Quick Search Builder. Select the staging
object (either AWS Event for Incidents or AWS Config Staging for Configuration
Items).
- From your current
record, select
.
- If any of your automation processes have failed, double-click
that item to get additional details on what caused the failure.
- Confirm the JSON parsing in any associated
One-Step™
Actions
is configured to the specific response you are receiving. You may find these
tools useful in assessing what kind of parsing is required: