Alerting basics

To generate alerts for a managed device, the Endpoint Manager alerting agent must be deployed to that device. A default alerting agent and alert ruleset can be deployed to every managed device when you install the standard agent on the device. That agent follows the rules defined in the alert rulesets for that device.

When you have defined a custom ruleset you can deploy it to devices to monitor items specific to that type of device. You can deploy multiple rulesets to devices, although you should be aware that conflicts could occur between similar rules in different rulesets.

Events that can generate alerts

This product has an extensive list of events that can generate alerts. Some events are problems that need immediate attention, such as component failure or system shutdown. Other events are configuration changes that provide useful information to a system administrator, such as changes that affect a device's performance and stability or cause problems with a standard installation.

Examples of the types of events you can monitor include the following:

  • Hardware changes: A component such as a processor, memory, a disk drive, or a network card has been added or removed.
  • Application added or removed: A user has installed or uninstalled an application on a device. This can be useful in tracking licenses or employee productivity. Applications registered in Windows Add or Remove Programs are monitored, and the application name used in Add or Remove Programs is the name that appears in the alert notification.
  • Service event: A service has started or stopped on the device.
  • Performance: A performance threshold has been crossed, such as for drive capacity, available memory, and so on.
  • IPMI event: An event detectable on IPMI devices has occurred, including changes to controllers, sensors, logs, and so on.
  • Modem usage: The system modem has been used, or a modem has been added or removed.
  • Physical security: Chassis intrusion detection, power cycling, or another physical change has occurred.
  • Package installation: A package has been installed on the target computer.
  • Remote control activity: Remote control session activity has occurred, including starting, stopping, or failures.

To view a record of alerts for configuration changes, review the alert log on the device's Real-time inventory and monitoring console.

Alerts can only be generated when devices are equipped with the appropriate hardware. For example, alerts generated from sensor readings only apply to devices equipped with the correct sensors.

Hardware monitoring is also dependent on the correct configuration of the hardware. For example, if a hard drive with S.M.A.R.T. monitoring capabilities is installed on a device but S.M.A.R.T. detection is not enabled in the device's BIOS settings, or if the device's BIOS does not support S.M.A.R.T. drives, alerts will not be generated from S.M.A.R.T. drive monitoring.

Severity levels for events

Device problems or events can be associated with some or all of the severity levels shown below. In some parts of the product interface, these states are noted with a numeric value as well as an associated icon. Numeric values are in parentheses.

  • Informational (1): Supports configuration changes or events that manufacturers may include with their systems. This severity level does not affect device health.
  • OK (2): Indicates that the status is at an acceptable level.
  • Warning (3): Provides some advance warning of a problem before it reaches a critical point.
  • Critical (4): Indicates that the problem needs immediate attention.
  • Unknown: The alert status can't be determined or the monitoring agent has not been installed on the device.

Depending on the nature of the event, some severity levels don't apply and aren't available. For example, with the Intrusion detection event, the device's chassis is either open or closed. If it is open, an alert action can be triggered, but only with a severity of Warning. Other events, such as Disk space and Virtual memory, include three severity levels (OK, Warning, and Critical) because different states can indicate different levels of concern to the administrator.

You can choose the severity level or threshold that will trigger some alerts. For example, you can select one action for a Warning status and a different action for a Critical status for an alert. The Unknown status can't be selected as an alert trigger but simply indicates that the status can't be determined.

Alert actions for notifications

This product can notify you when monitored events occur by doing any of the following:

  • Adding information to the log
  • E-mailing a notice or sending a message to a pager
  • Running a program on the core or an individual device
  • Sending an SNMP trap to an SNMP management console on the network
  • Rebooting or shutting down a device

Alert actions are configured when you define alert rulesets.

Alert storm control

Some alert rules assigned to groups of devices can simultaneously generate a large number of responses. For example, you can include an alert rule for computer configuration changes and associate it with an e-mail action. If a software distribution patch is applied to many devices with this alert rule, it would generate a number of e-mails from the core server equal to the number of devices to which the patch was applied, potentially flooding your e-mail server with a "storm" of alert notifications.

This product's alert storm control feature automatically limits the number of times an alert action occurs for an alert. If an alert triggers an action 5 times in 5 minutes, the alert action is discontinued but alerts are still written to the core log file. The administrator is notified of the alert storm with an automated e-mail. When the alert stops occurring and does not occur again for one hour, the alert storm control is reset for that alert. Alert actions will again be triggered if that alert occurs again later.