How to reduce alert fatigue with filters
What are Sensu filters?
Sensu filters allow you to filter events destined for one or more event handlers. Sensu filters evaluate their expressions against the event data, to determine if the event should be passed to an event handler.
Why use a filter?
Filters are commonly used to filter recurring events (i.e. to eliminate notification noise) and to filter events from systems in pre-production environments.
Using filters to reduce alert fatigue
The purpose of this guide is to help you reduce alert fatigue by configuring a
hourly, for a handler named
slack, in order to prevent alerts
from being sent to Slack every minute. If you don’t already have a handler in
place, learn how to send alerts with handlers.
Creating the filter
We’ll show you two approaches to creating a filter that will handle occurrences. The first approach will be to create our own filter that we’ll add to Sensu. The second approach will cover implementing the filter as an asset.
Using Sensuctl to Create a Filter
The first step is to create a filter that we will call
hourly, which matches
new events (where the event’s
occurrences is equal to
1) or hourly events
(so every hour after the first occurrence, calculated with the check’s
interval and the event’s
Events in Sensu Go are handled regardless of check execution status; even successful check events are passed through the pipeline. Therefore, it’s necessary to add a clause for non-zero status.
sensuctl filter create hourly \ --action allow \ --expressions "event.check.occurrences == 1 || event.check.occurrences % (3600 / event.check.interval) == 0"
Assigning the filter to a handler
Now that the
hourly filter has been created, it can be assigned to a handler.
Here, since we want to reduce the number of Slack messages sent by Sensu, we will apply
our filter to an already existing handler named
slack, in addition to the
is_incident filter so only failing events are handled.
sensuctl handler update slack
Follow the prompts to add the
is_incident filters to the Slack
Creating a fatigue check filter
While we can use
sensuctl to interactively create a filter, we can create more reusable filters through the use of assets. Read on to see how to implement a filter using this approach.
Using a Filter Asset
If you’re not already familiar with assets, take a minute or two and read over our guide to installing plugins with assets. This will help you understand what an asset is and how they are used in Sensu.
The first step we’ll need to take is to obtain a filter asset that will allow us to replicate the behavior we used when we created the
hourly filter via
sensuctl. Let’s use the fatigue check asset from the Bonsai Asset Index. You can download the asset directly by running the following:
curl -s https://bonsai.sensu.io/release_assets/nixwiz/sensu-go-fatigue-check-filter/0.1.3/any/noarch/download | sensuctl create
Excellent! You’ve registered the asset. We still need to create our filter. We’ll use the following configuration for creating the actual filter. In this case, we’ll call it
--- type: EventFilter api_version: core/v2 metadata: name: fatigue_check namespace: default spec: action: allow expressions: - fatigue_check(event) runtime_assets: - fatigue-check-filter
And we’ll go ahead and create it:
sensuctl create -f sensu-fatigue-check-filter.yml
Now that we’ve created the filter asset and the filter, let’s move on to the check annotations needed for the asset to work properly.
Annotating a check for filter asset use
Now that we’ve created the filter, we’ll need to make some additions to any checks we want to use the filter with. Let’s look at an example CPU check:
--- type: CheckConfig api_version: core/v2 metadata: name: linux-cpu-check namespace: default annotations: fatigue_check/occurrences: '1' fatigue_check/interval: '3600' fatigue_check/allow_resolution: 'false' spec: command: check-cpu -w 90 c 95 env_vars: handlers: - email high_flap_threshold: 0 interval: 60 low_flap_threshold: 0 output_metric_format: '' output_metric_handlers: proxy_entity_name: '' publish: true round_robin: false runtime_assets: stdin: false subdue: subscriptions: - linux timeout: 0 ttl: 0
You’ll notice that under the
metadata scope we’ve added some annotations. For our filter asset to work the way that our interactively created filter does, these annotations are necessary. Let’s discuss those annotations briefly.
The annotations in our check definition are doing several things:
fatigue_check/occurrences: This tells the filter on which occurrence we’re going to send the even through for further processing
fatigue_check/interval: This value (in seconds) tells the filter at what interval to allow additional events to be processed
fatigue_check/allow_resolution: Determines if a
resolveevent will be passed through to the filter.
For more information on configuring these values, see the filter asset README. Now let’s assign our newly minted filter to a handler.
Assigning the filter to a handler
Just like we did with our interactively created filter, we’re going to assign our filter to a handler. We can use the following handler example:
--- api_version: core/v2 type: Handler metadata: namespace: default name: slack spec: type: pipe command: 'sensu-slack-handler --channel ''#general'' --timeout 20 --username ''sensu'' ' env_vars: - SLACK_WEBHOOK_URL=https://www.webhook-url-for-slack.com timeout: 30 filters: - is_incident - fatigue_check
Let’s move on to validating our filter.
Validating the filter
You can verify the proper behavior of these filters by using
The default location of these logs varies based on the platform used, but the
troubleshooting guide provides this information.
Whenever an event is being handled, a log entry is added with the message
"handler":"slack","level":"debug","msg":"sending event to handler", followed by
a second one with the message
"msg":"pipelined executed event pipe
handler","output":"","status":0. However, if the event is being discarded by
our filter, a log entry with the message
event filtered will appear instead.
You now know how to apply a filter to a handler, as well as use a filter asset and hopefully reduce alert fatigue. From this point, here are some recommended resources:
- Read the filters reference for in-depth documentation on filters.