Scale Sensu Go with Enterprise datastore

COMMERCIAL FEATURE: Access the datastore feature in the packaged Sensu Go distribution. For more information, read Get started with commercial features.

Sensu Go’s datastore feature enables scaling your monitoring to many thousands of events per second.

For each unique entity/check pair, Sensu records the latest event object in its datastore. By default, Sensu uses the embedded etcd datastore for event storage. The embedded etcd datastore helps you get started, but as the number of entities and checks in your Sensu implementation grows, so does the rate of events being written to the datastore. In a clustered deployment of etcd, whether embedded or external to Sensu, each event received by a member of the cluster must be replicated to other members, increasing network and disk IO utilization.

Our team documented configuration and testing of Sensu running on bare metal infrastructure in the sensu/sensu-perf project. This configuration comfortably handled 12,000 Sensu agent connections (and their keepalives) and processed more than 8,500 events per second.

This rate of events should be sufficient for many installations but assumes an ideal scenario where Sensu backend nodes use direct-attached, dedicated non-volatile memory express (NVMe) storage and are connected to a dedicated LAN. Deployments on public cloud providers are not likely to achieve similar results due to sharing both disk and network bandwidth with other tenants. Adhering to the cloud provider’s recommended practices may also become a factor because many operators are inclined to deploy a cluster across multiple availability zones. In such a deployment cluster, communication happens over shared WAN links, which are subject to uncontrolled variability in throughput and latency.

The Enterprise datastore can help operators achieve much higher rates of event processing and minimize the replication communication between etcd peers. The sensu-perf test environment comfortably handles 40,000 Sensu agent connections (and their keepalives) and processes more than 36,000 events per second under ideal conditions.

IMPORTANT: PostgreSQL configuration file locations differ depending on platform. The steps in this guide use common paths for RHEL-family distributions, but the files may be stored elsewhere on your system. Learn more about PostgreSQL configuration file locations.

Prerequisites

  • Database server running Postgres 9.5 or later
  • Postgres database (or administrative access to create one)
  • Postgres user with permissions to the database (or administrative access to create such a user)
  • Licensed Sensu Go backend

For optimal performance, we recommend the following PostgreSQL configuration parameters and settings as a starting point for your postgresql.conf file:

max_connections = 200

shared_buffers = 10GB

maintenance_work_mem = 1GB

vacuum_cost_delay = 10ms
vacuum_cost_limit = 10000

bgwriter_delay = 50ms
bgwriter_lru_maxpages = 1000

max_worker_processes = 8
max_parallel_maintenance_workers = 2
max_parallel_workers_per_gather = 2
max_parallel_workers = 8

synchronous_commit = off

wal_sync_method = fdatasync
wal_writer_delay = 5000ms
max_wal_size = 5GB
min_wal_size = 1GB

checkpoint_completion_target = 0.9

autovacuum_naptime = 10s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.025

Adjust the parameters and settings as needed based on your hardware and the performance you observe. Read the PostgreSQL parameters documentation for information about setting parameters.

Configure Postgres

Before Sensu can start writing events to Postgres, you need a database and an account with permissions to write to that database. To provide consistent event throughput, we recommend exclusively dedicating your Postgres instance to storage of Sensu events.

If you have administrative access to Postgres, you can create the database and user.

  1. Change to the postgres user and open the Postgres prompt (postgres=#):

    sudo -u postgres psql
  2. Create the sensu_events database:

    CREATE DATABASE sensu_events;

    PostgreSQL will return a confirmation message: CREATE DATABASE.

  3. Create the sensu role with a password:

    CREATE USER sensu WITH ENCRYPTED PASSWORD 'mypass';

    PostgreSQL will return a confirmation message: CREATE ROLE.

  4. Grant the sensu role all privileges for the sensu_events database:

    GRANT ALL PRIVILEGES ON DATABASE sensu_events TO sensu;

    PostgreSQL will return a confirmation message: GRANT.

  5. Grant the sensu role all privileges to the public schema (This step is mandatory for Postgres 15):

    GRANT ALL ON SCHEMA public TO sensu;

    PostgreSQL will return a confirmation message: GRANT.

  6. Type \q to exit the PostgreSQL prompt.

With this configuration complete, PostgreSQL will have a sensu_events database for storing Sensu events and a sensu user with permissions to that database.

By default, the Postgres user you’ve just added will not be able to authenticate via password, so you’ll also need to make a change to the pg_hba.conf file. The required change will depend on how Sensu will connect to Postgres. In this case, you’ll configure Postgres to allow the sensu user to connect to the sensu_events database from any host using an md5-encrypted password:

  1. Make a copy of the current pg_hba.conf file:

    sudo cp /var/lib/pgsql/data/pg_hba.conf /var/tmp/pg_hba.conf.bak
  2. Give the Sensu user permissions to connect to the sensu_events database from any IP address:

    echo 'host sensu_events sensu 0.0.0.0/0 md5' | sudo tee -a /var/lib/pgsql/data/pg_hba.conf
  3. Restart the postgresql service to activate the pg_hba.conf changes:

    sudo systemctl restart postgresql

With this configuration complete, you can configure Sensu to store events in your Postgres database.

To secure communication between Sensu and the PostgreSQL event store using certificate authentication, read Secure PostgreSQL.

Configure Sensu

If your Sensu backend is already licensed, the configuration for routing events to Postgres is relatively straightforward. Create a PostgresConfig resource that describes the database connection as a data source name (DSN):

---
type: PostgresConfig
api_version: store/v1
metadata:
  name: postgres01
spec:
  dsn: "postgresql://sensu:mypass@10.0.2.15:5432/sensu_events?sslmode=disable"
  pool_size: 20
{
  "type": "PostgresConfig",
  "api_version": "store/v1",
  "metadata": {
    "name": "my-postgres"
  },
  "spec": {
    "dsn": "postgresql://sensu:mypass@10.0.2.15:5432/sensu_events",
    "pool_size": 20
  }
}

Save this configuration as my-postgres.yml or my-postgres.json and install it with sensuctl:

sensuctl create --file my-postgres.yml
sensuctl create --file my-postgres.json

The Sensu backend is now configured to use Postgres for event storage!

In the web UI and in sensuctl, event history will appear incomplete. When Postgres configuration is provided and the backend successfully connects to the database, etcd event history is not migrated. New events will be written to Postgres as they are processed, with the Postgres datastore ultimately being brought up to date with the current state of your monitored infrastructure.

Aside from event history, which is not migrated from etcd, there’s no observable difference when using Postgres as the event store, and neither interface supports displaying the PostgresConfig type.

To verify that the change was effective and your connection to Postgres was successful, look at the sensu-backend log:

{"component":"store","level":"warning","msg":"trying to enable external event store","time":"2019-10-02T23:31:38Z"}
{"component":"store","level":"warning","msg":"switched event store to postgres","time":"2019-10-02T23:31:38Z"}

You can also use psql to verify that events are being written to the sensu_events database.

  1. Change to the postgres user and open the Postgres prompt (postgres=#):

    sudo -u postgres psql
  2. Connect to the sensu_events database:

    \c sensu_events

    PostgreSQL will return a confirmation message:

    You are now connected to database "sensu_events" as user "postgres".

    The prompt will change to sensu_events=#.

  3. List the tables in the sensu_events database:

    \dt

    PostgreSQL will list the tables:

                 List of relations
     Schema |       Name        | Type  | Owner 
    --------+-------------------+-------+-------
     public | events            | table | sensu
     public | migration_version | table | sensu
    (2 rows)
  4. Request a list of all entities reporting keepalives:

    select sensu_entity from events where sensu_check = 'keepalive';

    PostgreSQL will return a list of the entities:

     sensu_entity 
    --------------
     i-414141
     i-424242
     i-434343
    (3 rows)

Revert to the built-in datastore

If you want to revert to the default etcd event store, delete the PostgresConfig resource. In this example, my-postgres.yml or my-postgres.json contain the same configuration you used to configure the Enterprise event store earlier in this guide:

sensuctl delete --file my-postgres.yml
sensuctl delete --file my-postgres.json

To verify that the change was effective, look for messages similar to these in the sensu-backend log:

{"component":"store","level":"warning","msg":"store configuration deleted","store":"/sensu.io/api/enterprise/store/v1/provider/postgres01","time":"2019-10-02T23:29:06Z"}
{"component":"store","level":"warning","msg":"switched event store to etcd","time":"2019-10-02T23:29:06Z"}

Similar to enabling PostgreSQL, switching back to the etcd datastore does not migrate current observability event data from one store to another. The web UI or sensuctl output may list outdated events until the etcd datastore catches up with the current state of your monitored infrastructure.

Configure Postgres streaming replication

Postgres supports an active standby with streaming replication. Configure streaming replication to replicate all Sensu events written to the primary Postgres server to the standby server.

Follow the steps in this section to create and add the replication role, set streaming replication configuration parameters, bootstrap the standby host, and confirm successful Postgres streaming replication.

Create and add the replication role

If you have administrative access to Postgres, you can create the replication role. Follow these steps to create and add the replication role on the primary Postgres host:

  1. Change to the postgres user and open the Postgres prompt (postgres=#):

    sudo -u postgres psql

  2. Create the repl role:

    CREATE ROLE repl PASSWORD '<your-password>' LOGIN REPLICATION;

    PostgreSQL will return a confirmation message: CREATE ROLE.

  3. Type \q to exit the PostgreSQL prompt.

  4. Add the replication role to pg_hba.conf using an md5-encrypted password. Make a copy of the current pg_hba.conf:

    sudo cp /var/lib/pgsql/data/pg_hba.conf /var/tmp/pg_hba.conf.bak

  5. In the following command, replace <standby_ip> with the IP address of your standby Postgres host and run the command:

    export STANDBY_IP=<standby-ip>

  6. Give the repl user permissions to replicate from the standby host:

    echo "host replication repl ${STANDBY_IP}/32 md5" | sudo tee -a /var/lib/pgsql/data/pg_hba.conf

  7. Restart the PostgreSQL service to activate the pg_hba.conf changes:

    sudo systemctl restart postgresql

Set streaming replication configuration parameters

Follow these steps to set streaming replication configuration parameters on the primary Postgres host:

  1. Make a copy of the postgresql.conf:

    sudo cp -a /var/lib/pgsql/data/postgresql.conf /var/lib/pgsql/data/postgresql.conf.bak

  2. Append the necessary configuration options.

    echo 'wal_level = replica' | sudo tee -a /var/lib/pgsql/data/postgresql.conf

  3. Set the maximum number of concurrent connections from the standby servers:

    echo 'max_wal_senders = 5' | sudo tee -a /var/lib/pgsql/data/postgresql.conf

  4. To prevent the primary server from removing the WAL segments required for the standby server before shipping them, set the minimum number of segments retained in the pg_xlog directory:

    echo 'wal_keep_segments = 32' | sudo tee -a /var/lib/pgsql/data/postgresql.conf

    At minimum, the number of wal_keep_segments should be larger than the number of segments generated between the beginning of online backup and the startup of streaming replication.

    NOTE: If you enable WAL archiving to an archive directory accessible from the standby, this step may not be necessary.

  5. Restart the PostgreSQL service to activate the postgresql.conf changes:

    sudo systemctl restart postgresql

Bootstrap the standby host

Follow these steps to bootstrap the standby host on the standby Postgres host:

  1. If the standby host has ever run Postgres, stop Postgres and empty the data directory:

    sudo systemctl stop postgresql

    sudo mv /var/lib/pgsql/data /var/lib/pgsql/data.bak
  2. Make the standby data directory:

    sudo install -d -o postgres -g postgres -m 0700 /var/lib/pgsql/data

  3. In the following command, replace <primary_ip> with the IP address of your primary Postgres host and run the command:

    export PRIMARY_IP=<primary_ip>

  4. Bootstrap the standby data directory:

    sudo -u postgres pg_basebackup -h $PRIMARY_IP -D /var/lib/pgsql/data -P -U repl -R --wal-method=stream

  5. Enter your password at the Postgres prompt:

    Password: 

After you enter your password, PostgreSQL will list database copy progress:

30318/30318 kB (100%), 1/1 tablespace

Confirm replication

Follow these steps to confirm replication:

  1. On the primary Postgres host, remove primary-only configurations:

    sudo sed -r -i.bak '/^(wal_level|max_wal_senders|wal_keep_segments).*/d' /var/lib/pgsql/data/postgresql.conf

  2. Start the PostgreSQL service:

    sudo systemctl start postgresql

  3. Check the primary host commit log location:

    sudo -u postgres psql -c "select pg_current_wal_lsn()"

    PostgreSQL will list the primary host commit log location:

     pg_current_wal_lsn 
    --------------------------
     0/3000568
    (1 row)

  4. On the standby Postgres host, check the commit log location:

    sudo -u postgres psql -c "select pg_last_wal_receive_lsn()"
    sudo -u postgres psql -c "select pg_last_wal_replay_lsn()"

    PostgreSQL will list the standby host commit log location:

     pg_last_wal_receive_lsn 
    -------------------------------
     0/3000568
    (1 row)
     pg_last_wal_replay_lsn 
    ------------------------------
     0/3000568
    (1 row)

With this configuration complete, your Sensu events will be replicated to the standby host.