Installing the PostgreSQL DB App#

Note

This setup follows the same procedure as creating a data connection in the PostgreSQL app. If you have already created a data connection in the app, you do not need to perform these steps again.

Install the PostgreSQL DB for Observe using the App section under Workspace Setting.

PostgreSQL app

Figure 1 - PostgreSQL DB app card

The default configuration settings of the PostgreSQL DB app may be sufficient for your needs, but you should consider overwriting the defaults for the following options:

  • Datastream - select the datastream that receives your PostgreSQL DB data from Telegraf and FluentBit. This should be the same datastream as the one used in your Host App.

  • PG Settings Resource Attributes - A list of pg_settings values to add as resource attributes to your PostgreSQL DB instances. The list includes reasonable default values, but you can add extra values of particular interest.

  • Name Format - The default value of PostgreSQL/%s places your PostgreSQL DB resources within the PostgreSQL folder, but you may want to put them in a different folder.

  • ____ App - Several other apps may contain logs or metadata relevant to your PostgreSQL DB instances, such as apps for Host Monitoring or various cloud providers. Use these App variables to link your logs and metadata from other apps to your PostgreSQL DB app data.

Linking your PostgreSQL DB data to other apps#

Select the option for Manual install (Advanced) as you continue with the PostgreSQL app installation. If you have already installed apps for Host Monitoring or any cloud providers, you can select the corresponding app to link their data with the data in your PostgreSQL app. If you haven’t yet installed any of those apps, you can always return to this configuration and update it later.

The Observe PostgreSQL app uses Telegraf to capture PostgreSQL DB health metrics and FluentBit to capture PostgreSQL DB logs. Use the following instructions to configure your PostgreSQL DB instances to allow Telegraf and FluentBit to collect your data and how to install Telegraf and FluentBit for data collection.

Data collection#

  1. Log Collection - Configure your PostgreSQL instance to write logs to a file in a place where FluentBit can capture them. For the most complete depth of observability, Observe recommends enabling all your PostgreSQL DB logs by adding the following configurations (or similar) to your postgresql.conf:

        logging_collector = on

        log_directory = 'pg_log'  # directory where log files are written,
                                  # can be absolute or relative to PGDATA

        log_filename = 'pg.log'   # log file name, can include pattern

        log_file_mode = 0644

        log_duration = 'on'
        log_min_duration_statement = 0    # -1 is disabled, 0 logs all statements
                                          # and their durations, > 0 logs only
                                          # statements running at least this number
                                          # of milliseconds

While Observe recommends setting log_min_duration_statement = 0 for the most comprehensive visibility into your PostgreSQL database query performance, for some databases with very high query throughput, this can affect the performance. In those cases, it may be more optimal to set log_min_duration_statement to a desired number of milliseconds. If log_duration is on and log_min_duration_statement has a positive value, all durations are logged but the query text is included only for statements exceeding the threshold.

Be sure to restart your PostgreSQL DB service if necessary for any of these changes to take effect.

2. Prepare your PostgreSQL DB instance to receive connections from Telegraf by running the following commands. You can also save and execute them as a file. Be sure to replace the ${PSQL_PASSWORD_FOR_TELEGRAF} value with a secure string, and keep it handy, because you use it again soon:

        CREATE DATABASE telegraf;
        create user telegraf with password '${PSQL_PASSWORD_FOR_TELEGRAF}';
        grant pg_monitor to telegraf;
        grant SELECT ON pg_stat_database to telegraf;

3. Install Telegraf and FluentBit. Observe recommends using the 1-line installation instructions for the Observe Host Monitoring app.

4. Add a Telegraf config file at /etc/telegraf/telegraf.d/inputs.postgresql.conf with the following content. Be sure to replace the ${PSQL_PASSWORD_FOR_TELEGRAF} value with your secure string from before.

        [[inputs.postgresql_extensible]]
          address = "host=localhost user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
          prepared_statements = true

          [[inputs.postgresql_extensible.query]]
            sqlquery="SELECT * FROM pg_stat_database"
            version=901
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="SELECT * FROM pg_stat_bgwriter"
            version=901
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="""
              select 
                application_name, 
                backend_type, 
                client_addr, 
                client_port, 
                datname, 
                pid, 
                query, 
                state, 
                usename, 
                wait_event, 
                wait_event_type, 
                extract(epoch from (clock_timestamp() - query_start)) as query_duration 
                from pg_stat_activity
            """
            version=901
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            # feel free to add more pg_setting columns for fields that are important to you
            sqlquery="""
              select * from pg_settings 
                where name in (
                  'autovacuum',
                  'server_version',
                  'max_connections',
                  'shared_buffers',
                  'effective_cache_size',
                  'work_mem',
                  'maintenance_work_mem',
                  'min_wal_size',
                  'max_wal_size',
                  'checkpoint_timeout',
                  'checkpoint_completion_target',
                  'wal_buffers',
                  'wal_level',
                  'default_statistics_target',
                  'listen_addresses'
                )
            """
            version=801
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="""
              select 
                pg_is_in_recovery() as is_replica, 
                count(client_addr) as replicas_count
              from pg_stat_replication
            """
            version=801
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="""
              select CASE 
                WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
                THEN 0
                ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
                END AS replication_delay
            """
            version=1001
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="""
              select 
                sender_host as primary_host, 
                status as replica_state 
              from pg_stat_wal_receiver
            """
            version=801
            withdbname=false
            tagvalue=""

Note that in the final query of that configuration, if any extra pg_setting fields are especially important to you, you can add them to this query. When you install the PostgreSQL DB app, under the Advanced Settings, you can specify those fields and attach them to your PostgreSQL DB instance resource.

5. Add a FluentBit config file at /etc/td-agent-bit/observe-postgresql.conf with this content. If you used custom log file path values, be sure to use those instead:

        [INPUT]
            name tail
            tag  tail_postgres
            Path_Key path
            path /path/to/your/pg.log   # e.g: /var/lib/postgresql/12/main/pg_log/pg.log
            Read_from_Head False
            db      postgres-snapshots.db

6. Restart Telegraf: service telegraf restart.

7. Restart Fluent Bit: service td-agent-bit restart.

  1. Enable Log Collection for your PostgreSQL DB instance, following the instructions for your cloud provider. You can then collect those logs by installing the Observe App for that cloud provider.

Observe recommends enabling all your PostgreSQL DB logs by adding the following configurations, or similar configurations, to your PostgreSQL DB instances:

        logging_collector = on

        log_directory = 'pg_log'  # directory where log files are written,
                                  # can be absolute or relative to PGDATA

        log_filename = 'pg.log'   # log file name, can include pattern

        log_file_mode = 0644

        log_min_duration_statement = 0    # -1 is disabled, 0 logs all statements
                                          # and their durations, > 0 logs only
                                          # statements running at least this number
                                          # of milliseconds

While Observe recommends setting the log_min_duration_statement = 0 for the most comprehensive visibility into your Postgres DB instances query performance, for some databases with very high query throughput this can affect the performance. In those cases, you should set it to a higher number of milliseconds to capture query statement logs only on those queries that exceed a longer duration.

Be sure to restart your PostgreSQL service if necessary for any of these changes to take effect.

2. Prepare your PostgreSQL instance to receive remote connections by setting listen_addresses = '*'. Then, prepare your PostgreSQL instance to receive connections specifically from Telegraf by running the following commands. You can also save and execute them as a file. Be sure to replace the ${PSQL_PASSWORD_FOR_TELEGRAF} value with a secure string:

        CREATE DATABASE telegraf;
        create user telegraf with password '${PSQL_PASSWORD_FOR_TELEGRAF}';
        grant pg_monitor to telegraf;
        grant SELECT ON pg_stat_database to telegraf;

3. On a server that you manage, install Telegraf. Observe recommends using the 1-line installation instructions for the Observe Host Monitoring app.

4. Add a Telegraf config file at /etc/telegraf/telegraf.d/inputs.postgresql.conf with the following content. Be sure to replace the ${HOST} value with your remote PostgreSQL DB IP address and the ${PSQL_PASSWORD_FOR_TELEGRAF} value with your secure string used earlier.

        [[inputs.postgresql_extensible]]
          address = "host=${HOST} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
          prepared_statements = true

          [[inputs.postgresql_extensible.query]]
            sqlquery="SELECT * FROM pg_stat_database"
            version=901
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="SELECT * FROM pg_stat_bgwriter"
            version=901
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="""
              select 
                application_name, 
                backend_type, 
                client_addr, 
                client_port, 
                datname, 
                pid, 
                query, 
                state, 
                usename, 
                wait_event, 
                wait_event_type, 
                extract(epoch from (clock_timestamp() - query_start)) as query_duration 
                from pg_stat_activity
            """
            version=901
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            # feel free to add more pg_setting columns for fields that are important to you
            sqlquery="""
              select * from pg_settings 
                where name in (
                  'autovacuum',
                  'server_version',
                  'max_connections',
                  'shared_buffers',
                  'effective_cache_size',
                  'work_mem',
                  'maintenance_work_mem',
                  'min_wal_size',
                  'max_wal_size',
                  'checkpoint_timeout',
                  'checkpoint_completion_target',
                  'wal_buffers',
                  'wal_level',
                  'default_statistics_target',
                  'listen_addresses'
                )
            """
            version=801
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="""
              select 
                pg_is_in_recovery() as is_replica, 
                count(client_addr) as replicas_count
              from pg_stat_replication
            """
            version=801
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="""
              select CASE 
                WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
                THEN 0
                ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
                END AS replication_delay
            """
            version=1001
            withdbname=false
            tagvalue=""
          [[inputs.postgresql_extensible.query]]
            sqlquery="""
              select 
                sender_host as primary_host, 
                status as replica_state 
              from pg_stat_wal_receiver
            """
            version=801
            withdbname=false
            tagvalue=""

Note in the final query of that configuration, if there are any extra pg_setting fields that are especially important to you, you can add them to this query. When you install the PostgreSQL DB app, under Advanced Settings, you can specify those fields and attach them to your PostgreSQL DB instance resource.

You can collect data from multiple remote PostgreSQL DB instances with the same Telegraf instance by repeating this configuration with different ${HOST} values.

        [[inputs.postgresql_extensible]]
          address = "host=${HOST_1} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
          prepared_statements = true

          [[inputs.postgresql_extensible.query]]
            sqlquery="SELECT * FROM pg_stat_database"
            ...

        [[inputs.postgresql_extensible]]
          address = "host=${HOST_2} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
          prepared_statements = true

          [[inputs.postgresql_extensible.query]]
            sqlquery="SELECT * FROM pg_stat_database"
            ...

5. Restart Telegraf: service telegraf restart.

  1. Install the Observe Kubernetes App. This will capture your container logs automatically.

  2. Deploy telegraf in your cluster with the following configurations to capture Postgres metrics and expose them to your Observe grafana agent, which will forward them to your Observe account.

Telegraf ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: telegraf-configmap
data:
  # set the postgres endpoint you want to monitor. can be a service, a cloud provider endpoint, etc.
  POSTGRESQL_HOST: "postgres.default.svc"
  # if you are monitoring multiple postgres instances from this telegraf deployment, 
  # add multiple POSTGRESQL_HOST vars like so
  # POSTGRESQL_HOST_2: "postgres-2.default.svc"
  telegraf.conf: |+
    [agent]
      interval = "10s"
      # debug = true
      round_interval = true
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      collection_jitter = "0s"
      flush_interval = "10s"
      flush_jitter = "0s"
      precision = ""
      omit_hostname = false
    [[outputs.health]]
      service_address = "http://:8888"
    [[outputs.prometheus_client]]
      listen = ":9273"
      path = "/metrics"
      # metric_version = 1
      metric_version = 2
    [[inputs.postgresql_extensible]]
      # if the postgres instance you are monitoring runs as a kubernetes service,
      # kubernetes will automatically add to your telegraf container an env var for
      # the postgres service IP. Add that as a tag as follows.
      # if the postgres service is named "my-postgres" the env var would be MY_POSTGRES_SERVICE_HOST
      # tags = {postgres_service_host = "${<SERVICE_NAME>_SERVICE_HOST}"}  
      address = "host=${POSTGRESQL_HOST} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
      prepared_statements = true
      [[inputs.postgresql_extensible.query]]
        sqlquery="SELECT * FROM pg_stat_database"
        version=901
        withdbname=false
        tagvalue=""
      [[inputs.postgresql_extensible.query]]
        sqlquery="SELECT * FROM pg_stat_bgwriter"
        version=901
        withdbname=false
        tagvalue=""
      [[inputs.postgresql_extensible.query]]
        sqlquery="""
          select 
            application_name, 
            backend_type, 
            client_addr, 
            client_port, 
            datname, 
            pid, 
            query, 
            state, 
            usename, 
            wait_event, 
            wait_event_type, 
            extract(epoch from (clock_timestamp() - query_start)) as query_duration 
            from pg_stat_activity
        """
        version=901
        withdbname=false
        tagvalue=""
      [[inputs.postgresql_extensible.query]]
        # feel free to add more pg_setting columns for fields that are important to you
        sqlquery="""
          select * from pg_settings 
            where name in (
              'autovacuum',
              'server_version',
              'max_connections',
              'shared_buffers',
              'effective_cache_size',
              'work_mem',
              'maintenance_work_mem',
              'min_wal_size',
              'max_wal_size',
              'checkpoint_timeout',
              'checkpoint_completion_target',
              'wal_buffers',
              'wal_level',
              'default_statistics_target',
              'listen_addresses'
            )
        """
        version=801
        withdbname=false
        tagvalue=""
      [[inputs.postgresql_extensible.query]]
        sqlquery="""
          select 
            pg_is_in_recovery() as is_replica, 
            count(client_addr) as replicas_count
          from pg_stat_replication
        """
        version=801
        withdbname=false
        tagvalue=""
      [[inputs.postgresql_extensible.query]]
        sqlquery="""
          select CASE 
            WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
            THEN 0
            ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
            END AS replication_delay
        """
        version=1001
        withdbname=false
        tagvalue=""
      [[inputs.postgresql_extensible.query]]
        sqlquery="""
          select 
            sender_host as primary_host, 
            status as replica_state 
          from pg_stat_wal_receiver
        """
        version=801
        withdbname=false
        tagvalue=""
    # if you are monitoring multiple postgres instances from this telegraf deployment, 
    # add multiple telegraf inputs, using the POSTGRESQL_HOST_N vars you defined above
    # [[inputs.postgresql_extensible]]
    #   # if the postgres instance you are monitoring runs as a kubernetes service,
    #   # kubernetes will automatically add to your telegraf container an env var for
    #   # the postgres service IP. Add that as a tag as follows.
    #   # if the postgres service is named "my-postgres" the env var would be MY_POSTGRES_SERVICE_HOST
    #   # tags = {postgres_service_host = "${<SERVICE_2_NAME>_SERVICE_HOST}"}  
    #   address = "host=${POSTGRESQL_HOST} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
    #   prepared_statements = true
    #   [[inputs.postgresql_extensible.query]]
    #     sqlquery="SELECT * FROM pg_stat_database"
    #     version=901
    #     withdbname=false
    #     tagvalue=""
    #   [[inputs.postgresql_extensible.query]]
    #     sqlquery="SELECT * FROM pg_stat_bgwriter"
    #     version=901
    #     withdbname=false
    #     tagvalue=""
    #   [[inputs.postgresql_extensible.query]]
    #     sqlquery="""
    #       select 
    #         application_name, 
    #         backend_type, 
    #         client_addr, 
    #         client_port, 
    #         datname, 
    #         pid, 
    #         query, 
    #         state, 
    #         usename, 
    #         wait_event, 
    #         wait_event_type, 
    #         extract(epoch from (clock_timestamp() - query_start)) as query_duration 
    #         from pg_stat_activity
    #     """
    #     version=901
    #     withdbname=false
    #     tagvalue=""
    #   [[inputs.postgresql_extensible.query]]
    #     # feel free to add more pg_setting columns for fields that are important to you
    #     sqlquery="""
    #       select * from pg_settings 
    #         where name in (
    #           'autovacuum',
    #           'server_version',
    #           'max_connections',
    #           'shared_buffers',
    #           'effective_cache_size',
    #           'work_mem',
    #           'maintenance_work_mem',
    #           'min_wal_size',
    #           'max_wal_size',
    #           'checkpoint_timeout',
    #           'checkpoint_completion_target',
    #           'wal_buffers',
    #           'wal_level',
    #           'default_statistics_target',
    #           'listen_addresses'
    #         )
    #     """
    #     version=801
    #     withdbname=false
    #     tagvalue=""
    #   [[inputs.postgresql_extensible.query]]
    #     sqlquery="""
    #       select 
    #         pg_is_in_recovery() as is_replica, 
    #         count(client_addr) as replicas_count
    #       from pg_stat_replication
    #     """
    #     version=801
    #     withdbname=false
    #     tagvalue=""
    #   [[inputs.postgresql_extensible.query]]
    #     sqlquery="""
    #       select CASE 
    #         WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
    #         THEN 0
    #         ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
    #         END AS replication_delay
    #     """
    #     version=1001
    #     withdbname=false
    #     tagvalue=""
    #   [[inputs.postgresql_extensible.query]]
    #     sqlquery="""
    #       select 
    #         sender_host as primary_host, 
    #         status as replica_state 
    #       from pg_stat_wal_receiver
    #     """
    #     version=801
    #     withdbname=false
    #     tagvalue=""

Telegraf Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: telegraf
spec:
  selector:
    matchLabels:
      app: telegraf
  replicas: 1
  template:
    metadata:
      labels:
        app: telegraf
    spec:
      containers:
        - name: telegraf
          image: telegraf:latest
          volumeMounts:
            - name: config-volume
              mountPath: /etc/telegraf/telegraf.conf
              subPath: telegraf.conf
          ports:
            - containerPort: 9273
              name: metrics
          args:
            - -config
            - /etc/telegraf/telegraf.conf
          envFrom:
            - configMapRef:
                name: telegraf-configmap
      volumes:
        - name: config-volume
          configMap:
            name: telegraf-configmap
            items:
              - key: telegraf.conf
                path: telegraf.conf
  1. In your Observe PostgreSQL App configurations Advanced options, use the Kuberentes App input to link your PostgreSQL App and Kubernetes App data.

Using Custom Log Shippers#

Observe recommends using FluentBit to collect your PostgreSQL DB logs for the PostgreSQL DB instances you manage, but you may use another log shipper if you prefer. To do so, configure your log shipper to forward your PostgreSQL DB logs to your Observe account, and create a dataset that contains your PostgreSQL DB logs. For best results, this dataset should have as many as possible of the following fields (case sensitive):

  • timestamp (of type timestamp, required)

  • inputType (string, to indicate where the logs originated)

  • message (string)

  • user (string, the user of the PostgreSQL process that generated the log event)

  • host (string, the host or instance that generated the log event)

  • datacenter (string, the datacenter for the host or instance that generated the log event)

  • logType (string, usually one of “LOG”, “ERROR”, “FATAL”, “WARNING”, “STATEMENT”, or “HINT”)

  • tags (map of strings, e.g, {"owner": "Terra Branford", "team": "Returners"})

Once your dataset exists, you can input the complete name in the PostgreSQL app’s Advanced options under the custom_logs_dataset_name input.