Installing the PostgreSQL DB App¶
Note
This setup follows the same procedure as creating a data connection in the PostgreSQL app. If you have already created a data connection in the app, you do not need to perform these steps again.
Install the PostgreSQL DB for Observe using the App section under Workspace Setting.
Figure 1 - PostgreSQL DB app card
The default configuration settings of the PostgreSQL DB app may be sufficient for your needs, but you should consider overwriting the defaults for the following options:
Datastream - select the datastream that receives your PostgreSQL DB data from Telegraf and FluentBit. This should be the same datastream as the one used in your Host App.
PG Settings Resource Attributes - A list of
pg_settings
values to add as resource attributes to your PostgreSQL DB instances. The list includes reasonable default values, but you can add extra values of particular interest.Name Format - The default value of
PostgreSQL/%s
places your PostgreSQL DB resources within thePostgreSQL
folder, but you may want to put them in a different folder.____ App - Several other apps may contain logs or metadata relevant to your PostgreSQL DB instances, such as apps for Host Monitoring or various cloud providers. Use these App variables to link your logs and metadata from other apps to your PostgreSQL DB app data.
Linking your PostgreSQL DB data to other apps¶
Select the option for Manual install (Advanced) as you continue with the PostgreSQL app installation. If you have already installed apps for Host Monitoring or any cloud providers, you can select the corresponding app to link their data with the data in your PostgreSQL app. If you haven’t yet installed any of those apps, you can always return to this configuration and update it later.
The Observe PostgreSQL app uses Telegraf to capture PostgreSQL DB health metrics and FluentBit to capture PostgreSQL DB logs. Use the following instructions to configure your PostgreSQL DB instances to allow Telegraf and FluentBit to collect your data and how to install Telegraf and FluentBit for data collection.
Data collection¶
Log Collection - Configure your PostgreSQL instance to write logs to a file in a place where FluentBit can capture them. For the most complete depth of observability, Observe recommends enabling all your PostgreSQL DB logs by adding the following configurations (or similar) to your
postgresql.conf
:
logging_collector = on
log_directory = 'pg_log' # directory where log files are written,
# can be absolute or relative to PGDATA
log_filename = 'pg.log' # log file name, can include pattern
log_file_mode = 0644
log_duration = 'on'
log_min_duration_statement = 0 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this number
# of milliseconds
While Observe recommends setting log_min_duration_statement = 0
for the most comprehensive visibility into your PostgreSQL database query performance, for some databases with very high query throughput, this can affect the performance. In those cases, it may be more optimal to set log_min_duration_statement
to a desired number of milliseconds. If log_duration
is on
and log_min_duration_statement
has a positive value, all durations are logged but the query text is included only for statements exceeding the threshold.
Be sure to restart your PostgreSQL DB service if necessary for any of these changes to take effect.
2. Prepare your PostgreSQL DB instance to receive connections from Telegraf by running the following commands. You can also save and execute them as a file. Be sure to replace the ${PSQL_PASSWORD_FOR_TELEGRAF}
value with a secure string, and keep it handy, because you use it again soon:
CREATE DATABASE telegraf;
create user telegraf with password '${PSQL_PASSWORD_FOR_TELEGRAF}';
grant pg_monitor to telegraf;
grant SELECT ON pg_stat_database to telegraf;
3. Install Telegraf and FluentBit. Observe recommends using the 1-line installation instructions for the Observe Host Monitoring app.
4. Add a Telegraf config file at /etc/telegraf/telegraf.d/inputs.postgresql.conf
with the following content. Be sure to replace the ${PSQL_PASSWORD_FOR_TELEGRAF}
value with your secure string from before.
[[inputs.postgresql_extensible]]
address = "host=localhost user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
prepared_statements = true
[[inputs.postgresql_extensible.query]]
sqlquery="SELECT * FROM pg_stat_database"
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="SELECT * FROM pg_stat_bgwriter"
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
application_name,
backend_type,
client_addr,
client_port,
datname,
pid,
query,
state,
usename,
wait_event,
wait_event_type,
extract(epoch from (clock_timestamp() - query_start)) as query_duration
from pg_stat_activity
"""
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
# feel free to add more pg_setting columns for fields that are important to you
sqlquery="""
select * from pg_settings
where name in (
'autovacuum',
'server_version',
'max_connections',
'shared_buffers',
'effective_cache_size',
'work_mem',
'maintenance_work_mem',
'min_wal_size',
'max_wal_size',
'checkpoint_timeout',
'checkpoint_completion_target',
'wal_buffers',
'wal_level',
'default_statistics_target',
'listen_addresses'
)
"""
version=801
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
pg_is_in_recovery() as is_replica,
count(client_addr) as replicas_count
from pg_stat_replication
"""
version=801
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select CASE
WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS replication_delay
"""
version=1001
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
sender_host as primary_host,
status as replica_state
from pg_stat_wal_receiver
"""
version=801
withdbname=false
tagvalue=""
Note that in the final query of that configuration, if any extra pg_setting
fields are especially important to you, you can add them to this query. When you install the PostgreSQL DB app, under the Advanced Settings, you can specify those fields and attach them to your PostgreSQL DB instance resource.
5. Add a FluentBit config file at /etc/td-agent-bit/observe-postgresql.conf
with this content. If you used custom log file path values, be sure to use those instead:
[INPUT]
name tail
tag tail_postgres
Path_Key path
path /path/to/your/pg.log # e.g: /var/lib/postgresql/12/main/pg_log/pg.log
Read_from_Head False
db postgres-snapshots.db
6. Restart Telegraf: service telegraf restart
.
7. Restart Fluent Bit: service td-agent-bit restart
.
Enable Log Collection for your PostgreSQL DB instance, following the instructions for your cloud provider. You can then collect those logs by installing the Observe App for that cloud provider.
Observe recommends enabling all your PostgreSQL DB logs by adding the following configurations, or similar configurations, to your PostgreSQL DB instances:
logging_collector = on
log_directory = 'pg_log' # directory where log files are written,
# can be absolute or relative to PGDATA
log_filename = 'pg.log' # log file name, can include pattern
log_file_mode = 0644
log_min_duration_statement = 0 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this number
# of milliseconds
While Observe recommends setting the log_min_duration_statement = 0
for the most comprehensive visibility into your Postgres DB instances query performance, for some databases with very high query throughput this can affect the performance. In those cases, you should set it to a higher number of milliseconds to capture query statement logs only on those queries that exceed a longer duration.
Be sure to restart your PostgreSQL service if necessary for any of these changes to take effect.
2. Prepare your PostgreSQL instance to receive remote connections by setting listen_addresses = '*'
. Then, prepare your PostgreSQL instance to receive connections specifically from Telegraf by running the following commands. You can also save and execute them as a file. Be sure to replace the ${PSQL_PASSWORD_FOR_TELEGRAF}
value with a secure string:
CREATE DATABASE telegraf;
create user telegraf with password '${PSQL_PASSWORD_FOR_TELEGRAF}';
grant pg_monitor to telegraf;
grant SELECT ON pg_stat_database to telegraf;
3. On a server that you manage, install Telegraf. Observe recommends using the 1-line installation instructions for the Observe Host Monitoring app.
4. Add a Telegraf config file at /etc/telegraf/telegraf.d/inputs.postgresql.conf
with the following content. Be sure to replace the ${HOST}
value with your remote PostgreSQL DB IP address and the ${PSQL_PASSWORD_FOR_TELEGRAF}
value with your secure string used earlier.
[[inputs.postgresql_extensible]]
address = "host=${HOST} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
prepared_statements = true
[[inputs.postgresql_extensible.query]]
sqlquery="SELECT * FROM pg_stat_database"
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="SELECT * FROM pg_stat_bgwriter"
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
application_name,
backend_type,
client_addr,
client_port,
datname,
pid,
query,
state,
usename,
wait_event,
wait_event_type,
extract(epoch from (clock_timestamp() - query_start)) as query_duration
from pg_stat_activity
"""
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
# feel free to add more pg_setting columns for fields that are important to you
sqlquery="""
select * from pg_settings
where name in (
'autovacuum',
'server_version',
'max_connections',
'shared_buffers',
'effective_cache_size',
'work_mem',
'maintenance_work_mem',
'min_wal_size',
'max_wal_size',
'checkpoint_timeout',
'checkpoint_completion_target',
'wal_buffers',
'wal_level',
'default_statistics_target',
'listen_addresses'
)
"""
version=801
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
pg_is_in_recovery() as is_replica,
count(client_addr) as replicas_count
from pg_stat_replication
"""
version=801
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select CASE
WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS replication_delay
"""
version=1001
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
sender_host as primary_host,
status as replica_state
from pg_stat_wal_receiver
"""
version=801
withdbname=false
tagvalue=""
Note in the final query of that configuration, if there are any extra pg_setting
fields that are especially important to you, you can add them to this query. When you install the PostgreSQL DB app, under Advanced Settings, you can specify those fields and attach them to your PostgreSQL DB instance resource.
You can collect data from multiple remote PostgreSQL DB instances with the same Telegraf instance by repeating this configuration with different ${HOST}
values.
[[inputs.postgresql_extensible]]
address = "host=${HOST_1} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
prepared_statements = true
[[inputs.postgresql_extensible.query]]
sqlquery="SELECT * FROM pg_stat_database"
...
[[inputs.postgresql_extensible]]
address = "host=${HOST_2} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
prepared_statements = true
[[inputs.postgresql_extensible.query]]
sqlquery="SELECT * FROM pg_stat_database"
...
5. Restart Telegraf: service telegraf restart
.
Install the Observe Kubernetes App. This will capture your container logs automatically.
Deploy telegraf in your cluster with the following configurations to capture Postgres metrics and expose them to your Observe grafana agent, which will forward them to your Observe account.
Telegraf ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: telegraf-configmap
data:
# set the postgres endpoint you want to monitor. can be a service, a cloud provider endpoint, etc.
POSTGRESQL_HOST: "postgres.default.svc"
# if you are monitoring multiple postgres instances from this telegraf deployment,
# add multiple POSTGRESQL_HOST vars like so
# POSTGRESQL_HOST_2: "postgres-2.default.svc"
telegraf.conf: |+
[agent]
interval = "10s"
# debug = true
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
omit_hostname = false
[[outputs.health]]
service_address = "http://:8888"
[[outputs.prometheus_client]]
listen = ":9273"
path = "/metrics"
# metric_version = 1
metric_version = 2
[[inputs.postgresql_extensible]]
# if the postgres instance you are monitoring runs as a kubernetes service,
# kubernetes will automatically add to your telegraf container an env var for
# the postgres service IP. Add that as a tag as follows.
# if the postgres service is named "my-postgres" the env var would be MY_POSTGRES_SERVICE_HOST
# tags = {postgres_service_host = "${<SERVICE_NAME>_SERVICE_HOST}"}
address = "host=${POSTGRESQL_HOST} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
prepared_statements = true
[[inputs.postgresql_extensible.query]]
sqlquery="SELECT * FROM pg_stat_database"
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="SELECT * FROM pg_stat_bgwriter"
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
application_name,
backend_type,
client_addr,
client_port,
datname,
pid,
query,
state,
usename,
wait_event,
wait_event_type,
extract(epoch from (clock_timestamp() - query_start)) as query_duration
from pg_stat_activity
"""
version=901
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
# feel free to add more pg_setting columns for fields that are important to you
sqlquery="""
select * from pg_settings
where name in (
'autovacuum',
'server_version',
'max_connections',
'shared_buffers',
'effective_cache_size',
'work_mem',
'maintenance_work_mem',
'min_wal_size',
'max_wal_size',
'checkpoint_timeout',
'checkpoint_completion_target',
'wal_buffers',
'wal_level',
'default_statistics_target',
'listen_addresses'
)
"""
version=801
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
pg_is_in_recovery() as is_replica,
count(client_addr) as replicas_count
from pg_stat_replication
"""
version=801
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select CASE
WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS replication_delay
"""
version=1001
withdbname=false
tagvalue=""
[[inputs.postgresql_extensible.query]]
sqlquery="""
select
sender_host as primary_host,
status as replica_state
from pg_stat_wal_receiver
"""
version=801
withdbname=false
tagvalue=""
# if you are monitoring multiple postgres instances from this telegraf deployment,
# add multiple telegraf inputs, using the POSTGRESQL_HOST_N vars you defined above
# [[inputs.postgresql_extensible]]
# # if the postgres instance you are monitoring runs as a kubernetes service,
# # kubernetes will automatically add to your telegraf container an env var for
# # the postgres service IP. Add that as a tag as follows.
# # if the postgres service is named "my-postgres" the env var would be MY_POSTGRES_SERVICE_HOST
# # tags = {postgres_service_host = "${<SERVICE_2_NAME>_SERVICE_HOST}"}
# address = "host=${POSTGRESQL_HOST} user=telegraf password=${PSQL_PASSWORD_FOR_TELEGRAF} dbname=postgres"
# prepared_statements = true
# [[inputs.postgresql_extensible.query]]
# sqlquery="SELECT * FROM pg_stat_database"
# version=901
# withdbname=false
# tagvalue=""
# [[inputs.postgresql_extensible.query]]
# sqlquery="SELECT * FROM pg_stat_bgwriter"
# version=901
# withdbname=false
# tagvalue=""
# [[inputs.postgresql_extensible.query]]
# sqlquery="""
# select
# application_name,
# backend_type,
# client_addr,
# client_port,
# datname,
# pid,
# query,
# state,
# usename,
# wait_event,
# wait_event_type,
# extract(epoch from (clock_timestamp() - query_start)) as query_duration
# from pg_stat_activity
# """
# version=901
# withdbname=false
# tagvalue=""
# [[inputs.postgresql_extensible.query]]
# # feel free to add more pg_setting columns for fields that are important to you
# sqlquery="""
# select * from pg_settings
# where name in (
# 'autovacuum',
# 'server_version',
# 'max_connections',
# 'shared_buffers',
# 'effective_cache_size',
# 'work_mem',
# 'maintenance_work_mem',
# 'min_wal_size',
# 'max_wal_size',
# 'checkpoint_timeout',
# 'checkpoint_completion_target',
# 'wal_buffers',
# 'wal_level',
# 'default_statistics_target',
# 'listen_addresses'
# )
# """
# version=801
# withdbname=false
# tagvalue=""
# [[inputs.postgresql_extensible.query]]
# sqlquery="""
# select
# pg_is_in_recovery() as is_replica,
# count(client_addr) as replicas_count
# from pg_stat_replication
# """
# version=801
# withdbname=false
# tagvalue=""
# [[inputs.postgresql_extensible.query]]
# sqlquery="""
# select CASE
# WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
# THEN 0
# ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
# END AS replication_delay
# """
# version=1001
# withdbname=false
# tagvalue=""
# [[inputs.postgresql_extensible.query]]
# sqlquery="""
# select
# sender_host as primary_host,
# status as replica_state
# from pg_stat_wal_receiver
# """
# version=801
# withdbname=false
# tagvalue=""
Telegraf Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: telegraf
spec:
selector:
matchLabels:
app: telegraf
replicas: 1
template:
metadata:
labels:
app: telegraf
spec:
containers:
- name: telegraf
image: telegraf:latest
volumeMounts:
- name: config-volume
mountPath: /etc/telegraf/telegraf.conf
subPath: telegraf.conf
ports:
- containerPort: 9273
name: metrics
args:
- -config
- /etc/telegraf/telegraf.conf
envFrom:
- configMapRef:
name: telegraf-configmap
volumes:
- name: config-volume
configMap:
name: telegraf-configmap
items:
- key: telegraf.conf
path: telegraf.conf
In your Observe PostgreSQL App configurations Advanced options, use the Kuberentes App input to link your PostgreSQL App and Kubernetes App data.
Using Custom Log Shippers¶
Observe recommends using FluentBit to collect your PostgreSQL DB logs for the PostgreSQL DB instances you manage, but you may use another log shipper if you prefer. To do so, configure your log shipper to forward your PostgreSQL DB logs to your Observe account, and create a dataset that contains your PostgreSQL DB logs. For best results, this dataset should have as many as possible of the following fields (case sensitive):
timestamp
(of type timestamp, required)inputType
(string, to indicate where the logs originated)message
(string)user
(string, the user of the PostgreSQL process that generated the log event)host
(string, the host or instance that generated the log event)datacenter
(string, the datacenter for the host or instance that generated the log event)logType
(string, usually one of “LOG”, “ERROR”, “FATAL”, “WARNING”, “STATEMENT”, or “HINT”)tags
(map of strings, e.g,{"owner": "Terra Branford", "team": "Returners"}
)
Once your dataset exists, you can input the complete name in the PostgreSQL app’s Advanced options under the custom_logs_dataset_name input.