Collecting Pod Metrics and cAdvisor Metrics

Since Observe Helm charts v0.4.21 and Observe kustomize manifests v1.0.0, Observe has reduced the amount of metrics its stack collects by default.

Pod metrics including custom metrics are not collected by default.

pod_action: drop  # set to "drop" to drop all pod metrics
- PROM_SCRAPE_POD_ACTION=drop

Some cAdvisor metrics are not collected by default. For example, container_fs_xxx metrics are not collected.

cadvisor_metric_drop_regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
cadvisor_metric_keep_regex: container_(cpu_.*|spec_.*|memory_.*|network_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)
- PROM_SCRAPE_CADVISOR_METRIC_DROP_REGEX=container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
- PROM_SCRAPE_CADVISOR_METRIC_KEEP_REGEX=container_(cpu_.*|spec_.*|memory_.*|network_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)
    

Let’s suppose you need container_fs_reads_bytes_total, one of cAdvisor metrics, Pod metrics including your custom metrics, and want to collect these metrics more often.

  1. Check the version of Observe Helm Charts.

helm search repo observe/stack --versions | head -2
$ helm search repo observe/stack --versions | head -2

NAME  CHART VERSION APP VERSION DESCRIPTION               
observe/stack 0.4.20    Observe Kubernetes agent stack
  1. Skip this step if you have observe/stack 0.4.20 or a newer version

helm repo update
$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "fluent" chart repository
...Successfully got an update from the "observe" chart repository
...Successfully got an update from the "open-telemetry" chart repository
...Successfully got an update from the "otel" chart repository
...Successfully got an update from the "grafana" chart repository
Update Complete. ⎈Happy Helming!⎈

3.a If Observe Kubernetes agent stack via Helm is already running, create the observe-stack-values.yaml file.

global:
  observe:
    collectionEndpoint: https://${OBSERVE_CUSTOMER}.collect.observeinc.com/
metrics:
  grafana-agent:
    prom_config:
      scrape_interval: 30s
      scrape_configs:
        pod_action: keep
        cadvisor_metric_keep_regex: container_(cpu_.*|spec_.*|memory_.*|network_.*|fs_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)
observe:
  token:
    value: ${OBSERVE_TOKEN}

3.b Upgrade the existing Observe Kubernetes agent stack via Helm.

helm upgrade --reset-values -f observe-stack-values.yaml --namespace=observe observe-stack observe/stack
  1. Follow this step if you install Observe Kubernetes agent stack via Helm for the first time.

helm install --namespace=observe observe-stack observe/stack \
--set global.observe.collectionEndpoint="https://${OBSERVE_CUSTOMER}.collect.observeinc.com/" \
--set observe.token.value="${OBSERVE_TOKEN}" \
--set metrics.grafana-agent.prom_config.scrape_interval="30s" \
--set metrics.grafana-agent.prom_config.scrape_configs.pod_action="keep" \
--set metrics.grafana-agent.prom_config.scrape_configs.cadvisor_metric_keep_regex="container_(cpu_.*|spec_.*|memory_.*|network_.*|fs_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)"

If you want to change other config parameters, see https://github.com/observeinc/helm-charts/blob/main/charts/metrics/values.yaml.

  1. Check whether you have the latest version of Observe kustomize manifests. If it returns nothing, that means you have the latest version.

kubectl diff -k https://github.com/observeinc/manifests/stack
  1. Skip this step if you have the latest version.

kubectl apply --prune -l observeinc.com/component -k https://github.com/observeinc/manifests/stack --prune-allowlist=/v1/ConfigMap

To find other resources you want to target, use the following kubectl command.

kubectl api-resources --verbs=delete
  1. Create a new kustomized manifest with our kustomized directory as a base.

You can override any individual configuration element by creating a new kustomized manifest with our kustomized directory as a base.

The following example creates a new directory with kustomization.yaml set to override the FB_DEBUG variable in the fluent-bit environment variable configmap:

EXAMPLE_DIR=$(mktemp -d)

cat <<EOF >$EXAMPLE_DIR/kustomization.yaml
resources:
- github.com/observeinc/manifests//stack?ref=main

configMapGenerator:
- name: grafana-agent-env
  behavior: merge
  literals:
    - PROM_SCRAPE_INTERVAL=30s
    - PROM_SCRAPE_POD_ACTION=keep
    - PROM_SCRAPE_CADVISOR_METRIC_KEEP_REGEX="container_(cpu_.*|spec_.*|memory_.*|network_.*|fs_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)"
EOF

If you want to change other config parameters, review the configuration template.

  1. Update the observe-metrics deployment.

kubectl apply -k $EXAMPLE_DIR

namespace/observe unchanged
serviceaccount/observe-events unchanged
serviceaccount/observe-logs unchanged
serviceaccount/observe-metrics unchanged
role.rbac.authorization.k8s.io/observe-events unchanged
clusterrole.rbac.authorization.k8s.io/observe-events unchanged
clusterrole.rbac.authorization.k8s.io/observe-metrics unchanged
rolebinding.rbac.authorization.k8s.io/observe-events unchanged
clusterrolebinding.rbac.authorization.k8s.io/observe-events unchanged
clusterrolebinding.rbac.authorization.k8s.io/observe-metrics unchanged
configmap/observe-fluent-bit-config-fg7t8944mb unchanged
configmap/observe-fluent-bit-env-chkt297kf9 unchanged
configmap/observe-grafana-agent-env-9hdcmcf8kk created
configmap/observe-grafana-agent-gbh462k4cm unchanged
configmap/observe-kube-state-events-env-d5h55bfb5h unchanged
deployment.apps/observe-events unchanged
deployment.apps/observe-metrics configured
daemonset.apps/observe-logs configured