Collecting Pod Metrics and cAdvisor Metrics¶
Since Observe Helm charts v0.4.21 and Observe kustomize manifests v1.0.0, Observe has reduced the amount of metrics its stack collects by default.
Pod metrics including custom metrics are not collected by default.
pod_action: drop # set to "drop" to drop all pod metrics
- PROM_SCRAPE_POD_ACTION=drop
Some cAdvisor metrics are not collected by default. For example, container_fs_xxx
metrics are not collected.
cadvisor_metric_drop_regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
cadvisor_metric_keep_regex: container_(cpu_.*|spec_.*|memory_.*|network_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)
- PROM_SCRAPE_CADVISOR_METRIC_DROP_REGEX=container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
- PROM_SCRAPE_CADVISOR_METRIC_KEEP_REGEX=container_(cpu_.*|spec_.*|memory_.*|network_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)
Let’s suppose you need container_fs_reads_bytes_total
, one of cAdvisor metrics, Pod metrics including your custom metrics, and want to collect these metrics more often.
Check the version of Observe Helm Charts.
helm search repo observe/stack --versions | head -2
$ helm search repo observe/stack --versions | head -2
NAME CHART VERSION APP VERSION DESCRIPTION
observe/stack 0.4.20 Observe Kubernetes agent stack
Skip this step if you have
observe/stack 0.4.20
or a newer version
helm repo update
$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "fluent" chart repository
...Successfully got an update from the "observe" chart repository
...Successfully got an update from the "open-telemetry" chart repository
...Successfully got an update from the "otel" chart repository
...Successfully got an update from the "grafana" chart repository
Update Complete. ⎈Happy Helming!⎈
3.a If Observe Kubernetes agent stack via Helm is already running, create the observe-stack-values.yaml
file.
global:
observe:
collectionEndpoint: https://${OBSERVE_CUSTOMER}.collect.observeinc.com/
metrics:
grafana-agent:
prom_config:
scrape_interval: 30s
scrape_configs:
pod_action: keep
cadvisor_metric_keep_regex: container_(cpu_.*|spec_.*|memory_.*|network_.*|fs_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)
observe:
token:
value: ${OBSERVE_TOKEN}
3.b Upgrade the existing Observe Kubernetes agent stack via Helm.
helm upgrade --reset-values -f observe-stack-values.yaml --namespace=observe observe-stack observe/stack
Follow this step if you install Observe Kubernetes agent stack via Helm for the first time.
helm install --namespace=observe observe-stack observe/stack \
--set global.observe.collectionEndpoint="https://${OBSERVE_CUSTOMER}.collect.observeinc.com/" \
--set observe.token.value="${OBSERVE_TOKEN}" \
--set metrics.grafana-agent.prom_config.scrape_interval="30s" \
--set metrics.grafana-agent.prom_config.scrape_configs.pod_action="keep" \
--set metrics.grafana-agent.prom_config.scrape_configs.cadvisor_metric_keep_regex="container_(cpu_.*|spec_.*|memory_.*|network_.*|fs_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)"
If you want to change other config parameters, see https://github.com/observeinc/helm-charts/blob/main/charts/metrics/values.yaml.
Check whether you have the latest version of Observe kustomize manifests. If it returns nothing, that means you have the latest version.
kubectl diff -k https://github.com/observeinc/manifests/stack
Skip this step if you have the latest version.
kubectl apply --prune -l observeinc.com/component -k https://github.com/observeinc/manifests/stack --prune-allowlist=/v1/ConfigMap
To find other resources you want to target, use the following kubectl
command.
kubectl api-resources --verbs=delete
Create a new kustomized manifest with our kustomized directory as a base.
You can override any individual configuration element by creating a new kustomized manifest with our kustomized directory as a base.
The following example creates a new directory with kustomization.yaml set to override the FB_DEBUG variable in the fluent-bit environment variable configmap:
EXAMPLE_DIR=$(mktemp -d)
cat <<EOF >$EXAMPLE_DIR/kustomization.yaml
resources:
- github.com/observeinc/manifests//stack?ref=main
configMapGenerator:
- name: grafana-agent-env
behavior: merge
literals:
- PROM_SCRAPE_INTERVAL=30s
- PROM_SCRAPE_POD_ACTION=keep
- PROM_SCRAPE_CADVISOR_METRIC_KEEP_REGEX="container_(cpu_.*|spec_.*|memory_.*|network_.*|fs_.*|file_descriptors)|machine_(cpu_cores|memory_bytes)"
EOF
If you want to change other config parameters, review the configuration template.
Update the
observe-metrics
deployment.
kubectl apply -k $EXAMPLE_DIR
namespace/observe unchanged
serviceaccount/observe-events unchanged
serviceaccount/observe-logs unchanged
serviceaccount/observe-metrics unchanged
role.rbac.authorization.k8s.io/observe-events unchanged
clusterrole.rbac.authorization.k8s.io/observe-events unchanged
clusterrole.rbac.authorization.k8s.io/observe-metrics unchanged
rolebinding.rbac.authorization.k8s.io/observe-events unchanged
clusterrolebinding.rbac.authorization.k8s.io/observe-events unchanged
clusterrolebinding.rbac.authorization.k8s.io/observe-metrics unchanged
configmap/observe-fluent-bit-config-fg7t8944mb unchanged
configmap/observe-fluent-bit-env-chkt297kf9 unchanged
configmap/observe-grafana-agent-env-9hdcmcf8kk created
configmap/observe-grafana-agent-gbh462k4cm unchanged
configmap/observe-kube-state-events-env-d5h55bfb5h unchanged
deployment.apps/observe-events unchanged
deployment.apps/observe-metrics configured
daemonset.apps/observe-logs configured