Kubernetes Integration

Observe provides a manifest which handles collecting telemetry within a Kubernetes cluster. By omission we gather all events, logs and metrics within a cluster using open source collectors.

Installation

Important

To proceed with this step, you will need an Observe customer ID and token.

Observe provides a manifest which installs all the necessary components for collecting telemetry data from Kubernetes. This manifest can be retrieved directly from https://github.com/observeinc/manifests.

At its simplest, the install process can be reduced to two steps:

$ kubectl apply -k https://github.com/observeinc/manifests/stack && \
	kubectl -n observe create secret generic credentials \
	--from-literal=OBSERVE_CUSTOMER=${OBSERVE_CUSTOMER?} \
	--from-literal=OBSERVE_TOKEN=${OBSERVE_TOKEN?}

This example is for illustrative purposes only. In production environments, we recommend downloading the manifest separately and tracking changes over time using your configuration management tool of choice. You can use kubectl kustomize <URL> to generate a static version of our manifest.

By omission, our manifest will create an observe namespace which contains all our collection infrastructure. Only then can we create a secret containing the appropriate credentials for sending data to Observe.

Once your manifest is applied, you can wait for all pods within the namespace to be ready:

$ kubectl wait --timeout=60s pods -n observe --for=condition=Ready --all

If you are monitoring multiple clusters, it is useful to provide a human readable name for each one. You may attach an identifier by providing a observeinc.com/cluster-name annotation:

$ kubectl annotate namespace observe observeinc.com/cluster-name="My Cluster"

Updating and Uninstalling

The cleanest way to remove our integration is to delete all resources included in the original installation manifest:

$ kubectl delete -k https://github.com/observeinc/manifests/stack

Manifest Versioning

Our manifests are versioned. To install or uninstall a specific version, add the version parameter to the URL:

$ kubectl apply -k 'https://github.com/observeinc/manifests/stack?ref=v0.5.0'
$ kubectl delete -k 'https://github.com/observeinc/manifests/stack?ref=v0.5.0'

The list of published versions is available in the releases page.

Metrics discovery

If you wish to expose prometheus metrics from your pod, you have two options:

  • Set the port in the annotations. This method has the disadvantage that it can only surface one HTTP endpoint for scraping.

     annotations:
       prometheus.io/port: "9999"
    
  • Expose the metrics endpoint through a port with a name ending with metrics. For example:

    ports:
    - containerPort: 9999
      name: metrics
    - containerPort: 12345
      name: sidecar-metrics
    

You can use the following annotations to further influence the discovery process:

  • prometheus.io/scrape: if set to false, pod will be ignored

  • prometheus.io/scheme: set to http or https

  • prometheus.io/path: defaults to /metrics

AWS Fargate support

Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service which can be configured to run on AWS Fargate. Daemonsets are not supported when running on a serverless compute engine such as AWS Fargate. Since our standard log collection method relies on daemonsets, additional configuration is needed to collect logs for pods scheduled on Fargate.

AWS provides a method of streaming logs from pods running on Fargate. This requires setting up a Kinesis Firehose that sends to Observe. We provide a Terraform module that automates this setup here. If you do not use Terraform and would like to set up EKS log forwarding for Fargate, please contact support.

Migrating from legacy manifests

Our legacy manifests are hosted under https://api.observeinc.com/v1/kubernetes/manifest. As of March 2022, they will only receive critical security fixes. To migrate:

  • Delete all existing Observe resources:

$ kubectl delete -f https://api.observeinc.com/v1/kubernetes/manifest
  • Install the latest manifest:

$ kubectl apply -k https://github.com/observeinc/manifests/stack && \
	kubectl -n observe create secret generic credentials \
	--from-literal=OBSERVE_CUSTOMER=${OBSERVE_CUSTOMER?} \
	--from-literal=OBSERVE_TOKEN=${OBSERVE_TOKEN?}

There are a few structural changes between the legacy and current manifests you will need to account for:

  • The proxy deployment was removed. If you were proxying custom telemetry in your cluster, please reach out to support.

  • Metrics are now collected through Grafana Agent rather than Telegraf. The kubernetes/Metrics dataset, which contains Telegraf data, will be sunsetted in favor of kubernetes/Pod Metrics and kubernetes/Cadvisor Metrics. If you have custom datasets or monitors built off of the former, please reach out to support for assistance.

FAQ

What Kubernetes versions do you support?

Kubernetes targets a minor release every 3 months, and maintains a concept of “support window” which spans one year from original release date. This typically results in four minor versions being supported at any given time.

Our support policy is as follows:

As of February 2022, the oldest release we officially support is 1.18.

What container runtimes do you support?

The container runtime only affects log collection. Our current Fluent Bit configuration has been validated on both Docker and containerd. Other runtimes or older versions may require minor configuration adjustments - please reach out to support for assistance.

Can collection be restricted to a specific namespace?

We do not currently support this. Collecting Kubernetes state in particular requires accessing resources that are not namespaced, such as node and persistentvolume. Log collection may be restricted to specific namespaces, however - see the next question for details.

How can I update the credentials used?

You should first regenerate the credentials secret. The simplest way of achieving this is to delete the existing secret and create a new one:

$ kubectl delete -n observe secret credentials --ignore-not-found && \
	kubectl -n observe create secret generic credentials \
	--from-literal=OBSERVE_CUSTOMER=${OBSERVE_CUSTOMER?} \
	--from-literal=OBSERVE_TOKEN=${OBSERVE_TOKEN?}

Now that your new credentials are in place, you will need to restart pods in the observe namespace to pick up the new values:

$ kubectl rollout restart -n observe daemonset && \
	kubectl rollout restart -n observe deployment

Can I filter what logs are sent to Observe?

We currently support filtering on log files pulled via Fluent Bit with the inclusion of a fluent-bit-extra.conf file alongside the fluent-bit.conf file deployed via our manifest. An example manifest to filter out logs from the “namespacePattern” namespace:

apiVersion: v1
data:
  fluent-bit-extra.conf: |-
    [FILTER]
        Name         grep
        Match        k8slogs
        Exclude      namespace /.*namespacePattern.*/

Depending on the shape of your data, you may be able to filter on additional keys for greater specificity. See the Fluent Bit grep filter documentation for more information.

How can I disable scheduling a daemonset pod on a specific node?

In order to provide full coverage of your cluster, the observe-logs daemonset is by design scheduled onto all nodes. If you wish to remove it from a subset of nodes, you can add a taint:

$ kubectl taint nodes ${NODENAME?} observeinc.com/unschedulable

This taint is only verified during scheduling. If an observe-logs pod is already running on the node, you must delete it manually:

$ kubectl delete pods -n observe -l name=observe-logs --field-selector=spec.nodeName=${NODENAME?}

What does a “mem buf overlimit” warning mean?

When reviewing fluent-bit logs, you may encounter a warning similar to the following example:

[2022/03/14 16:00:18] [ warn] [input] tail.0 paused (mem buf overlimit)

This indicates that fluent-bit has temporarily stopped reading logs from the files on disk, and is waiting for existing buffered data to be successfully uploaded. Typically no action needs to be taken, and no message is emitted once fluent-bit restarts tailing of logs.

If you see this message very frequently, you may have a very bursty log source or limited upload bandwidth. In either case, please reach out to support for assistance.