Node affinity, taints, and tolerations

Node affinity is a property of Kubernetes pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite -- they allow a node to repel a set of pods.

Tolerations are applied to pods. Tolerations allow the scheduler to schedule pods with matching taints. Tolerations allow scheduling but don't guarantee scheduling: the scheduler also evaluates other parameters as part of its function.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.

Apply node affinity configurations

Let's suppose you have a nine-node Kubernetes cluster.

gke-cluster-1-default-pool-020b00c1-9s2c
gke-cluster-1-default-pool-020b00c1-czlf
gke-cluster-1-default-pool-020b00c1-xhq2
gke-cluster-1-default-pool-7a5f9d4f-tjs6
gke-cluster-1-default-pool-7a5f9d4f-z4hq
gke-cluster-1-default-pool-7a5f9d4f-zxbt
gke-cluster-1-default-pool-ad91df86-1r4c
gke-cluster-1-default-pool-ad91df86-b0hv
gke-cluster-1-default-pool-ad91df86-rw8f

You'd like to assign pods to a specific node, gke-cluster-1-default-pool-020b00c1-9s2c. Perform the following steps:

Label your nodes.

kubectl label nodes gke-cluster-1-default-pool-020b00c1-9s2c node-type=useme

kubectl label nodes gke-cluster-1-default-pool-020b00c1-czlf \
gke-cluster-1-default-pool-020b00c1-xhq2 \
gke-cluster-1-default-pool-7a5f9d4f-tjs6 \
gke-cluster-1-default-pool-7a5f9d4f-z4hq \
gke-cluster-1-default-pool-7a5f9d4f-zxbt \
gke-cluster-1-default-pool-ad91df86-1r4c \
gke-cluster-1-default-pool-ad91df86-b0hv \
gke-cluster-1-default-pool-ad91df86-rw8f \
node-type=nouseme

Create affinity-values.yaml with the following configuration.

# 1) Define an anchor for the repeated affinity block.
affinityBase: &affinityBase
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: node-type
              operator: In
              values: [useme]
            - key: observeinc.com/unschedulable
              operator: DoesNotExist
            - key: kubernetes.io/os
              operator: NotIn
              values: [windows]

cluster-events:
  affinity: *affinityBase

cluster-metrics:
  affinity: *affinityBase

node-logs-metrics:
  affinity: *affinityBase

monitor:
  affinity: *affinityBase

# # Uncomment these if you are using Agent Chart v0.41+
# # observe-forward
# forwarder:
#   affinity: *affinityBase

Run the following command to redeploy the Observe Agent in the observe namespace with the affinity configuration.
```
helm upgrade --reuse-values observe-agent observe/agent -n observe --values affinity-values.yaml
```
Run the following command to make sure the Observe Agent has been redeployed successfully.
```
kubectl get pods -o wide -n observe
```

All pods are now assigned to the gke-cluster-1-default-pool-020b00c1-9s2c node as expected.

$kubectl get pods -o wide -n observe
NAME                                             READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
observe-agent-cluster-events-7bd9ccddcf-m6svd    1/1     Running   0          54s   10.248.8.6   gke-cluster-1-default-pool-020b00c1-9s2c   <none>           <none>
observe-agent-cluster-metrics-7fc5987bcb-v95rl   1/1     Running   0          54s   10.248.8.5   gke-cluster-1-default-pool-020b00c1-9s2c   <none>           <none>
observe-agent-monitor-b9f8c59c-9gpx2             1/1     Running   0          53s   10.248.8.8   gke-cluster-1-default-pool-020b00c1-9s2c   <none>           <none>
observe-agent-node-logs-metrics-agent-j59kh      1/1     Running   0          54s   10.248.8.7   gke-cluster-1-default-pool-020b00c1-9s2c   <none>           <none>

Apply taints and tolerations configurations

Let's suppose you have a nine-node Kubernetes cluster.

gke-cluster-1-default-pool-7fbc1d97-pt9k
gke-cluster-1-default-pool-7fbc1d97-w262
gke-cluster-1-default-pool-7fbc1d97-wshj
gke-cluster-1-default-pool-8dab22e2-2mhx
gke-cluster-1-default-pool-8dab22e2-lrj8
gke-cluster-1-default-pool-8dab22e2-rkd7
gke-cluster-1-default-pool-c192ec15-1wnx
gke-cluster-1-default-pool-c192ec15-3717
gke-cluster-1-default-pool-c192ec15-gt34

You'd like to assign pods to a specific node, gke-cluster-1-default-pool-7fbc1d97-pt9k with taints and tolerations. Perform the following steps:

Taint a node:

kubectl taint nodes gke-cluster-1-default-pool-7fbc1d97-pt9k deployObserve=notAllowed:NoSchedule

Check taints:

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

As you can see below, gke-cluster-1-default-pool-7fbc1d97-pt9k is tainted.

$ kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
NAME                                       TAINTS
gke-cluster-1-default-pool-7fbc1d97-pt9k   [map[effect:NoSchedule key:deployObserve value:notAllowed]]
gke-cluster-1-default-pool-7fbc1d97-w262   <none>
gke-cluster-1-default-pool-7fbc1d97-wshj   <none>
gke-cluster-1-default-pool-8dab22e2-2mhx   <none>
gke-cluster-1-default-pool-8dab22e2-lrj8   <none>
gke-cluster-1-default-pool-8dab22e2-rkd7   <none>
gke-cluster-1-default-pool-c192ec15-1wnx   <none>
gke-cluster-1-default-pool-c192ec15-3717   <none>
gke-cluster-1-default-pool-c192ec15-gt34   <none>

Create taints-tolerations-values.yaml with the following configuration.

# 1) Define the anchor for the repeated tolerations array:
tolerationsBase: &tolerationsBase
  - key: "deployObserve"
    operator: "Equal"
    value: "notAllowed"
    effect: "NoSchedule"

cluster-events:
  tolerations: *tolerationsBase

cluster-metrics:
  tolerations: *tolerationsBase

node-logs-metrics:
  tolerations: *tolerationsBase

monitor:
  tolerations: *tolerationsBase

# # Uncomment these if you are using Agent Chart v0.41+
# # observe-forward
# forwarder:
#   tolerations: *tolerationsBase

Run the following command to redeploy the Observe Agent in the observe namespace with the tolerations configuration.
```
helm upgrade --reuse-values observe-agent observe/agent -n observe --values taints-tolerations-values.yaml
```
Run the following command to make sure the Observe Agent has been redeployed successfully.
```
kubectl get pods -o wide -n observe
```

As you can see below, some pods are assigned to the tainted gke-cluster-1-default-pool-7fbc1d97-pt9 node as expected.

$ kubectl get pods -o wide -n observe
NAME                                            READY   STATUS    RESTARTS   AGE     IP            NODE                                       NOMINATED NODE   READINESS GATES
observe-agent-cluster-events-7c95b84b6-xgx9v    1/1     Running   0          2m3s    10.232.0.6    gke-cluster-1-default-pool-7fbc1d97-pt9k   <none>           <none>
observe-agent-cluster-metrics-c84d7c769-v8gmr   1/1     Running   0          2m3s    10.232.4.5    gke-cluster-1-default-pool-c192ec15-gt34   <none>           <none>
observe-agent-monitor-855569455b-5dqgf          1/1     Running   0          2m2s    10.232.5.5    gke-cluster-1-default-pool-c192ec15-3717   <none>           <none>
observe-agent-node-logs-metrics-agent-6lnpg     1/1     Running   0          5m11s   10.232.1.5    gke-cluster-1-default-pool-7fbc1d97-w262   <none>           <none>
observe-agent-node-logs-metrics-agent-ch2np     1/1     Running   0          87s     10.232.8.6    gke-cluster-1-default-pool-8dab22e2-lrj8   <none>           <none>
observe-agent-node-logs-metrics-agent-g6lw6     1/1     Running   0          5m11s   10.232.7.5    gke-cluster-1-default-pool-8dab22e2-rkd7   <none>           <none>
observe-agent-node-logs-metrics-agent-jxp94     1/1     Running   0          5m12s   10.232.6.12   gke-cluster-1-default-pool-8dab22e2-2mhx   <none>           <none>
observe-agent-node-logs-metrics-agent-kfgjc     1/1     Running   0          50s     10.232.5.6    gke-cluster-1-default-pool-c192ec15-3717   <none>           <none>
observe-agent-node-logs-metrics-agent-lfwcj     1/1     Running   0          5m12s   10.232.2.5    gke-cluster-1-default-pool-7fbc1d97-wshj   <none>           <none>
observe-agent-node-logs-metrics-agent-lqdx7     1/1     Running   0          2m3s    10.232.0.5    gke-cluster-1-default-pool-7fbc1d97-pt9k   <none>           <none>
observe-agent-node-logs-metrics-agent-n8vh4     0/1     Running   0          14s     10.232.4.6    gke-cluster-1-default-pool-c192ec15-gt34   <none>           <none>
observe-agent-node-logs-metrics-agent-rx4jb     1/1     Running   0          5m12s   10.232.3.4    gke-cluster-1-default-pool-c192ec15-1wnx   <none>           <none>

For more examples, see Example deployment scenarios in the Observe Helm chart documentation.

Updated about 2 months ago