This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Security

1: Apply Pod Security Standards at the Cluster Level
2: Apply Pod Security Standards at the Namespace Level
3: Restrict a Container's Access to Resources with AppArmor
4: Restrict a Container's Syscalls with seccomp

Security is an important concern for most organizations and people who run Kubernetes clusters. You can find a basic security checklist elsewhere in the Kubernetes documentation.

To learn how to deploy and manage security aspects of Kubernetes, you can follow the tutorials in this section.

1 - Apply Pod Security Standards at the Cluster Level

Note

This tutorial applies only for new clusters.

Pod Security is an admission controller that carries out checks against the Kubernetes Pod Security Standards when new pods are created. It is a feature GA'ed in v1.25. This tutorial shows you how to enforce the baseline Pod Security Standard at the cluster level which applies a standard configuration to all namespaces in a cluster.

To apply Pod Security Standards to specific namespaces, refer to Apply Pod Security Standards at the namespace level.

If you are running a version of Kubernetes other than v1.33, check the documentation for that version.

Before you begin

Install the following on your workstation:

This tutorial demonstrates what you can configure for a Kubernetes cluster that you fully control. If you are learning how to configure Pod Security Admission for a managed cluster where you are not able to configure the control plane, read Apply Pod Security Standards at the namespace level.

Choose the right Pod Security Standard to apply

Pod Security Admission lets you apply built-in Pod Security Standards with the following modes: enforce, audit, and warn.

To gather information that helps you to choose the Pod Security Standards that are most appropriate for your configuration, do the following:

Create a cluster with no Pod Security Standards applied:

kind create cluster --name psa-wo-cluster-pss

The output is similar to:

Creating cluster "psa-wo-cluster-pss" ...
✓ Ensuring node image (kindest/node:v1.33.0) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-psa-wo-cluster-pss"
You can now use your cluster with:

kubectl cluster-info --context kind-psa-wo-cluster-pss

Thanks for using kind! 😊

Set the kubectl context to the new cluster:

kubectl cluster-info --context kind-psa-wo-cluster-pss

The output is similar to this:

Kubernetes control plane is running at https://127.0.0.1:61350

CoreDNS is running at https://127.0.0.1:61350/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Get a list of namespaces in the cluster:

kubectl get ns

The output is similar to this:

NAME                 STATUS   AGE
default              Active   9m30s
kube-node-lease      Active   9m32s
kube-public          Active   9m32s
kube-system          Active   9m32s
local-path-storage   Active   9m26s

Use --dry-run=server to understand what happens when different Pod Security Standards are applied:

Privileged

kubectl label --dry-run=server --overwrite ns --all \
pod-security.kubernetes.io/enforce=privileged

The output is similar to:

namespace/default labeled
namespace/kube-node-lease labeled
namespace/kube-public labeled
namespace/kube-system labeled
namespace/local-path-storage labeled

Baseline

kubectl label --dry-run=server --overwrite ns --all \
pod-security.kubernetes.io/enforce=baseline

The output is similar to:

namespace/default labeled
namespace/kube-node-lease labeled
namespace/kube-public labeled
Warning: existing pods in namespace "kube-system" violate the new PodSecurity enforce level "baseline:latest"
Warning: etcd-psa-wo-cluster-pss-control-plane (and 3 other pods): host namespaces, hostPath volumes
Warning: kindnet-vzj42: non-default capabilities, host namespaces, hostPath volumes
Warning: kube-proxy-m6hwf: host namespaces, hostPath volumes, privileged
namespace/kube-system labeled
namespace/local-path-storage labeled

Restricted

kubectl label --dry-run=server --overwrite ns --all \
pod-security.kubernetes.io/enforce=restricted

The output is similar to:

namespace/default labeled
namespace/kube-node-lease labeled
namespace/kube-public labeled
Warning: existing pods in namespace "kube-system" violate the new PodSecurity enforce level "restricted:latest"
Warning: coredns-7bb9c7b568-hsptc (and 1 other pod): unrestricted capabilities, runAsNonRoot != true, seccompProfile
Warning: etcd-psa-wo-cluster-pss-control-plane (and 3 other pods): host namespaces, hostPath volumes, allowPrivilegeEscalation != false, unrestricted capabilities, restricted volume types, runAsNonRoot != true
Warning: kindnet-vzj42: non-default capabilities, host namespaces, hostPath volumes, allowPrivilegeEscalation != false, unrestricted capabilities, restricted volume types, runAsNonRoot != true, seccompProfile
Warning: kube-proxy-m6hwf: host namespaces, hostPath volumes, privileged, allowPrivilegeEscalation != false, unrestricted capabilities, restricted volume types, runAsNonRoot != true, seccompProfile
namespace/kube-system labeled
Warning: existing pods in namespace "local-path-storage" violate the new PodSecurity enforce level "restricted:latest"
Warning: local-path-provisioner-d6d9f7ffc-lw9lh: allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, seccompProfile
namespace/local-path-storage labeled

From the previous output, you'll notice that applying the privileged Pod Security Standard shows no warnings for any namespaces. However, baseline and restricted standards both have warnings, specifically in the kube-system namespace.

Set modes, versions and standards

In this section, you apply the following Pod Security Standards to the latest version:

baseline standard in enforce mode.
restricted standard in warn and audit mode.

The baseline Pod Security Standard provides a convenient middle ground that allows keeping the exemption list short and prevents known privilege escalations.

Additionally, to prevent pods from failing in kube-system, you'll exempt the namespace from having Pod Security Standards applied.

When you implement Pod Security Admission in your own environment, consider the following:

Based on the risk posture applied to a cluster, a stricter Pod Security Standard like restricted might be a better choice.
Exempting the kube-system namespace allows pods to run as privileged in this namespace. For real world use, the Kubernetes project strongly recommends that you apply strict RBAC policies that limit access to kube-system, following the principle of least privilege. To implement the preceding standards, do the following:

Create a configuration file that can be consumed by the Pod Security Admission Controller to implement these Pod Security Standards:

mkdir -p /tmp/pss
cat <<EOF > /tmp/pss/cluster-level-pss.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "baseline"
      enforce-version: "latest"
      audit: "restricted"
      audit-version: "latest"
      warn: "restricted"
      warn-version: "latest"
    exemptions:
      usernames: []
      runtimeClasses: []
      namespaces: [kube-system]
EOF

Note:

pod-security.admission.config.k8s.io/v1 configuration requires v1.25+. For v1.23 and v1.24, use v1beta1. For v1.22, use v1alpha1.

Configure the API server to consume this file during cluster creation:

cat <<EOF > /tmp/pss/cluster-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
        extraArgs:
          admission-control-config-file: /etc/config/cluster-level-pss.yaml
        extraVolumes:
          - name: accf
            hostPath: /etc/config
            mountPath: /etc/config
            readOnly: false
            pathType: "DirectoryOrCreate"
  extraMounts:
  - hostPath: /tmp/pss
    containerPath: /etc/config
    # optional: if set, the mount is read-only.
    # default false
    readOnly: false
    # optional: if set, the mount needs SELinux relabeling.
    # default false
    selinuxRelabel: false
    # optional: set propagation mode (None, HostToContainer or Bidirectional)
    # see https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation
    # default None
    propagation: None
EOF

Note:

If you use Docker Desktop with kind on macOS, you can add /tmp as a Shared Directory under the menu item Preferences > Resources > File Sharing.

Create a cluster that uses Pod Security Admission to apply these Pod Security Standards:

kind create cluster --name psa-with-cluster-pss --config /tmp/pss/cluster-config.yaml

The output is similar to this:

Creating cluster "psa-with-cluster-pss" ...
 ✓ Ensuring node image (kindest/node:v1.33.0) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-psa-with-cluster-pss"
You can now use your cluster with:

kubectl cluster-info --context kind-psa-with-cluster-pss

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂

Point kubectl to the cluster:

kubectl cluster-info --context kind-psa-with-cluster-pss

The output is similar to this:

Kubernetes control plane is running at https://127.0.0.1:63855
CoreDNS is running at https://127.0.0.1:63855/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Create a Pod in the default namespace:

security/example-baseline-pod.yaml

apiVersion: v1
 kind: Pod
 metadata:
   name: nginx
 spec:
   containers:
     - image: nginx
       name: nginx
       ports:
         - containerPort: 80

kubectl apply -f https://k8s.io/examples/security/example-baseline-pod.yaml

The pod is started normally, but the output includes a warning:

Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
pod/nginx created

Clean up

Now delete the clusters which you created above by running the following command:

kind delete cluster --name psa-with-cluster-pss

kind delete cluster --name psa-wo-cluster-pss

What's next

Run a shell script to perform all the preceding steps at once:
1. Create a Pod Security Standards based cluster level Configuration
2. Create a file to let API server consume this configuration
3. Create a cluster that creates an API server with this configuration
4. Set kubectl context to this new cluster
5. Create a minimal pod yaml file
6. Apply this file to create a Pod in the new cluster
Pod Security Admission
Pod Security Standards
Apply Pod Security Standards at the namespace level

2 - Apply Pod Security Standards at the Namespace Level

Note

This tutorial applies only for new clusters.

Pod Security Admission is an admission controller that applies Pod Security Standards when pods are created. It is a feature GA'ed in v1.25. In this tutorial, you will enforce the baseline Pod Security Standard, one namespace at a time.

You can also apply Pod Security Standards to multiple namespaces at once at the cluster level. For instructions, refer to Apply Pod Security Standards at the cluster level.

Before you begin

Install the following on your workstation:

Create cluster

Create a kind cluster as follows:

kind create cluster --name psa-ns-level

The output is similar to this:

Creating cluster "psa-ns-level" ...
 ✓ Ensuring node image (kindest/node:v1.33.0) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-psa-ns-level"
You can now use your cluster with:

kubectl cluster-info --context kind-psa-ns-level

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/

Set the kubectl context to the new cluster:

kubectl cluster-info --context kind-psa-ns-level

The output is similar to this:

Kubernetes control plane is running at https://127.0.0.1:50996
CoreDNS is running at https://127.0.0.1:50996/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Create a namespace

Create a new namespace called example:

kubectl create ns example

The output is similar to this:

namespace/example created

Enable Pod Security Standards checking for that namespace

Enable Pod Security Standards on this namespace using labels supported by built-in Pod Security Admission. In this step you will configure a check to warn on Pods that don't meet the latest version of the baseline pod security standard.
```
kubectl label --overwrite ns example \
   pod-security.kubernetes.io/warn=baseline \
   pod-security.kubernetes.io/warn-version=latest
```

You can configure multiple pod security standard checks on any namespace, using labels. The following command will enforce the baseline Pod Security Standard, but warn and audit for restricted Pod Security Standards as per the latest version (default value)

kubectl label --overwrite ns example \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/enforce-version=latest \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/warn-version=latest \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/audit-version=latest

Verify the Pod Security Standard enforcement

Create a baseline Pod in the example namespace:

kubectl apply -n example -f https://k8s.io/examples/security/example-baseline-pod.yaml

The Pod does start OK; the output includes a warning. For example:

Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
pod/nginx created

Create a baseline Pod in the default namespace:

kubectl apply -n default -f https://k8s.io/examples/security/example-baseline-pod.yaml

Output is similar to this:

pod/nginx created

The Pod Security Standards enforcement and warning settings were applied only to the example namespace. You could create the same Pod in the default namespace with no warnings.

Clean up

Now delete the cluster which you created above by running the following command:

kind delete cluster --name psa-ns-level

What's next

Run a shell script to perform all the preceding steps all at once.
1. Create kind cluster
2. Create new namespace
3. Apply baseline Pod Security Standard in enforce mode while applying restricted Pod Security Standard also in warn and audit mode.
4. Create a new pod with the following pod security standards applied
Pod Security Admission
Pod Security Standards
Apply Pod Security Standards at the cluster level

3 - Restrict a Container's Access to Resources with AppArmor

FEATURE STATE: Kubernetes v1.31 [stable] (enabled by default: true)

This page shows you how to load AppArmor profiles on your nodes and enforce those profiles in Pods. To learn more about how Kubernetes can confine Pods using AppArmor, see Linux kernel security constraints for Pods and containers.

Objectives

See an example of how to load a profile on a Node
Learn how to enforce the profile on a Pod
Learn how to check that the profile is loaded
See what happens when a profile is violated
See what happens when a profile cannot be loaded

Before you begin

AppArmor is an optional kernel module and Kubernetes feature, so verify it is supported on your Nodes before proceeding:

AppArmor kernel module is enabled -- For the Linux kernel to enforce an AppArmor profile, the AppArmor kernel module must be installed and enabled. Several distributions enable the module by default, such as Ubuntu and SUSE, and many others provide optional support. To check whether the module is enabled, check the /sys/module/apparmor/parameters/enabled file:
```
cat /sys/module/apparmor/parameters/enabled
Y
```
The kubelet verifies that AppArmor is enabled on the host before admitting a pod with AppArmor explicitly configured.
Container runtime supports AppArmor -- All common Kubernetes-supported container runtimes should support AppArmor, including containerd and CRI-O. Please refer to the corresponding runtime documentation and verify that the cluster fulfills the requirements to use AppArmor.
Profile is loaded -- AppArmor is applied to a Pod by specifying an AppArmor profile that each container should be run with. If any of the specified profiles are not loaded in the kernel, the kubelet will reject the Pod. You can view which profiles are loaded on a node by checking the /sys/kernel/security/apparmor/profiles file. For example:
```
ssh gke-test-default-pool-239f5d02-gyn2 "sudo cat /sys/kernel/security/apparmor/profiles | sort"
```
```
apparmor-test-deny-write (enforce)
apparmor-test-audit-write (enforce)
docker-default (enforce)
k8s-nginx (enforce)
```
For more details on loading profiles on nodes, see Setting up nodes with profiles.

Securing a Pod

Note:

Prior to Kubernetes v1.30, AppArmor was specified through annotations. Use the documentation version selector to view the documentation with this deprecated API.

AppArmor profiles can be specified at the pod level or container level. The container AppArmor profile takes precedence over the pod profile.

securityContext:
  appArmorProfile:
    type: <profile_type>

Where <profile_type> is one of:

RuntimeDefault to use the runtime's default profile
Localhost to use a profile loaded on the host (see below)
Unconfined to run without AppArmor

See Specifying AppArmor Confinement for full details on the AppArmor profile API.

To verify that the profile was applied, you can check that the container's root process is running with the correct profile by examining its proc attr:

kubectl exec <pod_name> -- cat /proc/1/attr/current

The output should look something like this:

cri-containerd.apparmor.d (enforce)

Example

This example assumes you have already set up a cluster with AppArmor support.

First, load the profile you want to use onto your Nodes. This profile blocks all file write operations:

#include <tunables/global>

profile k8s-apparmor-example-deny-write flags=(attach_disconnected) {
  #include <abstractions/base>

  file,

  # Deny all file writes.
  deny /** w,
}

The profile needs to be loaded onto all nodes, since you don't know where the pod will be scheduled. For this example you can use SSH to install the profiles, but other approaches are discussed in Setting up nodes with profiles.

# This example assumes that node names match host names, and are reachable via SSH.
NODES=($( kubectl get node -o jsonpath='{.items[*].status.addresses[?(.type == "Hostname")].address}' ))

for NODE in ${NODES[*]}; do ssh $NODE 'sudo apparmor_parser -q <<EOF
#include <tunables/global>

profile k8s-apparmor-example-deny-write flags=(attach_disconnected) {
  #include <abstractions/base>

  file,

  # Deny all file writes.
  deny /** w,
}
EOF'
done

Next, run a simple "Hello AppArmor" Pod with the deny-write profile:

pods/security/hello-apparmor.yaml

apiVersion: v1
kind: Pod
metadata:
  name: hello-apparmor
spec:
  securityContext:
    appArmorProfile:
      type: Localhost
      localhostProfile: k8s-apparmor-example-deny-write
  containers:
  - name: hello
    image: busybox:1.28
    command: [ "sh", "-c", "echo 'Hello AppArmor!' && sleep 1h" ]

kubectl create -f hello-apparmor.yaml

You can verify that the container is actually running with that profile by checking /proc/1/attr/current:

kubectl exec hello-apparmor -- cat /proc/1/attr/current

The output should be:

k8s-apparmor-example-deny-write (enforce)

Finally, you can see what happens if you violate the profile by writing to a file:

kubectl exec hello-apparmor -- touch /tmp/test

touch: /tmp/test: Permission denied
error: error executing remote command: command terminated with non-zero exit code: Error executing in Docker Container: 1

To wrap up, see what happens if you try to specify a profile that hasn't been loaded:

kubectl create -f /dev/stdin <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: hello-apparmor-2
spec:
  securityContext:
    appArmorProfile:
      type: Localhost
      localhostProfile: k8s-apparmor-example-allow-write
  containers:
  - name: hello
    image: busybox:1.28
    command: [ "sh", "-c", "echo 'Hello AppArmor!' && sleep 1h" ]
EOF

pod/hello-apparmor-2 created

Although the Pod was created successfully, further examination will show that it is stuck in pending:

kubectl describe pod hello-apparmor-2

Name:          hello-apparmor-2
Namespace:     default
Node:          gke-test-default-pool-239f5d02-x1kf/10.128.0.27
Start Time:    Tue, 30 Aug 2016 17:58:56 -0700
Labels:        <none>
Annotations:   container.apparmor.security.beta.kubernetes.io/hello=localhost/k8s-apparmor-example-allow-write
Status:        Pending
... 
Events:
  Type     Reason     Age              From               Message
  ----     ------     ----             ----               -------
  Normal   Scheduled  10s              default-scheduler  Successfully assigned default/hello-apparmor to gke-test-default-pool-239f5d02-x1kf
  Normal   Pulled     8s               kubelet            Successfully pulled image "busybox:1.28" in 370.157088ms (370.172701ms including waiting)
  Normal   Pulling    7s (x2 over 9s)  kubelet            Pulling image "busybox:1.28"
  Warning  Failed     7s (x2 over 8s)  kubelet            Error: failed to get container spec opts: failed to generate apparmor spec opts: apparmor profile not found k8s-apparmor-example-allow-write
  Normal   Pulled     7s               kubelet            Successfully pulled image "busybox:1.28" in 90.980331ms (91.005869ms including waiting)

An Event provides the error message with the reason, the specific wording is runtime-dependent:

  Warning  Failed     7s (x2 over 8s)  kubelet            Error: failed to get container spec opts: failed to generate apparmor spec opts: apparmor profile not found

Administration

Setting up Nodes with profiles

Kubernetes 1.33 does not provide any built-in mechanisms for loading AppArmor profiles onto Nodes. Profiles can be loaded through custom infrastructure or tools like the Kubernetes Security Profiles Operator.

The scheduler is not aware of which profiles are loaded onto which Node, so the full set of profiles must be loaded onto every Node. An alternative approach is to add a Node label for each profile (or class of profiles) on the Node, and use a node selector to ensure the Pod is run on a Node with the required profile.

Authoring Profiles

Getting AppArmor profiles specified correctly can be a tricky business. Fortunately there are some tools to help with that:

aa-genprof and aa-logprof generate profile rules by monitoring an application's activity and logs, and admitting the actions it takes. Further instructions are provided by the AppArmor documentation.
bane is an AppArmor profile generator for Docker that uses a simplified profile language.

To debug problems with AppArmor, you can check the system logs to see what, specifically, was denied. AppArmor logs verbose messages to dmesg, and errors can usually be found in the system logs or through journalctl. More information is provided in AppArmor failures.

Specifying AppArmor confinement

Caution:

Prior to Kubernetes v1.30, AppArmor was specified through annotations. Use the documentation version selector to view the documentation with this deprecated API.

AppArmor profile within security context

You can specify the appArmorProfile on either a container's securityContext or on a Pod's securityContext. If the profile is set at the pod level, it will be used as the default profile for all containers in the pod (including init, sidecar, and ephemeral containers). If both a pod & container AppArmor profile are set, the container's profile will be used.

An AppArmor profile has 2 fields:

type (required) - indicates which kind of AppArmor profile will be applied. Valid options are:

Localhost: a profile pre-loaded on the node (specified by localhostProfile).
RuntimeDefault: the container runtime's default profile.
Unconfined: no AppArmor enforcement.

localhostProfile - The name of a profile loaded on the node that should be used. The profile must be preconfigured on the node to work. This option must be provided if and only if the type is Localhost.

What's next

Additional resources:

4 - Restrict a Container's Syscalls with seccomp

FEATURE STATE: Kubernetes v1.19 [stable]

Seccomp stands for secure computing mode and has been a feature of the Linux kernel since version 2.6.12. It can be used to sandbox the privileges of a process, restricting the calls it is able to make from userspace into the kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a node to your Pods and containers.

Identifying the privileges required for your workloads can be difficult. In this tutorial, you will go through how to load seccomp profiles into a local Kubernetes cluster, how to apply them to a Pod, and how you can begin to craft profiles that give only the necessary privileges to your container processes.

Objectives

Learn how to load seccomp profiles on a node
Learn how to apply a seccomp profile to a container
Observe auditing of syscalls made by a container process
Observe behavior when a missing profile is specified
Observe a violation of a seccomp profile
Learn how to create fine-grained seccomp profiles
Learn how to apply a container runtime default seccomp profile

Before you begin

In order to complete all steps in this tutorial, you must install kind and kubectl.

The commands used in the tutorial assume that you are using Docker as your container runtime. (The cluster that kind creates may use a different container runtime internally). You could also use Podman but in that case, you would have to follow specific instructions in order to complete the tasks successfully.

This tutorial shows some examples that are still beta (since v1.25) and others that use only generally available seccomp functionality. You should make sure that your cluster is configured correctly for the version you are using.

The tutorial also uses the curl tool for downloading examples to your computer. You can adapt the steps to use a different tool if you prefer.

Note:

It is not possible to apply a seccomp profile to a container running with privileged: true set in the container's securityContext. Privileged containers always run as Unconfined.

Download example seccomp profiles

The contents of these profiles will be explored later on, but for now go ahead and download them into a directory named profiles/ so that they can be loaded into the cluster.

class=copy-code-icon>pods/security/seccomp/profiles/audit.json onclick='copyCode("pods-security-seccomp-profiles-audit-json")' title="Copy pods/security/seccomp/profiles/audit.json to clipboard">

{ "defaultAction": "SCMP_ACT_LOG" }

pods/security/seccomp/profiles/violation.json onclick='copyCode("pods-security-seccomp-profiles-violation-json")' title="Copy pods/security/seccomp/profiles/violation.json to clipboard">

{ "defaultAction": "SCMP_ACT_ERRNO" }

pods/security/seccomp/profiles/fine-grained.json onclick='copyCode("pods-security-seccomp-profiles-fine-grained-json")' title="Copy pods/security/seccomp/profiles/fine-grained.json to clipboard">

{ "defaultAction": "SCMP_ACT_ERRNO", "architectures": [ "SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32" ], "syscalls": [ { "names": [ "accept4", "epoll_wait", "pselect6", "futex", "madvise", "epoll_ctl", "getsockname", "setsockopt", "vfork", "mmap", "read", "write", "close", "arch_prctl", "sched_getaffinity", "munmap", "brk", "rt_sigaction", "rt_sigprocmask", "sigaltstack", "gettid", "clone", "bind", "socket", "openat", "readlinkat", "exit_group", "epoll_create1", "listen", "rt_sigreturn", "sched_yield", "clock_gettime", "connect", "dup2", "epoll_pwait", "execve", "exit", "fcntl", "getpid", "getuid", "ioctl", "mprotect", "nanosleep", "open", "poll", "recvfrom", "sendto", "set_tid_address", "setitimer", "writev", "fstatfs", "getdents64", "pipe2", "getrlimit" ], "action": "SCMP_ACT_ALLOW" } ] }

Run these commands:

mkdir ./profiles
curl -L -o profiles/audit.json https://k8s.io/examples/pods/security/seccomp/profiles/audit.json
curl -L -o profiles/violation.json https://k8s.io/examples/pods/security/seccomp/profiles/violation.json
curl -L -o profiles/fine-grained.json https://k8s.io/examples/pods/security/seccomp/profiles/fine-grained.json
ls profiles

You should see three profiles listed at the end of the final step:

audit.json  fine-grained.json  violation.json

Create a local Kubernetes cluster with kind

For simplicity, kind can be used to create a single node cluster with the seccomp profiles loaded. Kind runs Kubernetes in Docker, so each node of the cluster is a container. This allows for files to be mounted in the filesystem of each container similar to loading files onto a node.

pods/security/seccomp/kind.yaml

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
  extraMounts:
  - hostPath: "./profiles"
    containerPath: "/var/lib/kubelet/seccomp/profiles"

Download that example kind configuration, and save it to a file named kind.yaml:

curl -L -O https://k8s.io/examples/pods/security/seccomp/kind.yaml

You can set a specific Kubernetes version by setting the node's container image. See Nodes within the kind documentation about configuration for more details on this. This tutorial assumes you are using Kubernetes v1.33.

As a beta feature, you can configure Kubernetes to use the profile that the container runtime prefers by default, rather than falling back to Unconfined. If you want to try that, see enable the use of RuntimeDefault as the default seccomp profile for all workloads before you continue.

Once you have a kind configuration in place, create the kind cluster with that configuration:

kind create cluster --config=kind.yaml

After the new Kubernetes cluster is ready, identify the Docker container running as the single node cluster:

docker ps

You should see output indicating that a container is running with name kind-control-plane. The output is similar to:

CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                       NAMES
6a96207fed4b        kindest/node:v1.18.2   "/usr/local/bin/entr…"   27 seconds ago      Up 24 seconds       127.0.0.1:42223->6443/tcp   kind-control-plane

If observing the filesystem of that container, you should see that the profiles/ directory has been successfully loaded into the default seccomp path of the kubelet. Use docker exec to run a command in the Pod:

# Change 6a96207fed4b to the container ID you saw from "docker ps"
docker exec -it 6a96207fed4b ls /var/lib/kubelet/seccomp/profiles

audit.json  fine-grained.json  violation.json

You have verified that these seccomp profiles are available to the kubelet running within kind.

Create a Pod that uses the container runtime default seccomp profile

Most container runtimes provide a sane set of default syscalls that are allowed or not. You can adopt these defaults for your workload by setting the seccomp type in the security context of a pod or container to RuntimeDefault.

Note:

If you have the seccompDefault configuration enabled, then Pods use the RuntimeDefault seccomp profile whenever no other seccomp profile is specified. Otherwise, the default is Unconfined.

Here's a manifest for a Pod that requests the RuntimeDefault seccomp profile for all its containers:

pods/security/seccomp/ga/default-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: default-pod
  labels:
    app: default-pod
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: test-container
    image: hashicorp/http-echo:1.0
    args:
    - "-text=just made some more syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

Create that Pod:

kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/default-pod.yaml

kubectl get pod default-pod

The Pod should be showing as having started successfully:

NAME        READY   STATUS    RESTARTS   AGE
default-pod 1/1     Running   0          20s

Delete the Pod before moving to the next section:

kubectl delete pod default-pod --wait --now

Create a Pod with a seccomp profile for syscall auditing

To start off, apply the audit.json profile, which will log all syscalls of the process, to a new Pod.

Here's a manifest for that Pod:

pods/security/seccomp/ga/audit-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: audit-pod
  labels:
    app: audit-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/audit.json
  containers:
  - name: test-container
    image: hashicorp/http-echo:1.0
    args:
    - "-text=just made some syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

Note:

Older versions of Kubernetes allowed you to configure seccomp behavior using . Kubernetes 1.33 only supports using fields within .spec.securityContext to configure seccomp, and this tutorial explains that approach.

Create the Pod in the cluster:

kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/audit-pod.yaml

This profile does not restrict any syscalls, so the Pod should start successfully.

kubectl get pod audit-pod

NAME        READY   STATUS    RESTARTS   AGE
audit-pod   1/1     Running   0          30s

In order to be able to interact with this endpoint exposed by this container, create a NodePort Service that allows access to the endpoint from inside the kind control plane container.

kubectl expose pod audit-pod --type NodePort --port 5678

Check what port the Service has been assigned on the node.

kubectl get service audit-pod

The output is similar to:

NAME        TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
audit-pod   NodePort   10.111.36.142   <none>        5678:32373/TCP   72s

Now you can use curl to access that endpoint from inside the kind control plane container, at the port exposed by this Service. Use docker exec to run the curl command within the container belonging to that control plane container:

# Change 6a96207fed4b to the control plane container ID and 32373 to the port number you saw from "docker ps"
docker exec -it 6a96207fed4b curl localhost:32373

just made some syscalls!

You can see that the process is running, but what syscalls did it actually make? Because this Pod is running in a local cluster, you should be able to see those in /var/log/syslog on your local system. Open up a new terminal window and tail the output for calls from http-echo:

# The log path on your computer might be different from "/var/log/syslog"
tail -f /var/log/syslog | grep 'http-echo'

You should already see some logs of syscalls made by http-echo, and if you run curl again inside the control plane container you will see more output written to the log.

For example:

Jul  6 15:37:40 my-machine kernel: [369128.669452] audit: type=1326 audit(1594067860.484:14536): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=51 compat=0 ip=0x46fe1f code=0x7ffc0000
Jul  6 15:37:40 my-machine kernel: [369128.669453] audit: type=1326 audit(1594067860.484:14537): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=54 compat=0 ip=0x46fdba code=0x7ffc0000
Jul  6 15:37:40 my-machine kernel: [369128.669455] audit: type=1326 audit(1594067860.484:14538): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=202 compat=0 ip=0x455e53 code=0x7ffc0000
Jul  6 15:37:40 my-machine kernel: [369128.669456] audit: type=1326 audit(1594067860.484:14539): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=288 compat=0 ip=0x46fdba code=0x7ffc0000
Jul  6 15:37:40 my-machine kernel: [369128.669517] audit: type=1326 audit(1594067860.484:14540): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=0 compat=0 ip=0x46fd44 code=0x7ffc0000
Jul  6 15:37:40 my-machine kernel: [369128.669519] audit: type=1326 audit(1594067860.484:14541): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=270 compat=0 ip=0x4559b1 code=0x7ffc0000
Jul  6 15:38:40 my-machine kernel: [369188.671648] audit: type=1326 audit(1594067920.488:14559): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=270 compat=0 ip=0x4559b1 code=0x7ffc0000
Jul  6 15:38:40 my-machine kernel: [369188.671726] audit: type=1326 audit(1594067920.488:14560): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=202 compat=0 ip=0x455e53 code=0x7ffc0000

You can begin to understand the syscalls required by the http-echo process by looking at the syscall= entry on each line. While these are unlikely to encompass all syscalls it uses, it can serve as a basis for a seccomp profile for this container.

Delete the Service and the Pod before moving to the next section:

kubectl delete service audit-pod --wait
kubectl delete pod audit-pod --wait --now

Create a Pod with a seccomp profile that causes violation

For demonstration, apply a profile to the Pod that does not allow for any syscalls.

The manifest for this demonstration is:

pods/security/seccomp/ga/violation-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: violation-pod
  labels:
    app: violation-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/violation.json
  containers:
  - name: test-container
    image: hashicorp/http-echo:1.0
    args:
    - "-text=just made some syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

Attempt to create the Pod in the cluster:

kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/violation-pod.yaml

The Pod creates, but there is an issue. If you check the status of the Pod, you should see that it failed to start.

kubectl get pod violation-pod

NAME            READY   STATUS             RESTARTS   AGE
violation-pod   0/1     CrashLoopBackOff   1          6s

As seen in the previous example, the http-echo process requires quite a few syscalls. Here seccomp has been instructed to error on any syscall by setting "defaultAction": "SCMP_ACT_ERRNO". This is extremely secure, but removes the ability to do anything meaningful. What you really want is to give workloads only the privileges they need.

Delete the Pod before moving to the next section:

kubectl delete pod violation-pod --wait --now

Create a Pod with a seccomp profile that only allows necessary syscalls

If you take a look at the fine-grained.json profile, you will notice some of the syscalls seen in syslog of the first example where the profile set "defaultAction": "SCMP_ACT_LOG". Now the profile is setting "defaultAction": "SCMP_ACT_ERRNO", but explicitly allowing a set of syscalls in the "action": "SCMP_ACT_ALLOW" block. Ideally, the container will run successfully and you will see no messages sent to syslog.

The manifest for this example is:

pods/security/seccomp/ga/fine-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: fine-pod
  labels:
    app: fine-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/fine-grained.json
  containers:
  - name: test-container
    image: hashicorp/http-echo:1.0
    args:
    - "-text=just made some syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

Create the Pod in your cluster:

kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/fine-pod.yaml

kubectl get pod fine-pod

The Pod should be showing as having started successfully:

NAME        READY   STATUS    RESTARTS   AGE
fine-pod   1/1     Running   0          30s

Open up a new terminal window and use tail to monitor for log entries that mention calls from http-echo:

# The log path on your computer might be different from "/var/log/syslog"
tail -f /var/log/syslog | grep 'http-echo'

Next, expose the Pod with a NodePort Service:

kubectl expose pod fine-pod --type NodePort --port 5678

Check what port the Service has been assigned on the node:

kubectl get service fine-pod

The output is similar to:

NAME        TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
fine-pod    NodePort   10.111.36.142   <none>        5678:32373/TCP   72s

Use curl to access that endpoint from inside the kind control plane container:

# Change 6a96207fed4b to the control plane container ID and 32373 to the port number you saw from "docker ps"
docker exec -it 6a96207fed4b curl localhost:32373

just made some syscalls!

You should see no output in the syslog. This is because the profile allowed all necessary syscalls and specified that an error should occur if one outside of the list is invoked. This is an ideal situation from a security perspective, but required some effort in analyzing the program. It would be nice if there was a simple way to get closer to this security without requiring as much effort.

Delete the Service and the Pod before moving to the next section:

kubectl delete service fine-pod --wait
kubectl delete pod fine-pod --wait --now

Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads

FEATURE STATE: Kubernetes v1.27 [stable]

To use seccomp profile defaulting, you must run the kubelet with the --seccomp-default command line flag enabled for each node where you want to use it.

If enabled, the kubelet will use the RuntimeDefault seccomp profile by default, which is defined by the container runtime, instead of using the Unconfined (seccomp disabled) mode. The default profiles aim to provide a strong set of security defaults while preserving the functionality of the workload. It is possible that the default profiles differ between container runtimes and their release versions, for example when comparing those from CRI-O and containerd.

Note:

Enabling the feature will neither change the Kubernetes securityContext.seccompProfile API field nor add the deprecated annotations of the workload. This provides users the possibility to rollback anytime without actually changing the workload configuration. Tools like crictl inspect can be used to verify which seccomp profile is being used by a container.

Some workloads may require a lower amount of syscall restrictions than others. This means that they can fail during runtime even with the RuntimeDefault profile. To mitigate such a failure, you can:

Run the workload explicitly as Unconfined.
Disable the SeccompDefault feature for the nodes. Also making sure that workloads get scheduled on nodes where the feature is disabled.
Create a custom seccomp profile for the workload.

If you were introducing this feature into production-like cluster, the Kubernetes project recommends that you enable this feature gate on a subset of your nodes and then test workload execution before rolling the change out cluster-wide.

You can find more detailed information about a possible upgrade and downgrade strategy in the related Kubernetes Enhancement Proposal (KEP): Enable seccomp by default.

Kubernetes 1.33 lets you configure the seccomp profile that applies when the spec for a Pod doesn't define a specific seccomp profile. However, you still need to enable this defaulting for each node where you would like to use it.

If you are running a Kubernetes 1.33 cluster and want to enable the feature, either run the kubelet with the --seccomp-default command line flag, or enable it through the kubelet configuration file. To enable the feature gate in kind, ensure that kind provides the minimum required Kubernetes version and enables the SeccompDefault feature in the kind configuration:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    image: kindest/node:v1.28.0@sha256:9f3ff58f19dcf1a0611d11e8ac989fdb30a28f40f236f59f0bea31fb956ccf5c
    kubeadmConfigPatches:
      - |
        kind: JoinConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            seccomp-default: "true"        
  - role: worker
    image: kindest/node:v1.28.0@sha256:9f3ff58f19dcf1a0611d11e8ac989fdb30a28f40f236f59f0bea31fb956ccf5c
    kubeadmConfigPatches:
      - |
        kind: JoinConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            seccomp-default: "true"

If the cluster is ready, then running a pod:

kubectl run --rm -it --restart=Never --image=alpine alpine -- sh

Should now have the default seccomp profile attached. This can be verified by using docker exec to run crictl inspect for the container on the kind worker:

docker exec -it kind-worker bash -c \
    'crictl inspect $(crictl ps --name=alpine -q) | jq .info.runtimeSpec.linux.seccomp'

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
  "syscalls": [
    {
      "names": ["..."]
    }
  ]
}

What's next

You can learn more about Linux seccomp:

Security

1 - Apply Pod Security Standards at the Cluster Level

Note

Before you begin

Choose the right Pod Security Standard to apply

Set modes, versions and standards

Note:

Note:

Clean up

What's next

2 - Apply Pod Security Standards at the Namespace Level

Note

Before you begin

Create cluster

Create a namespace

Enable Pod Security Standards checking for that namespace

Verify the Pod Security Standard enforcement

Clean up

What's next

3 - Restrict a Container's Access to Resources with AppArmor

Objectives

Before you begin

Securing a Pod

Note:

Example

Administration

Setting up Nodes with profiles

Authoring Profiles

Specifying AppArmor confinement

Caution:

AppArmor profile within security context

What's next

4 - Restrict a Container's Syscalls with seccomp

Objectives

Before you begin

Note:

Download example seccomp profiles

Create a local Kubernetes cluster with kind

Create a Pod that uses the container runtime default seccomp profile

Note:

Create a Pod with a seccomp profile for syscall auditing

Note:

Create a Pod with a seccomp profile that causes violation

Create a Pod with a seccomp profile that only allows necessary syscalls

Enable the use of RuntimeDefault as the default seccomp profile for all workloads

Note:

What's next

Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads