kubeadm - CLI tool to easily provision a secure Kubernetes cluster.
Components
kubelet - The
primary agent that runs on each node. The kubelet takes a set of PodSpecs
and ensures that the described containers are running and healthy.
kube-apiserver -
REST API that validates and configures data for API objects such as pods,
services, replication controllers.
kube-controller-manager -
Daemon that embeds the core control loops shipped with Kubernetes.
kube-proxy - Can
do simple TCP/UDP stream forwarding or round-robin TCP/UDP forwarding across
a set of back-ends.
kube-scheduler -
Scheduler that manages availability, performance, and capacity.
List of ports and protocols that
should be open on control plane and worker nodes
Config APIs
This section hosts the documentation for "unpublished" APIs which are used to
configure kubernetes components or tools. Most of these APIs are not exposed
by the API server in a RESTful way though they are essential for a user or an
operator to use or manage a cluster.
This section provides reference information for the Kubernetes API.
The REST API is the fundamental fabric of Kubernetes. All operations and
communications between components, and external user commands are REST API
calls that the API Server handles. Consequently, everything in the Kubernetes
platform is treated as an API object and has a corresponding entry in the
API.
The JSON and Protobuf serialization schemas follow the same guidelines for
schema changes. The following descriptions cover both formats.
The API versioning and software versioning are indirectly related.
The API and release versioning proposal
describes the relationship between API versioning and software versioning.
Different API versions indicate different levels of stability and support. You
can find more information about the criteria for each level in the
API Changes documentation.
Here's a summary of each level:
Alpha:
The version names contain alpha (for example, v1alpha1).
Built-in alpha API versions are disabled by default and must be explicitly enabled in the kube-apiserver configuration to be used.
The software may contain bugs. Enabling a feature may expose bugs.
Support for an alpha API may be dropped at any time without notice.
The API may change in incompatible ways in a later software release without notice.
The software is recommended for use only in short-lived testing clusters,
due to increased risk of bugs and lack of long-term support.
Beta:
The version names contain beta (for example, v2beta3).
Built-in beta API versions are disabled by default and must be explicitly enabled in the kube-apiserver configuration to be used
(except for beta versions of APIs introduced prior to Kubernetes 1.22, which were enabled by default).
Built-in beta API versions have a maximum lifetime of 9 months or 3 minor releases (whichever is longer) from introduction
to deprecation, and 9 months or 3 minor releases (whichever is longer) from deprecation to removal.
The software is well tested. Enabling a feature is considered safe.
The support for a feature will not be dropped, though the details may change.
The schema and/or semantics of objects may change in incompatible ways in
a subsequent beta or stable API version. When this happens, migration
instructions are provided. Adapting to a subsequent beta or stable API version
may require editing or re-creating API objects, and may not be straightforward.
The migration may require downtime for applications that rely on the feature.
The software is not recommended for production uses. Subsequent releases
may introduce incompatible changes. Use of beta API versions is
required to transition to subsequent beta or stable API versions
once the beta API version is deprecated and no longer served.
Note:
Please try beta features and provide feedback. After the features exit beta, it
may not be practical to make more changes.
Stable:
The version name is vX where X is an integer.
Stable API versions remain available for all future releases within a Kubernetes major version,
and there are no current plans for a major version revision of Kubernetes that removes stable APIs.
API groups
API groups
make it easier to extend the Kubernetes API.
The API group is specified in a REST path and in the apiVersion field of a
serialized object.
There are several API groups in Kubernetes:
The core (also called legacy) group is found at REST path /api/v1.
The core group is not specified as part of the apiVersion field, for
example, apiVersion: v1.
The named groups are at REST path /apis/$GROUP_NAME/$VERSION and use
apiVersion: $GROUP_NAME/$VERSION (for example, apiVersion: batch/v1).
You can find the full list of supported API groups in
Kubernetes API reference.
Enabling or disabling API groups
Certain resources and API groups are enabled by default. You can enable or
disable them by setting --runtime-config on the API server. The
--runtime-config flag accepts comma separated <key>[=<value>] pairs
describing the runtime configuration of the API server. If the =<value>
part is omitted, it is treated as if =true is specified. For example:
to disable batch/v1, set --runtime-config=batch/v1=false
to enable batch/v2alpha1, set --runtime-config=batch/v2alpha1
to enable a specific version of an API, such as storage.k8s.io/v1beta1/csistoragecapacities, set --runtime-config=storage.k8s.io/v1beta1/csistoragecapacities
Note:
When you enable or disable groups or resources, you need to restart the API
server and controller manager to pick up the --runtime-config changes.
Persistence
Kubernetes stores its serialized state in terms of the API resources by writing them into
etcd.
The Kubernetes API is a resource-based (RESTful) programmatic interface
provided via HTTP. It supports retrieving, creating, updating, and deleting
primary resources via the standard HTTP verbs (POST, PUT, PATCH, DELETE,
GET).
For some resources, the API includes additional subresources that allow
fine grained authorization (such as separate views for Pod details and
log retrievals), and can accept and serve those resources in different
representations for convenience or efficiency.
Kubernetes supports efficient change notifications on resources via
watches:
in the Kubernetes API, watch is a verb that is used to track changes to an object in Kubernetes as a stream.
It is used for the efficient detection of changes.
Kubernetes also provides consistent list operations so that API clients can
effectively cache, track, and synchronize the state of resources.
You can view the API reference online,
or read on to learn about the API in general.
Kubernetes API terminology
Kubernetes generally leverages common RESTful terminology to describe the
API concepts:
A resource type is the name used in the URL (pods, namespaces, services)
All resource types have a concrete representation (their object schema) which is called a kind
A list of instances of a resource type is known as a collection
A single instance of a resource type is called a resource, and also usually represents an object
For some resource types, the API includes one or more sub-resources, which are represented as URI paths below the resource
Most Kubernetes API resource types are
objects –
they represent a concrete instance of a concept on the cluster, like a
pod or namespace. A smaller number of API resource types are virtual in
that they often represent operations on objects, rather than objects, such
as a permission check
(use a POST with a JSON-encoded body of SubjectAccessReview to the
subjectaccessreviews resource), or the eviction sub-resource of a Pod
(used to trigger
API-initiated eviction).
Object names
All objects you can create via the API have a unique object
name to allow idempotent creation and
retrieval, except that virtual resource types may not have unique names if they are
not retrievable, or do not rely on idempotency.
Within a namespace, only one object
of a given kind can have a given name at a time. However, if you delete the object,
you can make a new object with the same name. Some objects are not namespaced (for
example: Nodes), and so their names must be unique across the whole cluster.
API verbs
Almost all object resource types support the standard HTTP verbs - GET, POST, PUT, PATCH,
and DELETE. Kubernetes also uses its own verbs, which are often written lowercase to distinguish
them from HTTP verbs.
Kubernetes uses the term list to describe returning a collection of
resources to distinguish from retrieving a single resource which is usually called
a get. If you sent an HTTP GET request with the ?watch query parameter,
Kubernetes calls this a watch and not a get (see
Efficient detection of changes for more details).
For PUT requests, Kubernetes internally classifies these as either create or update
based on the state of the existing object. An update is different from a patch; the
HTTP verb for a patch is PATCH.
Resource URIs
All resource types are either scoped by the cluster (/apis/GROUP/VERSION/*) or to a
namespace (/apis/GROUP/VERSION/namespaces/NAMESPACE/*). A namespace-scoped resource
type will be deleted when its namespace is deleted and access to that resource type
is controlled by authorization checks on the namespace scope.
Note: core resources use /api instead of /apis and omit the GROUP path segment.
You can also access collections of resources (for example: listing all Nodes).
The following paths are used to retrieve collections and resources:
Cluster-scoped resources:
GET /apis/GROUP/VERSION/RESOURCETYPE - return the collection of resources of the resource type
GET /apis/GROUP/VERSION/RESOURCETYPE/NAME - return the resource with NAME under the resource type
Namespace-scoped resources:
GET /apis/GROUP/VERSION/RESOURCETYPE - return the collection of all instances of the resource type across all namespaces
GET /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE - return collection of all instances of the resource type in NAMESPACE
GET /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE/NAME - return the instance of the resource type with NAME in NAMESPACE
Since a namespace is a cluster-scoped resource type, you can retrieve the list
(“collection”) of all namespaces with GET /api/v1/namespaces and details about
a particular namespace with GET /api/v1/namespaces/NAME.
Cluster-scoped subresource: GET /apis/GROUP/VERSION/RESOURCETYPE/NAME/SUBRESOURCE
Namespace-scoped subresource: GET /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE/NAME/SUBRESOURCE
The verbs supported for each subresource will differ depending on the object -
see the API reference for more information. It
is not possible to access sub-resources across multiple resources - generally a new
virtual resource type would be used if that becomes necessary.
Efficient detection of changes
The Kubernetes API allows clients to make an initial request for an object or a
collection, and then to track changes since that initial request: a watch. Clients
can send a list or a get and then make a follow-up watch request.
To make this change tracking possible, every Kubernetes object has a resourceVersion
field representing the version of that resource as stored in the underlying persistence
layer. When retrieving a collection of resources (either namespace or cluster scoped),
the response from the API server contains a resourceVersion value. The client can
use that resourceVersion to initiate a watch against the API server.
When you send a watch request, the API server responds with a stream of
changes. These changes itemize the outcome of operations (such as create, delete,
and update) that occurred after the resourceVersion you specified as a parameter
to the watch request. The overall watch mechanism allows a client to fetch
the current state and then subscribe to subsequent changes, without missing any events.
If a client watch is disconnected then that client can start a new watch from
the last returned resourceVersion; the client could also perform a fresh get /
list request and begin again. See Resource Version Semantics
for more detail.
For example:
List all of the pods in a given namespace.
GET /api/v1/namespaces/test/pods
---
200 OK
Content-Type: application/json
{
"kind": "PodList",
"apiVersion": "v1",
"metadata": {"resourceVersion":"10245"},
"items": [...]
}
Starting from resource version 10245, receive notifications of any API operations
(such as create, delete, patch or update) that affect Pods in the
test namespace. Each change notification is a JSON document. The HTTP response body
(served as application/json) consists a series of JSON documents.
A given Kubernetes server will only preserve a historical record of changes for a
limited time. Clusters using etcd 3 preserve changes in the last 5 minutes by default.
When the requested watch operations fail because the historical version of that
resource is not available, clients must handle the case by recognizing the status code
410 Gone, clearing their local cache, performing a new get or list operation,
and starting the watch from the resourceVersion that was returned.
For subscribing to collections, Kubernetes client libraries typically offer some form
of standard tool for this list-then-watch logic. (In the Go client library,
this is called a Reflector and is located in the k8s.io/client-go/tools/cache package.)
Watch bookmarks
To mitigate the impact of short history window, the Kubernetes API provides a watch
event named BOOKMARK. It is a special kind of event to mark that all changes up
to a given resourceVersion the client is requesting have already been sent. The
document representing the BOOKMARK event is of the type requested by the request,
but only includes a .metadata.resourceVersion field. For example:
As a client, you can request BOOKMARK events by setting the
allowWatchBookmarks=true query parameter to a watch request, but you shouldn't
assume bookmarks are returned at any specific interval, nor can clients assume that
the API server will send any BOOKMARK event even when requested.
Streaming lists
FEATURE STATE:Kubernetes v1.27 [alpha]
On large clusters, retrieving the collection of some resource types may result in
a significant increase of resource usage (primarily RAM) on the control plane.
In order to alleviate its impact and simplify the user experience of the list + watch
pattern, Kubernetes v1.27 introduces as an alpha feature the support
for requesting the initial state (previously requested via the list request) as part of
the watch request.
Provided that the WatchListfeature gate
is enabled, this can be achieved by specifying sendInitialEvents=true as query string parameter
in a watch request. If set, the API server starts the watch stream with synthetic init
events (of type ADDED) to build the whole state of all existing objects followed by a
BOOKMARK event
(if requested via allowWatchBookmarks=true option). The bookmark event includes the resource version
to which is synced. After sending the bookmark event, the API server continues as for any other watch
request.
When you set sendInitialEvents=true in the query string, Kubernetes also requires that you set
resourceVersionMatch to NotOlderThan value.
If you provided resourceVersion in the query string without providing a value or don't provide
it at all, this is interpreted as a request for consistent read;
the bookmark event is sent when the state is synced at least to the moment of a consistent read
from when the request started to be processed. If you specify resourceVersion (in the query string),
the bookmark event is sent when the state is synced at least to the provided resource version.
Example
An example: you want to watch a collection of Pods. For that collection, the current resource version
is 10245 and there are two pods: foo and bar. Then sending the following request (explicitly requesting
consistent read by setting empty resource version using resourceVersion=) could result
in the following sequence of events:
APIResponseCompression is an option that allows the API server to compress the responses for get
and list requests, reducing the network bandwidth and improving the performance of large-scale clusters.
It is enabled by default since Kubernetes 1.16 and it can be disabled by including
APIResponseCompression=false in the --feature-gates flag on the API server.
API response compression can significantly reduce the size of the response, especially for large resources or
collections.
For example, a list request for pods can return hundreds of kilobytes or even megabytes of data,
depending on the number of pods and their attributes. By compressing the response, the network bandwidth
can be saved and the latency can be reduced.
To verify if APIResponseCompression is working, you can send a get or list request to the
API server with an Accept-Encoding header, and check the response size and headers. For example:
GET /api/v1/pods
Accept-Encoding: gzip
---
200 OK
Content-Type: application/json
content-encoding: gzip
...
The content-encoding header indicates that the response is compressed with gzip.
Retrieving large results sets in chunks
FEATURE STATE:Kubernetes v1.29 [stable]
On large clusters, retrieving the collection of some resource types may result in
very large responses that can impact the server and client. For instance, a cluster
may have tens of thousands of Pods, each of which is equivalent to roughly 2 KiB of
encoded JSON. Retrieving all pods across all namespaces may result in a very large
response (10-20MB) and consume a large amount of server resources.
The Kubernetes API server supports the ability to break a single large collection request
into many smaller chunks while preserving the consistency of the total request. Each
chunk can be returned sequentially which reduces both the total size of the request and
allows user-oriented clients to display results incrementally to improve responsiveness.
You can request that the API server handles a list by serving single collection
using pages (which Kubernetes calls chunks). To retrieve a single collection in
chunks, two query parameters limit and continue are supported on requests against
collections, and a response field continue is returned from all list operations
in the collection's metadata field. A client should specify the maximum results they
wish to receive in each chunk with limit and the server will return up to limit
resources in the result and include a continue value if there are more resources
in the collection.
As an API client, you can then pass this continue value to the API server on the
next request, to instruct the server to return the next page (chunk) of results. By
continuing until the server returns an empty continue value, you can retrieve the
entire collection.
Like a watch operation, a continue token will expire after a short amount
of time (by default 5 minutes) and return a 410 Gone if more results cannot be
returned. In this case, the client will need to start from the beginning or omit the
limit parameter.
For example, if there are 1,253 pods on the cluster and you want to receive chunks
of 500 pods at a time, request those chunks as follows:
List all of the pods on a cluster, retrieving up to 500 pods each time.
Continue the previous call, retrieving the last 253 pods.
GET /api/v1/pods?limit=500&continue=ENCODED_CONTINUE_TOKEN_2
---
200 OK
Content-Type: application/json
{
"kind": "PodList",
"apiVersion": "v1",
"metadata": {
"resourceVersion":"10245",
"continue": "", // continue token is empty because we have reached the end of the list
...
},
"items": [...] // returns pods 1001-1253
}
Notice that the resourceVersion of the collection remains constant across each request,
indicating the server is showing you a consistent snapshot of the pods. Pods that
are created, updated, or deleted after version 10245 would not be shown unless
you make a separate list request without the continue token. This allows you
to break large requests into smaller chunks and then perform a watch operation
on the full set without missing any updates.
remainingItemCount is the number of subsequent items in the collection that are not
included in this response. If the list request contained label or field
selectors then the number of
remaining items is unknown and the API server does not include a remainingItemCount
field in its response.
If the list is complete (either because it is not chunking, or because this is the
last chunk), then there are no more remaining items and the API server does not include a
remainingItemCount field in its response. The intended use of the remainingItemCount
is estimating the size of a collection.
Collections
In Kubernetes terminology, the response you get from a list is
a collection. However, Kubernetes defines concrete kinds for
collections of different types of resource. Collections have a kind
named for the resource kind, with List appended.
When you query the API for a particular type, all items returned by that query are
of that type.
For example, when you list Services, the collection response
has kind set to
ServiceList; each item in that collection represents a single Service. For example:
There are dozens of collection types (such as PodList, ServiceList,
and NodeList) defined in the Kubernetes API.
You can get more information about each collection type from the
Kubernetes API documentation.
Some tools, such as kubectl, represent the Kubernetes collection
mechanism slightly differently from the Kubernetes API itself.
Because the output of kubectl might include the response from
multiple list operations at the API level, kubectl represents
a list of items using kind: List. For example:
Keep in mind that the Kubernetes API does not have a kind named List.
kind: List is a client-side, internal implementation detail for processing
collections that might be of different kinds of object. Avoid depending on
kind: List in automation or other code.
Receiving resources as Tables
When you run kubectl get, the default output format is a simple tabular
representation of one or more instances of a particular resource type. In the past,
clients were required to reproduce the tabular and describe output implemented in
kubectl to perform simple lists of objects.
A few limitations of that approach include non-trivial logic when dealing with
certain objects. Additionally, types provided by API aggregation or third party
resources are not known at compile time. This means that generic implementations
had to be in place for types unrecognized by a client.
In order to avoid potential limitations as described above, clients may request
the Table representation of objects, delegating specific details of printing to the
server. The Kubernetes API implements standard HTTP content type negotiation: passing
an Accept header containing a value of application/json;as=Table;g=meta.k8s.io;v=v1
with a GET call will request that the server return objects in the Table content
type.
For example, list all of the pods on a cluster in the Table format.
GET /api/v1/pods
Accept: application/json;as=Table;g=meta.k8s.io;v=v1
---
200 OK
Content-Type: application/json
{
"kind": "Table",
"apiVersion": "meta.k8s.io/v1",
...
"columnDefinitions": [
...
]
}
For API resource types that do not have a custom Table definition known to the control
plane, the API server returns a default Table response that consists of the resource's
name and creationTimestamp fields.
Not all API resource types support a Table response; for example, a
CustomResourceDefinitions
might not define field-to-table mappings, and an APIService that
extends the core Kubernetes API
might not serve Table responses at all. If you are implementing a client that
uses the Table information and must work against all resource types, including
extensions, you should make requests that specify multiple content types in the
Accept header. For example:
By default, Kubernetes returns objects serialized to JSON with content type
application/json. This is the default serialization format for the API. However,
clients may request the more efficient
Protobuf representation of these objects for better performance at scale.
The Kubernetes API implements standard HTTP content type negotiation: passing an
Accept header with a GET call will request that the server tries to return
a response in your preferred media type, while sending an object in Protobuf to
the server for a PUT or POST call means that you must set the Content-Type
header appropriately.
The server will return a response with a Content-Type header if the requested
format is supported, or the 406 Not acceptable error if none of the media types you
requested are supported. All built-in resource types support the application/json
media type.
See the Kubernetes API reference for a list of
supported content types for each API.
For example:
List all of the pods on a cluster in Protobuf format.
GET /api/v1/pods
Accept: application/vnd.kubernetes.protobuf
---
200 OK
Content-Type: application/vnd.kubernetes.protobuf
... binary encoded PodList object
Create a pod by sending Protobuf encoded data to the server, but request a response
in JSON.
POST /api/v1/namespaces/test/pods
Content-Type: application/vnd.kubernetes.protobuf
Accept: application/json
... binary encoded Pod object
---
200 OK
Content-Type: application/json
{
"kind": "Pod",
"apiVersion": "v1",
...
}
Not all API resource types support Protobuf; specifically, Protobuf isn't available for
resources that are defined as
CustomResourceDefinitions
or are served via the
aggregation layer.
As a client, if you might need to work with extension types you should specify multiple
content types in the request Accept header to support fallback to JSON.
For example:
Kubernetes uses an envelope wrapper to encode Protobuf responses. That wrapper starts
with a 4 byte magic number to help identify content in disk or in etcd as Protobuf
(as opposed to JSON), and then is followed by a Protobuf encoded wrapper message, which
describes the encoding and type of the underlying object and then contains the object.
The wrapper format is:
A four byte magic number prefix:
Bytes 0-3: "k8s\x00" [0x6b, 0x38, 0x73, 0x00]
An encoded Protobuf message with the following IDL:
message Unknown {
// typeMeta should have the string values for "kind" and "apiVersion" as set on the JSON object
optional TypeMeta typeMeta = 1;
// raw will hold the complete serialized object in protobuf. See the protobuf definitions in the client libraries for a given kind.
optional bytes raw = 2;
// contentEncoding is encoding used for the raw data. Unspecified means no encoding.
optional string contentEncoding = 3;
// contentType is the serialization method used to serialize 'raw'. Unspecified means application/vnd.kubernetes.protobuf and is usually
// omitted.
optional string contentType = 4;
}
message TypeMeta {
// apiVersion is the group/version for this type
optional string apiVersion = 1;
// kind is the name of the object schema. A protobuf definition should exist for this object.
optional string kind = 2;
}
Note:
Clients that receive a response in application/vnd.kubernetes.protobuf that does
not match the expected prefix should reject the response, as future versions may need
to alter the serialization format in an incompatible way and will do so by changing
the prefix.
Resource deletion
When you delete a resource this takes place in two phases.
When a client first sends a delete to request the removal of a resource, the .metadata.deletionTimestamp is set to the current time.
Once the .metadata.deletionTimestamp is set, external controllers that act on finalizers
may start performing their cleanup work at any time, in any order.
Order is not enforced between finalizers because it would introduce significant
risk of stuck .metadata.finalizers.
The .metadata.finalizers field is shared: any actor with permission can reorder it.
If the finalizer list were processed in order, then this might lead to a situation
in which the component responsible for the first finalizer in the list is
waiting for some signal (field value, external system, or other) produced by a
component responsible for a finalizer later in the list, resulting in a deadlock.
Without enforced ordering, finalizers are free to order amongst themselves and are
not vulnerable to ordering changes in the list.
Once the last finalizer is removed, the resource is actually removed from etcd.
Single resource API
The Kubernetes API verbs get, create, update, patch,
delete and proxy support single resources only.
These verbs with single resource support have no support for submitting multiple
resources together in an ordered or unordered list or transaction.
When clients (including kubectl) act on a set of resources, the client makes a series
of single-resource API requests, then aggregates the responses if needed.
By contrast, the Kubernetes API verbs list and watch allow getting multiple
resources, and deletecollection allows deleting multiple resources.
Field validation
Kubernetes always validates the type of fields. For example, if a field in the
API is defined as a number, you cannot set the field to a text value. If a field
is defined as an array of strings, you can only provide an array. Some fields
allow you to omit them, other fields are required. Omitting a required field
from an API request is an error.
If you make a request with an extra field, one that the cluster's control plane
does not recognize, then the behavior of the API server is more complicated.
By default, the API server drops fields that it does not recognize
from an input that it receives (for example, the JSON body of a PUT request).
There are two situations where the API server drops fields that you supplied in
an HTTP request.
These situations are:
The field is unrecognized because it is not in the resource's OpenAPI schema. (One
exception to this is for CRDs that explicitly choose not to prune unknown
fields via x-kubernetes-preserve-unknown-fields).
The field is duplicated in the object.
Validation for unrecognized or duplicate fields
FEATURE STATE:Kubernetes v1.27 [stable]
From 1.25 onward, unrecognized or duplicate fields in an object are detected via
validation on the server when you use HTTP verbs that can submit data (POST, PUT, and PATCH). Possible levels of
validation are Ignore, Warn (default), and Strict.
Ignore
The API server succeeds in handling the request as it would without the erroneous fields
being set, dropping all unknown and duplicate fields and giving no indication it
has done so.
Warn
(Default) The API server succeeds in handling the request, and reports a
warning to the client. The warning is sent using the Warning: response header,
adding one warning item for each unknown or duplicate field. For more
information about warnings and the Kubernetes API, see the blog article
Warning: Helpful Warnings Ahead.
Strict
The API server rejects the request with a 400 Bad Request error when it
detects any unknown or duplicate fields. The response message from the API
server specifies all the unknown or duplicate fields that the API server has
detected.
The field validation level is set by the fieldValidation query parameter.
Note:
If you submit a request that specifies an unrecognized field, and that is also invalid for
a different reason (for example, the request provides a string value where the API expects
an integer for a known field), then the API server responds with a 400 Bad Request error, but will
not provide any information on unknown or duplicate fields (only which fatal
error it encountered first).
You always receive an error response in this case, no matter what field validation level you requested.
Tools that submit requests to the server (such as kubectl), might set their own
defaults that are different from the Warn validation level that the API server uses
by default.
The kubectl tool uses the --validate flag to set the level of field
validation. It accepts the values ignore, warn, and strict while
also accepting the values true (equivalent to strict) and false
(equivalent to ignore). The default validation setting for kubectl is
--validate=true, which means strict server-side field validation.
When kubectl cannot connect to an API server with field validation (API servers
prior to Kubernetes 1.27), it will fall back to using client-side validation.
Client-side validation will be removed entirely in a future version of kubectl.
Note:
Prior to Kubernetes 1.25 kubectl --validate was used to toggle client-side validation on or off as
a boolean flag.
Dry-run
FEATURE STATE:Kubernetes v1.19 [stable]
When you use HTTP verbs that can modify resources (POST, PUT, PATCH, and
DELETE), you can submit your request in a dry run mode. Dry run mode helps to
evaluate a request through the typical request stages (admission chain, validation,
merge conflicts) up until persisting objects to storage. The response body for the
request is as close as possible to a non-dry-run response. Kubernetes guarantees that
dry-run requests will not be persisted in storage or have any other side effects.
Make a dry-run request
Dry-run is triggered by setting the dryRun query parameter. This parameter is a
string, working as an enum, and the only accepted values are:
[no value set]
Allow side effects. You request this with a query string such as ?dryRun
or ?dryRun&pretty=true. The response is the final object that would have been
persisted, or an error if the request could not be fulfilled.
All
Every stage runs as normal, except for the final storage stage where side effects
are prevented.
When you set ?dryRun=All, any relevant
admission controllers
are run, validating admission controllers check the request post-mutation, merge is
performed on PATCH, fields are defaulted, and schema validation occurs. The changes
are not persisted to the underlying storage, but the final object which would have
been persisted is still returned to the user, along with the normal status code.
If the non-dry-run version of a request would trigger an admission controller that has
side effects, the request will be failed rather than risk an unwanted side effect. All
built in admission control plugins support dry-run. Additionally, admission webhooks can
declare in their
configuration object
that they do not have side effects, by setting their sideEffects field to None.
Note:
If a webhook actually does have side effects, then the sideEffects field should be
set to "NoneOnDryRun". That change is appropriate provided that the webhook is also
be modified to understand the DryRun field in AdmissionReview, and to prevent side
effects on any request marked as dry runs.
Here is an example dry-run request that uses ?dryRun=All:
POST /api/v1/namespaces/test/pods?dryRun=All
Content-Type: application/json
Accept: application/json
The response would look the same as for non-dry-run request, but the values of some
generated fields may differ.
Generated values
Some values of an object are typically generated before the object is persisted. It
is important not to rely upon the values of these fields set by a dry-run request,
since these values will likely be different in dry-run mode from when the real
request is made. Some of these fields are:
name: if generateName is set, name will have a unique random name
creationTimestamp / deletionTimestamp: records the time of creation/deletion
UID: uniquely identifies the object and is randomly generated (non-deterministic)
resourceVersion: tracks the persisted version of the object
Any field set by a mutating admission controller
For the Service resource: Ports or IP addresses that the kube-apiserver assigns to Service objects
Dry-run authorization
Authorization for dry-run and non-dry-run requests is identical. Thus, to make
a dry-run request, you must be authorized to make the non-dry-run request.
For example, to run a dry-run patch for a Deployment, you must be authorized
to perform that patch. Here is an example of a rule for Kubernetes
RBAC that allows patching
Deployments:
Kubernetes provides several ways to update existing objects.
You can read choosing an update mechanism to
learn about which approach might be best for your use case.
You can overwrite (update) an existing resource - for example, a ConfigMap -
using an HTTP PUT. For a PUT request, it is the client's responsibility to specify
the resourceVersion (taking this from the object being updated). Kubernetes uses
that resourceVersion information so that the API server can detect lost updates
and reject requests made by a client that is out of date with the cluster.
In the event that the resource has changed (the resourceVersion the client
provided is stale), the API server returns a 409 Conflict error response.
Instead of sending a PUT request, the client can send an instruction to the API
server to patch an existing resource. A patch is typically appropriate
if the change that the client wants to make isn't conditional on the existing data. Clients that need effective detection of lost updates should consider
making their request conditional on the existing resourceVersion (either HTTP PUT or HTTP PATCH),
and then handle any retries that are needed in case there is a conflict.
The Kubernetes API supports four different PATCH operations, determined by their
corresponding HTTP Content-Type header:
application/apply-patch+yaml
Server Side Apply YAML (a Kubernetes-specific extension, based on YAML).
All JSON documents are valid YAML, so you can also submit JSON using this
media type. See Server Side Apply serialization
for more details. To Kubernetes, this is a create operation if the object does not exist,
or a patch operation if the object already exists.
application/json-patch+json
JSON Patch, as defined in RFC6902.
A JSON patch is a sequence of operations that are executed on the resource;
for example {"op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ]}. To Kubernetes, this is a patch operation.
A patch using application/json-patch+json can include conditions to
validate consistency, allowing the operation to fail if those conditions
are not met (for example, to avoid a lost update).
application/merge-patch+json
JSON Merge Patch, as defined in RFC7386.
A JSON Merge Patch is essentially a partial representation of the resource.
The submitted JSON is combined with the current resource to create a new one,
then the new one is saved. To Kubernetes, this is a patch operation.
application/strategic-merge-patch+json
Strategic Merge Patch (a Kubernetes-specific extension based on JSON).
Strategic Merge Patch is a custom implementation of JSON Merge Patch.
You can only use Strategic Merge Patch with built-in APIs, or with aggregated
API servers that have special support for it. You cannot use
application/strategic-merge-patch+json with any API
defined using a CustomResourceDefinition.
Note:
The Kubernetes server side apply mechanism has superseded Strategic Merge
Patch.
Kubernetes' Server Side Apply
feature allows the control plane to track managed fields for newly created objects.
Server Side Apply provides a clear pattern for managing field conflicts,
offers server-side apply and update operations, and replaces the
client-side functionality of kubectl apply.
For Server-Side Apply, Kubernetes treats the request as a create if the object
does not yet exist, and a patch otherwise. For other requests that use PATCH
at the HTTP level, the logical Kubernetes operation is always patch.
The update (HTTP PUT) operation is simple to implement and flexible,
but has drawbacks:
You need to handle conflicts where the resourceVersion of the object changes
between your client reading it and trying to write it back. Kubernetes always
detects the conflict, but you as the client author need to implement retries.
You might accidentally drop fields if you decode an object locally (for example,
using client-go, you could receive fields that your client does not know how to
handle - and then drop them as part of your update.
If there's a lot of contention on the object (even on a field, or set of fields,
that you're not trying to edit), you might have trouble sending the update.
The problem is worse for larger objects and for objects with many fields.
HTTP PATCH using JSON Patch
A patch update is helpful, because:
As you're only sending differences, you have less data to send in the PATCH
request.
You can make changes that rely on existing values, such as copying the
value of a particular field into an annotation.
Unlike with an update (HTTP PUT), making your change can happen right away
even if there are frequent changes to unrelated fields): you usually would
not need to retry.
You might still need to specify the resourceVersion (to match an existing object)
if you want to be extra careful to avoid lost updates
It's still good practice to write in some retry logic in case of errors.
You can use test conditions to careful craft specific update conditions.
For example, you can increment a counter without reading it if the existing
value matches what you expect. You can do this with no lost update risk,
even if the object has changed in other ways since you last wrote to it.
(If the test condition fails, you can fall back to reading the current value
and then write back the changed number).
However:
you need more local (client) logic to build the patch; it helps a lot if you have
a library implementation of JSON Patch, or even for making a JSON Patch specifically against Kubernetes
as the author of client software, you need to be careful when building the patch
(the HTTP request body) not to drop fields (the order of operations matters)
HTTP PATCH using Server-Side Apply
Server-Side Apply has some clear benefits:
A single round trip: it rarely requires making a GET request first.
and you can still detect conflicts for unexpected changes
you have the option to force override a conflict, if appropriate
Client implementations are easy to make
You get an atomic create-or-update operation without extra effort
(similar to UPSERT in some SQL dialects)
However:
Server-Side Apply does not work at all for field changes that depend on a current value of the object
You can only apply updates to objects. Some resources in the Kubernetes HTTP API are
not objects (they do not have a .metadata field), and Server-Side Apply
is only relevant for Kubernetes objects.
Resource versions
Resource versions are strings that identify the server's internal version of an
object. Resource versions can be used by clients to determine when objects have
changed, or to express data consistency requirements when getting, listing and
watching resources. Resource versions must be treated as opaque by clients and passed
unmodified back to the server.
You must not assume resource versions are numeric or collatable. API clients may
only compare two resource versions for equality (this means that you must not compare
resource versions for greater-than or less-than relationships).
resourceVersion fields in metadata
Clients find resource versions in resources, including the resources from the response
stream for a watch, or when using list to enumerate resources.
v1.meta/ObjectMeta - The metadata.resourceVersion of a resource instance identifies the resource version the instance was last modified at.
v1.meta/ListMeta - The metadata.resourceVersion of a resource collection (the response to a list) identifies the resource version at which the collection was constructed.
resourceVersion parameters in query strings
The get, list, and watch operations support the resourceVersion parameter.
From version v1.19, Kubernetes API servers also support the resourceVersionMatch
parameter on list requests.
The API server interprets the resourceVersion parameter differently depending
on the operation you request, and on the value of resourceVersion. If you set
resourceVersionMatch then this also affects the way matching happens.
Semantics for get and list
For get and list, the semantics of resourceVersion are:
get:
resourceVersion unset
resourceVersion="0"
resourceVersion="{value other than 0}"
Most Recent
Any
Not older than
list:
From version v1.19, Kubernetes API servers support the resourceVersionMatch parameter
on list requests. If you set both resourceVersion and resourceVersionMatch, the
resourceVersionMatch parameter determines how the API server interprets
resourceVersion.
You should always set the resourceVersionMatch parameter when setting
resourceVersion on a list request. However, be prepared to handle the case
where the API server that responds is unaware of resourceVersionMatch
and ignores it.
Unless you have strong consistency requirements, using resourceVersionMatch=NotOlderThan and
a known resourceVersion is preferable since it can achieve better performance and scalability
of your cluster than leaving resourceVersion and resourceVersionMatch unset, which requires
quorum read to be served.
Setting the resourceVersionMatch parameter without setting resourceVersion is not valid.
This table explains the behavior of list requests with various combinations of
resourceVersion and resourceVersionMatch:
resourceVersionMatch and paging parameters for list
resourceVersionMatch param
paging params
resourceVersion not set
resourceVersion="0"
resourceVersion="{value other than 0}"
unset
limit unset
Most Recent
Any
Not older than
unset
limit=<n>, continue unset
Most Recent
Any
Exact
unset
limit=<n>, continue=<token>
Continue Token, Exact
Invalid, treated as Continue Token, Exact
Invalid, HTTP 400 Bad Request
resourceVersionMatch=Exact
limit unset
Invalid
Invalid
Exact
resourceVersionMatch=Exact
limit=<n>, continue unset
Invalid
Invalid
Exact
resourceVersionMatch=NotOlderThan
limit unset
Invalid
Any
Not older than
resourceVersionMatch=NotOlderThan
limit=<n>, continue unset
Invalid
Any
Not older than
Note:
If your cluster's API server does not honor the resourceVersionMatch parameter,
the behavior is the same as if you did not set it.
The meaning of the get and list semantics are:
Any
Return data at any resource version. The newest available resource version is preferred,
but strong consistency is not required; data at any resource version may be served. It is possible
for the request to return data at a much older resource version that the client has previously
observed, particularly in high availability configurations, due to partitions or stale
caches. Clients that cannot tolerate this should not use this semantic.
Most recent
Return data at the most recent resource version. The returned data must be
consistent (in detail: served from etcd via a quorum read).
For etcd v3.4.31+ and v3.5.13+ Kubernetes 1.31 serves “most recent” reads from the watch cache:
an internal, in-memory store within the API server that caches and mirrors the state of data
persisted into etcd. Kubernetes requests progress notification to maintain cache consistency against
the etcd persistence layer. Kubernetes versions v1.28 through to v1.30 also supported this
feature, although as Alpha it was not recommended for production nor enabled by default until the v1.31 release.
Not older than
Return data at least as new as the provided resourceVersion. The newest
available data is preferred, but any data not older than the provided resourceVersion may be
served. For list requests to servers that honor the resourceVersionMatch parameter, this
guarantees that the collection's .metadata.resourceVersion is not older than the requested
resourceVersion, but does not make any guarantee about the .metadata.resourceVersion of any
of the items in that collection.
Exact
Return data at the exact resource version provided. If the provided resourceVersion is
unavailable, the server responds with HTTP 410 "Gone". For list requests to servers that honor the
resourceVersionMatch parameter, this guarantees that the collection's .metadata.resourceVersion
is the same as the resourceVersion you requested in the query string. That guarantee does
not apply to the .metadata.resourceVersion of any items within that collection.
Continue Token, Exact
Return data at the resource version of the initial paginated list call. The returned continue
tokens are responsible for keeping track of the initially provided resource version for all paginated
list calls after the initial paginated list.
Note:
When you list resources and receive a collection response, the response includes the
list metadata
of the collection as well as
object metadata
for each item in that collection. For individual objects found within a collection response,
.metadata.resourceVersion tracks when that object was last updated, and not how up-to-date
the object is when served.
When using resourceVersionMatch=NotOlderThan and limit is set, clients must
handle HTTP 410 "Gone" responses. For example, the client might retry with a
newer resourceVersion or fall back to resourceVersion="".
When using resourceVersionMatch=Exact and limit is unset, clients must
verify that the collection's .metadata.resourceVersion matches
the requested resourceVersion, and handle the case where it does not. For
example, the client might fall back to a request with limit set.
Semantics for watch
For watch, the semantics of resource version are:
watch:
resourceVersion for watch
resourceVersion unset
resourceVersion="0"
resourceVersion="{value other than 0}"
Get State and Start at Most Recent
Get State and Start at Any
Start at Exact
The meaning of those watch semantics are:
Get State and Start at Any
Caution:
Watches initialized this way may return arbitrarily stale
data. Please review this semantic before using it, and favor the other semantics
where possible.
Start a watch at any resource version; the most recent resource version
available is preferred, but not required. Any starting resource version is
allowed. It is possible for the watch to start at a much older resource
version that the client has previously observed, particularly in high availability
configurations, due to partitions or stale caches. Clients that cannot tolerate
this apparent rewinding should not start a watch with this semantic. To
establish initial state, the watch begins with synthetic "Added" events for
all resource instances that exist at the starting resource version. All following
watch events are for all changes that occurred after the resource version the
watch started at.
Get State and Start at Most Recent
Start a watch at the most recent resource version, which must be consistent
(in detail: served from etcd via a quorum read). To establish initial state,
the watch begins with synthetic "Added" events of all resources instances
that exist at the starting resource version. All following watch events are for
all changes that occurred after the resource version the watch started at.
Start at Exact
Start a watch at an exact resource version. The watch events are for all changes
after the provided resource version. Unlike "Get State and Start at Most Recent"
and "Get State and Start at Any", the watch is not started with synthetic
"Added" events for the provided resource version. The client is assumed to already
have the initial state at the starting resource version since the client provided
the resource version.
"410 Gone" responses
Servers are not required to serve all older resource versions and may return a HTTP
410 (Gone) status code if a client requests a resourceVersion older than the
server has retained. Clients must be able to tolerate 410 (Gone) responses. See
Efficient detection of changes for details on
how to handle 410 (Gone) responses when watching resources.
If you request a resourceVersion outside the applicable limit then, depending
on whether a request is served from cache or not, the API server may reply with a
410 Gone HTTP response.
Unavailable resource versions
Servers are not required to serve unrecognized resource versions. If you request
list or get for a resource version that the API server does not recognize,
then the API server may either:
wait briefly for the resource version to become available, then timeout with a
504 (Gateway Timeout) if the provided resource versions does not become available
in a reasonable amount of time;
respond with a Retry-After response header indicating how many seconds a client
should wait before retrying the request.
If you request a resource version that an API server does not recognize, the
kube-apiserver additionally identifies its error responses with a "Too large resource
version" message.
If you make a watch request for an unrecognized resource version, the API server
may wait indefinitely (until the request timeout) for the resource version to become
available.
2.2 - Server-Side Apply
FEATURE STATE:Kubernetes v1.22 [stable]
Kubernetes supports multiple appliers collaborating to manage the fields
of a single object.
Server-Side Apply provides an optional mechanism for your cluster's control plane to track
changes to an object's fields. At the level of a specific resource, Server-Side
Apply records and tracks information about control over the fields of that object.
Server-Side Apply helps users and controllers
manage their resources through declarative configuration. Clients can create and modify
objects
declaratively by submitting their fully specified intent.
A fully specified intent is a partial object that only includes the fields and
values for which the user has an opinion. That intent either creates a new
object (using default values for unspecified fields), or is
combined, by the API server, with the existing object.
Comparison with Client-Side Apply explains
how Server-Side Apply differs from the original, client-side kubectl apply
implementation.
Field management
The Kubernetes API server tracks managed fields for all newly created objects.
When trying to apply an object, fields that have a different value and are owned by
another manager will result in a conflict. This is done
in order to signal that the operation might undo another collaborator's changes.
Writes to objects with managed fields can be forced, in which case the value of any
conflicted field will be overridden, and the ownership will be transferred.
Whenever a field's value does change, ownership moves from its current manager to the
manager making the change.
Apply checks if there are any other field managers that also own the
field. If the field is not owned by any other field managers, that field is
set to its default value (if there is one), or otherwise is deleted from the
object.
The same rule applies to fields that are lists, associative lists, or maps.
For a user to manage a field, in the Server-Side Apply sense, means that the
user relies on and expects the value of the field not to change. The user who
last made an assertion about the value of a field will be recorded as the
current field manager. This can be done by changing the field manager
details explicitly using HTTP POST (create), PUT (update), or non-apply
PATCH (patch). You can also declare and record a field manager
by including a value for that field in a Server-Side Apply operation.
A Server-Side Apply patch request requires the client to provide its identity
as a field manager. When using Server-Side Apply, trying to change a
field that is controlled by a different manager results in a rejected
request unless the client forces an override.
For details of overrides, see Conflicts.
When two or more appliers set a field to the same value, they share ownership of
that field. Any subsequent attempt to change the value of the shared field, by any of
the appliers, results in a conflict. Shared field owners may give up ownership
of a field by making a Server-Side Apply patch request that doesn't include
that field.
Field management details are stored in a managedFields field that is part of an
object's metadata.
If you remove a field from a manifest and apply that manifest, Server-Side
Apply checks if there are any other field managers that also own the field.
If the field is not owned by any other field managers, it is either deleted
from the live object or reset to its default value, if it has one.
The same rule applies to associative list or map items.
Compared to the (legacy)
kubectl.kubernetes.io/last-applied-configuration
annotation managed by kubectl, Server-Side Apply uses a more declarative
approach, that tracks a user's (or client's) field management, rather than
a user's last applied state. As a side effect of using Server-Side Apply,
information about which field manager manages each field in an object also
becomes available.
Example
A simple example of an object created using Server-Side Apply could look like this:
---apiVersion:v1kind:ConfigMapmetadata:name:test-cmnamespace:defaultlabels:test-label:testmanagedFields:- manager:kubectloperation: Apply # note capitalization:"Apply"(or "Update")apiVersion:v1time:"2010-10-10T0:00:00Z"fieldsType:FieldsV1fieldsV1:f:metadata:f:labels:f:test-label:{}f:data:f:key:{}data:key:some value
That example ConfigMap object contains a single field management record in
.metadata.managedFields. The field management record consists of basic information
about the managing entity itself, plus details about the fields being managed and
the relevant operation (Apply or Update). If the request that last changed that
field was a Server-Side Apply patch then the value of operation is Apply;
otherwise, it is Update.
There is another possible outcome. A client could submit an invalid request
body. If the fully specified intent does not produce a valid object, the
request fails.
It is however possible to change .metadata.managedFields through an
update, or through a patch operation that does not use Server-Side Apply.
Doing so is highly discouraged, but might be a reasonable option to try if,
for example, the .metadata.managedFields get into an inconsistent state
(which should not happen in normal operations).
The format of managedFields is described
in the Kubernetes API reference.
Caution:
The .metadata.managedFields field is managed by the API server.
You should avoid updating it manually.
Conflicts
A conflict is a special status error that occurs when an Apply operation tries
to change a field that another manager also claims to manage. This prevents an
applier from unintentionally overwriting the value set by another user. When
this occurs, the applier has 3 options to resolve the conflicts:
Overwrite value, become sole manager: If overwriting the value was
intentional (or if the applier is an automated process like a controller) the
applier should set the force query parameter to true (for kubectl apply,
you use the --force-conflicts command line parameter), and make the request
again. This forces the operation to succeed, changes the value of the field,
and removes the field from all other managers' entries in managedFields.
Don't overwrite value, give up management claim: If the applier doesn't
care about the value of the field any more, the applier can remove it from their
local model of the resource, and make a new request with that particular field
omitted. This leaves the value unchanged, and causes the field to be removed
from the applier's entry in managedFields.
Don't overwrite value, become shared manager: If the applier still cares
about the value of a field, but doesn't want to overwrite it, they can
change the value of that field in their local model of the resource so as to
match the value of the object on the server, and then make a new request that
takes into account that local update. Doing so leaves the value unchanged,
and causes that field's management to be shared by the applier along with all
other field managers that already claimed to manage it.
Field managers
Managers identify distinct workflows that are modifying the object (especially
useful on conflicts!), and can be specified through the
fieldManager
query parameter as part of a modifying request. When you Apply to a resource,
the fieldManager parameter is required.
For other updates, the API server infers a field manager identity from the
"User-Agent:" HTTP header (if present).
When you use the kubectl tool to perform a Server-Side Apply operation, kubectl
sets the manager identity to "kubectl" by default.
Serialization
At the protocol level, Kubernetes represents Server-Side Apply message bodies
as YAML, with the media type application/apply-patch+yaml.
Note:
Whether you are submitting JSON data or YAML data, use
application/apply-patch+yaml as the Content-Type header value.
All JSON documents are valid YAML. However, Kubernetes has a bug where it uses a YAML
parser that does not fully implement the YAML specification. Some JSON escapes may
not be recognized.
The serialization is the same as for Kubernetes objects, with the exception that
clients are not required to send a complete object.
Here's an example of a Server-Side Apply message body (fully specified intent):
{"apiVersion": "v1","kind": "ConfigMap"}
(this would make a no-change update, provided that it was sent as the body
of a patch request to a valid v1/configmaps resource, and with the
appropriate request Content-Type).
Operations in scope for field management
The Kubernetes API operations where field management is considered are:
Server-Side Apply (HTTP PATCH, with content type application/apply-patch+yaml)
Replacing an existing object (update to Kubernetes; PUT at the HTTP level)
Both operations update .metadata.managedFields, but behave a little differently.
Unless you specify a forced override, an apply operation that encounters field-level
conflicts always fails; by contrast, if you make a change using update that would
affect a managed field, a conflict never provokes failure of the operation.
All Server-Side Apply patch requests are required to identify themselves by providing a
fieldManager query parameter, while the query parameter is optional for update
operations. Finally, when using the Apply operation you cannot define managedFields in
the body of the request that you submit.
An example object with multiple managers could look like this:
---apiVersion:v1kind:ConfigMapmetadata:name:test-cmnamespace:defaultlabels:test-label:testmanagedFields:- manager:kubectloperation:ApplyapiVersion:v1fields:f:metadata:f:labels:f:test-label:{}- manager:kube-controller-manageroperation:UpdateapiVersion:v1time:'2019-03-30T16:00:00.000Z'fields:f:data:f:key:{}data:key:new value
In this example, a second operation was run as an update by the manager called
kube-controller-manager. The update request succeeded and changed a value in the data
field, which caused that field's management to change to the kube-controller-manager.
If this update has instead been attempted using Server-Side Apply, the request
would have failed due to conflicting ownership.
Merge strategy
The merging strategy, implemented with Server-Side Apply, provides a generally
more stable object lifecycle. Server-Side Apply tries to merge fields based on
the actor who manages them instead of overruling based on values. This way
multiple actors can update the same object without causing unexpected interference.
When a user sends a fully-specified intent object to the Server-Side Apply
endpoint, the server merges it with the live object favoring the value from the
request body if it is specified in both places. If the set of items present in
the applied config is not a superset of the items applied by the same user last
time, each missing item not managed by any other appliers is removed. For
more information about how an object's schema is used to make decisions when
merging, see
sigs.k8s.io/structured-merge-diff.
The Kubernetes API (and the Go code that implements that API for Kubernetes) allows
defining merge strategy markers. These markers describe the merge strategy supported
for fields within Kubernetes objects.
For a CustomResourceDefinition,
you can set these markers when you define the custom resource.
Golang marker
OpenAPI extension
Possible values
Description
//+listType
x-kubernetes-list-type
atomic/set/map
Applicable to lists. set applies to lists that include only scalar elements. These elements must be unique. map applies to lists of nested types only. The key values (see listMapKey) must be unique in the list. atomic can apply to any list. If configured as atomic, the entire list is replaced during merge. At any point in time, a single manager owns the list. If set or map, different managers can manage entries separately.
//+listMapKey
x-kubernetes-list-map-keys
List of field names, e.g. ["port", "protocol"]
Only applicable when +listType=map. A list of field names whose values uniquely identify entries in the list. While there can be multiple keys, listMapKey is singular because keys need to be specified individually in the Go type. The key fields must be scalars.
//+mapType
x-kubernetes-map-type
atomic/granular
Applicable to maps. atomic means that the map can only be entirely replaced by a single manager. granular means that the map supports separate managers updating individual fields.
//+structType
x-kubernetes-map-type
atomic/granular
Applicable to structs; otherwise same usage and OpenAPI annotation as //+mapType.
If listType is missing, the API server interprets a
patchStrategy=merge marker as a listType=map and the
corresponding patchMergeKey marker as a listMapKey.
The atomic list type is recursive.
(In the Go code for Kubernetes, these markers are specified as
comments and code authors need not repeat them as field tags).
Custom resources and Server-Side Apply
By default, Server-Side Apply treats custom resources as unstructured data. All
keys are treated the same as struct fields, and all lists are considered atomic.
If the CustomResourceDefinition defines a
schema
that contains annotations as defined in the previous Merge Strategy
section, these annotations will be used when merging objects of this
type.
Compatibility across topology changes
On rare occurrences, the author for a CustomResourceDefinition (CRD) or built-in
may want to change the specific topology of a field in their resource,
without incrementing its API version. Changing the topology of types,
by upgrading the cluster or updating the CRD, has different consequences when
updating existing objects. There are two categories of changes: when a field goes from
map/set/granular to atomic, and the other way around.
When the listType, mapType, or structType changes from
map/set/granular to atomic, the whole list, map, or struct of
existing objects will end-up being owned by actors who owned an element
of these types. This means that any further change to these objects
would cause a conflict.
When a listType, mapType, or structType changes from atomic to
map/set/granular, the API server is unable to infer the new
ownership of these fields. Because of that, no conflict will be produced
when objects have these fields updated. For that reason, it is not
recommended to change a type from atomic to map/set/granular.
Before spec.data gets changed from atomic to granular,
manager-one owns the field spec.data, and all the fields within it
(key1 and key2). When the CRD gets changed to make spec.datagranular, manager-one continues to own the top-level field
spec.data (meaning no other managers can delete the map called data
without a conflict), but it no longer owns key1 and key2, so another
manager can then modify or delete those fields without conflict.
Using Server-Side Apply in a controller
As a developer of a controller, you can use Server-Side Apply as a way to
simplify the update logic of your controller. The main differences with a
read-modify-write and/or patch are the following:
the applied object must contain all the fields that the controller cares about.
there is no way to remove fields that haven't been applied by the controller
before (controller can still send a patch or update for these use-cases).
the object doesn't have to be read beforehand; resourceVersion doesn't have
to be specified.
It is strongly recommended for controllers to always force conflicts on objects that
they own and manage, since they might not be able to resolve or act on these conflicts.
Transferring ownership
In addition to the concurrency controls provided by conflict resolution,
Server-Side Apply provides ways to perform coordinated
field ownership transfers from users to controllers.
This is best explained by example. Let's look at how to safely transfer
ownership of the replicas field from a user to a controller while enabling
automatic horizontal scaling for a Deployment, using the HorizontalPodAutoscaler
resource and its accompanying controller.
Say a user has defined Deployment with replicas set to the desired value:
Now, the user would like to remove replicas from their configuration, so they
don't accidentally fight with the HorizontalPodAutoscaler (HPA) and its controller.
However, there is a race: it might take some time before the HPA feels the need
to adjust .spec.replicas; if the user removes .spec.replicas before the HPA writes
to the field and becomes its owner, then the API server would set .spec.replicas to
1 (the default replica count for Deployment).
This is not what the user wants to happen, even temporarily - it might well degrade
a running workload.
There are two solutions:
(basic) Leave replicas in the configuration; when the HPA eventually writes to that
field, the system gives the user a conflict over it. At that point, it is safe
to remove from the configuration.
(more advanced) If, however, the user doesn't want to wait, for example
because they want to keep the cluster legible to their colleagues, then they
can take the following steps to make it safe to remove replicas from their
configuration:
First, the user defines a new manifest containing only the replicas field:
# Save this file as 'nginx-deployment-replicas-only.yaml'.apiVersion:apps/v1kind:Deploymentmetadata:name:nginx-deploymentspec:replicas:3
Note:
The YAML file for SSA in this case only contains the fields you want to change.
You are not supposed to provide a fully compliant Deployment manifest if you only
want to modify the spec.replicas field using SSA.
The user applies that manifest using a private field manager name. In this example,
the user picked handover-to-hpa:
If the apply results in a conflict with the HPA controller, then do nothing. The
conflict indicates the controller has claimed the field earlier in the
process than it sometimes does.
At this point the user may remove the replicas field from their manifest:
Note that whenever the HPA controller sets the replicas field to a new value,
the temporary field manager will no longer own any fields and will be
automatically deleted. No further clean up is required.
Transferring ownership between managers
Field managers can transfer ownership of a field between each other by setting the field
to the same value in both of their applied configurations, causing them to share
ownership of the field. Once the managers share ownership of the field, one of them
can remove the field from their applied configuration to give up ownership and
complete the transfer to the other field manager.
Comparison with Client-Side Apply
Server-Side Apply is meant both as a replacement for the original client-side
implementation of the kubectl apply subcommand, and as simple and effective
mechanism for controllers
to enact their changes.
Compared to the last-applied annotation managed by kubectl, Server-Side
Apply uses a more declarative approach, which tracks an object's field management,
rather than a user's last applied state. This means that as a side effect of
using Server-Side Apply, information about which field manager manages each
field in an object also becomes available.
A consequence of the conflict detection and resolution implemented by Server-Side
Apply is that an applier always has up to date field values in their local
state. If they don't, they get a conflict the next time they apply. Any of the
three options to resolve conflicts results in the applied configuration being an
up to date subset of the object on the server's fields.
This is different from Client-Side Apply, where outdated values which have been
overwritten by other users are left in an applier's local config. These values
only become accurate when the user updates that specific field, if ever, and an
applier has no way of knowing whether their next apply will overwrite other
users' changes.
Another difference is that an applier using Client-Side Apply is unable to
change the API version they are using, but Server-Side Apply supports this use
case.
Migration between client-side and server-side apply
Upgrading from client-side apply to server-side apply
Client-side apply users who manage a resource with kubectl apply can start
using server-side apply with the following flag.
kubectl apply --server-side [--dry-run=server]
By default, field management of the object transfers from client-side apply to
kubectl server-side apply, without encountering conflicts.
Caution:
Keep the last-applied-configuration annotation up to date.
The annotation infers client-side applies managed fields.
Any fields not managed by client-side apply raise conflicts.
For example, if you used kubectl scale to update the replicas field after
client-side apply, then this field is not owned by client-side apply and
creates conflicts on kubectl apply --server-side.
This behavior applies to server-side apply with the kubectl field manager.
As an exception, you can opt-out of this behavior by specifying a different,
non-default field manager, as seen in the following example. The default field
manager for kubectl server-side apply is kubectl.
Downgrading from server-side apply to client-side apply
If you manage a resource with kubectl apply --server-side,
you can downgrade to client-side apply directly with kubectl apply.
Downgrading works because kubectl Server-Side Apply keeps the
last-applied-configuration annotation up-to-date if you use
kubectl apply.
This behavior applies to Server-Side Apply with the kubectl field manager.
As an exception, you can opt-out of this behavior by specifying a different,
non-default field manager, as seen in the following example. The default field
manager for kubectl server-side apply is kubectl.
The PATCH verb for a resource that supports Server-Side Apply can accepts the
unofficial application/apply-patch+yaml content type. Users of Server-Side
Apply can send partially specified objects as YAML as the body of a PATCH request
to the URI of a resource. When applying a configuration, you should always include all the
fields that are important to the outcome (such as a desired state) that you want to define.
All JSON messages are valid YAML. Some clients specify Server-Side Apply requests using YAML
request bodies that are also valid JSON.
Access control and permissions
Since Server-Side Apply is a type of PATCH, a principal (such as a Role for Kubernetes
RBAC) requires the patch permission to
edit existing resources, and also needs the create verb permission in order to create
new resources with Server-Side Apply.
Clearing managedFields
It is possible to strip all managedFields from an object by overwriting them
using a patch (JSON Merge Patch, Strategic Merge Patch, JSON Patch), or
through an update (HTTP PUT); in other words, through every write operation
other than apply. This can be done by overwriting the managedFields field
with an empty entry. Two examples are:
This will overwrite the managedFields with a list containing a single empty
entry that then results in the managedFields being stripped entirely from the
object. Note that setting the managedFields to an empty list will not
reset the field. This is on purpose, so managedFields never get stripped by
clients not aware of the field.
In cases where the reset operation is combined with changes to other fields
than the managedFields, this will result in the managedFields being reset
first and the other changes being processed afterwards. As a result the
applier takes ownership of any fields updated in the same request.
Note:
Server-Side Apply does not correctly track ownership on
sub-resources that don't receive the resource object type. If you are
using Server-Side Apply with such a sub-resource, the changed fields
may not be tracked.
What's next
You can read about managedFields within the Kubernetes API reference for the
metadata
top level field.
2.3 - Client Libraries
This page contains an overview of the client libraries for using the Kubernetes
API from various programming languages.
To write applications using the Kubernetes REST API,
you do not need to implement the API calls and request/response types yourself.
You can use a client library for the programming language you are using.
Client libraries often handle common tasks such as authentication for you.
Most client libraries can discover and use the Kubernetes Service Account to
authenticate if the API client is running inside the Kubernetes cluster, or can
understand the kubeconfig file
format to read the credentials and the API Server address.
Note: This section links to third party projects that provide functionality required by Kubernetes. The Kubernetes project authors aren't responsible for these projects, which are listed alphabetically. To add a project to this list, read the content guide before submitting a change. More information.
The following Kubernetes API client libraries are provided and maintained by
their authors, not the Kubernetes team.
The Common Expression Language (CEL) is used
in the Kubernetes API to declare validation rules, policy rules, and other
constraints or conditions.
CEL expressions are evaluated directly in the
API server, making CEL a
convenient alternative to out-of-process mechanisms, such as webhooks, for many
extensibility use cases. Your CEL expressions continue to execute so long as the
control plane's API server component remains available.
Language overview
The CEL
language has a
straightforward syntax that is similar to the expressions in C, C++, Java,
JavaScript and Go.
CEL was designed to be embedded into applications. Each CEL "program" is a
single expression that evaluates to a single value. CEL expressions are
typically short "one-liners" that inline well into the string fields of Kubernetes
API resources.
Inputs to a CEL program are "variables". Each Kubernetes API field that contains
CEL declares in the API documentation which variables are available to use for
that field. For example, in the x-kubernetes-validations[i].rules field of
CustomResourceDefinitions, the self and oldSelf variables are available and
refer to the previous and current state of the custom resource data to be
validated by the CEL expression. Other Kubernetes API fields may declare
different variables. See the API documentation of the API fields to learn which
variables are available for that field.
Example CEL expressions:
Examples of CEL expressions and the purpose of each
CEL functions, features and language settings support Kubernetes control plane
rollbacks. For example, CEL Optional Values was introduced at Kubernetes 1.29
and so only API servers at that version or newer will accept write requests to
CEL expressions that use CEL Optional Values. However, when a cluster is
rolled back to Kubernetes 1.28 CEL expressions using "CEL Optional Values" that
are already stored in API resources will continue to evaluate correctly.
Kubernetes CEL libraries
In additional to the CEL community libraries, Kubernetes includes CEL libraries
that are available everywhere CEL is used in Kubernetes.
Kubernetes list library
The list library includes indexOf and lastIndexOf, which work similar to the
strings functions of the same names. These functions either the first or last
positional index of the provided element in the list.
The list library also includes min, max and sum. Sum is supported on all
number types as well as the duration type. Min and max are supported on all
comparable types.
isSorted is also provided as a convenience function and is supported on all
comparable types.
Examples:
Examples of CEL expressions using list library functions
CEL Expression
Purpose
names.isSorted()
Verify that a list of names is kept in alphabetical order
items.map(x, x.weight).sum() == 1.0
Verify that the "weights" of a list of objects sum to 1.0
In addition to the matches function provided by the CEL standard library, the
regex library provides find and findAll, enabling a much wider range of
regex operations.
Examples:
Examples of CEL expressions using regex library functions
For CEL expressions in the API where a variable of type Authorizer is available,
the authorizer may be used to perform authorization checks for the principal
(authenticated user) of the request.
API resource checks are performed as follows:
Specify the group and resource to check: Authorizer.group(string).resource(string) ResourceCheck
Optionally call any combination of the following builder functions to further narrow the authorization check.
Note that these functions return the receiver type and can be chained:
ResourceCheck.subresource(string) ResourceCheck
ResourceCheck.namespace(string) ResourceCheck
ResourceCheck.name(string) ResourceCheck
Call ResourceCheck.check(verb string) Decision to perform the authorization check.
Call allowed() bool or reason() string to inspect the result of the authorization check.
Non-resource authorization performed are used as follows:
specify only a path: Authorizer.path(string) PathCheck
Call PathCheck.check(httpVerb string) Decision to perform the authorization check.
Call allowed() bool or reason() string to inspect the result of the authorization check.
To perform an authorization check for a service account:
Authorizer.serviceAccount(namespace string, name string) Authorizer
Examples of CEL expressions using URL library functions
quantity(string) Quantity converts a string to a Quantity or results in an error if the
string is not a valid quantity.
Once parsed via the quantity function, the resulting Quantity object has the
following library of member functions:
Available member functions of a Quantity
Member Function
CEL Return Value
Description
isInteger()
bool
returns true if and only if asInteger is safe to call without an error
asInteger()
int
returns a representation of the current value as an int64 if possible or results in an error if conversion would result in overflow or loss of precision.
asApproximateFloat()
float
returns a float64 representation of the quantity which may lose precision. If the value of the quantity is outside the range of a float64 +Inf/-Inf will be returned.
sign()
int
Returns 1 if the quantity is positive, -1 if it is negative. 0 if it is zero
add(<Quantity>)
Quantity
Returns sum of two quantities
add(<int>)
Quantity
Returns sum of quantity and an integer
sub(<Quantity>)
Quantity
Returns difference between two quantities
sub(<int>)
Quantity
Returns difference between a quantity and an integer
isLessThan(<Quantity>)
bool
Returns true if and only if the receiver is less than the operand
isGreaterThan(<Quantity>)
bool
Returns true if and only if the receiver is greater than the operand
compareTo(<Quantity>)
int
Compares receiver to operand and returns 0 if they are equal, 1 if the receiver is greater, or -1 if the receiver is less than the operand
Examples:
Examples of CEL expressions using URL library functions
CEL Expression
Purpose
quantity("500000G").isInteger()
Test if conversion to integer would throw an error
Some Kubernetes API fields contain partially type checked CEL expressions. A
partially type checked expression is an expressions where some of the variables
are statically typed but others are dynamically typed. For example, in the CEL
expressions of
ValidatingAdmissionPolicies
the request variable is typed, but the object variable is dynamically typed.
As a result, an expression containing request.namex would fail type checking
because the namex field is not defined. However, object.namex would pass
type checking even when the namex field is not defined for the resource kinds
that object refers to, because object is dynamically typed.
The has() macro in CEL may be used in CEL expressions to check if a field of a
dynamically typed variable is accessible before attempting to access the field's
value. For example:
Equality comparison for arrays with x-kubernetes-list-type of set or map ignores element
order. For example [1, 2] == [2, 1] if the arrays represent Kubernetes set values.
Concatenation on arrays with x-kubernetes-list-type use the semantics of the
list type:
set: X + Y performs a union where the array positions of all elements in
X are preserved and non-intersecting elements in Y are appended, retaining
their partial order.
map: X + Y performs a merge where the array positions of all keys in X
are preserved but the values are overwritten by values in Y when the key
sets of X and Y intersect. Elements in Y with non-intersecting keys are
appended, retaining their partial order.
Escaping
Only Kubernetes resource property names of the form
[a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible from CEL. Accessible property
names are escaped according to the following rules when accessed in the
expression:
When you escape any of CEL's RESERVED keywords you need to match the exact property name
use the underscore escaping
(for example, int in the word sprint would not be escaped and nor would it need to be).
Examples on escaping:
Examples escaped CEL identifiers
property name
rule with escaped property name
namespace
self.__namespace__ > 0
x-prop
self.x__dash__prop > 0
redact__d
self.redact__underscores__d > 0
string
self.startsWith('kube')
Resource constraints
CEL is non-Turing complete and offers a variety of production safety controls to
limit execution time. CEL's resource constraint features provide feedback to
developers about expression complexity and help protect the API server from
excessive resource consumption during evaluation. CEL's resource constraint
features are used to prevent CEL evaluation from consuming excessive API server
resources.
A key element of the resource constraint features is a cost unit that CEL
defines as a way of tracking CPU utilization. Cost units are independent of
system load and hardware. Cost units are also deterministic; for any given CEL
expression and input data, evaluation of the expression by the CEL interpreter
will always result in the same cost.
Many of CEL's core operations have fixed costs. The simplest operations, such as
comparisons (e.g. <) have a cost of 1. Some have a higher fixed cost, for
example list literal declarations have a fixed base cost of 40 cost units.
Calls to functions implemented in native code approximate cost based on the time
complexity of the operation. For example: operations that use regular
expressions, such as match and find, are estimated using an approximated
cost of length(regexString)*length(inputString). The approximated cost
reflects the worst case time complexity of Go's RE2 implementation.
Runtime cost budget
All CEL expressions evaluated by Kubernetes are constrained by a runtime cost
budget. The runtime cost budget is an estimate of actual CPU utilization
computed by incrementing a cost unit counter while interpreting a CEL
expression. If the CEL interpreter executes too many instructions, the runtime
cost budget will be exceeded, execution of the expressions will be halted, and
an error will result.
Some Kubernetes resources define an additional runtime cost budget that bounds
the execution of multiple expressions. If the sum total of the cost of
expressions exceed the budget, execution of the expressions will be halted, and
an error will result. For example the validation of a custom resource has a
per-validation runtime cost budget for all
Validation Rules
evaluated to validate the custom resource.
Estimated cost limits
For some Kubernetes resources, the API server may also check if worst case
estimated running time of CEL expressions would be prohibitively expensive to
execute. If so, the API server prevent the CEL expression from being written to
API resources by rejecting create or update operations containing the CEL
expression to the API resources. This feature offers a stronger assurance that
CEL expressions written to the API resource will be evaluate at runtime without
exceeding the runtime cost budget.
2.5 - Kubernetes Deprecation Policy
This document details the deprecation policy for various facets of the system.
Kubernetes is a large system with many components and many contributors. As
with any such software, the feature set naturally evolves over time, and
sometimes a feature may need to be removed. This could include an API, a flag,
or even an entire feature. To avoid breaking existing users, Kubernetes follows
a deprecation policy for aspects of the system that are slated to be removed.
Deprecating parts of the API
Since Kubernetes is an API-driven system, the API has evolved over time to
reflect the evolving understanding of the problem space. The Kubernetes API is
actually a set of APIs, called "API groups", and each API group is
independently versioned. API versions fall
into 3 main tracks, each of which has different policies for deprecation:
Example
Track
v1
GA (generally available, stable)
v1beta1
Beta (pre-release)
v1alpha1
Alpha (experimental)
A given release of Kubernetes can support any number of API groups and any
number of versions of each.
The following rules govern the deprecation of elements of the API. This
includes:
REST resources (aka API objects)
Fields of REST resources
Annotations on REST resources, including "beta" annotations but not
including "alpha" annotations.
Enumerated or constant values
Component config structures
These rules are enforced between official releases, not between
arbitrary commits to master or release branches.
Rule #1: API elements may only be removed by incrementing the version of the
API group.
Once an API element has been added to an API group at a particular version, it
can not be removed from that version or have its behavior significantly
changed, regardless of track.
Note:
For historical reasons, there are 2 "monolithic" API groups - "core" (no
group name) and "extensions". Resources will incrementally be moved from these
legacy API groups into more domain-specific API groups.
Rule #2: API objects must be able to round-trip between API versions in a given
release without information loss, with the exception of whole REST resources
that do not exist in some versions.
For example, an object can be written as v1 and then read back as v2 and
converted to v1, and the resulting v1 resource will be identical to the
original. The representation in v2 might be different from v1, but the system
knows how to convert between them in both directions. Additionally, any new
field added in v2 must be able to round-trip to v1 and back, which means v1
might have to add an equivalent field or represent it as an annotation.
Rule #3: An API version in a given track may not be deprecated in favor of a less stable API version.
GA API versions can replace beta and alpha API versions.
Beta API versions can replace earlier beta and alpha API versions, but may not replace GA API versions.
Alpha API versions can replace earlier alpha API versions, but may not replace GA or beta API versions.
Rule #4a: API lifetime is determined by the API stability level
GA API versions may be marked as deprecated, but must not be removed within a major version of Kubernetes
Beta API versions are deprecated no more than 9 months or 3 minor releases after introduction (whichever is longer),
and are no longer served 9 months or 3 minor releases after deprecation (whichever is longer)
Alpha API versions may be removed in any release without prior deprecation notice
This ensures beta API support covers the maximum supported version skew of 2 releases,
and that APIs don't stagnate on unstable beta versions, accumulating production usage that will be disrupted when support for the beta API ends.
Note:
There are no current plans for a major version revision of Kubernetes that removes GA APIs.
Note:
Until #52185 is
resolved, no API versions that have been persisted to storage may be removed.
Serving REST endpoints for those versions may be disabled (subject to the
deprecation timelines in this document), but the API server must remain capable
of decoding/converting previously persisted data from storage.
Rule #4b: The "preferred" API version and the "storage version" for a given
group may not advance until after a release has been made that supports both the
new version and the previous version
Users must be able to upgrade to a new release of Kubernetes and then roll back
to a previous release, without converting anything to the new API version or
suffering breakages (unless they explicitly used features only available in the
newer version). This is particularly evident in the stored representation of
objects.
All of this is best illustrated by examples. Imagine a Kubernetes release,
version X, which introduces a new API group. A new Kubernetes release is made
every approximately 4 months (3 per year). The following table describes which
API versions are supported in a series of subsequent releases.
Consider a hypothetical REST resource named Widget, which was present in API v1
in the above timeline, and which needs to be deprecated. We document and
announce the
deprecation in sync with release X+1. The Widget resource still exists in API
version v1 (deprecated) but not in v2alpha1. The Widget resource continues to
exist and function in releases up to and including X+8. Only in release X+9,
when API v1 has aged out, does the Widget resource cease to exist, and the
behavior get removed.
Starting in Kubernetes v1.19, making an API request to a deprecated REST API endpoint:
Returns a Warning header (as defined in RFC7234, Section 5.5) in the API response.
Adds a "k8s.io/deprecated":"true" annotation to the audit event recorded for the request.
Sets an apiserver_requested_deprecated_apis gauge metric to 1 in the kube-apiserver
process. The metric has labels for group, version, resource, subresource that can be joined
to the apiserver_request_total metric, and a removed_release label that indicates the
Kubernetes release in which the API will no longer be served. The following Prometheus query
returns information about requests made to deprecated APIs which will be removed in v1.22:
As with whole REST resources, an individual field which was present in API v1
must exist and function until API v1 is removed. Unlike whole resources, the
v2 APIs may choose a different representation for the field, as long as it can
be round-tripped. For example a v1 field named "magnitude" which was
deprecated might be named "deprecatedMagnitude" in API v2. When v1 is
eventually removed, the deprecated field can be removed from v2.
Enumerated or constant values
As with whole REST resources and fields thereof, a constant value which was
supported in API v1 must exist and function until API v1 is removed.
Component config structures
Component configs are versioned and managed similar to REST resources.
Future work
Over time, Kubernetes will introduce more fine-grained API versions, at which
point these rules will be adjusted as needed.
Deprecating a flag or CLI
The Kubernetes system is comprised of several different programs cooperating.
Sometimes, a Kubernetes release might remove flags or CLI commands
(collectively "CLI elements") in these programs. The individual programs
naturally sort into two main groups - user-facing and admin-facing programs,
which vary slightly in their deprecation policies. Unless a flag is explicitly
prefixed or documented as "alpha" or "beta", it is considered GA.
CLI elements are effectively part of the API to the system, but since they are
not versioned in the same way as the REST API, the rules for deprecation are as
follows:
Rule #5a: CLI elements of user-facing components (e.g. kubectl) must function
after their announced deprecation for no less than:
GA: 12 months or 2 releases (whichever is longer)
Beta: 3 months or 1 release (whichever is longer)
Alpha: 0 releases
Rule #5b: CLI elements of admin-facing components (e.g. kubelet) must function
after their announced deprecation for no less than:
GA: 6 months or 1 release (whichever is longer)
Beta: 3 months or 1 release (whichever is longer)
Alpha: 0 releases
Rule #5c: Command line interface (CLI) elements cannot be deprecated in favor of
less stable CLI elements
Similar to the Rule #3 for APIs, if an element of a command line interface is being replaced with an
alternative implementation, such as by renaming an existing element, or by switching to
use configuration sourced from a file
instead of a command line argument, that recommended alternative must be of
the same or higher stability level.
Rule #6: Deprecated CLI elements must emit warnings (optionally disable)
when used.
Deprecating a feature or behavior
Occasionally a Kubernetes release needs to deprecate some feature or behavior
of the system that is not controlled by the API or CLI. In this case, the
rules for deprecation are as follows:
Rule #7: Deprecated behaviors must function for no less than 1 year after their
announced deprecation.
If the feature or behavior is being replaced with an alternative implementation
that requires work to adopt the change, there should be an effort to simplify
the transition whenever possible. If an alternative implementation is under
Kubernetes organization control, the following rules apply:
Rule #8: The feature of behavior must not be deprecated in favor of an alternative
implementation that is less stable
For example, a generally available feature cannot be deprecated in favor of a Beta
replacement.
The Kubernetes project does, however, encourage users to adopt and transitions to alternative
implementations even before they reach the same maturity level. This is particularly important
for exploring new use cases of a feature or getting an early feedback on the replacement.
Alternative implementations may sometimes be external tools or products,
for example a feature may move from the kubelet to container runtime
that is not under Kubernetes project control. In such cases, the rule cannot be
applied, but there must be an effort to ensure that there is a transition path
that does not compromise on components' maturity levels. In the example with
container runtimes, the effort may involve trying to ensure that popular container runtimes
have versions that offer the same level of stability while implementing that replacement behavior.
Deprecation rules for features and behaviors do not imply that all changes
to the system are governed by this policy.
These rules applies only to significant, user-visible behaviors which impact the
correctness of applications running on Kubernetes or that impact the
administration of Kubernetes clusters, and which are being removed entirely.
An exception to the above rule is feature gates. Feature gates are key=value
pairs that allow for users to enable/disable experimental features.
Feature gates are intended to cover the development life cycle of a feature - they
are not intended to be long-term APIs. As such, they are expected to be deprecated
and removed after a feature becomes GA or is dropped.
As a feature moves through the stages, the associated feature gate evolves.
The feature life cycle matched to its corresponding feature gate is:
Alpha: the feature gate is disabled by default and can be enabled by the user.
Beta: the feature gate is enabled by default and can be disabled by the user.
GA: the feature gate is deprecated (see "Deprecation") and becomes
non-operational.
GA, deprecation window complete: the feature gate is removed and calls to it are
no longer accepted.
Deprecation
Features can be removed at any point in the life cycle prior to GA. When features are
removed prior to GA, their associated feature gates are also deprecated.
When an invocation tries to disable a non-operational feature gate, the call fails in order
to avoid unsupported scenarios that might otherwise run silently.
In some cases, removing pre-GA features requires considerable time. Feature gates can remain
operational until their associated feature is fully removed, at which point the feature gate
itself can be deprecated.
When removing a feature gate for a GA feature also requires considerable time, calls to
feature gates may remain operational if the feature gate has no effect on the feature,
and if the feature gate causes no errors.
Features intended to be disabled by users should include a mechanism for disabling the
feature in the associated feature gate.
Versioning for feature gates is different from the previously discussed components,
therefore the rules for deprecation are as follows:
Rule #9: Feature gates must be deprecated when the corresponding feature they control
transitions a lifecycle stage as follows. Feature gates must function for no less than:
Beta feature to GA: 6 months or 2 releases (whichever is longer)
Beta feature to EOL: 3 months or 1 release (whichever is longer)
Alpha feature to EOL: 0 releases
Rule #10: Deprecated feature gates must respond with a warning when used. When a feature gate
is deprecated it must be documented in both in the release notes and the corresponding CLI help.
Both warnings and documentation must indicate whether a feature gate is non-operational.
Deprecating a metric
Each component of the Kubernetes control-plane exposes metrics (usually the
/metrics endpoint), which are typically ingested by cluster administrators.
Not all metrics are the same: some metrics are commonly used as SLIs or used
to determine SLOs, these tend to have greater import. Other metrics are more
experimental in nature or are used primarily in the Kubernetes development
process.
Accordingly, metrics fall under three stability classes (ALPHA, BETASTABLE);
this impacts removal of a metric during a Kubernetes release. These classes
are determined by the perceived importance of the metric. The rules for
deprecating and removing a metric are as follows:
Rule #11a: Metrics, for the corresponding stability class, must function for no less than:
STABLE: 4 releases or 12 months (whichever is longer)
BETA: 2 releases or 8 months (whichever is longer)
ALPHA: 0 releases
Rule #11b: Metrics, after their announced deprecation, must function for no less than:
STABLE: 3 releases or 9 months (whichever is longer)
BETA: 1 releases or 4 months (whichever is longer)
ALPHA: 0 releases
Deprecated metrics will have their description text prefixed with a deprecation notice
string '(Deprecated from x.y)' and a warning log will be emitted during metric
registration. Like their stable undeprecated counterparts, deprecated metrics will
be automatically registered to the metrics endpoint and therefore visible.
On a subsequent release (when the metric's deprecatedVersion is equal to
current_kubernetes_version - 3), a deprecated metric will become a hidden metric.
Unlike their deprecated counterparts, hidden metrics will no longer be
automatically registered to the metrics endpoint (hence hidden). However, they
can be explicitly enabled through a command line flag on the binary
(--show-hidden-metrics-for-version=). This provides cluster admins an
escape hatch to properly migrate off of a deprecated metric, if they were not
able to react to the earlier deprecation warnings. Hidden metrics should be
deleted after one release.
Exceptions
No policy can cover every possible situation. This policy is a living
document, and will evolve over time. In practice, there will be situations
that do not fit neatly into this policy, or for which this policy becomes a
serious impediment. Such situations should be discussed with SIGs and project
leaders to find the best solutions for those specific cases, always bearing in
mind that Kubernetes is committed to being a stable system that, as much as
possible, never breaks users. Exceptions will always be announced in all
relevant release notes.
2.6 - Deprecated API Migration Guide
As the Kubernetes API evolves, APIs are periodically reorganized or upgraded.
When APIs evolve, the old API is deprecated and eventually removed.
This page contains information you need to know when migrating from
deprecated API versions to newer and more stable API versions.
Removed APIs by release
v1.32
The v1.32 release will stop serving the following deprecated API versions:
Flow control resources
The flowcontrol.apiserver.k8s.io/v1beta3 API version of FlowSchema and PriorityLevelConfiguration will no longer be served in v1.32.
Migrate manifests and API clients to use the flowcontrol.apiserver.k8s.io/v1 API version, available since v1.29.
All existing persisted objects are accessible via the new API
Notable changes in flowcontrol.apiserver.k8s.io/v1:
The PriorityLevelConfiguration spec.limited.nominalConcurrencyShares field only defaults to 30 when unspecified, and an explicit value of 0 is not changed to 30.
v1.29
The v1.29 release stopped serving the following deprecated API versions:
Flow control resources
The flowcontrol.apiserver.k8s.io/v1beta2 API version of FlowSchema and PriorityLevelConfiguration is no longer served as of v1.29.
Migrate manifests and API clients to use the flowcontrol.apiserver.k8s.io/v1 API version, available since v1.29, or the flowcontrol.apiserver.k8s.io/v1beta3 API version, available since v1.26.
All existing persisted objects are accessible via the new API
Notable changes in flowcontrol.apiserver.k8s.io/v1:
The PriorityLevelConfiguration spec.limited.assuredConcurrencyShares field is renamed to spec.limited.nominalConcurrencyShares and only defaults to 30 when unspecified, and an explicit value of 0 is not changed to 30.
Notable changes in flowcontrol.apiserver.k8s.io/v1beta3:
The PriorityLevelConfiguration spec.limited.assuredConcurrencyShares field is renamed to spec.limited.nominalConcurrencyShares
v1.27
The v1.27 release stopped serving the following deprecated API versions:
CSIStorageCapacity
The storage.k8s.io/v1beta1 API version of CSIStorageCapacity is no longer served as of v1.27.
Migrate manifests and API clients to use the storage.k8s.io/v1 API version, available since v1.24.
All existing persisted objects are accessible via the new API
No notable changes
v1.26
The v1.26 release stopped serving the following deprecated API versions:
Flow control resources
The flowcontrol.apiserver.k8s.io/v1beta1 API version of FlowSchema and PriorityLevelConfiguration is no longer served as of v1.26.
Migrate manifests and API clients to use the flowcontrol.apiserver.k8s.io/v1beta2 API version.
All existing persisted objects are accessible via the new API
No notable changes
HorizontalPodAutoscaler
The autoscaling/v2beta2 API version of HorizontalPodAutoscaler is no longer served as of v1.26.
Migrate manifests and API clients to use the autoscaling/v2 API version, available since v1.23.
All existing persisted objects are accessible via the new API
The v1.25 release stopped serving the following deprecated API versions:
CronJob
The batch/v1beta1 API version of CronJob is no longer served as of v1.25.
Migrate manifests and API clients to use the batch/v1 API version, available since v1.21.
All existing persisted objects are accessible via the new API
No notable changes
EndpointSlice
The discovery.k8s.io/v1beta1 API version of EndpointSlice is no longer served as of v1.25.
Migrate manifests and API clients to use the discovery.k8s.io/v1 API version, available since v1.21.
All existing persisted objects are accessible via the new API
Notable changes in discovery.k8s.io/v1:
use per Endpoint nodeName field instead of deprecated topology["kubernetes.io/hostname"] field
use per Endpoint zone field instead of deprecated topology["topology.kubernetes.io/zone"] field
topology is replaced with the deprecatedTopology field which is not writable in v1
Event
The events.k8s.io/v1beta1 API version of Event is no longer served as of v1.25.
Migrate manifests and API clients to use the events.k8s.io/v1 API version, available since v1.19.
All existing persisted objects are accessible via the new API
Notable changes in events.k8s.io/v1:
type is limited to Normal and Warning
involvedObject is renamed to regarding
action, reason, reportingController, and reportingInstance are required
when creating new events.k8s.io/v1 Events
use eventTime instead of the deprecated firstTimestamp field (which is renamed
to deprecatedFirstTimestamp and not permitted in new events.k8s.io/v1 Events)
use series.lastObservedTime instead of the deprecated lastTimestamp field
(which is renamed to deprecatedLastTimestamp and not permitted in new events.k8s.io/v1 Events)
use series.count instead of the deprecated count field
(which is renamed to deprecatedCount and not permitted in new events.k8s.io/v1 Events)
use reportingController instead of the deprecated source.component field
(which is renamed to deprecatedSource.component and not permitted in new events.k8s.io/v1 Events)
use reportingInstance instead of the deprecated source.host field
(which is renamed to deprecatedSource.host and not permitted in new events.k8s.io/v1 Events)
HorizontalPodAutoscaler
The autoscaling/v2beta1 API version of HorizontalPodAutoscaler is no longer served as of v1.25.
Migrate manifests and API clients to use the autoscaling/v2 API version, available since v1.23.
All existing persisted objects are accessible via the new API
The policy/v1beta1 API version of PodDisruptionBudget is no longer served as of v1.25.
Migrate manifests and API clients to use the policy/v1 API version, available since v1.21.
All existing persisted objects are accessible via the new API
Notable changes in policy/v1:
an empty spec.selector ({}) written to a policy/v1 PodDisruptionBudget selects all
pods in the namespace (in policy/v1beta1 an empty spec.selector selected no pods).
An unset spec.selector selects no pods in either API version.
PodSecurityPolicy
PodSecurityPolicy in the policy/v1beta1 API version is no longer served as of v1.25,
and the PodSecurityPolicy admission controller will be removed.
RuntimeClass in the node.k8s.io/v1beta1 API version is no longer served as of v1.25.
Migrate manifests and API clients to use the node.k8s.io/v1 API version, available since v1.20.
All existing persisted objects are accessible via the new API
No notable changes
v1.22
The v1.22 release stopped serving the following deprecated API versions:
Webhook resources
The admissionregistration.k8s.io/v1beta1 API version of MutatingWebhookConfiguration
and ValidatingWebhookConfiguration is no longer served as of v1.22.
Migrate manifests and API clients to use the admissionregistration.k8s.io/v1 API version, available since v1.16.
All existing persisted objects are accessible via the new APIs
Notable changes:
webhooks[*].failurePolicy default changed from Ignore to Fail for v1
webhooks[*].matchPolicy default changed from Exact to Equivalent for v1
webhooks[*].timeoutSeconds default changed from 30s to 10s for v1
webhooks[*].sideEffects default value is removed, and the field made required,
and only None and NoneOnDryRun are permitted for v1
webhooks[*].admissionReviewVersions default value is removed and the field made
required for v1 (supported versions for AdmissionReview are v1 and v1beta1)
webhooks[*].name must be unique in the list for objects created via admissionregistration.k8s.io/v1
CustomResourceDefinition
The apiextensions.k8s.io/v1beta1 API version of CustomResourceDefinition is no longer served as of v1.22.
Migrate manifests and API clients to use the apiextensions.k8s.io/v1 API version, available since v1.16.
All existing persisted objects are accessible via the new API
Notable changes:
spec.scope is no longer defaulted to Namespaced and must be explicitly specified
spec.version is removed in v1; use spec.versions instead
spec.validation is removed in v1; use spec.versions[*].schema instead
spec.subresources is removed in v1; use spec.versions[*].subresources instead
spec.additionalPrinterColumns is removed in v1; use spec.versions[*].additionalPrinterColumns instead
spec.conversion.webhookClientConfig is moved to spec.conversion.webhook.clientConfig in v1
spec.conversion.conversionReviewVersions is moved to spec.conversion.webhook.conversionReviewVersions in v1
spec.versions[*].schema.openAPIV3Schema is now required when creating v1 CustomResourceDefinition objects,
and must be a structural schema
spec.preserveUnknownFields: true is disallowed when creating v1 CustomResourceDefinition objects;
it must be specified within schema definitions as x-kubernetes-preserve-unknown-fields: true
In additionalPrinterColumns items, the JSONPath field was renamed to jsonPath in v1
(fixes #66531)
APIService
The apiregistration.k8s.io/v1beta1 API version of APIService is no longer served as of v1.22.
Migrate manifests and API clients to use the apiregistration.k8s.io/v1 API version, available since v1.10.
All existing persisted objects are accessible via the new API
No notable changes
TokenReview
The authentication.k8s.io/v1beta1 API version of TokenReview is no longer served as of v1.22.
Migrate manifests and API clients to use the authentication.k8s.io/v1 API version, available since v1.6.
No notable changes
SubjectAccessReview resources
The authorization.k8s.io/v1beta1 API version of LocalSubjectAccessReview,
SelfSubjectAccessReview, SubjectAccessReview, and SelfSubjectRulesReview is no longer served as of v1.22.
Migrate manifests and API clients to use the authorization.k8s.io/v1 API version, available since v1.6.
Notable changes:
spec.group was renamed to spec.groups in v1 (fixes #32709)
CertificateSigningRequest
The certificates.k8s.io/v1beta1 API version of CertificateSigningRequest is no longer served as of v1.22.
Migrate manifests and API clients to use the certificates.k8s.io/v1 API version, available since v1.19.
All existing persisted objects are accessible via the new API
Notable changes in certificates.k8s.io/v1:
For API clients requesting certificates:
spec.signerName is now required
(see known Kubernetes signers),
and requests for kubernetes.io/legacy-unknown are not allowed to be created via the certificates.k8s.io/v1 API
spec.usages is now required, may not contain duplicate values, and must only contain known usages
For API clients approving or signing certificates:
status.conditions may not contain duplicate types
status.conditions[*].status is now required
status.certificate must be PEM-encoded, and contain only CERTIFICATE blocks
Lease
The coordination.k8s.io/v1beta1 API version of Lease is no longer served as of v1.22.
Migrate manifests and API clients to use the coordination.k8s.io/v1 API version, available since v1.14.
All existing persisted objects are accessible via the new API
No notable changes
Ingress
The extensions/v1beta1 and networking.k8s.io/v1beta1 API versions of Ingress is no longer served as of v1.22.
Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.19.
All existing persisted objects are accessible via the new API
Notable changes:
spec.backend is renamed to spec.defaultBackend
The backend serviceName field is renamed to service.name
Numeric backend servicePort fields are renamed to service.port.number
String backend servicePort fields are renamed to service.port.name
pathType is now required for each specified path. Options are Prefix,
Exact, and ImplementationSpecific. To match the undefined v1beta1 behavior, use ImplementationSpecific.
IngressClass
The networking.k8s.io/v1beta1 API version of IngressClass is no longer served as of v1.22.
Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.19.
All existing persisted objects are accessible via the new API
No notable changes
RBAC resources
The rbac.authorization.k8s.io/v1beta1 API version of ClusterRole, ClusterRoleBinding,
Role, and RoleBinding is no longer served as of v1.22.
Migrate manifests and API clients to use the rbac.authorization.k8s.io/v1 API version, available since v1.8.
All existing persisted objects are accessible via the new APIs
No notable changes
PriorityClass
The scheduling.k8s.io/v1beta1 API version of PriorityClass is no longer served as of v1.22.
Migrate manifests and API clients to use the scheduling.k8s.io/v1 API version, available since v1.14.
All existing persisted objects are accessible via the new API
No notable changes
Storage resources
The storage.k8s.io/v1beta1 API version of CSIDriver, CSINode, StorageClass, and VolumeAttachment is no longer served as of v1.22.
Migrate manifests and API clients to use the storage.k8s.io/v1 API version
CSIDriver is available in storage.k8s.io/v1 since v1.19.
CSINode is available in storage.k8s.io/v1 since v1.17
StorageClass is available in storage.k8s.io/v1 since v1.6
VolumeAttachment is available in storage.k8s.io/v1 v1.13
All existing persisted objects are accessible via the new APIs
No notable changes
v1.16
The v1.16 release stopped serving the following deprecated API versions:
NetworkPolicy
The extensions/v1beta1 API version of NetworkPolicy is no longer served as of v1.16.
Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.8.
All existing persisted objects are accessible via the new API
DaemonSet
The extensions/v1beta1 and apps/v1beta2 API versions of DaemonSet are no longer served as of v1.16.
Migrate manifests and API clients to use the apps/v1 API version, available since v1.9.
All existing persisted objects are accessible via the new API
Notable changes:
spec.templateGeneration is removed
spec.selector is now required and immutable after creation; use the existing
template labels as the selector for seamless upgrades
spec.updateStrategy.type now defaults to RollingUpdate
(the default in extensions/v1beta1 was OnDelete)
Deployment
The extensions/v1beta1, apps/v1beta1, and apps/v1beta2 API versions of Deployment are no longer served as of v1.16.
Migrate manifests and API clients to use the apps/v1 API version, available since v1.9.
All existing persisted objects are accessible via the new API
Notable changes:
spec.rollbackTo is removed
spec.selector is now required and immutable after creation; use the existing
template labels as the selector for seamless upgrades
spec.progressDeadlineSeconds now defaults to 600 seconds
(the default in extensions/v1beta1 was no deadline)
spec.revisionHistoryLimit now defaults to 10
(the default in apps/v1beta1 was 2, the default in extensions/v1beta1 was to retain all)
maxSurge and maxUnavailable now default to 25%
(the default in extensions/v1beta1 was 1)
StatefulSet
The apps/v1beta1 and apps/v1beta2 API versions of StatefulSet are no longer served as of v1.16.
Migrate manifests and API clients to use the apps/v1 API version, available since v1.9.
All existing persisted objects are accessible via the new API
Notable changes:
spec.selector is now required and immutable after creation;
use the existing template labels as the selector for seamless upgrades
spec.updateStrategy.type now defaults to RollingUpdate
(the default in apps/v1beta1 was OnDelete)
ReplicaSet
The extensions/v1beta1, apps/v1beta1, and apps/v1beta2 API versions of ReplicaSet are no longer served as of v1.16.
Migrate manifests and API clients to use the apps/v1 API version, available since v1.9.
All existing persisted objects are accessible via the new API
Notable changes:
spec.selector is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
PodSecurityPolicy
The extensions/v1beta1 API version of PodSecurityPolicy is no longer served as of v1.16.
Migrate manifests and API client to use the policy/v1beta1 API version, available since v1.10.
Note that the policy/v1beta1 API version of PodSecurityPolicy will be removed in v1.25.
What to do
Test with deprecated APIs disabled
You can test your clusters by starting an API server with specific API versions disabled
to simulate upcoming removals. Add the following flag to the API server startup arguments:
This conversion may use non-ideal default values. To learn more about a specific
resource, check the Kubernetes API reference.
Note:
The kubectl convert tool is not installed by default, although
in fact it once was part of kubectl itself. For more details, you can read the
deprecation and removal issue
for the built-in subcommand.
To learn how to set up kubectl convert on your computer, visit the page that is right for your
operating system:
Linux,
macOS, or
Windows.
2.7 - Kubernetes API health endpoints
The Kubernetes API server provides API endpoints to indicate the current status of the API server.
This page describes these API endpoints and explains how you can use them.
API endpoints for health
The Kubernetes API server provides 3 API endpoints (healthz, livez and readyz) to indicate the current status of the API server.
The healthz endpoint is deprecated (since Kubernetes v1.16), and you should use the more specific livez and readyz endpoints instead.
The livez endpoint can be used with the --livez-grace-periodflag to specify the startup duration.
For a graceful shutdown you can specify the --shutdown-delay-durationflag with the /readyz endpoint.
Machines that check the healthz/livez/readyz of the API server should rely on the HTTP status code.
A status code 200 indicates the API server is healthy/live/ready, depending on the called endpoint.
The more verbose options shown below are intended to be used by human operators to debug their cluster or understand the state of the API server.
The following examples will show how you can interact with the health API endpoints.
For all endpoints, you can use the verbose parameter to print out the checks and their status.
This can be useful for a human operator to debug the current status of the API server, it is not intended to be consumed by a machine:
curl -k https://localhost:6443/livez?verbose
or from a remote host with authentication:
kubectl get --raw='/readyz?verbose'
The output will look like this:
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check passed
The Kubernetes API server also supports to exclude specific checks.
The query parameters can also be combined like in this example:
[+]ping ok
[+]log ok
[+]etcd excluded: ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]shutdown ok
healthz check passed
Individual health checks
FEATURE STATE:Kubernetes v1.31 [alpha]
Each individual health check exposes an HTTP endpoint and can be checked individually.
The schema for the individual health checks is /livez/<healthcheck-name> or /readyz/<healthcheck-name>, where livez and readyz can be used to indicate if you want to check the liveness or the readiness of the API server, respectively.
The <healthcheck-name> path can be discovered using the verbose flag from above and take the path between [+] and ok.
These individual health checks should not be consumed by machines but can be helpful for a human operator to debug a system:
All Kubernetes clusters have two categories of users: service accounts managed
by Kubernetes, and normal users.
It is assumed that a cluster-independent service manages normal users in the following ways:
an administrator distributing private keys
a user store like Keystone or Google Accounts
a file with a list of usernames and passwords
In this regard, Kubernetes does not have objects which represent normal user accounts.
Normal users cannot be added to a cluster through an API call.
Even though a normal user cannot be added via an API call, any user that
presents a valid certificate signed by the cluster's certificate authority
(CA) is considered authenticated. In this configuration, Kubernetes determines
the username from the common name field in the 'subject' of the cert (e.g.,
"/CN=bob"). From there, the role based access control (RBAC) sub-system would
determine whether the user is authorized to perform a specific operation on a
resource. For more details, refer to the normal users topic in
certificate request
for more details about this.
In contrast, service accounts are users managed by the Kubernetes API. They are
bound to specific namespaces, and created automatically by the API server or
manually through API calls. Service accounts are tied to a set of credentials
stored as Secrets, which are mounted into pods allowing in-cluster processes
to talk to the Kubernetes API.
API requests are tied to either a normal user or a service account, or are treated
as anonymous requests. This means every process inside or outside the cluster, from
a human user typing kubectl on a workstation, to kubelets on nodes, to members
of the control plane, must authenticate when making requests to the API server,
or be treated as an anonymous user.
Authentication strategies
Kubernetes uses client certificates, bearer tokens, or an authenticating proxy to
authenticate API requests through authentication plugins. As HTTP requests are
made to the API server, plugins attempt to associate the following attributes
with the request:
Username: a string which identifies the end user. Common values might be kube-admin or jane@example.com.
UID: a string which identifies the end user and attempts to be more consistent and unique than username.
Groups: a set of strings, each of which indicates the user's membership in a named logical collection of users.
Common values might be system:masters or devops-team.
Extra fields: a map of strings to list of strings which holds additional information authorizers may find useful.
All values are opaque to the authentication system and only hold significance
when interpreted by an authorizer.
You can enable multiple authentication methods at once. You should usually use at least two methods:
service account tokens for service accounts
at least one other method for user authentication.
When multiple authenticator modules are enabled, the first module
to successfully authenticate the request short-circuits evaluation.
The API server does not guarantee the order authenticators run in.
The system:authenticated group is included in the list of groups for all authenticated users.
Integrations with other authentication protocols (LDAP, SAML, Kerberos, alternate x509 schemes, etc)
can be accomplished using an authenticating proxy or the
authentication webhook.
X509 client certificates
Client certificate authentication is enabled by passing the --client-ca-file=SOMEFILE
option to API server. The referenced file must contain one or more certificate authorities
to use to validate client certificates presented to the API server. If a client certificate
is presented and verified, the common name of the subject is used as the user name for the
request. As of Kubernetes 1.4, client certificates can also indicate a user's group memberships
using the certificate's organization fields. To include multiple group memberships for a user,
include multiple organization fields in the certificate.
For example, using the openssl command line tool to generate a certificate signing request:
The API server reads bearer tokens from a file when given the --token-auth-file=SOMEFILE option
on the command line. Currently, tokens last indefinitely, and the token list cannot be
changed without restarting the API server.
The token file is a csv file with a minimum of 3 columns: token, user name, user uid,
followed by optional group names.
Note:
If you have more than one group, the column must be double quoted e.g.
token,user,uid,"group1,group2,group3"
Putting a bearer token in a request
When using bearer token authentication from an http client, the API
server expects an Authorization header with a value of Bearer <token>. The bearer token must be a character sequence that can be
put in an HTTP header value using no more than the encoding and
quoting facilities of HTTP. For example: if the bearer token is
31ada4fd-adec-460c-809a-9e56ceb75269 then it would appear in an HTTP
header as shown below.
To allow for streamlined bootstrapping for new clusters, Kubernetes includes a
dynamically-managed Bearer token type called a Bootstrap Token. These tokens
are stored as Secrets in the kube-system namespace, where they can be
dynamically managed and created. Controller Manager contains a TokenCleaner
controller that deletes bootstrap tokens as they expire.
The tokens are of the form [a-z0-9]{6}.[a-z0-9]{16}. The first component is a
Token ID and the second component is the Token Secret. You specify the token
in an HTTP header as follows:
Authorization: Bearer 781292.db7bc3a58fc5f07e
You must enable the Bootstrap Token Authenticator with the
--enable-bootstrap-token-auth flag on the API Server. You must enable
the TokenCleaner controller via the --controllers flag on the Controller
Manager. This is done with something like --controllers=*,tokencleaner.
kubeadm will do this for you if you are using it to bootstrap a cluster.
The authenticator authenticates as system:bootstrap:<Token ID>. It is
included in the system:bootstrappers group. The naming and groups are
intentionally limited to discourage users from using these tokens past
bootstrapping. The user names and group can be used (and are used by kubeadm)
to craft the appropriate authorization policies to support bootstrapping a
cluster.
Please see Bootstrap Tokens for in depth
documentation on the Bootstrap Token authenticator and controllers along with
how to manage these tokens with kubeadm.
Service account tokens
A service account is an automatically enabled authenticator that uses signed
bearer tokens to verify requests. The plugin takes two optional flags:
--service-account-key-file File containing PEM-encoded x509 RSA or ECDSA
private or public keys, used to verify ServiceAccount tokens. The specified file
can contain multiple keys, and the flag can be specified multiple times with
different files. If unspecified, --tls-private-key-file is used.
--service-account-lookup If enabled, tokens which are deleted from the API will be revoked.
Service accounts are usually created automatically by the API server and
associated with pods running in the cluster through the ServiceAccountAdmission Controller. Bearer tokens are
mounted into pods at well-known locations, and allow in-cluster processes to
talk to the API server. Accounts may be explicitly associated with pods using the
serviceAccountName field of a PodSpec.
Note:
serviceAccountName is usually omitted because this is done automatically.
apiVersion:apps/v1# this apiVersion is relevant as of Kubernetes 1.9kind:Deploymentmetadata:name:nginx-deploymentnamespace:defaultspec:replicas:3template:metadata:# ...spec:serviceAccountName:bob-the-botcontainers:- name:nginximage:nginx:1.14.2
Service account bearer tokens are perfectly valid to use outside the cluster and
can be used to create identities for long standing jobs that wish to talk to the
Kubernetes API. To manually create a service account, use the kubectl create serviceaccount (NAME) command. This creates a service account in the current
namespace.
kubectl create serviceaccount jenkins
serviceaccount/jenkins created
Create an associated token:
kubectl create token jenkins
eyJhbGciOiJSUzI1NiIsImtp...
The created token is a signed JSON Web Token (JWT).
The signed JWT can be used as a bearer token to authenticate as the given service
account. See above for how the token is included
in a request. Normally these tokens are mounted into pods for in-cluster access to
the API server, but can be used from outside the cluster as well.
Service accounts authenticate with the username system:serviceaccount:(NAMESPACE):(SERVICEACCOUNT),
and are assigned to the groups system:serviceaccounts and system:serviceaccounts:(NAMESPACE).
Warning:
Because service account tokens can also be stored in Secret API objects, any user with
write access to Secrets can request a token, and any user with read access to those
Secrets can authenticate as the service account. Be cautious when granting permissions
to service accounts and read or write capabilities for Secrets.
OpenID Connect Tokens
OpenID Connect is a flavor of OAuth2 supported by
some OAuth2 providers, notably Microsoft Entra ID, Salesforce, and Google.
The protocol's main extension of OAuth2 is an additional field returned with
the access token called an ID Token.
This token is a JSON Web Token (JWT) with well known fields, such as a user's
email, signed by the server.
To identify the user, the authenticator uses the id_token (not the access_token)
from the OAuth2 token response
as a bearer token. See above for how the token
is included in a request.
Log in to your identity provider
Your identity provider will provide you with an access_token, id_token and a refresh_token
When using kubectl, use your id_token with the --token flag or add it directly to your kubeconfig
kubectl sends your id_token in a header called Authorization to the API server
The API server will make sure the JWT signature is valid
Check to make sure the id_token hasn't expired
Perform claim and/or user validation if CEL expressions are configured with AuthenticationConfiguration.
Make sure the user is authorized
Once authorized the API server returns a response to kubectl
kubectl provides feedback to the user
Since all of the data needed to validate who you are is in the id_token, Kubernetes doesn't need to
"phone home" to the identity provider. In a model where every request is stateless this provides a
very scalable solution for authentication. It does offer a few challenges:
Kubernetes has no "web interface" to trigger the authentication process. There is no browser or
interface to collect credentials which is why you need to authenticate to your identity provider first.
The id_token can't be revoked, it's like a certificate so it should be short-lived (only a few minutes)
so it can be very annoying to have to get a new token every few minutes.
To authenticate to the Kubernetes dashboard, you must use the kubectl proxy command or a reverse proxy
that injects the id_token.
Configuring the API Server
Using flags
To enable the plugin, configure the following flags on the API server:
Parameter
Description
Example
Required
--oidc-issuer-url
URL of the provider that allows the API server to discover public signing keys. Only URLs that use the https:// scheme are accepted. This is typically the provider's discovery URL, changed to have an empty path.
If the issuer's OIDC discovery URL is https://accounts.provider.example/.well-known/openid-configuration, the value should be https://accounts.provider.example
Yes
--oidc-client-id
A client id that all tokens must be issued for.
kubernetes
Yes
--oidc-username-claim
JWT claim to use as the user name. By default sub, which is expected to be a unique identifier of the end user. Admins can choose other claims, such as email or name, depending on their provider. However, claims other than email will be prefixed with the issuer URL to prevent naming clashes with other plugins.
sub
No
--oidc-username-prefix
Prefix prepended to username claims to prevent clashes with existing names (such as system: users). For example, the value oidc: will create usernames like oidc:jane.doe. If this flag isn't provided and --oidc-username-claim is a value other than email the prefix defaults to ( Issuer URL )# where ( Issuer URL ) is the value of --oidc-issuer-url. The value - can be used to disable all prefixing.
oidc:
No
--oidc-groups-claim
JWT claim to use as the user's group. If the claim is present it must be an array of strings.
groups
No
--oidc-groups-prefix
Prefix prepended to group claims to prevent clashes with existing names (such as system: groups). For example, the value oidc: will create group names like oidc:engineering and oidc:infra.
oidc:
No
--oidc-required-claim
A key=value pair that describes a required claim in the ID Token. If set, the claim is verified to be present in the ID Token with a matching value. Repeat this flag to specify multiple claims.
claim=value
No
--oidc-ca-file
The path to the certificate for the CA that signed your identity provider's web certificate. Defaults to the host's root CAs.
/etc/kubernetes/ssl/kc-ca.pem
No
--oidc-signing-algs
The signing algorithms accepted. Default is "RS256".
RS512
No
Authentication configuration from a file
FEATURE STATE:Kubernetes v1.30 [beta]
JWT Authenticator is an authenticator to authenticate Kubernetes users using JWT compliant tokens.
The authenticator will attempt to parse a raw ID token, verify it's been signed by the configured issuer.
The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery.
The minimum valid JWT payload must contain the following claims:
{
"iss": "https://example.com", // must match the issuer.url
"aud": ["my-app"], // at least one of the entries in issuer.audiences must match the "aud" claim in presented JWTs.
"exp": 1234567890, // token expiration as Unix time (the number of seconds elapsed since January 1, 1970 UTC)
"<username-claim>": "user"// this is the username claim configured in the claimMappings.username.claim or claimMappings.username.expression
}
The configuration file approach allows you to configure multiple JWT authenticators, each with a unique
issuer.url and issuer.discoveryURL. The configuration file even allows you to specify CEL
expressions to map claims to user attributes, and to validate claims and user information.
The API server also automatically reloads the authenticators when the configuration file is modified.
You can use apiserver_authentication_config_controller_automatic_reload_last_timestamp_seconds metric
to monitor the last time the configuration was reloaded by the API server.
You must specify the path to the authentication configuration using the --authentication-config flag
on the API server. If you want to use command line flags instead of the configuration file, those will
continue to work as-is. To access the new capabilities like configuring multiple authenticators,
setting multiple audiences for an issuer, switch to using the configuration file.
For Kubernetes v1.31, the structured authentication configuration file format
is beta-level, and the mechanism for using that configuration is also beta. Provided you didn't specifically
disable the StructuredAuthenticationConfigurationfeature gate for your cluster,
you can turn on structured authentication by specifying the --authentication-config command line
argument to the kube-apiserver. An example of the structured authentication configuration file is shown below.
Note:
If you specify --authentication-config along with any of the --oidc-* command line arguments, this is
a misconfiguration. In this situation, the API server reports an error and then immediately exits.
If you want to switch to using structured authentication configuration, you have to remove the --oidc-*
command line arguments, and use the configuration file instead.
---## CAUTION: this is an example configuration.# Do not use this for your own cluster!#apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfiguration# list of authenticators to authenticate Kubernetes users using JWT compliant tokens.# the maximum number of allowed authenticators is 64.jwt:- issuer:# url must be unique across all authenticators.# url must not conflict with issuer configured in --service-account-issuer.url:https://example.com# Same as --oidc-issuer-url.# discoveryURL, if specified, overrides the URL used to fetch discovery# information instead of using "{url}/.well-known/openid-configuration".# The exact value specified is used, so "/.well-known/openid-configuration"# must be included in discoveryURL if needed.## The "issuer" field in the fetched discovery information must match the "issuer.url" field# in the AuthenticationConfiguration and will be used to validate the "iss" claim in the presented JWT.# This is for scenarios where the well-known and jwks endpoints are hosted at a different# location than the issuer (such as locally in the cluster).# discoveryURL must be different from url if specified and must be unique across all authenticators.discoveryURL:https://discovery.example.com/.well-known/openid-configuration# PEM encoded CA certificates used to validate the connection when fetching# discovery information. If not set, the system verifier will be used.# Same value as the content of the file referenced by the --oidc-ca-file flag.certificateAuthority:<PEM encoded CA certificates> # audiences is the set of acceptable audiences the JWT must be issued to.# At least one of the entries must match the "aud" claim in presented JWTs.audiences:- my-app# Same as --oidc-client-id.- my-other-app# this is required to be set to "MatchAny" when multiple audiences are specified.audienceMatchPolicy:MatchAny# rules applied to validate token claims to authenticate users.claimValidationRules:# Same as --oidc-required-claim key=value.- claim:hdrequiredValue:example.com# Instead of claim and requiredValue, you can use expression to validate the claim.# expression is a CEL expression that evaluates to a boolean.# all the expressions must evaluate to true for validation to succeed.- expression:'claims.hd == "example.com"'# Message customizes the error message seen in the API server logs when the validation fails.message:the hd claim must be set to example.com- expression:'claims.exp - claims.nbf <= 86400'message:total token lifetime must not exceed 24 hoursclaimMappings:# username represents an option for the username attribute.# This is the only required attribute.username:# Same as --oidc-username-claim. Mutually exclusive with username.expression.claim:"sub"# Same as --oidc-username-prefix. Mutually exclusive with username.expression.# if username.claim is set, username.prefix is required.# Explicitly set it to "" if no prefix is desired.prefix:""# Mutually exclusive with username.claim and username.prefix.# expression is a CEL expression that evaluates to a string.## 1. If username.expression uses 'claims.email', then 'claims.email_verified' must be used in# username.expression or extra[*].valueExpression or claimValidationRules[*].expression.# An example claim validation rule expression that matches the validation automatically# applied when username.claim is set to 'email' is 'claims.?email_verified.orValue(true)'.# 2. If the username asserted based on username.expression is the empty string, the authentication# request will fail.expression:'claims.username + ":external-user"'# groups represents an option for the groups attribute.groups:# Same as --oidc-groups-claim. Mutually exclusive with groups.expression.claim:"sub"# Same as --oidc-groups-prefix. Mutually exclusive with groups.expression.# if groups.claim is set, groups.prefix is required.# Explicitly set it to "" if no prefix is desired.prefix:""# Mutually exclusive with groups.claim and groups.prefix.# expression is a CEL expression that evaluates to a string or a list of strings.expression:'claims.roles.split(",")'# uid represents an option for the uid attribute.uid:# Mutually exclusive with uid.expression.claim:'sub'# Mutually exclusive with uid.claim# expression is a CEL expression that evaluates to a string.expression:'claims.sub'# extra attributes to be added to the UserInfo object. Keys must be domain-prefix path and must be unique.extra:- key:'example.com/tenant'# valueExpression is a CEL expression that evaluates to a string or a list of strings.valueExpression:'claims.tenant'# validation rules applied to the final user object.userValidationRules:# expression is a CEL expression that evaluates to a boolean.# all the expressions must evaluate to true for the user to be valid.- expression:"!user.username.startsWith('system:')"# Message customizes the error message seen in the API server logs when the validation fails.message: 'username cannot used reserved system:prefix'- expression:"user.groups.all(group, !group.startsWith('system:'))"message: 'groups cannot used reserved system:prefix'
Claim validation rule expression
jwt.claimValidationRules[i].expression represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of the token payload, organized into claims CEL variable.
claims is a map of claim names (as strings) to claim values (of any type).
User validation rule expression
jwt.userValidationRules[i].expression represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of userInfo, organized into user CEL variable.
Refer to the UserInfo
API documentation for the schema of user.
Claim mapping expression
jwt.claimMappings.username.expression, jwt.claimMappings.groups.expression, jwt.claimMappings.uid.expressionjwt.claimMappings.extra[i].valueExpression represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of the token payload, organized into claims CEL variable.
claims is a map of claim names (as strings) to claim values (of any type).
apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfigurationjwt:- issuer:url:https://example.comaudiences:- my-appclaimMappings:username:expression:'claims.username + ":external-user"'groups:expression:'claims.roles.split(",")'uid:expression:'claims.sub'extra:- key:'example.com/tenant'valueExpression:'claims.tenant'userValidationRules:- expression:"!user.username.startsWith('system:')"# the expression will evaluate to true, so validation will succeed.message: 'username cannot used reserved system:prefix'
apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfigurationjwt:- issuer:url:https://example.comaudiences:- my-appclaimValidationRules:- expression:'claims.hd == "example.com"'# the token below does not have this claim, so validation will fail.message:the hd claim must be set to example.comclaimMappings:username:expression:'claims.username + ":external-user"'groups:expression:'claims.roles.split(",")'uid:expression:'claims.sub'extra:- key:'example.com/tenant'valueExpression:'claims.tenant'userValidationRules:- expression:"!user.username.startsWith('system:')"# the expression will evaluate to true, so validation will succeed.message: 'username cannot used reserved system:prefix'
The token with the above AuthenticationConfiguration will fail to authenticate because the
hd claim is not set to example.com. The API server will return 401 Unauthorized error.
apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfigurationjwt:- issuer:url:https://example.comaudiences:- my-appclaimValidationRules:- expression:'claims.hd == "example.com"'message:the hd claim must be set to example.comclaimMappings:username:expression:'"system:" + claims.username'# this will prefix the username with "system:" and will fail user validation.groups:expression:'claims.roles.split(",")'uid:expression:'claims.sub'extra:- key:'example.com/tenant'valueExpression:'claims.tenant'userValidationRules:- expression:"!user.username.startsWith('system:')"# the username will be system:foo and expression will evaluate to false, so validation will fail.message: 'username cannot used reserved system:prefix'
which will fail user validation because the username starts with system:.
The API server will return 401 Unauthorized error.
Limitations
Distributed claims do not work via CEL expressions.
Egress selector configuration is not supported for calls to issuer.url and issuer.discoveryURL.
Kubernetes does not provide an OpenID Connect Identity Provider.
You can use an existing public OpenID Connect Identity Provider (such as Google, or
others).
Or, you can run your own Identity Provider, such as dex,
Keycloak,
CloudFoundry UAA, or
Tremolo Security's OpenUnison.
For an identity provider to work with Kubernetes it must:
The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery.
If you're using the authentication configuration file, the identity provider doesn't need to publicly expose the discovery endpoint.
You can host the discovery endpoint at a different location than the issuer (such as locally in the cluster) and specify the
issuer.discoveryURL in the configuration file.
Run in TLS with non-obsolete ciphers
Have a CA signed certificate (even if the CA is not a commercial CA or is self signed)
A note about requirement #3 above, requiring a CA signed certificate. If you deploy your own
identity provider (as opposed to one of the cloud providers like Google or Microsoft) you MUST
have your identity provider's web server certificate signed by a certificate with the CA flag
set to TRUE, even if it is self signed. This is due to GoLang's TLS client implementation
being very strict to the standards around certificate validation. If you don't have a CA handy,
you can use the gencert script
from the Dex team to create a simple CA and a signed certificate and key pair. Or you can use
this similar script
that generates SHA256 certs with a longer life and larger key size.
The first option is to use the kubectl oidc authenticator, which sets the id_token as a bearer token
for all requests and refreshes the token once it expires. After you've logged into your provider, use
kubectl to add your id_token, refresh_token, client_id, and client_secret to configure the plugin.
Providers that don't return an id_token as part of their refresh token response aren't supported
by this plugin and should use "Option 2" below.
kubectl config set-credentials USER_NAME \
--auth-provider=oidc \
--auth-provider-arg=idp-issuer-url=( issuer url )\
--auth-provider-arg=client-id=( your client id )\
--auth-provider-arg=client-secret=( your client secret )\
--auth-provider-arg=refresh-token=( your refresh token )\
--auth-provider-arg=idp-certificate-authority=( path to your ca certificate )\
--auth-provider-arg=id-token=( your id_token )
As an example, running the below command after authenticating to your identity provider:
Once your id_token expires, kubectl will attempt to refresh your id_token using your refresh_token
and client_secret storing the new values for the refresh_token and id_token in your .kube/config.
Option 2 - Use the --token Option
The kubectl command lets you pass in a token using the --token option.
Copy and paste the id_token into this option:
kubectl --token=eyJhbGciOiJSUzI1NiJ9.eyJpc3MiOiJodHRwczovL21sYi50cmVtb2xvLmxhbjo4MDQzL2F1dGgvaWRwL29pZGMiLCJhdWQiOiJrdWJlcm5ldGVzIiwiZXhwIjoxNDc0NTk2NjY5LCJqdGkiOiI2RDUzNXoxUEpFNjJOR3QxaWVyYm9RIiwiaWF0IjoxNDc0NTk2MzY5LCJuYmYiOjE0NzQ1OTYyNDksInN1YiI6Im13aW5kdSIsInVzZXJfcm9sZSI6WyJ1c2VycyIsIm5ldy1uYW1lc3BhY2Utdmlld2VyIl0sImVtYWlsIjoibXdpbmR1QG5vbW9yZWplZGkuY29tIn0.f2As579n9VNoaKzoF-dOQGmXkFKf1FMyNV0-va_B63jn-_n9LGSCca_6IVMP8pO-Zb4KvRqGyTP0r3HkHxYy5c81AnIh8ijarruczl-TK_yF5akjSTHFZD-0gRzlevBDiH8Q79NAr-ky0P4iIXS8lY9Vnjch5MF74Zx0c3alKJHJUnnpjIACByfF2SCaYzbWFMUNat-K1PaUk5-ujMBG7yYnr95xD-63n8CO8teGUAAEMx6zRjzfhnhbzX-ajwZLGwGUBT4WqjMs70-6a7_8gZmLZb2az1cZynkFRj2BaCkVT3A2RrjeEwZEtGXlMqKJ1_I2ulrOVsYx01_yD35-rw get nodes
Webhook Token Authentication
Webhook authentication is a hook for verifying bearer tokens.
--authentication-token-webhook-config-file a configuration file describing how to access the remote webhook service.
--authentication-token-webhook-cache-ttl how long to cache authentication decisions. Defaults to two minutes.
--authentication-token-webhook-version determines whether to use authentication.k8s.io/v1beta1 or authentication.k8s.io/v1TokenReview objects to send/receive information from the webhook. Defaults to v1beta1.
The configuration file uses the kubeconfig
file format. Within the file, clusters refers to the remote service and
users refers to the API server webhook. An example would be:
# Kubernetes API versionapiVersion:v1# kind of the API objectkind:Config# clusters refers to the remote service.clusters:- name:name-of-remote-authn-servicecluster:certificate-authority:/path/to/ca.pem # CA for verifying the remote service.server:https://authn.example.com/authenticate# URL of remote service to query. 'https' recommended for production.# users refers to the API server's webhook configuration.users:- name:name-of-api-serveruser:client-certificate:/path/to/cert.pem# cert for the webhook plugin to useclient-key:/path/to/key.pem # key matching the cert# kubeconfig files require a context. Provide one for the API server.current-context:webhookcontexts:- context:cluster:name-of-remote-authn-serviceuser:name-of-api-servername:webhook
When a client attempts to authenticate with the API server using a bearer token as discussed
above, the authentication webhook POSTs a JSON-serialized
TokenReview object containing the token to the remote service.
Note that webhook API objects are subject to the same versioning compatibility rules
as other Kubernetes API objects. Implementers should check the apiVersion field of the request to ensure correct deserialization,
and must respond with a TokenReview object of the same version as the request.
The Kubernetes API server defaults to sending authentication.k8s.io/v1beta1 token reviews for backwards compatibility.
To opt into receiving authentication.k8s.io/v1 token reviews, the API server must be started with --authentication-token-webhook-version=v1.
{"apiVersion": "authentication.k8s.io/v1","kind": "TokenReview","spec": {# Opaque bearer token sent to the API server"token": "014fbff9a07c...",# Optional list of the audience identifiers for the server the token was presented to.# Audience-aware token authenticators (for example, OIDC token authenticators)# should verify the token was intended for at least one of the audiences in this list,# and return the intersection of this list and the valid audiences for the token in the response status.# This ensures the token is valid to authenticate to the server it was presented to.# If no audiences are provided, the token should be validated to authenticate to the Kubernetes API server."audiences": ["https://myserver.example.com","https://myserver.internal.example.com"]}}
{"apiVersion": "authentication.k8s.io/v1beta1","kind": "TokenReview","spec": {# Opaque bearer token sent to the API server"token": "014fbff9a07c...",# Optional list of the audience identifiers for the server the token was presented to.# Audience-aware token authenticators (for example, OIDC token authenticators)# should verify the token was intended for at least one of the audiences in this list,# and return the intersection of this list and the valid audiences for the token in the response status.# This ensures the token is valid to authenticate to the server it was presented to.# If no audiences are provided, the token should be validated to authenticate to the Kubernetes API server."audiences": ["https://myserver.example.com","https://myserver.internal.example.com"]}}
The remote service is expected to fill the status field of the request to indicate the success of the login.
The response body's spec field is ignored and may be omitted.
The remote service must return a response using the same TokenReview API version that it received.
A successful validation of the bearer token would return:
{"apiVersion": "authentication.k8s.io/v1","kind": "TokenReview","status": {"authenticated": true,"user": {# Required"username": "janedoe@example.com",# Optional"uid": "42",# Optional group memberships"groups": ["developers","qa"],# Optional additional information provided by the authenticator.# This should not contain confidential data, as it can be recorded in logs# or API objects, and is made available to admission webhooks."extra": {"extrafield1": ["extravalue1","extravalue2"]}},# Optional list audience-aware token authenticators can return,# containing the audiences from the `spec.audiences` list for which the provided token was valid.# If this is omitted, the token is considered to be valid to authenticate to the Kubernetes API server."audiences": ["https://myserver.example.com"]}}
{"apiVersion": "authentication.k8s.io/v1beta1","kind": "TokenReview","status": {"authenticated": true,"user": {# Required"username": "janedoe@example.com",# Optional"uid": "42",# Optional group memberships"groups": ["developers","qa"],# Optional additional information provided by the authenticator.# This should not contain confidential data, as it can be recorded in logs# or API objects, and is made available to admission webhooks."extra": {"extrafield1": ["extravalue1","extravalue2"]}},# Optional list audience-aware token authenticators can return,# containing the audiences from the `spec.audiences` list for which the provided token was valid.# If this is omitted, the token is considered to be valid to authenticate to the Kubernetes API server."audiences": ["https://myserver.example.com"]}}
{"apiVersion": "authentication.k8s.io/v1","kind": "TokenReview","status": {"authenticated": false,# Optionally include details about why authentication failed.# If no error is provided, the API will return a generic Unauthorized message.# The error field is ignored when authenticated=true."error": "Credentials are expired"}}
{"apiVersion": "authentication.k8s.io/v1beta1","kind": "TokenReview","status": {"authenticated": false,# Optionally include details about why authentication failed.# If no error is provided, the API will return a generic Unauthorized message.# The error field is ignored when authenticated=true."error": "Credentials are expired"}}
Authenticating Proxy
The API server can be configured to identify users from request header values, such as X-Remote-User.
It is designed for use in combination with an authenticating proxy, which sets the request header value.
--requestheader-username-headers Required, case-insensitive. Header names to check, in order,
for the user identity. The first header containing a value is used as the username.
--requestheader-group-headers 1.6+. Optional, case-insensitive. "X-Remote-Group" is suggested.
Header names to check, in order, for the user's groups. All values in all specified headers are used as group names.
--requestheader-extra-headers-prefix 1.6+. Optional, case-insensitive. "X-Remote-Extra-" is suggested.
Header prefixes to look for to determine extra information about the user (typically used by the configured authorization plugin).
Any headers beginning with any of the specified prefixes have the prefix removed.
The remainder of the header name is lowercased and percent-decoded
and becomes the extra key, and the header value is the extra value.
Note:
Prior to 1.11.3 (and 1.10.7, 1.9.11), the extra key could only contain characters which
were legal in HTTP header labels.
In order to prevent header spoofing, the authenticating proxy is required to present a valid client
certificate to the API server for validation against the specified CA before the request headers are
checked. WARNING: do not reuse a CA that is used in a different context unless you understand
the risks and the mechanisms to protect the CA's usage.
--requestheader-client-ca-file Required. PEM-encoded certificate bundle. A valid client certificate
must be presented and validated against the certificate authorities in the specified file before the
request headers are checked for user names.
--requestheader-allowed-names Optional. List of Common Name values (CNs). If set, a valid client
certificate with a CN in the specified list must be presented before the request headers are checked
for user names. If empty, any CN is allowed.
Anonymous requests
When enabled, requests that are not rejected by other configured authentication methods are
treated as anonymous requests, and given a username of system:anonymous and a group of
system:unauthenticated.
For example, on a server with token authentication configured, and anonymous access enabled,
a request providing an invalid bearer token would receive a 401 Unauthorized error.
A request providing no bearer token would be treated as an anonymous request.
In 1.5.1-1.5.x, anonymous access is disabled by default, and can be enabled by
passing the --anonymous-auth=true option to the API server.
In 1.6+, anonymous access is enabled by default if an authorization mode other than AlwaysAllow
is used, and can be disabled by passing the --anonymous-auth=false option to the API server.
Starting in 1.6, the ABAC and RBAC authorizers require explicit authorization of the
system:anonymous user or the system:unauthenticated group, so legacy policy rules
that grant access to the * user or * group do not include anonymous users.
Anonymous Authenticator Configuration
FEATURE STATE:Kubernetes v1.31 [alpha]
The AuthenticationConfiguration can be used to configure the anonymous
authenticator. To enable configuring anonymous auth via the config file you need
enable the AnonymousAuthConfigurableEndpoints feature gate. When this feature
gate is enabled you cannot set the --anonymous-auth flag.
The main advantage of configuring anonymous authenticator using the authentication
configuration file is that in addition to enabling and disabling anonymous authentication
you can also configure which endpoints support anonymous authentication.
A sample authentication configuration file is below:
---## CAUTION: this is an example configuration.# Do not use this for your own cluster!#apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfigurationanonymous:enabled:trueconditions:- path:/livez- path:/readyz- path:/healthz
In the configuration above only the /livez, /readyz and /healthz endpoints
are reachable by anonymous requests. Any other endpoints will not be reachable
even if it is allowed by RBAC configuration.
User impersonation
A user can act as another user through impersonation headers. These let requests
manually override the user info a request authenticates as. For example, an admin
could use this feature to debug an authorization policy by temporarily
impersonating another user and seeing if a request was denied.
Impersonation requests first authenticate as the requesting user, then switch
to the impersonated user info.
A user makes an API call with their credentials and impersonation headers.
API server authenticates the user.
API server ensures the authenticated users have impersonation privileges.
Request user info is replaced with impersonation values.
Request is evaluated, authorization acts on impersonated user info.
The following HTTP headers can be used to performing an impersonation request:
Impersonate-User: The username to act as.
Impersonate-Group: A group name to act as. Can be provided multiple times to set multiple groups.
Optional. Requires "Impersonate-User".
Impersonate-Extra-( extra name ): A dynamic header used to associate extra fields with the user.
Optional. Requires "Impersonate-User". In order to be preserved consistently, ( extra name )
must be lower-case, and any characters which aren't legal in HTTP header labels
MUST be utf8 and percent-encoded.
Impersonate-Uid: A unique identifier that represents the user being impersonated. Optional.
Requires "Impersonate-User". Kubernetes does not impose any format requirements on this string.
Note:
Prior to 1.11.3 (and 1.10.7, 1.9.11), ( extra name ) could only contain characters which
were legal in HTTP header labels.
Note:
Impersonate-Uid is only available in versions 1.22.0 and higher.
An example of the impersonation headers used when impersonating a user with groups:
To impersonate a user, group, user identifier (UID) or extra fields, the impersonating user must
have the ability to perform the "impersonate" verb on the kind of attribute
being impersonated ("user", "group", "uid", etc.). For clusters that enable the RBAC
authorization plugin, the following ClusterRole encompasses the rules needed to
set user and group impersonation headers:
For impersonation, extra fields and impersonated UIDs are both under the "authentication.k8s.io" apiGroup.
Extra fields are evaluated as sub-resources of the resource "userextras". To
allow a user to use impersonation headers for the extra field "scopes" and
for UIDs, a user should be granted the following role:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:scopes-and-uid-impersonatorrules:# Can set "Impersonate-Extra-scopes" header and the "Impersonate-Uid" header.- apiGroups:["authentication.k8s.io"]resources:["userextras/scopes","uids"]verbs:["impersonate"]
The values of impersonation headers can also be restricted by limiting the set
of resourceNames a resource can take.
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:limited-impersonatorrules:# Can impersonate the user "jane.doe@example.com"- apiGroups:[""]resources:["users"]verbs:["impersonate"]resourceNames:["jane.doe@example.com"]# Can impersonate the groups "developers" and "admins"- apiGroups:[""]resources:["groups"]verbs:["impersonate"]resourceNames:["developers","admins"]# Can impersonate the extras field "scopes" with the values "view" and "development"- apiGroups:["authentication.k8s.io"]resources:["userextras/scopes"]verbs:["impersonate"]resourceNames:["view","development"]# Can impersonate the uid "06f6ce97-e2c5-4ab8-7ba5-7654dd08d52b"- apiGroups:["authentication.k8s.io"]resources:["uids"]verbs:["impersonate"]resourceNames:["06f6ce97-e2c5-4ab8-7ba5-7654dd08d52b"]
Note:
Impersonating a user or group allows you to perform any action as if you were that user or group;
for that reason, impersonation is not namespace scoped.
If you want to allow impersonation using Kubernetes RBAC,
this requires using a ClusterRole and a ClusterRoleBinding,
not a Role and RoleBinding.
client-go credential plugins
FEATURE STATE:Kubernetes v1.22 [stable]
k8s.io/client-go and tools using it such as kubectl and kubelet are able to execute an
external command to receive user credentials.
This feature is intended for client side integrations with authentication protocols not natively
supported by k8s.io/client-go (LDAP, Kerberos, OAuth2, SAML, etc.). The plugin implements the
protocol specific logic, then returns opaque credentials to use. Almost all credential plugin
use cases require a server side component with support for the webhook token authenticator
to interpret the credential format produced by the client plugin.
Note:
Earlier versions of kubectl included built-in support for authenticating to AKS and GKE, but this is no longer present.
Example use case
In a hypothetical use case, an organization would run an external service that exchanges LDAP credentials
for user specific, signed tokens. The service would also be capable of responding to webhook token
authenticator requests to validate the tokens. Users would be required
to install a credential plugin on their workstation.
To authenticate against the API:
The user issues a kubectl command.
Credential plugin prompts the user for LDAP credentials, exchanges credentials with external service for a token.
Credential plugin returns token to client-go, which uses it as a bearer token against the API server.
apiVersion:v1kind:Configusers:- name:my-useruser:exec:# Command to execute. Required.command:"example-client-go-exec-plugin"# API version to use when decoding the ExecCredentials resource. Required.## The API version returned by the plugin MUST match the version listed here.## To integrate with tools that support multiple versions (such as client.authentication.k8s.io/v1beta1),# set an environment variable, pass an argument to the tool that indicates which version the exec plugin expects,# or read the version from the ExecCredential object in the KUBERNETES_EXEC_INFO environment variable.apiVersion:"client.authentication.k8s.io/v1"# Environment variables to set when executing the plugin. Optional.env:- name:"FOO"value:"bar"# Arguments to pass when executing the plugin. Optional.args:- "arg1"- "arg2"# Text shown to the user when the executable doesn't seem to be present. Optional.installHint:| example-client-go-exec-plugin is required to authenticate
to the current cluster. It can be installed:
On macOS: brew install example-client-go-exec-plugin
On Ubuntu: apt-get install example-client-go-exec-plugin
On Fedora: dnf install example-client-go-exec-plugin
...# Whether or not to provide cluster information, which could potentially contain# very large CA data, to this exec plugin as a part of the KUBERNETES_EXEC_INFO# environment variable.provideClusterInfo:true# The contract between the exec plugin and the standard input I/O stream. If the# contract cannot be satisfied, this plugin will not be run and an error will be# returned. Valid values are "Never" (this exec plugin never uses standard input),# "IfAvailable" (this exec plugin wants to use standard input if it is available),# or "Always" (this exec plugin requires standard input to function). Required.interactiveMode:Neverclusters:- name:my-clustercluster:server:"https://172.17.4.100:6443"certificate-authority:"/etc/kubernetes/ca.pem"extensions:- name:client.authentication.k8s.io/exec# reserved extension name for per cluster exec configextension:arbitrary:configthis:can be provided via the KUBERNETES_EXEC_INFO environment variable upon setting provideClusterInfoyou:["can","put","anything","here"]contexts:- name:my-clustercontext:cluster:my-clusteruser:my-usercurrent-context:my-cluster
apiVersion:v1kind:Configusers:- name:my-useruser:exec:# Command to execute. Required.command:"example-client-go-exec-plugin"# API version to use when decoding the ExecCredentials resource. Required.## The API version returned by the plugin MUST match the version listed here.## To integrate with tools that support multiple versions (such as client.authentication.k8s.io/v1),# set an environment variable, pass an argument to the tool that indicates which version the exec plugin expects,# or read the version from the ExecCredential object in the KUBERNETES_EXEC_INFO environment variable.apiVersion:"client.authentication.k8s.io/v1beta1"# Environment variables to set when executing the plugin. Optional.env:- name:"FOO"value:"bar"# Arguments to pass when executing the plugin. Optional.args:- "arg1"- "arg2"# Text shown to the user when the executable doesn't seem to be present. Optional.installHint:| example-client-go-exec-plugin is required to authenticate
to the current cluster. It can be installed:
On macOS: brew install example-client-go-exec-plugin
On Ubuntu: apt-get install example-client-go-exec-plugin
On Fedora: dnf install example-client-go-exec-plugin
...# Whether or not to provide cluster information, which could potentially contain# very large CA data, to this exec plugin as a part of the KUBERNETES_EXEC_INFO# environment variable.provideClusterInfo:true# The contract between the exec plugin and the standard input I/O stream. If the# contract cannot be satisfied, this plugin will not be run and an error will be# returned. Valid values are "Never" (this exec plugin never uses standard input),# "IfAvailable" (this exec plugin wants to use standard input if it is available),# or "Always" (this exec plugin requires standard input to function). Optional.# Defaults to "IfAvailable".interactiveMode:Neverclusters:- name:my-clustercluster:server:"https://172.17.4.100:6443"certificate-authority:"/etc/kubernetes/ca.pem"extensions:- name:client.authentication.k8s.io/exec# reserved extension name for per cluster exec configextension:arbitrary:configthis:can be provided via the KUBERNETES_EXEC_INFO environment variable upon setting provideClusterInfoyou:["can","put","anything","here"]contexts:- name:my-clustercontext:cluster:my-clusteruser:my-usercurrent-context:my-cluster
Relative command paths are interpreted as relative to the directory of the config file. If
KUBECONFIG is set to /home/jane/kubeconfig and the exec command is ./bin/example-client-go-exec-plugin,
the binary /home/jane/bin/example-client-go-exec-plugin is executed.
- name:my-useruser:exec:# Path relative to the directory of the kubeconfigcommand:"./bin/example-client-go-exec-plugin"apiVersion:"client.authentication.k8s.io/v1"interactiveMode:Never
Input and output formats
The executed command prints an ExecCredential object to stdout. k8s.io/client-go
authenticates against the Kubernetes API using the returned credentials in the status.
The executed command is passed an ExecCredential object as input via the KUBERNETES_EXEC_INFO
environment variable. This input contains helpful information like the expected API version
of the returned ExecCredential object and whether or not the plugin can use stdin to interact
with the user.
When run from an interactive session (i.e., a terminal), stdin can be exposed directly
to the plugin. Plugins should use the spec.interactive field of the input
ExecCredential object from the KUBERNETES_EXEC_INFO environment variable in order to
determine if stdin has been provided. A plugin's stdin requirements (i.e., whether
stdin is optional, strictly required, or never used in order for the plugin
to run successfully) is declared via the user.exec.interactiveMode field in the
kubeconfig
(see table below for valid values). The user.exec.interactiveMode field is optional
in client.authentication.k8s.io/v1beta1 and required in client.authentication.k8s.io/v1.
interactiveMode values
interactiveMode Value
Meaning
Never
This exec plugin never needs to use standard input, and therefore the exec plugin will be run regardless of whether standard input is available for user input.
IfAvailable
This exec plugin would like to use standard input if it is available, but can still operate if standard input is not available. Therefore, the exec plugin will be run regardless of whether stdin is available for user input. If standard input is available for user input, then it will be provided to this exec plugin.
Always
This exec plugin requires standard input in order to run, and therefore the exec plugin will only be run if standard input is available for user input. If standard input is not available for user input, then the exec plugin will not be run and an error will be returned by the exec plugin runner.
To use bearer token credentials, the plugin returns a token in the status of the
ExecCredential
Alternatively, a PEM-encoded client certificate and key can be returned to use TLS client auth.
If the plugin returns a different certificate and key on a subsequent call, k8s.io/client-go
will close existing connections with the server to force a new TLS handshake.
If specified, clientKeyData and clientCertificateData must both must be present.
clientCertificateData may contain additional intermediate certificates to send to the server.
Optionally, the response can include the expiry of the credential formatted as a
RFC 3339 timestamp.
Presence or absence of an expiry has the following impact:
If an expiry is included, the bearer token and TLS credentials are cached until
the expiry time is reached, or if the server responds with a 401 HTTP status code,
or when the process exits.
If an expiry is omitted, the bearer token and TLS credentials are cached until
the server responds with a 401 HTTP status code or until the process exits.
To enable the exec plugin to obtain cluster-specific information, set provideClusterInfo on the user.exec
field in the kubeconfig.
The plugin will then be supplied this cluster-specific information in the KUBERNETES_EXEC_INFO environment variable.
Information from this environment variable can be used to perform cluster-specific
credential acquisition logic.
The following ExecCredential manifest describes a cluster information sample.
{
"apiVersion": "client.authentication.k8s.io/v1",
"kind": "ExecCredential",
"spec": {
"cluster": {
"server": "https://172.17.4.100:6443",
"certificate-authority-data": "LS0t...",
"config": {
"arbitrary": "config",
"this": "can be provided via the KUBERNETES_EXEC_INFO environment variable upon setting provideClusterInfo",
"you": ["can", "put", "anything", "here"]
}
},
"interactive": true }
}
{
"apiVersion": "client.authentication.k8s.io/v1beta1",
"kind": "ExecCredential",
"spec": {
"cluster": {
"server": "https://172.17.4.100:6443",
"certificate-authority-data": "LS0t...",
"config": {
"arbitrary": "config",
"this": "can be provided via the KUBERNETES_EXEC_INFO environment variable upon setting provideClusterInfo",
"you": ["can", "put", "anything", "here"]
}
},
"interactive": true }
}
API access to authentication information for a client
FEATURE STATE:Kubernetes v1.28 [stable]
If your cluster has the API enabled, you can use the SelfSubjectReview API to find out
how your Kubernetes cluster maps your authentication information to identify you as a client.
This works whether you are authenticating as a user (typically representing
a real person) or as a ServiceAccount.
SelfSubjectReview objects do not have any configurable fields. On receiving a request,
the Kubernetes API server fills the status with the user attributes and returns it to the user.
Request example (the body would be a SelfSubjectReview):
POST /apis/authentication.k8s.io/v1/selfsubjectreviews
For convenience, the kubectl auth whoami command is present. Executing this command will
produce the following output (yet different user attributes will be shown):
Simple output example
ATTRIBUTE VALUE
Username jane.doe
Groups [system:authenticated]
Complex example including extra attributes
ATTRIBUTE VALUE
Username jane.doe
UID b79dbf30-0c6a-11ed-861d-0242ac120002
Groups [students teachers system:authenticated]
Extra: skills [reading learning]
Extra: subjects [math sports]
By providing the output flag, it is also possible to print the JSON or YAML representation of the result:
The Kubernetes API server fills the userInfo after all authentication mechanisms are applied,
including impersonation.
If you, or an authentication proxy, make a SelfSubjectReview using impersonation,
you see the user details and properties for the user that was impersonated.
By default, all authenticated users can create SelfSubjectReview objects when the APISelfSubjectReview
feature is enabled. It is allowed by the system:basic-user cluster role.
Note:
You can only make SelfSubjectReview requests if:
the APISelfSubjectReviewfeature gate
is enabled for your cluster (not needed for Kubernetes 1.31, but older
Kubernetes versions might not offer this feature gate, or might default it to be off)
(if you are running a version of Kubernetes older than v1.28) the API server for your
cluster has the authentication.k8s.io/v1alpha1 or authentication.k8s.io/v1beta1API group
enabled.
Bootstrap tokens are a simple bearer token that is meant to be used when
creating new clusters or joining new nodes to an existing cluster.
It was built to support kubeadm, but can be used in other contexts
for users that wish to start clusters without kubeadm. It is also built to
work, via RBAC policy, with the
kubelet TLS Bootstrapping system.
Bootstrap Tokens Overview
Bootstrap Tokens are defined with a specific type
(bootstrap.kubernetes.io/token) of secrets that lives in the kube-system
namespace. These Secrets are then read by the Bootstrap Authenticator in the
API Server. Expired tokens are removed with the TokenCleaner controller in the
Controller Manager. The tokens are also used to create a signature for a
specific ConfigMap used in a "discovery" process through a BootstrapSigner
controller.
Token Format
Bootstrap Tokens take the form of abcdef.0123456789abcdef.
More formally, they must match the regular expression [a-z0-9]{6}\.[a-z0-9]{16}.
The first part of the token is the "Token ID" and is considered public
information. It is used when referring to a token without leaking the secret
part used for authentication. The second part is the "Token Secret" and should
only be shared with trusted parties.
Enabling Bootstrap Token Authentication
The Bootstrap Token authenticator can be enabled using the following flag on the
API server:
--enable-bootstrap-token-auth
When enabled, bootstrapping tokens can be used as bearer token credentials to
authenticate requests against the API server.
Authorization: Bearer 07401b.f395accd246ae52d
Tokens authenticate as the username system:bootstrap:<token id> and are members
of the group system:bootstrappers.
Additional groups may be specified in the token's Secret.
Expired tokens can be deleted automatically by enabling the tokencleaner
controller on the controller manager.
--controllers=*,tokencleaner
Bootstrap Token Secret Format
Each valid token is backed by a secret in the kube-system namespace. You can
find the full design doc
here.
Here is what the secret looks like.
apiVersion:v1kind:Secretmetadata:# Name MUST be of form "bootstrap-token-<token id>"name:bootstrap-token-07401bnamespace:kube-system# Type MUST be 'bootstrap.kubernetes.io/token'type:bootstrap.kubernetes.io/tokenstringData:# Human readable description. Optional.description:"The default bootstrap token generated by 'kubeadm init'."# Token ID and secret. Required.token-id:07401btoken-secret:f395accd246ae52d# Expiration. Optional.expiration:2017-03-10T03:22:11Z# Allowed usages.usage-bootstrap-authentication:"true"usage-bootstrap-signing:"true"# Extra groups to authenticate the token as. Must start with "system:bootstrappers:"auth-extra-groups:system:bootstrappers:worker,system:bootstrappers:ingress
The type of the secret must be bootstrap.kubernetes.io/token and the name must
be bootstrap-token-<token id>. It must also exist in the kube-system namespace.
The usage-bootstrap-* members indicate what this secret is intended to be used for.
A value must be set to true to be enabled.
usage-bootstrap-authentication indicates that the token can be used to
authenticate to the API server as a bearer token.
usage-bootstrap-signing indicates that the token may be used to sign the
cluster-info ConfigMap as described below.
The expiration field controls the expiry of the token. Expired tokens are
rejected when used for authentication and ignored during ConfigMap signing.
The expiry value is encoded as an absolute UTC time using RFC3339. Enable the
tokencleaner controller to automatically delete expired tokens.
Token Management with kubeadm
You can use the kubeadm tool to manage tokens on a running cluster. See the
kubeadm token docs for details.
ConfigMap Signing
In addition to authentication, the tokens can be used to sign a ConfigMap.
This is used early in a cluster bootstrap process before the client trusts the API
server. The signed ConfigMap can be authenticated by the shared token.
Enable ConfigMap signing by enabling the bootstrapsigner controller on the
Controller Manager.
--controllers=*,bootstrapsigner
The ConfigMap that is signed is cluster-info in the kube-public namespace.
The typical flow is that a client reads this ConfigMap while unauthenticated and
ignoring TLS errors. It then validates the payload of the ConfigMap by looking
at a signature embedded in the ConfigMap.
The kubeconfig member of the ConfigMap is a config file with only the cluster
information filled out. The key thing being communicated here is the
certificate-authority-data. This may be expanded in the future.
The signature is a JWS signature using the "detached" mode. To validate the
signature, the user should encode the kubeconfig payload according to JWS
rules (base64 encoded while discarding any trailing =). That encoded payload
is then used to form a whole JWS by inserting it between the 2 dots. You can
verify the JWS using the HS256 scheme (HMAC-SHA256) with the full token (e.g.
07401b.f395accd246ae52d) as the shared secret. Users must verify that HS256
is used.
Warning:
Any party with a bootstrapping token can create a valid signature for that
token. When using ConfigMap signing it's discouraged to share the same token with
many clients, since a compromised client can potentially man-in-the middle another
client relying on the signature to bootstrap TLS trust.
Details of Kubernetes authorization mechanisms and supported authorization modes.
Kubernetes authorization takes place following
authentication.
Usually, a client making a request must be authenticated (logged in) before its
request can be allowed; however, Kubernetes also allows anonymous requests in
some circumstances.
Kubernetes authorization of API requests takes place within the API server.
The API server evaluates all of the request attributes against all policies,
potentially also consulting external services, and then allows or denies the
request.
All parts of an API request must be allowed by some authorization
mechanism in order to proceed. In other words: access is denied by default.
Note:
Access controls and policies that
depend on specific fields of specific kinds of objects are handled by
admission controllers.
Kubernetes admission control happens after authorization has completed (and,
therefore, only when the authorization decision was to allow the request).
When multiple authorization modules are configured,
each is checked in sequence.
If any authorizer approves or denies a request, that decision is immediately
returned and no other authorizer is consulted. If all modules have no opinion
on the request, then the request is denied. An overall deny verdict means that
the API server rejects the request and responds with an HTTP 403 (Forbidden)
status.
Request attributes used in authorization
Kubernetes reviews only the following API request attributes:
user - The user string provided during authentication.
group - The list of group names to which the authenticated user belongs.
extra - A map of arbitrary string keys to string values, provided by the authentication layer.
API - Indicates whether the request is for an API resource.
Request path - Path to miscellaneous non-resource endpoints like /api or /healthz.
API request verb - API verbs like get, list, create, update, patch, watch, delete, and deletecollection are used for resource requests. To determine the request verb for a resource API endpoint, see request verbs and authorization.
HTTP request verb - Lowercased HTTP methods like get, post, put, and delete are used for non-resource requests.
Resource - The ID or name of the resource that is being accessed (for resource requests only) -- For resource requests using get, update, patch, and delete verbs, you must provide the resource name.
Subresource - The subresource that is being accessed (for resource requests only).
Namespace - The namespace of the object that is being accessed (for namespaced resource requests only).
API group - The API Group being accessed (for resource requests only). An empty string designates the coreAPI group.
Request verbs and authorization
Non-resource requests
Requests to endpoints other than /api/v1/... or /apis/<group>/<version>/...
are considered non-resource requests, and use the lower-cased HTTP method of the request as the verb.
For example, making a GET request using HTTP to endpoints such as /api or /healthz would use get as the verb.
Resource requests
To determine the request verb for a resource API endpoint, Kubernetes maps the HTTP verb
used and considers whether or not the request acts on an individual resource or on a
collection of resources:
HTTP verb
request verb
POST
create
GET, HEAD
get (for individual resources), list (for collections, including full object content), watch (for watching an individual resource or collection of resources)
+The get, list and watch verbs can all return the full details of a resource. In
terms of access to the returned data they are equivalent. For example, list on secrets
will reveal the data attributes of any returned resources.
Kubernetes sometimes checks authorization for additional permissions using specialized verbs. For example:
bind and escalate verbs on roles and clusterroles resources in the rbac.authorization.k8s.io API group.
Authorization context
Kubernetes expects attributes that are common to REST API requests. This means
that Kubernetes authorization works with existing organization-wide or
cloud-provider-wide access control systems which may handle other APIs besides
the Kubernetes API.
Authorization modes
The Kubernetes API server may authorize a request using one of several authorization modes:
AlwaysAllow
This mode allows all requests, which brings security risks. Use this authorization mode only if you do not require authorization for your API requests (for example, for testing).
AlwaysDeny
This mode blocks all requests. Use this authorization mode only for testing.
Kubernetes ABAC mode defines an access control paradigm whereby access rights are granted to users through the use of policies which combine attributes together. The policies can use any type of attributes (user attributes, resource attributes, object, environment attributes, etc).
Kubernetes RBAC is a method of regulating access to computer or network resources based on the roles of individual users within an enterprise. In this context, access is the ability of an individual user to perform a specific task, such as view, create, or modify a file. In this mode, Kubernetes uses the rbac.authorization.k8s.io API group to drive authorization decisions, allowing you to dynamically configure permission policies through the Kubernetes API.
Node
A special-purpose authorization mode that grants permissions to kubelets based on the pods they are scheduled to run. To learn more about the Node authorization mode, see Node Authorization.
Webhook
Kubernetes webhook mode for authorization makes a synchronous HTTP callout, blocking the request until the remote HTTP service responds to the query.You can write your own software to handle the callout, or use solutions from the ecosystem.
You have to pick one of the two configuration approaches; setting both --authorization-config
path and configuring an authorization webhook using the --authorization-mode and
--authorization-webhook-* command line arguments is not allowed.
If you try this, the API server reports an error message during startup, then exits immediately.
Command line authorization mode configuration
FEATURE STATE:Kubernetes v1.8 [stable]
You can use the following modes:
--authorization-mode=ABAC (Attribute-based access control mode)
--authorization-mode=RBAC (Role-based access control mode)
You can choose more than one authorization mode; for example:
--authorization-mode=Node,Webhook
Kubernetes checks authorization modules based on the order that you specify them
on the API server's command line, so an earlier module has higher priority to allow
or deny a request.
The configuration file approach even allows you to specify
CEL rules to pre-filter requests before they are dispatched
to webhooks, helping you to prevent unnecessary invocations. The API server also automatically
reloads the authorizer chain when the configuration file is modified.
You specify the path to the authorization configuration using the
--authorization-config command line argument.
If you want to use command line arguments instead of a configuration file, that's also a valid and supported approach.
Some authorization capabilities (for example: multiple webhooks, webhook failure policy, and pre-filter rules)
are only available if you use an authorization configuration file.
Example configuration
---## DO NOT USE THE CONFIG AS IS. THIS IS AN EXAMPLE.#apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthorizationConfigurationauthorizers:- type:Webhook# Name used to describe the authorizer# This is explicitly used in monitoring machinery for metrics# Note:# - Validation for this field is similar to how K8s labels are validated today.# Required, with no defaultname:webhookwebhook:# The duration to cache 'authorized' responses from the webhook# authorizer.# Same as setting `--authorization-webhook-cache-authorized-ttl` flag# Default: 5m0sauthorizedTTL:30s# The duration to cache 'unauthorized' responses from the webhook# authorizer.# Same as setting `--authorization-webhook-cache-unauthorized-ttl` flag# Default: 30sunauthorizedTTL:30s# Timeout for the webhook request# Maximum allowed is 30s.# Required, with no default.timeout:3s# The API version of the authorization.k8s.io SubjectAccessReview to# send to and expect from the webhook.# Same as setting `--authorization-webhook-version` flag# Required, with no default# Valid values: v1beta1, v1subjectAccessReviewVersion:v1# MatchConditionSubjectAccessReviewVersion specifies the SubjectAccessReview# version the CEL expressions are evaluated against# Valid values: v1# Required, no default valuematchConditionSubjectAccessReviewVersion:v1# Controls the authorization decision when a webhook request fails to# complete or returns a malformed response or errors evaluating# matchConditions.# Valid values:# - NoOpinion: continue to subsequent authorizers to see if one of# them allows the request# - Deny: reject the request without consulting subsequent authorizers# Required, with no default.failurePolicy:DenyconnectionInfo:# Controls how the webhook should communicate with the server.# Valid values:# - KubeConfig: use the file specified in kubeConfigFile to locate the# server.# - InClusterConfig: use the in-cluster configuration to call the# SubjectAccessReview API hosted by kube-apiserver. This mode is not# allowed for kube-apiserver.type:KubeConfig# Path to KubeConfigFile for connection info# Required, if connectionInfo.Type is KubeConfigkubeConfigFile:/kube-system-authz-webhook.yaml# matchConditions is a list of conditions that must be met for a request to be sent to this# webhook. An empty list of matchConditions matches all requests.# There are a maximum of 64 match conditions allowed.## The exact matching logic is (in order):# 1. If at least one matchCondition evaluates to FALSE, then the webhook is skipped.# 2. If ALL matchConditions evaluate to TRUE, then the webhook is called.# 3. If at least one matchCondition evaluates to an error (but none are FALSE):# - If failurePolicy=Deny, then the webhook rejects the request# - If failurePolicy=NoOpinion, then the error is ignored and the webhook is skippedmatchConditions:# expression represents the expression which will be evaluated by CEL. Must evaluate to bool.# CEL expressions have access to the contents of the SubjectAccessReview in v1 version.# If version specified by subjectAccessReviewVersion in the request variable is v1beta1,# the contents would be converted to the v1 version before evaluating the CEL expression.## Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/## only send resource requests to the webhook- expression:has(request.resourceAttributes)# only intercept requests to kube-system- expression:request.resourceAttributes.namespace == 'kube-system'# don't intercept requests from kube-system service accounts- expression:!('system:serviceaccounts:kube-system' in request.user.groups)- type:Nodename:node- type:RBACname:rbac- type:Webhookname:in-cluster-authorizerwebhook:authorizedTTL:5munauthorizedTTL:30stimeout:3ssubjectAccessReviewVersion:v1failurePolicy:NoOpinionconnectionInfo:type:InClusterConfig
When configuring the authorizer chain using a configuration file, make sure all the
control plane nodes have the same file contents. Take a note of the API server
configuration when upgrading / downgrading your clusters. For example, if upgrading
from Kubernetes 1.30 to Kubernetes 1.31,
you would need to make sure the config file is in a format that Kubernetes 1.31
can understand, before you upgrade the cluster. If you downgrade to 1.30,
you would need to set the configuration appropriately.
Authorization configuration and reloads
Kubernetes reloads the authorization configuration file when the API server observes a change
to the file, and also on a 60 second schedule if no change events were observed.
Note:
You must ensure that all non-webhook authorizer types remain unchanged in the file on reload.
A reload must not add or remove Node or RBAC authorizers (they can be reordered,
but cannot be added or removed).
Privilege escalation via workload creation or edits
Users who can create/edit pods in a namespace, either directly or through an object that
enables indirect workload management, may be
able to escalate their privileges in that namespace. The potential routes to privilege
escalation include Kubernetes API extensions
and their associated controllers.
Caution:
As a cluster administrator, use caution when granting access to create or edit workloads.
Some details of how these can be misused are documented in
escalation paths.
Escalation paths
There are different ways that an attacker or untrustworthy user could gain additional
privilege within a namespace, if you allow them to run arbitrary Pods in that namespace:
Mounting arbitrary Secrets in that namespace
Can be used to access confidential information meant for other workloads
Can be used to obtain a more privileged ServiceAccount's service account token
Using arbitrary ServiceAccounts in that namespace
Can perform Kubernetes API actions as another workload (impersonation)
Can perform any privileged actions that ServiceAccount has
Mounting or using ConfigMaps meant for other workloads in that namespace
Can be used to obtain information meant for other workloads, such as database host names.
Mounting volumes meant for other workloads in that namespace
Can be used to obtain information meant for other workloads, and change it.
Caution:
As a system administrator, you should be cautious when deploying CustomResourceDefinitions
that let users make changes to the above areas. These may open privilege escalations paths.
Consider the consequences of this kind of change when deciding on your authorization controls.
Checking API access
kubectl provides the auth can-i subcommand for quickly querying the API authorization layer.
The command uses the SelfSubjectAccessReview API to determine if the current user can perform
a given action, and works regardless of the authorization mode used.
kubectl auth can-i create deployments --namespace dev
SelfSubjectAccessReview is part of the authorization.k8s.io API group, which
exposes the API server authorization to external services. Other resources in
this group include:
SubjectAccessReview
Access review for any user, not only the current one. Useful for delegating authorization decisions to the API server. For example, the kubelet and extension API servers use this to determine user access to their own APIs.
LocalSubjectAccessReview
Like SubjectAccessReview but restricted to a specific namespace.
SelfSubjectRulesReview
A review which returns the set of actions a user can perform within a namespace. Useful for users to quickly summarize their own access, or for UIs to hide/show actions.
These APIs can be queried by creating normal Kubernetes resources, where the response status
field of the returned object is the result of the query. For example:
Role-based access control (RBAC) is a method of regulating access to computer or
network resources based on the roles of individual users within your organization.
RBAC authorization uses the rbac.authorization.k8s.ioAPI group to drive authorization
decisions, allowing you to dynamically configure policies through the Kubernetes API.
To enable RBAC, start the API server
with the --authorization-mode flag set to a comma-separated list that includes RBAC;
for example:
The RBAC API declares four kinds of Kubernetes object: Role, ClusterRole,
RoleBinding and ClusterRoleBinding. You can describe or amend the RBAC
objects
using tools such as kubectl, just like any other Kubernetes object.
Caution:
These objects, by design, impose access restrictions. If you are making changes
to a cluster as you learn, see
privilege escalation prevention and bootstrapping
to understand how those restrictions can prevent you making some changes.
Role and ClusterRole
An RBAC Role or ClusterRole contains rules that represent a set of permissions.
Permissions are purely additive (there are no "deny" rules).
A Role always sets permissions within a particular namespace;
when you create a Role, you have to specify the namespace it belongs in.
ClusterRole, by contrast, is a non-namespaced resource. The resources have different names (Role
and ClusterRole) because a Kubernetes object always has to be either namespaced or not namespaced;
it can't be both.
ClusterRoles have several uses. You can use a ClusterRole to:
define permissions on namespaced resources and be granted access within individual namespace(s)
define permissions on namespaced resources and be granted access across all namespaces
define permissions on cluster-scoped resources
If you want to define a role within a namespace, use a Role; if you want to define
a role cluster-wide, use a ClusterRole.
Role example
Here's an example Role in the "default" namespace that can be used to grant read access to
pods:
apiVersion:rbac.authorization.k8s.io/v1kind:Rolemetadata:namespace:defaultname:pod-readerrules:- apiGroups:[""]# "" indicates the core API groupresources:["pods"]verbs:["get","watch","list"]
ClusterRole example
A ClusterRole can be used to grant the same permissions as a Role.
Because ClusterRoles are cluster-scoped, you can also use them to grant access to:
namespaced resources (like Pods), across all namespaces
For example: you can use a ClusterRole to allow a particular user to run
kubectl get pods --all-namespaces
Here is an example of a ClusterRole that can be used to grant read access to
secrets in any particular namespace,
or across all namespaces (depending on how it is bound):
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:# "namespace" omitted since ClusterRoles are not namespacedname:secret-readerrules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing Secret# objects is "secrets"resources:["secrets"]verbs:["get","watch","list"]
The name of a Role or a ClusterRole object must be a valid
path segment name.
RoleBinding and ClusterRoleBinding
A role binding grants the permissions defined in a role to a user or set of users.
It holds a list of subjects (users, groups, or service accounts), and a reference to the
role being granted.
A RoleBinding grants permissions within a specific namespace whereas a ClusterRoleBinding
grants that access cluster-wide.
A RoleBinding may reference any Role in the same namespace. Alternatively, a RoleBinding
can reference a ClusterRole and bind that ClusterRole to the namespace of the RoleBinding.
If you want to bind a ClusterRole to all the namespaces in your cluster, you use a
ClusterRoleBinding.
The name of a RoleBinding or ClusterRoleBinding object must be a valid
path segment name.
RoleBinding examples
Here is an example of a RoleBinding that grants the "pod-reader" Role to the user "jane"
within the "default" namespace.
This allows "jane" to read pods in the "default" namespace.
apiVersion:rbac.authorization.k8s.io/v1# This role binding allows "jane" to read pods in the "default" namespace.# You need to already have a Role named "pod-reader" in that namespace.kind:RoleBindingmetadata:name:read-podsnamespace:defaultsubjects:# You can specify more than one "subject"- kind:Username:jane# "name" is case sensitiveapiGroup:rbac.authorization.k8s.ioroleRef:# "roleRef" specifies the binding to a Role / ClusterRolekind:Role#this must be Role or ClusterRolename:pod-reader# this must match the name of the Role or ClusterRole you wish to bind toapiGroup:rbac.authorization.k8s.io
A RoleBinding can also reference a ClusterRole to grant the permissions defined in that
ClusterRole to resources inside the RoleBinding's namespace. This kind of reference
lets you define a set of common roles across your cluster, then reuse them within
multiple namespaces.
For instance, even though the following RoleBinding refers to a ClusterRole,
"dave" (the subject, case sensitive) will only be able to read Secrets in the "development"
namespace, because the RoleBinding's namespace (in its metadata) is "development".
apiVersion:rbac.authorization.k8s.io/v1# This role binding allows "dave" to read secrets in the "development" namespace.# You need to already have a ClusterRole named "secret-reader".kind:RoleBindingmetadata:name:read-secrets## The namespace of the RoleBinding determines where the permissions are granted.# This only grants permissions within the "development" namespace.namespace:developmentsubjects:- kind:Username:dave# Name is case sensitiveapiGroup:rbac.authorization.k8s.ioroleRef:kind:ClusterRolename:secret-readerapiGroup:rbac.authorization.k8s.io
ClusterRoleBinding example
To grant permissions across a whole cluster, you can use a ClusterRoleBinding.
The following ClusterRoleBinding allows any user in the group "manager" to read
secrets in any namespace.
apiVersion:rbac.authorization.k8s.io/v1# This cluster role binding allows anyone in the "manager" group to read secrets in any namespace.kind:ClusterRoleBindingmetadata:name:read-secrets-globalsubjects:- kind:Groupname:manager# Name is case sensitiveapiGroup:rbac.authorization.k8s.ioroleRef:kind:ClusterRolename:secret-readerapiGroup:rbac.authorization.k8s.io
After you create a binding, you cannot change the Role or ClusterRole that it refers to.
If you try to change a binding's roleRef, you get a validation error. If you do want
to change the roleRef for a binding, you need to remove the binding object and create
a replacement.
There are two reasons for this restriction:
Making roleRef immutable allows granting someone update permission on an existing binding
object, so that they can manage the list of subjects, without being able to change
the role that is granted to those subjects.
A binding to a different role is a fundamentally different binding.
Requiring a binding to be deleted/recreated in order to change the roleRef
ensures the full list of subjects in the binding is intended to be granted
the new role (as opposed to enabling or accidentally modifying only the roleRef
without verifying all of the existing subjects should be given the new role's
permissions).
The kubectl auth reconcile command-line utility creates or updates a manifest file containing RBAC objects,
and handles deleting and recreating binding objects if required to change the role they refer to.
See command usage and examples for more information.
Referring to resources
In the Kubernetes API, most resources are represented and accessed using a string representation of
their object name, such as pods for a Pod. RBAC refers to resources using exactly the same
name that appears in the URL for the relevant API endpoint.
Some Kubernetes APIs involve a
subresource, such as the logs for a Pod. A request for a Pod's logs looks like:
GET /api/v1/namespaces/{namespace}/pods/{name}/log
In this case, pods is the namespaced resource for Pod resources, and log is a
subresource of pods. To represent this in an RBAC role, use a slash (/) to
delimit the resource and subresource. To allow a subject to read pods and
also access the log subresource for each of those Pods, you write:
You can also refer to resources by name for certain requests through the resourceNames list.
When specified, requests can be restricted to individual instances of a resource.
Here is an example that restricts its subject to only get or update a
ConfigMap named my-configmap:
apiVersion:rbac.authorization.k8s.io/v1kind:Rolemetadata:namespace:defaultname:configmap-updaterrules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing ConfigMap# objects is "configmaps"resources:["configmaps"]resourceNames:["my-configmap"]verbs:["update","get"]
Note:
You cannot restrict create or deletecollection requests by their resource name.
For create, this limitation is because the name of the new object may not be known at authorization time.
If you restrict list or watch by resourceName, clients must include a metadata.name field selector in their list or watch request that matches the specified resourceName in order to be authorized.
For example, kubectl get configmaps --field-selector=metadata.name=my-configmap
Rather than referring to individual resources, apiGroups, and verbs,
you can use the wildcard * symbol to refer to all such objects.
For nonResourceURLs, you can use the wildcard * as a suffix glob match.
For resourceNames, an empty set means that everything is allowed.
Here is an example that allows access to perform any current and future action on
all current and future resources in the example.com API group.
This is similar to the built-in cluster-admin role.
apiVersion:rbac.authorization.k8s.io/v1kind:Rolemetadata:namespace:defaultname:example.com-superuser# DO NOT USE THIS ROLE, IT IS JUST AN EXAMPLErules:- apiGroups:["example.com"]resources:["*"]verbs:["*"]
Caution:
Using wildcards in resource and verb entries could result in overly permissive access being granted
to sensitive resources.
For instance, if a new resource type is added, or a new subresource is added,
or a new custom verb is checked, the wildcard entry automatically grants access, which may be undesirable.
The principle of least privilege
should be employed, using specific resources and verbs to ensure only the permissions required for the
workload to function correctly are applied.
Aggregated ClusterRoles
You can aggregate several ClusterRoles into one combined ClusterRole.
A controller, running as part of the cluster control plane, watches for ClusterRole
objects with an aggregationRule set. The aggregationRule defines a label
selector that the controller
uses to match other ClusterRole objects that should be combined into the rules
field of this one.
Caution:
The control plane overwrites any values that you manually specify in the rules field of an
aggregate ClusterRole. If you want to change or add rules, do so in the ClusterRole objects
that are selected by the aggregationRule.
Here is an example aggregated ClusterRole:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:monitoringaggregationRule:clusterRoleSelectors:- matchLabels:rbac.example.com/aggregate-to-monitoring:"true"rules:[]# The control plane automatically fills in the rules
If you create a new ClusterRole that matches the label selector of an existing aggregated ClusterRole,
that change triggers adding the new rules into the aggregated ClusterRole.
Here is an example that adds rules to the "monitoring" ClusterRole, by creating another
ClusterRole labeled rbac.example.com/aggregate-to-monitoring: true.
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:monitoring-endpointslabels:rbac.example.com/aggregate-to-monitoring:"true"# When you create the "monitoring-endpoints" ClusterRole,# the rules below will be added to the "monitoring" ClusterRole.rules:- apiGroups:[""]resources:["services","endpointslices","pods"]verbs:["get","list","watch"]
The default user-facing roles use ClusterRole aggregation. This lets you,
as a cluster administrator, include rules for custom resources, such as those served by
CustomResourceDefinitions
or aggregated API servers, to extend the default roles.
For example: the following ClusterRoles let the "admin" and "edit" default roles manage the custom resource
named CronTab, whereas the "view" role can perform only read actions on CronTab resources.
You can assume that CronTab objects are named "crontabs" in URLs as seen by the API server.
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:aggregate-cron-tabs-editlabels:# Add these permissions to the "admin" and "edit" default roles.rbac.authorization.k8s.io/aggregate-to-admin:"true"rbac.authorization.k8s.io/aggregate-to-edit:"true"rules:- apiGroups:["stable.example.com"]resources:["crontabs"]verbs:["get","list","watch","create","update","patch","delete"]---kind:ClusterRoleapiVersion:rbac.authorization.k8s.io/v1metadata:name:aggregate-cron-tabs-viewlabels:# Add these permissions to the "view" default role.rbac.authorization.k8s.io/aggregate-to-view:"true"rules:- apiGroups:["stable.example.com"]resources:["crontabs"]verbs:["get","list","watch"]
Role examples
The following examples are excerpts from Role or ClusterRole objects, showing only
the rules section.
Allow reading "pods" resources in the core
API Group:
rules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing Pod# objects is "pods"resources:["pods"]verbs:["get","list","watch"]
Allow reading/writing Deployments (at the HTTP level: objects with "deployments"
in the resource part of their URL) in the "apps" API groups:
rules:- apiGroups:["apps"]## at the HTTP level, the name of the resource for accessing Deployment# objects is "deployments"resources:["deployments"]verbs:["get","list","watch","create","update","patch","delete"]
Allow reading Pods in the core API group, as well as reading or writing Job
resources in the "batch" API group:
rules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing Pod# objects is "pods"resources:["pods"]verbs:["get","list","watch"]- apiGroups:["batch"]## at the HTTP level, the name of the resource for accessing Job# objects is "jobs"resources:["jobs"]verbs:["get","list","watch","create","update","patch","delete"]
Allow reading a ConfigMap named "my-config" (must be bound with a
RoleBinding to limit to a single ConfigMap in a single namespace):
rules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing ConfigMap# objects is "configmaps"resources:["configmaps"]resourceNames:["my-config"]verbs:["get"]
Allow reading the resource "nodes" in the core group (because a
Node is cluster-scoped, this must be in a ClusterRole bound with a
ClusterRoleBinding to be effective):
rules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing Node# objects is "nodes"resources:["nodes"]verbs:["get","list","watch"]
Allow GET and POST requests to the non-resource endpoint /healthz and
all subpaths (must be in a ClusterRole bound with a ClusterRoleBinding
to be effective):
rules:- nonResourceURLs:["/healthz","/healthz/*"]# '*' in a nonResourceURL is a suffix glob matchverbs:["get","post"]
Referring to subjects
A RoleBinding or ClusterRoleBinding binds a role to subjects.
Subjects can be groups, users or
ServiceAccounts.
Kubernetes represents usernames as strings.
These can be: plain names, such as "alice"; email-style names, like "bob@example.com";
or numeric user IDs represented as a string. It is up to you as a cluster administrator
to configure the authentication modules
so that authentication produces usernames in the format you want.
Caution:
The prefix system: is reserved for Kubernetes system use, so you should ensure
that you don't have users or groups with names that start with system: by
accident.
Other than this special prefix, the RBAC authorization system does not require any format
for usernames.
In Kubernetes, Authenticator modules provide group information.
Groups, like users, are represented as strings, and that string has no format requirements,
other than that the prefix system: is reserved.
ServiceAccounts have names prefixed
with system:serviceaccount:, and belong to groups that have names prefixed with system:serviceaccounts:.
Note:
system:serviceaccount: (singular) is the prefix for service account usernames.
system:serviceaccounts: (plural) is the prefix for service account groups.
RoleBinding examples
The following examples are RoleBinding excerpts that only
show the subjects section.
API servers create a set of default ClusterRole and ClusterRoleBinding objects.
Many of these are system: prefixed, which indicates that the resource is directly
managed by the cluster control plane.
All of the default ClusterRoles and ClusterRoleBindings are labeled with kubernetes.io/bootstrapping=rbac-defaults.
Caution:
Take care when modifying ClusterRoles and ClusterRoleBindings with names
that have a system: prefix.
Modifications to these resources can result in non-functional clusters.
Auto-reconciliation
At each start-up, the API server updates default cluster roles with any missing permissions,
and updates default cluster role bindings with any missing subjects.
This allows the cluster to repair accidental modifications, and helps to keep roles and role bindings
up-to-date as permissions and subjects change in new Kubernetes releases.
To opt out of this reconciliation, set the rbac.authorization.kubernetes.io/autoupdate
annotation on a default cluster role or default cluster RoleBinding to false.
Be aware that missing default permissions and subjects can result in non-functional clusters.
Auto-reconciliation is enabled by default if the RBAC authorizer is active.
API discovery roles
Default cluster role bindings authorize unauthenticated and authenticated users to read API information
that is deemed safe to be publicly accessible (including CustomResourceDefinitions).
To disable anonymous unauthenticated access, add --anonymous-auth=false flag to
the API server configuration.
To view the configuration of these roles via kubectl run:
kubectl get clusterroles system:discovery -o yaml
Note:
If you edit that ClusterRole, your changes will be overwritten on API server restart
via auto-reconciliation. To avoid that overwriting,
either do not manually edit the role, or disable auto-reconciliation.
Kubernetes RBAC API discovery roles
Default ClusterRole
Default ClusterRoleBinding
Description
system:basic-user
system:authenticated group
Allows a user read-only access to basic information about themselves. Prior to v1.14, this role was also bound to system:unauthenticated by default.
system:discovery
system:authenticated group
Allows read-only access to API discovery endpoints needed to discover and negotiate an API level. Prior to v1.14, this role was also bound to system:unauthenticated by default.
system:public-info-viewer
system:authenticated and system:unauthenticated groups
Allows read-only access to non-sensitive information about the cluster. Introduced in Kubernetes v1.14.
User-facing roles
Some of the default ClusterRoles are not system: prefixed. These are intended to be user-facing roles.
They include super-user roles (cluster-admin), roles intended to be granted cluster-wide
using ClusterRoleBindings, and roles intended to be granted within particular
namespaces using RoleBindings (admin, edit, view).
User-facing ClusterRoles use ClusterRole aggregation to allow admins to include
rules for custom resources on these ClusterRoles. To add rules to the admin, edit, or view roles, create
a ClusterRole with one or more of the following labels:
Allows super-user access to perform any action on any resource.
When used in a ClusterRoleBinding, it gives full control over every resource in the cluster and in all namespaces.
When used in a RoleBinding, it gives full control over every resource in the role binding's namespace, including the namespace itself.
admin
None
Allows admin access, intended to be granted within a namespace using a RoleBinding.
If used in a RoleBinding, allows read/write access to most resources in a namespace,
including the ability to create roles and role bindings within the namespace.
This role does not allow write access to resource quota or to the namespace itself.
This role also does not allow write access to EndpointSlices (or Endpoints) in clusters created
using Kubernetes v1.22+. More information is available in the
"Write Access for EndpointSlices and Endpoints" section.
edit
None
Allows read/write access to most objects in a namespace.
This role does not allow viewing or modifying roles or role bindings.
However, this role allows accessing Secrets and running Pods as any ServiceAccount in
the namespace, so it can be used to gain the API access levels of any ServiceAccount in
the namespace. This role also does not allow write access to EndpointSlices (or Endpoints) in
clusters created using Kubernetes v1.22+. More information is available in the
"Write Access for EndpointSlices and Endpoints" section.
view
None
Allows read-only access to see most objects in a namespace.
It does not allow viewing roles or role bindings.
This role does not allow viewing Secrets, since reading
the contents of Secrets enables access to ServiceAccount credentials
in the namespace, which would allow API access as any ServiceAccount
in the namespace (a form of privilege escalation).
Core component roles
Default ClusterRole
Default ClusterRoleBinding
Description
system:kube-scheduler
system:kube-scheduler user
Allows access to the resources required by the scheduler component.
system:volume-scheduler
system:kube-scheduler user
Allows access to the volume resources required by the kube-scheduler component.
system:kube-controller-manager
system:kube-controller-manager user
Allows access to the resources required by the controller manager component.
The permissions required by individual controllers are detailed in the controller roles.
system:node
None
Allows access to resources required by the kubelet, including read access to all secrets, and write access to all pod status objects.
You should use the Node authorizer and NodeRestriction admission plugin instead of the system:node role, and allow granting API access to kubelets based on the Pods scheduled to run on them.
The system:node role only exists for compatibility with Kubernetes clusters upgraded from versions prior to v1.8.
system:node-proxier
system:kube-proxy user
Allows access to the resources required by the kube-proxy component.
Other component roles
Default ClusterRole
Default ClusterRoleBinding
Description
system:auth-delegator
None
Allows delegated authentication and authorization checks.
This is commonly used by add-on API servers for unified authentication and authorization.
Allows read access to control-plane monitoring endpoints (i.e. kube-apiserver liveness and readiness endpoints (/healthz, /livez, /readyz), the individual health-check endpoints (/healthz/*, /livez/*, /readyz/*), and /metrics). Note that individual health check endpoints and the metric endpoint may expose sensitive information.
Roles for built-in controllers
The Kubernetes controller manager runs
controllers that are built in to the Kubernetes
control plane.
When invoked with --use-service-account-credentials, kube-controller-manager starts each controller
using a separate service account.
Corresponding roles exist for each built-in controller, prefixed with system:controller:.
If the controller manager is not started with --use-service-account-credentials, it runs all control loops
using its own credential, which must be granted all the relevant roles.
These roles include:
The RBAC API prevents users from escalating privileges by editing roles or role bindings.
Because this is enforced at the API level, it applies even when the RBAC authorizer is not in use.
Restrictions on role creation or update
You can only create/update a role if at least one of the following things is true:
You already have all the permissions contained in the role, at the same scope as the object being modified
(cluster-wide for a ClusterRole, within the same namespace or cluster-wide for a Role).
You are granted explicit permission to perform the escalate verb on the roles or
clusterroles resource in the rbac.authorization.k8s.io API group.
For example, if user-1 does not have the ability to list Secrets cluster-wide, they cannot create a ClusterRole
containing that permission. To allow a user to create/update roles:
Grant them a role that allows them to create/update Role or ClusterRole objects, as desired.
Grant them permission to include specific permissions in the roles they create/update:
implicitly, by giving them those permissions (if they attempt to create or modify a Role or
ClusterRole with permissions they themselves have not been granted, the API request will be forbidden)
or explicitly allow specifying any permission in a Role or ClusterRole by giving them
permission to perform the escalate verb on roles or clusterroles resources in the
rbac.authorization.k8s.io API group
Restrictions on role binding creation or update
You can only create/update a role binding if you already have all the permissions contained in the referenced role
(at the same scope as the role binding) or if you have been authorized to perform the bind verb on the referenced role.
For example, if user-1 does not have the ability to list Secrets cluster-wide, they cannot create a ClusterRoleBinding
to a role that grants that permission. To allow a user to create/update role bindings:
Grant them a role that allows them to create/update RoleBinding or ClusterRoleBinding objects, as desired.
Grant them permissions needed to bind a particular role:
implicitly, by giving them the permissions contained in the role.
explicitly, by giving them permission to perform the bind verb on the particular Role (or ClusterRole).
For example, this ClusterRole and RoleBinding would allow user-1 to grant other users the admin, edit, and view roles in the namespace user-1-namespace:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:role-grantorrules:- apiGroups:["rbac.authorization.k8s.io"]resources:["rolebindings"]verbs:["create"]- apiGroups:["rbac.authorization.k8s.io"]resources:["clusterroles"]verbs:["bind"]# omit resourceNames to allow binding any ClusterRoleresourceNames:["admin","edit","view"]---apiVersion:rbac.authorization.k8s.io/v1kind:RoleBindingmetadata:name:role-grantor-bindingnamespace:user-1-namespaceroleRef:apiGroup:rbac.authorization.k8s.iokind:ClusterRolename:role-grantorsubjects:- apiGroup:rbac.authorization.k8s.iokind:Username:user-1
When bootstrapping the first roles and role bindings, it is necessary for the initial user to grant permissions they do not yet have.
To bootstrap initial roles and role bindings:
Use a credential with the "system:masters" group, which is bound to the "cluster-admin" super-user role by the default bindings.
Command-line utilities
kubectl create role
Creates a Role object defining permissions within a single namespace. Examples:
Create a Role named "pod-reader" that allows users to perform get, watch and list on pods:
kubectl create role pod-reader --verb=get --verb=list --verb=watch --resource=pods
Create a Role named "pod-reader" with resourceNames specified:
kubectl create role pod-reader --verb=get --resource=pods --resource-name=readablepod --resource-name=anotherpod
Create a Role named "foo" with apiGroups specified:
kubectl create role foo --verb=get,list,watch --resource=replicasets.apps
Create a Role named "foo" with subresource permissions:
kubectl create role foo --verb=get,list,watch --resource=pods,pods/status
Create a Role named "my-component-lease-holder" with permissions to get/update a resource with a specific name:
kubectl create role my-component-lease-holder --verb=get,list,watch,update --resource=lease --resource-name=my-component
kubectl create clusterrole
Creates a ClusterRole. Examples:
Create a ClusterRole named "pod-reader" that allows user to perform get, watch and list on pods:
Default RBAC policies grant scoped permissions to control-plane components, nodes,
and controllers, but grant no permissions to service accounts outside the kube-system namespace
(beyond the permissions given by API discovery roles).
This allows you to grant particular roles to particular ServiceAccounts as needed.
Fine-grained role bindings provide greater security, but require more effort to administrate.
Broader grants can give unnecessary (and potentially escalating) API access to
ServiceAccounts, but are easier to administrate.
In order from most secure to least secure, the approaches are:
Grant a role to an application-specific service account (best practice)
This requires the application to specify a serviceAccountName in its pod spec,
and for the service account to be created (via the API, application manifest, kubectl create serviceaccount, etc.).
For example, grant read-only permission within "my-namespace" to the "my-sa" service account:
Many add-ons run as the
"default" service account in the kube-system namespace.
To allow those add-ons to run with super-user access, grant cluster-admin
permissions to the "default" service account in the kube-system namespace.
Caution:
Enabling this means the kube-system namespace contains Secrets
that grant super-user access to your cluster's API.
Grant a role to all service accounts in a namespace
If you want all applications in a namespace to have a role, no matter what service account they use,
you can grant a role to the service account group for that namespace.
For example, grant read-only permission within "my-namespace" to all service accounts in that namespace:
Grant super-user access to all service accounts cluster-wide (strongly discouraged)
If you don't care about partitioning permissions at all, you can grant super-user access to all service accounts.
Warning:
This allows any application full access to your cluster, and also grants
any user with read access to Secrets (or the ability to create any pod)
full access to your cluster.
Kubernetes clusters created before Kubernetes v1.22 include write access to
EndpointSlices (and Endpoints) in the aggregated "edit" and "admin" roles.
As a mitigation for CVE-2021-25740,
this access is not part of the aggregated roles in clusters that you create using
Kubernetes v1.22 or later.
Existing clusters that have been upgraded to Kubernetes v1.22 will not be
subject to this change. The CVE
announcement includes
guidance for restricting this access in existing clusters.
If you want new clusters to retain this level of access in the aggregated roles,
you can create the following ClusterRole:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:annotations:kubernetes.io/description:|- Add endpoints write permissions to the edit and admin roles. This was
removed by default in 1.22 because of CVE-2021-25740. See
https://issue.k8s.io/103675. This can allow writers to direct LoadBalancer
or Ingress implementations to expose backend IPs that would not otherwise
be accessible, and can circumvent network policies or security controls
intended to prevent/isolate access to those backends.
EndpointSlices were never included in the edit or admin roles, so there
is nothing to restore for the EndpointSlice API.labels:rbac.authorization.k8s.io/aggregate-to-edit:"true"name:custom:aggregate-to-edit:endpoints# you can change this if you wishrules:- apiGroups:[""]resources:["endpoints"]verbs:["create","delete","deletecollection","patch","update"]
Upgrading from ABAC
Clusters that originally ran older Kubernetes versions often used
permissive ABAC policies, including granting full API access to all
service accounts.
Default RBAC policies grant scoped permissions to control-plane components, nodes,
and controllers, but grant no permissions to service accounts outside the kube-system namespace
(beyond the permissions given by API discovery roles).
While far more secure, this can be disruptive to existing workloads expecting to automatically receive API permissions.
Here are two approaches for managing this transition:
Parallel authorizers
Run both the RBAC and ABAC authorizers, and specify a policy file that contains
the legacy ABAC policy:
To explain that first command line option in detail: if earlier authorizers, such as Node,
deny a request, then the RBAC authorizer attempts to authorize the API request. If RBAC
also denies that API request, the ABAC authorizer is then run. This means that any request
allowed by either the RBAC or ABAC policies is allowed.
When the kube-apiserver is run with a log level of 5 or higher for the RBAC component
(--vmodule=rbac*=5 or --v=5), you can see RBAC denials in the API server log
(prefixed with RBAC).
You can use that information to determine which roles need to be granted to which users, groups, or service accounts.
Once you have granted roles to service accounts and workloads
are running with no RBAC denial messages in the server logs, you can remove the ABAC authorizer.
Permissive RBAC permissions
You can replicate a permissive ABAC policy using RBAC role bindings.
Warning:
The following policy allows ALL service accounts to act as cluster administrators.
Any application running in a container receives service account credentials automatically,
and could perform any action against the API, including viewing secrets and modifying permissions.
This is not a recommended policy.
After you have transitioned to use RBAC, you should adjust the access controls
for your cluster to ensure that these meet your information security needs.
3.5 - Using Node Authorization
Node authorization is a special-purpose authorization mode that specifically
authorizes API requests made by kubelets.
Overview
The Node authorizer allows a kubelet to perform API operations. This includes:
Read operations:
services
endpoints
nodes
pods
secrets, configmaps, persistent volume claims and persistent volumes related
to pods bound to the kubelet's node
FEATURE STATE:Kubernetes v1.31 [alpha]
When the AuthorizeNodeWithSelectors feature is enabled
(along with the pre-requisite AuthorizeWithSelectors feature),
kubelets are only allowed to read their own Node objects,
and are only allowed to read pods bound to their node.
Write operations:
nodes and node status (enable the NodeRestriction admission plugin to limit
a kubelet to modify its own node)
pods and pod status (enable the NodeRestriction admission plugin to limit a
kubelet to modify pods bound to itself)
the ability to create TokenReviews and SubjectAccessReviews for delegated
authentication/authorization checks
In future releases, the node authorizer may add or remove permissions to ensure
kubelets have the minimal set of permissions required to operate correctly.
In order to be authorized by the Node authorizer, kubelets must use a credential
that identifies them as being in the system:nodes group, with a username of
system:node:<nodeName>.
This group and user name format match the identity created for each kubelet as part of
kubelet TLS bootstrapping.
The value of <nodeName>must match precisely the name of the node as
registered by the kubelet. By default, this is the host name as provided by
hostname, or overridden via the
kubelet option--hostname-override. However, when using the --cloud-provider kubelet
option, the specific hostname may be determined by the cloud provider, ignoring
the local hostname and the --hostname-override option.
For specifics about how the kubelet determines the hostname, see the
kubelet options reference.
To enable the Node authorizer, start the apiserver with --authorization-mode=Node.
To limit the API objects kubelets are able to write, enable the
NodeRestriction
admission plugin by starting the apiserver with
--enable-admission-plugins=...,NodeRestriction,...
Migration considerations
Kubelets outside the system:nodes group
Kubelets outside the system:nodes group would not be authorized by the Node
authorization mode, and would need to continue to be authorized via whatever
mechanism currently authorizes them.
The node admission plugin would not restrict requests from these kubelets.
Kubelets with undifferentiated usernames
In some deployments, kubelets have credentials that place them in the system:nodes group,
but do not identify the particular node they are associated with,
because they do not have a username in the system:node:... format.
These kubelets would not be authorized by the Node authorization mode,
and would need to continue to be authorized via whatever mechanism currently authorizes them.
The NodeRestriction admission plugin would ignore requests from these kubelets,
since the default node identifier implementation would not consider that a node identity.
3.6 - Webhook Mode
A WebHook is an HTTP callback: an HTTP POST that occurs when something happens; a simple event-notification via HTTP POST. A web application implementing WebHooks will POST a message to a URL when certain things happen.
When specified, mode Webhook causes Kubernetes to query an outside REST
service when determining user privileges.
Configuration File Format
Mode Webhook requires a file for HTTP configuration, specify by the
--authorization-webhook-config-file=SOME_FILENAME flag.
The configuration file uses the kubeconfig
file format. Within the file "users" refers to the API Server webhook and
"clusters" refers to the remote service.
A configuration example which uses HTTPS client auth:
# Kubernetes API versionapiVersion:v1# kind of the API objectkind:Config# clusters refers to the remote service.clusters:- name:name-of-remote-authz-servicecluster:# CA for verifying the remote service.certificate-authority:/path/to/ca.pem# URL of remote service to query. Must use 'https'. May not include parameters.server:https://authz.example.com/authorize# users refers to the API Server's webhook configuration.users:- name:name-of-api-serveruser:client-certificate:/path/to/cert.pem# cert for the webhook plugin to useclient-key:/path/to/key.pem # key matching the cert# kubeconfig files require a context. Provide one for the API Server.current-context:webhookcontexts:- context:cluster:name-of-remote-authz-serviceuser:name-of-api-servername:webhook
Request Payloads
When faced with an authorization decision, the API Server POSTs a JSON-
serialized authorization.k8s.io/v1beta1SubjectAccessReview object describing the
action. This object contains fields describing the user attempting to make the
request, and either details about the resource being accessed or requests
attributes.
Note that webhook API objects are subject to the same versioning compatibility rules
as other Kubernetes API objects. Implementers should be aware of looser
compatibility promises for beta objects and check the "apiVersion" field of the
request to ensure correct deserialization. Additionally, the API Server must
enable the authorization.k8s.io/v1beta1 API extensions group (--runtime-config=authorization.k8s.io/v1beta1=true).
The remote service is expected to fill the status field of
the request and respond to either allow or disallow access. The response body's
spec field is ignored and may be omitted. A permissive response would return:
The first method is preferred in most cases, and indicates the authorization
webhook does not allow, or has "no opinion" about the request, but if other
authorizers are configured, they are given a chance to allow the request.
If there are no other authorizers, or none of them allow the request, the
request is forbidden. The webhook would return:
{
"apiVersion": "authorization.k8s.io/v1beta1",
"kind": "SubjectAccessReview",
"status": {
"allowed": false,
"reason": "user does not have read access to the namespace" }
}
The second method denies immediately, short-circuiting evaluation by other
configured authorizers. This should only be used by webhooks that have
detailed knowledge of the full authorizer configuration of the cluster.
The webhook would return:
{
"apiVersion": "authorization.k8s.io/v1beta1",
"kind": "SubjectAccessReview",
"status": {
"allowed": false,
"denied": true,
"reason": "user does not have read access to the namespace" }
}
With the AuthorizeWithSelectors feature enabled, field and label selectors in the request
are passed to the authorization webhook. The webhook can make authorization decisions
informed by the scoped field and label selectors, if it wishes.
The SubjectAccessReview API documentation
gives guidelines for how these fields should be interpreted and handled by authorization webhooks,
specifically using the parsed requirements rather than the raw selector strings,
and how to handle unrecognized operators safely.
Non-resource paths include: /api, /apis, /metrics,
/logs, /debug, /healthz, /livez, /openapi/v2, /readyz, and
/version. Clients require access to /api, /api/*, /apis, /apis/*,
and /version to discover what resources and versions are present on the server.
Access to other non-resource paths can be disallowed without restricting access
to the REST api.
Attribute-based access control (ABAC) defines an access control paradigm whereby access rights are granted
to users through the use of policies which combine attributes together.
Policy File Format
To enable ABAC mode, specify --authorization-policy-file=SOME_FILENAME and --authorization-mode=ABAC
on startup.
The file format is one JSON object per line. There
should be no enclosing list or map, only one map per line.
Each line is a "policy object", where each such object is a map with the following
properties:
Versioning properties:
apiVersion, type string; valid values are "abac.authorization.kubernetes.io/v1beta1". Allows versioning
and conversion of the policy format.
kind, type string: valid values are "Policy". Allows versioning and conversion of the policy format.
spec property set to a map with the following properties:
Subject-matching properties:
user, type string; the user-string from --token-auth-file. If you specify user, it must match the
username of the authenticated user.
group, type string; if you specify group, it must match one of the groups of the authenticated user.
system:authenticated matches all authenticated requests. system:unauthenticated matches all
unauthenticated requests.
Resource-matching properties:
apiGroup, type string; an API group.
Ex: apps, networking.k8s.io
Wildcard: * matches all API groups.
namespace, type string; a namespace.
Ex: kube-system
Wildcard: * matches all resource requests.
resource, type string; a resource type
Ex: pods, deployments
Wildcard: * matches all resource requests.
Non-resource-matching properties:
nonResourcePath, type string; non-resource request paths.
Ex: /version or /apis
Wildcard:
* matches all non-resource requests.
/foo/* matches all subpaths of /foo/.
readonly, type boolean, when true, means that the Resource-matching policy only applies to get, list,
and watch operations, Non-resource-matching policy only applies to get operation.
Note:
An unset property is the same as a property set to the zero value for its type
(e.g. empty string, 0, false). However, unset should be preferred for
readability.
In the future, policies may be expressed in a JSON format, and managed via a
REST interface.
Authorization Algorithm
A request has attributes which correspond to the properties of a policy object.
When a request is received, the attributes are determined. Unknown attributes
are set to the zero value of its type (e.g. empty string, 0, false).
A property set to "*" will match any value of the corresponding attribute.
The tuple of attributes is checked for a match against every policy in the
policy file. If at least one line matches the request attributes, then the
request is authorized (but may fail later validation).
To permit any authenticated user to do something, write a policy with the
group property set to "system:authenticated".
To permit any unauthenticated user to do something, write a policy with the
group property set to "system:unauthenticated".
To permit a user to do anything, write a policy with the apiGroup, namespace,
resource, and nonResourcePath properties set to "*".
Kubectl
Kubectl uses the /api and /apis endpoints of apiserver to discover
served resource types, and validates objects sent to the API by create/update
operations using schema information located at /openapi/v2.
When using ABAC authorization, those special resources have to be explicitly
exposed via the nonResourcePath property in a policy (see examples below):
/api, /api/*, /apis, and /apis/* for API version negotiation.
/version for retrieving the server version via kubectl version.
/swaggerapi/* for create/update operations.
To inspect the HTTP calls involved in a specific kubectl operation you can turn
up the verbosity:
Creating a new namespace leads to the creation of a new service account in the following format:
system:serviceaccount:<namespace>:default
For example, if you wanted to grant the default service account (in the kube-system namespace) full
privilege to the API using ABAC, you would add this line to your policy file:
The apiserver will need to be restarted to pick up the new policy lines.
3.8 - Admission Controllers Reference
This page provides an overview of Admission Controllers.
What are they?
An admission controller is a piece of code that intercepts requests to the
Kubernetes API server prior to persistence of the object, but after the request
is authenticated and authorized.
Admission controllers may be validating, mutating, or both. Mutating
controllers may modify objects related to the requests they admit; validating controllers may not.
Admission controllers limit requests to create, delete, modify objects. Admission
controllers can also block custom verbs, such as a request connect to a Pod via
an API server proxy. Admission controllers do not (and cannot) block requests
to read (get, watch or list) objects.
The admission controllers in Kubernetes 1.31 consist of the
list below, are compiled into the
kube-apiserver binary, and may only be configured by the cluster
administrator. In that list, there are two special controllers:
MutatingAdmissionWebhook and ValidatingAdmissionWebhook. These execute the
mutating and validating (respectively)
admission control webhooks
which are configured in the API.
Admission control phases
The admission control process proceeds in two phases. In the first phase,
mutating admission controllers are run. In the second phase, validating
admission controllers are run. Note again that some of the controllers are
both.
If any of the controllers in either phase reject the request, the entire
request is rejected immediately and an error is returned to the end-user.
Finally, in addition to sometimes mutating the object in question, admission
controllers may sometimes have side effects, that is, mutate related
resources as part of request processing. Incrementing quota usage is the
canonical example of why this is necessary. Any such side-effect needs a
corresponding reclamation or reconciliation process, as a given admission
controller does not know for sure that a given request will pass all of the
other admission controllers.
Why do I need them?
Several important features of Kubernetes require an admission controller to be enabled in order
to properly support the feature. As a result, a Kubernetes API server that is not properly
configured with the right set of admission controllers is an incomplete server and will not
support all the features you expect.
How do I turn on an admission controller?
The Kubernetes API server flag enable-admission-plugins takes a comma-delimited list of admission control plugins to invoke prior to modifying objects in the cluster.
For example, the following command line enables the NamespaceLifecycle and the LimitRanger
admission control plugins:
Depending on the way your Kubernetes cluster is deployed and how the API server is
started, you may need to apply the settings in different ways. For example, you may
have to modify the systemd unit file if the API server is deployed as a systemd
service, you may modify the manifest file for the API server if Kubernetes is deployed
in a self-hosted way.
How do I turn off an admission controller?
The Kubernetes API server flag disable-admission-plugins takes a comma-delimited list of admission control plugins to be disabled, even if they are in the list of plugins enabled by default.
The ValidatingAdmissionPolicy admission plugin is enabled
by default, but is only active if you enable the ValidatingAdmissionPolicyfeature gateand
the admissionregistration.k8s.io/v1alpha1 API.
What does each admission controller do?
AlwaysAdmit
FEATURE STATE:Kubernetes v1.13 [deprecated]
Type: Validating.
This admission controller allows all pods into the cluster. It is deprecated because
its behavior is the same as if there were no admission controller at all.
AlwaysDeny
FEATURE STATE:Kubernetes v1.13 [deprecated]
Type: Validating.
Rejects all requests. AlwaysDeny is deprecated as it has no real meaning.
AlwaysPullImages
Type: Mutating and Validating.
This admission controller modifies every new Pod to force the image pull policy to Always. This is useful in a
multitenant cluster so that users can be assured that their private images can only be used by those
who have the credentials to pull them. Without this admission controller, once an image has been pulled to a
node, any pod from any user can use it by knowing the image's name (assuming the Pod is
scheduled onto the right node), without any authorization check against the image. When this admission controller
is enabled, images are always pulled prior to starting containers, which means valid credentials are
required.
CertificateApproval
Type: Validating.
This admission controller observes requests to approve CertificateSigningRequest resources and performs additional
authorization checks to ensure the approving user has permission to approve certificate requests with the
spec.signerName requested on the CertificateSigningRequest resource.
See Certificate Signing Requests for more
information on the permissions required to perform different actions on CertificateSigningRequest resources.
CertificateSigning
Type: Validating.
This admission controller observes updates to the status.certificate field of CertificateSigningRequest resources
and performs an additional authorization checks to ensure the signing user has permission to sign certificate
requests with the spec.signerName requested on the CertificateSigningRequest resource.
See Certificate Signing Requests for more
information on the permissions required to perform different actions on CertificateSigningRequest resources.
CertificateSubjectRestriction
Type: Validating.
This admission controller observes creation of CertificateSigningRequest resources that have a spec.signerName
of kubernetes.io/kube-apiserver-client. It rejects any request that specifies a 'group' (or 'organization attribute')
of system:masters.
DefaultIngressClass
Type: Mutating.
This admission controller observes creation of Ingress objects that do not request any specific
ingress class and automatically adds a default ingress class to them. This way, users that do not
request any special ingress class do not need to care about them at all and they will get the
default one.
This admission controller does not do anything when no default ingress class is configured. When more than one ingress
class is marked as default, it rejects any creation of Ingress with an error and an administrator
must revisit their IngressClass objects and mark only one as default (with the annotation
"ingressclass.kubernetes.io/is-default-class"). This admission controller ignores any Ingress
updates; it acts only on creation.
See the Ingress documentation for more about ingress
classes and how to mark one as default.
DefaultStorageClass
Type: Mutating.
This admission controller observes creation of PersistentVolumeClaim objects that do not request any specific storage class
and automatically adds a default storage class to them.
This way, users that do not request any special storage class do not need to care about them at all and they
will get the default one.
This admission controller does not do anything when no default storage class is configured. When more than one storage
class is marked as default, it rejects any creation of PersistentVolumeClaim with an error and an administrator
must revisit their StorageClass objects and mark only one as default.
This admission controller ignores any PersistentVolumeClaim updates; it acts only on creation.
See persistent volume documentation about persistent volume claims and
storage classes and how to mark a storage class as default.
DefaultTolerationSeconds
Type: Mutating.
This admission controller sets the default forgiveness toleration for pods to tolerate
the taints notready:NoExecute and unreachable:NoExecute based on the k8s-apiserver input parameters
default-not-ready-toleration-seconds and default-unreachable-toleration-seconds if the pods don't already
have toleration for taints node.kubernetes.io/not-ready:NoExecute or
node.kubernetes.io/unreachable:NoExecute.
The default value for default-not-ready-toleration-seconds and default-unreachable-toleration-seconds is 5 minutes.
DenyServiceExternalIPs
Type: Validating.
This admission controller rejects all net-new usage of the Service field externalIPs. This
feature is very powerful (allows network traffic interception) and not well
controlled by policy. When enabled, users of the cluster may not create new
Services which use externalIPs and may not add new values to externalIPs on
existing Service objects. Existing uses of externalIPs are not affected,
and users may remove values from externalIPs on existing Service objects.
Most users do not need this feature at all, and cluster admins should consider disabling it.
Clusters that do need to use this feature should consider using some custom policy to manage usage
of it.
This admission controller is disabled by default.
EventRateLimit
FEATURE STATE:Kubernetes v1.13 [alpha]
Type: Validating.
This admission controller mitigates the problem where the API server gets flooded by
requests to store new Events. The cluster admin can specify event rate limits by:
Enabling the EventRateLimit admission controller;
Referencing an EventRateLimit configuration file from the file provided to the API
server's command line flag --admission-control-config-file:
This plug-in facilitates creation of dedicated nodes with extended resources.
If operators want to create dedicated nodes with extended resources (like GPUs, FPGAs etc.), they are expected to
taint the node with the extended resource
name as the key. This admission controller, if enabled, automatically
adds tolerations for such taints to pods requesting extended resources, so users don't have to manually
add these tolerations.
This admission controller is disabled by default.
ImagePolicyWebhook
Type: Validating.
The ImagePolicyWebhook admission controller allows a backend webhook to make admission decisions.
This admission controller is disabled by default.
Configuration file format
ImagePolicyWebhook uses a configuration file to set options for the behavior of the backend.
This file may be json or yaml and has the following format:
imagePolicy:kubeConfigFile:/path/to/kubeconfig/for/backend# time in s to cache approvalallowTTL:50# time in s to cache denialdenyTTL:50# time in ms to wait between retriesretryBackoff:500# determines behavior if the webhook backend failsdefaultAllow:true
Reference the ImagePolicyWebhook configuration file from the file provided to the API server's command line flag --admission-control-config-file:
The ImagePolicyWebhook config file must reference a
kubeconfig
formatted file which sets up the connection to the backend.
It is required that the backend communicate over TLS.
The kubeconfig file's cluster field must point to the remote service, and the user field
must contain the returned authorizer.
# clusters refers to the remote service.clusters:- name:name-of-remote-imagepolicy-servicecluster:certificate-authority:/path/to/ca.pem # CA for verifying the remote service.server:https://images.example.com/policy# URL of remote service to query. Must use 'https'.# users refers to the API server's webhook configuration.users:- name:name-of-api-serveruser:client-certificate:/path/to/cert.pem# cert for the webhook admission controller to useclient-key:/path/to/key.pem # key matching the cert
For additional HTTP configuration, refer to the
kubeconfig documentation.
Request payloads
When faced with an admission decision, the API Server POSTs a JSON serialized
imagepolicy.k8s.io/v1alpha1ImageReview object describing the action.
This object contains fields describing the containers being admitted, as well as
any pod annotations that match *.image-policy.k8s.io/*.
Note:
The webhook API objects are subject to the same versioning compatibility rules
as other Kubernetes API objects. Implementers should be aware of looser compatibility
promises for alpha objects and check the apiVersion field of the request to
ensure correct deserialization.
Additionally, the API Server must enable the imagepolicy.k8s.io/v1alpha1 API extensions
group (--runtime-config=imagepolicy.k8s.io/v1alpha1=true).
The remote service is expected to fill the status field of the request and
respond to either allow or disallow access. The response body's spec field is ignored, and
may be omitted. A permissive response would return:
All annotations on a Pod that match *.image-policy.k8s.io/* are sent to the webhook.
Sending annotations allows users who are aware of the image policy backend to
send extra information to it, and for different backends implementations to
accept different information.
Examples of information you might put here are:
request to "break glass" to override a policy, in case of emergency.
a ticket number from a ticket system that documents the break-glass request
provide a hint to the policy server as to the imageID of the image being provided, to save it a lookup
In any case, the annotations are provided by the user and are not validated by Kubernetes in any way.
LimitPodHardAntiAffinityTopology
Type: Validating.
This admission controller denies any pod that defines AntiAffinity topology key other than
kubernetes.io/hostname in requiredDuringSchedulingRequiredDuringExecution.
This admission controller is disabled by default.
LimitRanger
Type: Mutating and Validating.
This admission controller will observe the incoming request and ensure that it does not violate
any of the constraints enumerated in the LimitRange object in a Namespace. If you are using
LimitRange objects in your Kubernetes deployment, you MUST use this admission controller to
enforce those constraints. LimitRanger can also be used to apply default resource requests to Pods
that don't specify any; currently, the default LimitRanger applies a 0.1 CPU requirement to all
Pods in the default namespace.
This admission controller calls any mutating webhooks which match the request. Matching
webhooks are called in serial; each one may modify the object if it desires.
This admission controller (as implied by the name) only runs in the mutating phase.
If a webhook called by this has side effects (for example, decrementing quota) it
must have a reconciliation system, as it is not guaranteed that subsequent
webhooks or validating admission controllers will permit the request to finish.
If you disable the MutatingAdmissionWebhook, you must also disable the
MutatingWebhookConfiguration object in the admissionregistration.k8s.io/v1
group/version via the --runtime-config flag, both are on by default.
Use caution when authoring and installing mutating webhooks
Users may be confused when the objects they try to create are different from
what they get back.
Built in control loops may break when the objects they try to create are
different when read back.
Setting originally unset fields is less likely to cause problems than
overwriting fields set in the original request. Avoid doing the latter.
Future changes to control loops for built-in resources or third-party resources
may break webhooks that work well today. Even when the webhook installation API
is finalized, not all possible webhook behaviors will be guaranteed to be supported
indefinitely.
NamespaceAutoProvision
Type: Mutating.
This admission controller examines all incoming requests on namespaced resources and checks
if the referenced namespace does exist.
It creates a namespace if it cannot be found.
This admission controller is useful in deployments that do not want to restrict creation of
a namespace prior to its usage.
NamespaceExists
Type: Validating.
This admission controller checks all requests on namespaced resources other than Namespace itself.
If the namespace referenced from a request doesn't exist, the request is rejected.
NamespaceLifecycle
Type: Validating.
This admission controller enforces that a Namespace that is undergoing termination cannot have
new objects created in it, and ensures that requests in a non-existent Namespace are rejected.
This admission controller also prevents deletion of three system reserved namespaces default,
kube-system, kube-public.
A Namespace deletion kicks off a sequence of operations that remove all objects (pods, services,
etc.) in that namespace. In order to enforce integrity of that process, we strongly recommend
running this admission controller.
NodeRestriction
Type: Validating.
This admission controller limits the Node and Pod objects a kubelet can modify. In order to be limited by this admission controller,
kubelets must use credentials in the system:nodes group, with a username in the form system:node:<nodeName>.
Such kubelets will only be allowed to modify their own Node API object, and only modify Pod API objects that are bound to their node.
kubelets are not allowed to update or remove taints from their Node API object.
The NodeRestriction admission plugin prevents kubelets from deleting their Node API object,
and enforces kubelet modification of labels under the kubernetes.io/ or k8s.io/ prefixes as follows:
Prevents kubelets from adding/removing/updating labels with a node-restriction.kubernetes.io/ prefix.
This label prefix is reserved for administrators to label their Node objects for workload isolation purposes,
and kubelets will not be allowed to modify labels with that prefix.
Allows kubelets to add/remove/update these labels and label prefixes:
Use of any other labels under the kubernetes.io or k8s.io prefixes by kubelets is reserved,
and may be disallowed or allowed by the NodeRestriction admission plugin in the future.
Future versions may add additional restrictions to ensure kubelets have the minimal set of
permissions required to operate correctly.
OwnerReferencesPermissionEnforcement
Type: Validating.
This admission controller protects the access to the metadata.ownerReferences of an object
so that only users with delete permission to the object can change it.
This admission controller also protects the access to metadata.ownerReferences[x].blockOwnerDeletion
of an object, so that only users with update permission to the finalizers
subresource of the referenced owner can change it.
PersistentVolumeClaimResize
FEATURE STATE:Kubernetes v1.24 [stable]
Type: Validating.
This admission controller implements additional validations for checking incoming
PersistentVolumeClaim resize requests.
Enabling the PersistentVolumeClaimResize admission controller is recommended.
This admission controller prevents resizing of all claims by default unless a claim's StorageClass
explicitly enables resizing by setting allowVolumeExpansion to true.
For example: all PersistentVolumeClaims created from the following StorageClass support volume expansion:
This admission controller defaults and limits what node selectors may be used within a namespace
by reading a namespace annotation and a global configuration.
This admission controller is disabled by default.
Configuration file format
PodNodeSelector uses a configuration file to set options for the behavior of the backend.
Note that the configuration file format will move to a versioned file in a future release.
This file may be json or yaml and has the following format:
This admission controller has the following behavior:
If the Namespace has an annotation with a key scheduler.alpha.kubernetes.io/node-selector,
use its value as the node selector.
If the namespace lacks such an annotation, use the clusterDefaultNodeSelector defined in the
PodNodeSelector plugin configuration file as the node selector.
Evaluate the pod's node selector against the namespace node selector for conflicts. Conflicts
result in rejection.
Evaluate the pod's node selector against the namespace-specific allowed selector defined the
plugin configuration file. Conflicts result in rejection.
Note:
PodNodeSelector allows forcing pods to run on specifically labeled nodes. Also see the PodTolerationRestriction
admission plugin, which allows preventing pods from running on specifically tainted nodes.
PodSecurity
FEATURE STATE:Kubernetes v1.25 [stable]
Type: Validating.
The PodSecurity admission controller checks new Pods before they are
admitted, determines if it should be admitted based on the requested security context and the restrictions on permitted
Pod Security Standards
for the namespace that the Pod would be in.
PodSecurity replaced an older admission controller named PodSecurityPolicy.
PodTolerationRestriction
FEATURE STATE:Kubernetes v1.7 [alpha]
Type: Mutating and Validating.
The PodTolerationRestriction admission controller verifies any conflict between tolerations of a
pod and the tolerations of its namespace.
It rejects the pod request if there is a conflict.
It then merges the tolerations annotated on the namespace into the tolerations of the pod.
The resulting tolerations are checked against a list of allowed tolerations annotated to the namespace.
If the check succeeds, the pod request is admitted otherwise it is rejected.
If the namespace of the pod does not have any associated default tolerations or allowed
tolerations annotated, the cluster-level default tolerations or cluster-level list of allowed tolerations are used
instead if they are specified.
Tolerations to a namespace are assigned via the scheduler.alpha.kubernetes.io/defaultTolerations annotation key.
The list of allowed tolerations can be added via the scheduler.alpha.kubernetes.io/tolerationsWhitelist annotation key.
The priority admission controller uses the priorityClassName field and populates the integer
value of the priority.
If the priority class is not found, the Pod is rejected.
ResourceQuota
Type: Validating.
This admission controller will observe the incoming request and ensure that it does not violate
any of the constraints enumerated in the ResourceQuota object in a Namespace. If you are
using ResourceQuota objects in your Kubernetes deployment, you MUST use this admission
controller to enforce quota constraints.
If you define a RuntimeClass with Pod overhead
configured, this admission controller checks incoming Pods.
When enabled, this admission controller rejects any Pod create requests
that have the overhead already set.
For Pods that have a RuntimeClass configured and selected in their .spec,
this admission controller sets .spec.overhead in the Pod based on the value
defined in the corresponding RuntimeClass.
This admission controller implements automation for
serviceAccounts.
The Kubernetes project strongly recommends enabling this admission controller.
You should enable this admission controller if you intend to make any use of Kubernetes
ServiceAccount objects.
Regarding the annotation kubernetes.io/enforce-mountable-secrets: While the annotation's name suggests it only concerns the mounting of Secrets,
its enforcement also extends to other ways Secrets are used in the context of a Pod.
Therefore, it is crucial to ensure that all the referenced secrets are correctly specified in the ServiceAccount.
StorageObjectInUseProtection
Type: Mutating.
The StorageObjectInUseProtection plugin adds the kubernetes.io/pvc-protection or kubernetes.io/pv-protection
finalizers to newly created Persistent Volume Claims (PVCs) or Persistent Volumes (PV).
In case a user deletes a PVC or PV the PVC or PV is not removed until the finalizer is removed
from the PVC or PV by PVC or PV Protection Controller.
Refer to the
Storage Object in Use Protection
for more detailed information.
TaintNodesByCondition
Type: Mutating.
This admission controller taints newly created
Nodes as NotReady and NoSchedule. That tainting avoids a race condition that could cause Pods
to be scheduled on new Nodes before their taints were updated to accurately reflect their reported
conditions.
ValidatingAdmissionPolicy
Type: Validating.
This admission controller implements the CEL validation for incoming matched requests.
It is enabled when both feature gate validatingadmissionpolicy and admissionregistration.k8s.io/v1alpha1 group/version are enabled.
If any of the ValidatingAdmissionPolicy fails, the request fails.
ValidatingAdmissionWebhook
Type: Validating.
This admission controller calls any validating webhooks which match the request. Matching
webhooks are called in parallel; if any of them rejects the request, the request
fails. This admission controller only runs in the validation phase; the webhooks it calls may not
mutate the object, as opposed to the webhooks called by the MutatingAdmissionWebhook admission controller.
If a webhook called by this has side effects (for example, decrementing quota) it
must have a reconciliation system, as it is not guaranteed that subsequent
webhooks or other validating admission controllers will permit the request to finish.
If you disable the ValidatingAdmissionWebhook, you must also disable the
ValidatingWebhookConfiguration object in the admissionregistration.k8s.io/v1
group/version via the --runtime-config flag.
Is there a recommended set of admission controllers to use?
Yes. The recommended admission controllers are enabled by default
(shown here),
so you do not need to explicitly specify them.
You can enable additional admission controllers beyond the default set using the
--enable-admission-plugins flag (order doesn't matter).
3.9 - Dynamic Admission Control
In addition to compiled-in admission plugins,
admission plugins can be developed as extensions and run as webhooks configured at runtime.
This page describes how to build, configure, use, and monitor admission webhooks.
What are admission webhooks?
Admission webhooks are HTTP callbacks that receive admission requests and do
something with them. You can define two types of admission webhooks,
validating admission webhook
and
mutating admission webhook.
Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom defaults.
After all object modifications are complete, and after the incoming object is validated by the API server,
validating admission webhooks are invoked and can reject requests to enforce custom policies.
Note:
Admission webhooks that need to guarantee they see the final state of the object in order to enforce policy
should use a validating admission webhook, since objects can be modified after being seen by mutating webhooks.
Experimenting with admission webhooks
Admission webhooks are essentially part of the cluster control-plane. You should
write and deploy them with great caution. Please read the
user guides
for instructions if you intend to write/deploy production-grade admission webhooks.
In the following, we describe how to quickly experiment with admission webhooks.
Prerequisites
Ensure that MutatingAdmissionWebhook and ValidatingAdmissionWebhook
admission controllers are enabled.
Here
is a recommended set of admission controllers to enable in general.
Ensure that the admissionregistration.k8s.io/v1 API is enabled.
Write an admission webhook server
Please refer to the implementation of the admission webhook server
that is validated in a Kubernetes e2e test. The webhook handles the
AdmissionReview request sent by the API servers, and sends back its decision
as an AdmissionReview object in the same version it received.
See the webhook request section for details on the data sent to webhooks.
See the webhook response section for the data expected from webhooks.
The example admission webhook server leaves the ClientAuth field
empty,
which defaults to NoClientCert. This means that the webhook server does not
authenticate the identity of the clients, supposedly API servers. If you need
mutual TLS or other ways to authenticate the clients, see
how to authenticate API servers.
Deploy the admission webhook service
The webhook server in the e2e test is deployed in the Kubernetes cluster, via
the deployment API.
The test also creates a service
as the front-end of the webhook server. See
code.
You may also deploy your webhooks outside of the cluster. You will need to update
your webhook configurations accordingly.
The following is an example ValidatingWebhookConfiguration, a mutating webhook configuration is similar.
See the webhook configuration section for details about each config field.
You must replace the <CA_BUNDLE> in the above example by a valid CA bundle
which is a PEM-encoded (field value is Base64 encoded) CA bundle for validating the webhook's server certificate.
The scope field specifies if only cluster-scoped resources ("Cluster") or namespace-scoped
resources ("Namespaced") will match this rule. "∗" means that there are no scope restrictions.
Note:
When using clientConfig.service, the server cert must be valid for
<svc_name>.<svc_namespace>.svc.
Note:
Default timeout for a webhook call is 10 seconds,
You can set the timeout and it is encouraged to use a short timeout for webhooks.
If the webhook call times out, the request is handled according to the webhook's
failure policy.
When an API server receives a request that matches one of the rules, the
API server sends an admissionReview request to webhook as specified in the
clientConfig.
After you create the webhook configuration, the system will take a few seconds
to honor the new configuration.
Authenticate API servers
If your admission webhooks require authentication, you can configure the
API servers to use basic auth, bearer token, or a cert to authenticate itself to
the webhooks. There are three steps to complete the configuration.
When starting the API server, specify the location of the admission control
configuration file via the --admission-control-config-file flag.
In the admission control configuration file, specify where the
MutatingAdmissionWebhook controller and ValidatingAdmissionWebhook controller
should read the credentials. The credentials are stored in kubeConfig files
(yes, the same schema that's used by kubectl), so the field name is
kubeConfigFile. Here is an example admission control configuration file:
# Deprecated in v1.17 in favor of apiserver.config.k8s.io/v1apiVersion:apiserver.k8s.io/v1alpha1kind:AdmissionConfigurationplugins:- name:ValidatingAdmissionWebhookconfiguration:# Deprecated in v1.17 in favor of apiserver.config.k8s.io/v1, kind=WebhookAdmissionConfigurationapiVersion:apiserver.config.k8s.io/v1alpha1kind:WebhookAdmissionkubeConfigFile:"<path-to-kubeconfig-file>"- name:MutatingAdmissionWebhookconfiguration:# Deprecated in v1.17 in favor of apiserver.config.k8s.io/v1, kind=WebhookAdmissionConfigurationapiVersion:apiserver.config.k8s.io/v1alpha1kind:WebhookAdmissionkubeConfigFile:"<path-to-kubeconfig-file>"
apiVersion:v1kind:Configusers:# name should be set to the DNS name of the service or the host (including port) of the URL the webhook is configured to speak to.# If a non-443 port is used for services, it must be included in the name when configuring 1.16+ API servers.## For a webhook configured to speak to a service on the default port (443), specify the DNS name of the service:# - name: webhook1.ns1.svc# user: ...## For a webhook configured to speak to a service on non-default port (e.g. 8443), specify the DNS name and port of the service in 1.16+:# - name: webhook1.ns1.svc:8443# user: ...# and optionally create a second stanza using only the DNS name of the service for compatibility with 1.15 API servers:# - name: webhook1.ns1.svc# user: ...## For webhooks configured to speak to a URL, match the host (and port) specified in the webhook's URL. Examples:# A webhook with `url: https://www.example.com`:# - name: www.example.com# user: ...## A webhook with `url: https://www.example.com:443`:# - name: www.example.com:443# user: ...## A webhook with `url: https://www.example.com:8443`:# - name: www.example.com:8443# user: ...#- name:'webhook1.ns1.svc'user:client-certificate-data:"<pem encoded certificate>"client-key-data:"<pem encoded key>"# The `name` supports using * to wildcard-match prefixing segments.- name:'*.webhook-company.org'user:password:"<password>"username:"<name>"# '*' is the default match.- name:'*'user:token:"<token>"
Of course you need to set up the webhook server to handle these authentication requests.
Webhook request and response
Request
Webhooks are sent as POST requests, with Content-Type: application/json,
with an AdmissionReview API object in the admission.k8s.io API group
serialized to JSON as the body.
Webhooks can specify what versions of AdmissionReview objects they accept
with the admissionReviewVersions field in their configuration:
admissionReviewVersions is a required field when creating webhook configurations.
Webhooks are required to support at least one AdmissionReview
version understood by the current and previous API server.
API servers send the first AdmissionReview version in the admissionReviewVersions list they support.
If none of the versions in the list are supported by the API server, the configuration will not be allowed to be created.
If an API server encounters a webhook configuration that was previously created and does not support any of the AdmissionReview
versions the API server knows how to send, attempts to call to the webhook will fail and be subject to the failure policy.
This example shows the data contained in an AdmissionReview object
for a request to update the scale subresource of an apps/v1Deployment:
apiVersion:admission.k8s.io/v1kind:AdmissionReviewrequest:# Random uid uniquely identifying this admission calluid:705ab4f5-6393-11e8-b7cc-42010a800002# Fully-qualified group/version/kind of the incoming objectkind:group:autoscalingversion:v1kind:Scale# Fully-qualified group/version/kind of the resource being modifiedresource:group:appsversion:v1resource:deployments# subresource, if the request is to a subresourcesubResource:scale# Fully-qualified group/version/kind of the incoming object in the original request to the API server.# This only differs from `kind` if the webhook specified `matchPolicy: Equivalent` and the# original request to the API server was converted to a version the webhook registered for.requestKind:group:autoscalingversion:v1kind:Scale# Fully-qualified group/version/kind of the resource being modified in the original request to the API server.# This only differs from `resource` if the webhook specified `matchPolicy: Equivalent` and the# original request to the API server was converted to a version the webhook registered for.requestResource:group:appsversion:v1resource:deployments# subresource, if the request is to a subresource# This only differs from `subResource` if the webhook specified `matchPolicy: Equivalent` and the# original request to the API server was converted to a version the webhook registered for.requestSubResource:scale# Name of the resource being modifiedname:my-deployment# Namespace of the resource being modified, if the resource is namespaced (or is a Namespace object)namespace:my-namespace# operation can be CREATE, UPDATE, DELETE, or CONNECToperation:UPDATEuserInfo:# Username of the authenticated user making the request to the API serverusername:admin# UID of the authenticated user making the request to the API serveruid:014fbff9a07c# Group memberships of the authenticated user making the request to the API servergroups:- system:authenticated- my-admin-group# Arbitrary extra info associated with the user making the request to the API server.# This is populated by the API server authentication layer and should be included# if any SubjectAccessReview checks are performed by the webhook.extra:some-key:- some-value1- some-value2# object is the new object being admitted.# It is null for DELETE operations.object:apiVersion:autoscaling/v1kind:Scale# oldObject is the existing object.# It is null for CREATE and CONNECT operations.oldObject:apiVersion:autoscaling/v1kind:Scale# options contains the options for the operation being admitted, like meta.k8s.io/v1 CreateOptions, UpdateOptions, or DeleteOptions.# It is null for CONNECT operations.options:apiVersion:meta.k8s.io/v1kind:UpdateOptions# dryRun indicates the API request is running in dry run mode and will not be persisted.# Webhooks with side effects should avoid actuating those side effects when dryRun is true.# See http://k8s.io/docs/reference/using-api/api-concepts/#make-a-dry-run-request for more details.dryRun:False
Response
Webhooks respond with a 200 HTTP status code, Content-Type: application/json,
and a body containing an AdmissionReview object (in the same version they were sent),
with the response stanza populated, serialized to JSON.
At a minimum, the response stanza must contain the following fields:
uid, copied from the request.uid sent to the webhook
allowed, either set to true or false
Example of a minimal response from a webhook to allow a request:
When rejecting a request, the webhook can customize the http code and message returned to the user
using the status field. The specified status object is returned to the user.
See the API documentation
for details about the status type.
Example of a response to forbid a request, customizing the HTTP status code and message presented to the user:
{
"apiVersion": "admission.k8s.io/v1",
"kind": "AdmissionReview",
"response": {
"uid": "<value from request.uid>",
"allowed": false,
"status": {
"code": 403,
"message": "You cannot do this because it is Tuesday and your name starts with A" }
}
}
When allowing a request, a mutating admission webhook may optionally modify the incoming object as well.
This is done using the patch and patchType fields in the response.
The only currently supported patchType is JSONPatch.
See JSON patch documentation for more details.
For patchType: JSONPatch, the patch field contains a base64-encoded array of JSON patch operations.
As an example, a single patch operation that would set spec.replicas would be
[{"op": "add", "path": "/spec/replicas", "value": 3}]
Base64-encoded, this would be W3sib3AiOiAiYWRkIiwgInBhdGgiOiAiL3NwZWMvcmVwbGljYXMiLCAidmFsdWUiOiAzfV0=
Admission webhooks can optionally return warning messages that are returned to the requesting client
in HTTP Warning headers with a warning code of 299. Warnings can be sent with allowed or rejected admission responses.
If you're implementing a webhook that returns a warning:
Don't include a "Warning:" prefix in the message
Use warning messages to describe problems the client making the API request should correct or be aware of
Limit warnings to 120 characters if possible
Caution:
Individual warning messages over 256 characters may be truncated by the API server before being returned to clients.
If more than 4096 characters of warning messages are added (from all sources), additional warning messages are ignored.
{
"apiVersion": "admission.k8s.io/v1",
"kind": "AdmissionReview",
"response": {
"uid": "<value from request.uid>",
"allowed": true,
"warnings": [
"duplicate envvar entries specified with name MY_ENV",
"memory request less than 4MB specified for container mycontainer, which will not start successfully" ]
}
}
Webhook configuration
To register admission webhooks, create MutatingWebhookConfiguration or ValidatingWebhookConfiguration API objects.
The name of a MutatingWebhookConfiguration or a ValidatingWebhookConfiguration object must be a valid
DNS subdomain name.
Each configuration can contain one or more webhooks.
If multiple webhooks are specified in a single configuration, each must be given a unique name.
This is required in order to make resulting audit logs and metrics easier to match up to active
configurations.
Each webhook defines the following things.
Matching requests: rules
Each webhook must specify a list of rules used to determine if a request to the API server should be sent to the webhook.
Each rule specifies one or more operations, apiGroups, apiVersions, and resources, and a resource scope:
operations lists one or more operations to match. Can be "CREATE", "UPDATE", "DELETE", "CONNECT",
or "*" to match all.
apiGroups lists one or more API groups to match. "" is the core API group. "*" matches all API groups.
apiVersions lists one or more API versions to match. "*" matches all API versions.
resources lists one or more resources to match.
"*" matches all resources, but not subresources.
"*/*" matches all resources and subresources.
"pods/*" matches all subresources of pods.
"*/status" matches all status subresources.
scope specifies a scope to match. Valid values are "Cluster", "Namespaced", and "*".
Subresources match the scope of their parent resource. Default is "*".
"Cluster" means that only cluster-scoped resources will match this rule (Namespace API objects are cluster-scoped).
"Namespaced" means that only namespaced resources will match this rule.
"*" means that there are no scope restrictions.
If an incoming request matches one of the specified operations, groups, versions,
resources, and scope for any of a webhook's rules, the request is sent to the webhook.
Here are other examples of rules that could be used to specify which resources should be intercepted.
Match CREATE or UPDATE requests to apps/v1 and apps/v1beta1deployments and replicasets:
Webhooks may optionally limit which requests are intercepted based on the labels of the
objects they would be sent, by specifying an objectSelector. If specified, the objectSelector
is evaluated against both the object and oldObject that would be sent to the webhook,
and is considered to match if either object matches the selector.
A null object (oldObject in the case of create, or newObject in the case of delete),
or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object)
is not considered to match.
Use the object selector only if the webhook is opt-in, because end users may skip
the admission webhook by setting the labels.
This example shows a mutating webhook that would match a CREATE of any resource (but not subresources) with the label foo: bar:
See labels concept
for more examples of label selectors.
Matching requests: namespaceSelector
Webhooks may optionally limit which requests for namespaced resources are intercepted,
based on the labels of the containing namespace, by specifying a namespaceSelector.
The namespaceSelector decides whether to run the webhook on a request for a namespaced resource
(or a Namespace object), based on whether the namespace's labels match the selector.
If the object itself is a namespace, the matching is performed on object.metadata.labels.
If the object is a cluster scoped resource other than a Namespace, namespaceSelector has no effect.
This example shows a mutating webhook that matches a CREATE of any namespaced resource inside a namespace
that does not have a "runlevel" label of "0" or "1":
This example shows a validating webhook that matches a CREATE of any namespaced resource inside
a namespace that is associated with the "environment" of "prod" or "staging":
See labels concept
for more examples of label selectors.
Matching requests: matchPolicy
API servers can make objects available via multiple API groups or versions.
For example, if a webhook only specified a rule for some API groups/versions
(like apiGroups:["apps"], apiVersions:["v1","v1beta1"]),
and a request was made to modify the resource via another API group/version (like extensions/v1beta1),
the request would not be sent to the webhook.
The matchPolicy lets a webhook define how its rules are used to match incoming requests.
Allowed values are Exact or Equivalent.
Exact means a request should be intercepted only if it exactly matches a specified rule.
Equivalent means a request should be intercepted if it modifies a resource listed in rules,
even via another API group or version.
In the example given above, the webhook that only registered for apps/v1 could use matchPolicy:
matchPolicy: Exact would mean the extensions/v1beta1 request would not be sent to the webhook
matchPolicy: Equivalent means the extensions/v1beta1 request would be sent to the webhook
(with the objects converted to a version the webhook had specified: apps/v1)
Specifying Equivalent is recommended, and ensures that webhooks continue to intercept the
resources they expect when upgrades enable new versions of the resource in the API server.
When a resource stops being served by the API server, it is no longer considered equivalent to
other versions of that resource that are still served.
For example, extensions/v1beta1 deployments were first deprecated and then removed (in Kubernetes v1.16).
Since that removal, a webhook with a apiGroups:["extensions"], apiVersions:["v1beta1"], resources:["deployments"] rule
does not intercept deployments created via apps/v1 APIs. For that reason, webhooks should prefer registering
for stable versions of resources.
This example shows a validating webhook that intercepts modifications to deployments (no matter the API group or version),
and is always sent an apps/v1Deployment object:
The matchPolicy for an admission webhooks defaults to Equivalent.
Matching requests: matchConditions
FEATURE STATE:Kubernetes v1.30 [stable]
You can define match conditions for webhooks if you need fine-grained request filtering. These
conditions are useful if you find that match rules, objectSelectors and namespaceSelectors still
doesn't provide the filtering you want over when to call out over HTTP. Match conditions are
CEL expressions. All match conditions must evaluate to true for the
webhook to be called.
Here is an example illustrating a few different uses for match conditions:
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingWebhookConfigurationwebhooks:- name:my-webhook.example.commatchPolicy:Equivalentrules:- operations:['CREATE','UPDATE']apiGroups:['*']apiVersions:['*']resources:['*']failurePolicy:'Ignore'# Fail-open (optional)sideEffects:NoneclientConfig:service:namespace:my-namespacename:my-webhookcaBundle:'<omitted>'# You can have up to 64 matchConditions per webhookmatchConditions:- name:'exclude-leases'# Each match condition must have a unique nameexpression:'!(request.resource.group == "coordination.k8s.io" && request.resource.resource == "leases")'# Match non-lease resources.- name:'exclude-kubelet-requests'expression:'!("system:nodes" in request.userInfo.groups)'# Match requests made by non-node users.- name:'rbac'# Skip RBAC requests, which are handled by the second webhook.expression:'request.resource.group != "rbac.authorization.k8s.io"'# This example illustrates the use of the 'authorizer'. The authorization check is more expensive# than a simple expression, so in this example it is scoped to only RBAC requests by using a second# webhook. Both webhooks can be served by the same endpoint.- name:rbac.my-webhook.example.commatchPolicy:Equivalentrules:- operations:['CREATE','UPDATE']apiGroups:['rbac.authorization.k8s.io']apiVersions:['*']resources:['*']failurePolicy:'Fail'# Fail-closed (the default)sideEffects:NoneclientConfig:service:namespace:my-namespacename:my-webhookcaBundle:'<omitted>'# You can have up to 64 matchConditions per webhookmatchConditions:- name:'breakglass'# Skip requests made by users authorized to 'breakglass' on this webhook.# The 'breakglass' API verb does not need to exist outside this check.expression:'!authorizer.group("admissionregistration.k8s.io").resource("validatingwebhookconfigurations").name("my-webhook.example.com").check("breakglass").allowed()'
Note:
You can define up to 64 elements in the matchConditions field per webhook.
Match conditions have access to the following CEL variables:
object - The object from the incoming request. The value is null for DELETE requests. The object
version may be converted based on the matchPolicy.
oldObject - The existing object. The value is null for CREATE requests.
request - The request portion of the AdmissionReview, excluding object and oldObject.
authorizer - A CEL Authorizer. May be used to perform authorization checks for the principal
(authenticated user) of the request. See
Authz in the Kubernetes CEL library
documentation for more details.
authorizer.requestResource - A shortcut for an authorization check configured with the request
resource (group, resource, (subresource), namespace, name).
Once the API server has determined a request should be sent to a webhook,
it needs to know how to contact the webhook. This is specified in the clientConfig
stanza of the webhook configuration.
Webhooks can either be called via a URL or a service reference,
and can optionally include a custom CA bundle to use to verify the TLS connection.
URL
url gives the location of the webhook, in standard URL form
(scheme://host:port/path).
The host should not refer to a service running in the cluster; use
a service reference by specifying the service field instead.
The host might be resolved via external DNS in some API servers
(e.g., kube-apiserver cannot resolve in-cluster DNS as that would
be a layering violation). host may also be an IP address.
Please note that using localhost or 127.0.0.1 as a host is
risky unless you take great care to run this webhook on all hosts
which run an API server which might need to make calls to this
webhook. Such installations are likely to be non-portable or not readily
run in a new cluster.
The scheme must be "https"; the URL must begin with "https://".
Attempting to use a user or basic auth (for example user:password@) is not allowed.
Fragments (#...) and query parameters (?...) are also not allowed.
Here is an example of a mutating webhook configured to call a URL
(and expects the TLS certificate to be verified using system trust roots, so does not specify a caBundle):
The service stanza inside clientConfig is a reference to the service for this webhook.
If the webhook is running within the cluster, then you should use service instead of url.
The service namespace and name are required. The port is optional and defaults to 443.
The path is optional and defaults to "/".
Here is an example of a mutating webhook configured to call a service on port "1234"
at the subpath "/my-path", and to verify the TLS connection against the ServerName
my-service-name.my-service-namespace.svc using a custom CA bundle:
You must replace the <CA_BUNDLE> in the above example by a valid CA bundle
which is a PEM-encoded CA bundle for validating the webhook's server certificate.
Side effects
Webhooks typically operate only on the content of the AdmissionReview sent to them.
Some webhooks, however, make out-of-band changes as part of processing admission requests.
Webhooks that make out-of-band changes ("side effects") must also have a reconciliation mechanism
(like a controller) that periodically determines the actual state of the world, and adjusts
the out-of-band data modified by the admission webhook to reflect reality.
This is because a call to an admission webhook does not guarantee the admitted object will be persisted as is, or at all.
Later webhooks can modify the content of the object, a conflict could be encountered while writing to storage,
or the server could power off before persisting the object.
Additionally, webhooks with side effects must skip those side-effects when dryRun: true admission requests are handled.
A webhook must explicitly indicate that it will not have side-effects when run with dryRun,
or the dry-run request will not be sent to the webhook and the API request will fail instead.
Webhooks indicate whether they have side effects using the sideEffects field in the webhook configuration:
None: calling the webhook will have no side effects.
NoneOnDryRun: calling the webhook will possibly have side effects, but if a request with
dryRun: true is sent to the webhook, the webhook will suppress the side effects (the webhook
is dryRun-aware).
Here is an example of a validating webhook indicating it has no side effects on dryRun: true requests:
Because webhooks add to API request latency, they should evaluate as quickly as possible.
timeoutSeconds allows configuring how long the API server should wait for a webhook to respond
before treating the call as a failure.
If the timeout expires before the webhook responds, the webhook call will be ignored or
the API call will be rejected based on the failure policy.
The timeout value must be between 1 and 30 seconds.
Here is an example of a validating webhook with a custom timeout of 2 seconds:
The timeout for an admission webhook defaults to 10 seconds.
Reinvocation policy
A single ordering of mutating admissions plugins (including webhooks) does not work for all cases
(see https://issue.k8s.io/64333 as an example). A mutating webhook can add a new sub-structure
to the object (like adding a container to a pod), and other mutating plugins which have already
run may have opinions on those new structures (like setting an imagePullPolicy on all containers).
To allow mutating admission plugins to observe changes made by other plugins,
built-in mutating admission plugins are re-run if a mutating webhook modifies an object,
and mutating webhooks can specify a reinvocationPolicy to control whether they are reinvoked as well.
reinvocationPolicy may be set to Never or IfNeeded. It defaults to Never.
Never: the webhook must not be called more than once in a single admission evaluation.
IfNeeded: the webhook may be called again as part of the admission evaluation if the object
being admitted is modified by other admission plugins after the initial webhook call.
The important elements to note are:
The number of additional invocations is not guaranteed to be exactly one.
If additional invocations result in further modifications to the object, webhooks are not
guaranteed to be invoked again.
Webhooks that use this option may be reordered to minimize the number of additional invocations.
To validate an object after all mutations are guaranteed complete, use a validating admission
webhook instead (recommended for webhooks with side-effects).
Here is an example of a mutating webhook opting into being re-invoked if later admission plugins
modify the object:
Mutating webhooks must be idempotent, able to successfully process an object they have already admitted
and potentially modified. This is true for all mutating admission webhooks, since any change they can make
in an object could already exist in the user-provided object, but it is essential for webhooks that opt into reinvocation.
Failure policy
failurePolicy defines how unrecognized errors and timeout errors from the admission webhook
are handled. Allowed values are Ignore or Fail.
Ignore means that an error calling the webhook is ignored and the API request is allowed to continue.
Fail means that an error calling the webhook causes the admission to fail and the API request to be rejected.
Here is a mutating webhook configured to reject an API request if errors are encountered calling the admission webhook:
The default failurePolicy for an admission webhooks is Fail.
Monitoring admission webhooks
The API server provides ways to monitor admission webhook behaviors. These
monitoring mechanisms help cluster admins to answer questions like:
Which mutating webhook mutated the object in a API request?
What change did the mutating webhook applied to the object?
Which webhooks are frequently rejecting API requests? What's the reason for a rejection?
Mutating webhook auditing annotations
Sometimes it's useful to know which mutating webhook mutated the object in a API request, and what change did the
webhook apply.
The Kubernetes API server performs auditing on each
mutating webhook invocation. Each invocation generates an auditing annotation
capturing if a request object is mutated by the invocation, and optionally generates an annotation
capturing the applied patch from the webhook admission response. The annotations are set in the
audit event for given request on given stage of its execution, which is then pre-processed
according to a certain policy and written to a backend.
The audit level of a event determines which annotations get recorded:
At Metadata audit level or higher, an annotation with key
mutation.webhook.admission.k8s.io/round_{round idx}_index_{order idx} gets logged with JSON
payload indicating a webhook gets invoked for given request and whether it mutated the object or not.
For example, the following annotation gets recorded for a webhook being reinvoked. The webhook is
ordered the third in the mutating webhook chain, and didn't mutated the request object during the
invocation.
# the audit event recorded{"kind": "Event","apiVersion": "audit.k8s.io/v1","annotations": {"mutation.webhook.admission.k8s.io/round_1_index_2": "{\"configuration\":\"my-mutating-webhook-configuration.example.com\",\"webhook\":\"my-webhook.example.com\",\"mutated\": false}"# other annotations...}# other fields...}
# the annotation value deserialized{"configuration": "my-mutating-webhook-configuration.example.com","webhook": "my-webhook.example.com","mutated": false}
The following annotation gets recorded for a webhook being invoked in the first round. The webhook
is ordered the first in the mutating webhook chain, and mutated the request object during the
invocation.
# the audit event recorded{"kind": "Event","apiVersion": "audit.k8s.io/v1","annotations": {"mutation.webhook.admission.k8s.io/round_0_index_0": "{\"configuration\":\"my-mutating-webhook-configuration.example.com\",\"webhook\":\"my-webhook-always-mutate.example.com\",\"mutated\": true}"# other annotations...}# other fields...}
# the annotation value deserialized{"configuration": "my-mutating-webhook-configuration.example.com","webhook": "my-webhook-always-mutate.example.com","mutated": true}
At Request audit level or higher, an annotation with key
patch.webhook.admission.k8s.io/round_{round idx}_index_{order idx} gets logged with JSON payload indicating
a webhook gets invoked for given request and what patch gets applied to the request object.
For example, the following annotation gets recorded for a webhook being reinvoked. The webhook is ordered the fourth in the
mutating webhook chain, and responded with a JSON patch which got applied to the request object.
# the audit event recorded{"kind": "Event","apiVersion": "audit.k8s.io/v1","annotations": {"patch.webhook.admission.k8s.io/round_1_index_3": "{\"configuration\":\"my-other-mutating-webhook-configuration.example.com\",\"webhook\":\"my-webhook-always-mutate.example.com\",\"patch\":[{\"op\":\"add\",\"path\":\"/data/mutation-stage\",\"value\":\"yes\"}],\"patchType\":\"JSONPatch\"}"# other annotations...}# other fields...}
# the annotation value deserialized{"configuration": "my-other-mutating-webhook-configuration.example.com","webhook": "my-webhook-always-mutate.example.com","patchType": "JSONPatch","patch": [{"op": "add","path": "/data/mutation-stage","value": "yes"}]}
Admission webhook metrics
The API server exposes Prometheus metrics from the /metrics endpoint, which can be used for monitoring and
diagnosing API server status. The following metrics record status related to admission webhooks.
API server admission webhook rejection count
Sometimes it's useful to know which admission webhooks are frequently rejecting API requests, and the
reason for a rejection.
The API server exposes a Prometheus counter metric recording admission webhook rejections. The
metrics are labelled to identify the causes of webhook rejection(s):
name: the name of the webhook that rejected a request.
operation: the operation type of the request, can be one of CREATE,
UPDATE, DELETE and CONNECT.
type: the admission webhook type, can be one of admit and validating.
error_type: identifies if an error occurred during the webhook invocation
that caused the rejection. Its value can be one of:
calling_webhook_error: unrecognized errors or timeout errors from the admission webhook happened and the
webhook's Failure policy is set to Fail.
no_error: no error occurred. The webhook rejected the request with allowed: false in the admission
response. The metrics label rejection_code records the .status.code set in the admission response.
apiserver_internal_error: an API server internal error happened.
rejection_code: the HTTP status code set in the admission response when a
webhook rejected a request.
Example of the rejection count metrics:
# HELP apiserver_admission_webhook_rejection_count [ALPHA] Admission webhook rejection count, identified by name and broken out for each admission type (validating or admit) and operation. Additional labels specify an error type (calling_webhook_error or apiserver_internal_error if an error occurred; no_error otherwise) and optionally a non-zero rejection code if the webhook rejects the request with an HTTP status code (honored by the apiserver when the code is greater or equal to 400). Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded.
# TYPE apiserver_admission_webhook_rejection_count counter
apiserver_admission_webhook_rejection_count{error_type="calling_webhook_error",name="always-timeout-webhook.example.com",operation="CREATE",rejection_code="0",type="validating"} 1
apiserver_admission_webhook_rejection_count{error_type="calling_webhook_error",name="invalid-admission-response-webhook.example.com",operation="CREATE",rejection_code="0",type="validating"} 1
apiserver_admission_webhook_rejection_count{error_type="no_error",name="deny-unwanted-configmap-data.example.com",operation="CREATE",rejection_code="400",type="validating"} 13
Best practices and warnings
Idempotence
An idempotent mutating admission webhook is able to successfully process an object it has already admitted
and potentially modified. The admission can be applied multiple times without changing the result beyond
the initial application.
Example of idempotent mutating admission webhooks:
For a CREATE pod request, set the field .spec.securityContext.runAsNonRoot of the
pod to true, to enforce security best practices.
For a CREATE pod request, if the field .spec.containers[].resources.limits
of a container is not set, set default resource limits.
For a CREATE pod request, inject a sidecar container with name foo-sidecar if no container
with the name foo-sidecar already exists.
In the cases above, the webhook can be safely reinvoked, or admit an object that already has the fields set.
Example of non-idempotent mutating admission webhooks:
For a CREATE pod request, inject a sidecar container with name foo-sidecar
suffixed with the current timestamp (e.g. foo-sidecar-19700101-000000).
For a CREATE/UPDATE pod request, reject if the pod has label "env" set,
otherwise add an "env": "prod" label to the pod.
For a CREATE pod request, blindly append a sidecar container named
foo-sidecar without looking to see if there is already a foo-sidecar
container in the pod.
In the first case above, reinvoking the webhook can result in the same sidecar being injected multiple times to a pod, each time
with a different container name. Similarly the webhook can inject duplicated containers if the sidecar already exists in
a user-provided pod.
In the second case above, reinvoking the webhook will result in the webhook failing on its own output.
In the third case above, reinvoking the webhook will result in duplicated containers in the pod spec, which makes
the request invalid and rejected by the API server.
Intercepting all versions of an object
It is recommended that admission webhooks should always intercept all versions of an object by setting .webhooks[].matchPolicy
to Equivalent. It is also recommended that admission webhooks should prefer registering for stable versions of resources.
Failure to intercept all versions of an object can result in admission policies not being enforced for requests in certain
versions. See Matching requests: matchPolicy for examples.
Availability
It is recommended that admission webhooks should evaluate as quickly as possible (typically in
milliseconds), since they add to API request latency.
It is encouraged to use a small timeout for webhooks. See Timeouts for more detail.
It is recommended that admission webhooks should leverage some format of load-balancing, to
provide high availability and performance benefits. If a webhook is running within the cluster,
you can run multiple webhook backends behind a service to leverage the load-balancing that service
supports.
Guaranteeing the final state of the object is seen
Admission webhooks that need to guarantee they see the final state of the object in order to enforce policy
should use a validating admission webhook, since objects can be modified after being seen by mutating webhooks.
For example, a mutating admission webhook is configured to inject a sidecar container with name
"foo-sidecar" on every CREATE pod request. If the sidecar must be present, a validating
admisson webhook should also be configured to intercept CREATE pod requests, and validate that a
container with name "foo-sidecar" with the expected configuration exists in the to-be-created
object.
Avoiding deadlocks in self-hosted webhooks
A webhook running inside the cluster might cause deadlocks for its own deployment if it is configured
to intercept resources required to start its own pods.
For example, a mutating admission webhook is configured to admit CREATE pod requests only if a certain label is set in the
pod (e.g. "env": "prod"). The webhook server runs in a deployment which doesn't set the "env" label.
When a node that runs the webhook server pods
becomes unhealthy, the webhook deployment will try to reschedule the pods to another node. However the requests will
get rejected by the existing webhook server since the "env" label is unset, and the migration cannot happen.
It is recommended to exclude the namespace where your webhook is running with a
namespaceSelector.
Side effects
It is recommended that admission webhooks should avoid side effects if possible, which means the webhooks operate only on the
content of the AdmissionReview sent to them, and do not make out-of-band changes. The .webhooks[].sideEffects field should
be set to None if a webhook doesn't have any side effect.
If side effects are required during the admission evaluation, they must be suppressed when processing an
AdmissionReview object with dryRun set to true, and the .webhooks[].sideEffects field should be
set to NoneOnDryRun. See Side effects for more detail.
Avoiding operating on the kube-system namespace
The kube-system namespace contains objects created by the Kubernetes system,
e.g. service accounts for the control plane components, pods like kube-dns.
Accidentally mutating or rejecting requests in the kube-system namespace may
cause the control plane components to stop functioning or introduce unknown behavior.
If your admission webhooks don't intend to modify the behavior of the Kubernetes control
plane, exclude the kube-system namespace from being intercepted using a
namespaceSelector.
3.10 - Managing Service Accounts
A ServiceAccount provides an identity for processes that run in a Pod.
A process inside a Pod can use the identity of its associated service account to
authenticate to the cluster's API server.
This task guide explains some of the concepts behind ServiceAccounts. The
guide also explains how to obtain or revoke tokens that represent
ServiceAccounts.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must
be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a
cluster, you can create one by using
minikube
or you can use one of these Kubernetes playgrounds:
To be able to follow these steps exactly, ensure you have a namespace named
examplens.
If you don't, create one by running:
kubectl create namespace examplens
User accounts versus service accounts
Kubernetes distinguishes between the concept of a user account and a service account
for a number of reasons:
User accounts are for humans. Service accounts are for application processes,
which (for Kubernetes) run in containers that are part of pods.
User accounts are intended to be global: names must be unique across all
namespaces of a cluster. No matter what namespace you look at, a particular
username that represents a user represents the same user.
In Kubernetes, service accounts are namespaced: two different namespaces can
contain ServiceAccounts that have identical names.
Typically, a cluster's user accounts might be synchronised from a corporate
database, where new user account creation requires special privileges and is
tied to complex business processes. By contrast, service account creation is
intended to be more lightweight, allowing cluster users to create service accounts
for specific tasks on demand. Separating ServiceAccount creation from the steps to
onboard human users makes it easier for workloads to follow the principle of
least privilege.
Auditing considerations for humans and service accounts may differ; the separation
makes that easier to achieve.
A configuration bundle for a complex system may include definition of various service
accounts for components of that system. Because service accounts can be created
without many constraints and have namespaced names, such configuration is
usually portable.
Bound service account tokens
ServiceAccount tokens can be bound to API objects that exist in the kube-apiserver.
This can be used to tie the validity of a token to the existence of another API object.
Supported object types are as follows:
Pod (used for projected volume mounts, see below)
Secret (can be used to allow revoking a token by deleting the Secret)
Node (in v1.30, creating new node-bound tokens is alpha, using existing node-bound tokens is beta)
When a token is bound to an object, the object's metadata.name and metadata.uid are
stored as extra 'private claims' in the issued JWT.
When a bound token is presented to the kube-apiserver, the service account authenticator
will extract and verify these claims.
If the referenced object or the ServiceAccount is pending deletion (for example, due to finalizers),
then for any instant that is 60 seconds (or more) after the .metadata.deletionTimestamp date,
authentication with that token would fail.
If the referenced object no longer exists (or its metadata.uid does not match),
the request will not be authenticated.
Additional metadata in Pod bound tokens
FEATURE STATE:Kubernetes v1.30 [beta]
When a service account token is bound to a Pod object, additional metadata is also
embedded into the token that indicates the value of the bound pod's spec.nodeName field,
and the uid of that Node, if available.
This node information is not verified by the kube-apiserver when the token is used for authentication.
It is included so integrators do not have to fetch Pod or Node API objects to check the associated Node name
and uid when inspecting a JWT.
Verifying and inspecting private claims
The TokenReview API can be used to verify and extract private claims from a token:
First, assume you have a pod named test-pod and a service account named my-sa.
Despite using kubectl create -f to create this resource, and defining it similar to
other resource types in Kubernetes, TokenReview is a special type and the kube-apiserver
does not actually persist the TokenReview object into etcd.
Hence kubectl get tokenreview is not a valid command.
Here's an example of how that looks for a launched Pod:
...- name:kube-api-access-<random-suffix>projected:sources:- serviceAccountToken:path:token# must match the path the app expects- configMap:items:- key:ca.crtpath:ca.crtname:kube-root-ca.crt- downwardAPI:items:- fieldRef:apiVersion:v1fieldPath:metadata.namespacepath:namespace
That manifest snippet defines a projected volume that consists of three sources. In this case,
each source also represents a single path within that volume. The three sources are:
A serviceAccountToken source, that contains a token that the kubelet acquires from kube-apiserver.
The kubelet fetches time-bound tokens using the TokenRequest API. A token served for a TokenRequest expires
either when the pod is deleted or after a defined lifespan (by default, that is 1 hour).
The kubelet also refreshes that token before the token expires.
The token is bound to the specific Pod and has the kube-apiserver as its audience.
This mechanism superseded an earlier mechanism that added a volume based on a Secret,
where the Secret represented the ServiceAccount for the Pod, but did not expire.
A configMap source. The ConfigMap contains a bundle of certificate authority data. Pods can use these
certificates to make sure that they are connecting to your cluster's kube-apiserver (and not to middlebox
or an accidentally misconfigured peer).
A downwardAPI source that looks up the name of the namespace containing the Pod, and makes
that name information available to application code running inside the Pod.
Any container within the Pod that mounts this particular volume can access the above information.
Note:
There is no specific mechanism to invalidate a token issued via TokenRequest. If you no longer
trust a bound service account token for a Pod, you can delete that Pod. Deleting a Pod expires
its bound service account tokens.
Manual Secret management for ServiceAccounts
Versions of Kubernetes before v1.22 automatically created credentials for accessing
the Kubernetes API. This older mechanism was based on creating token Secrets that
could then be mounted into running Pods.
In more recent versions, including Kubernetes v1.31, API credentials
are obtained directly using the
TokenRequest API,
and are mounted into Pods using a projected volume.
The tokens obtained using this method have bounded lifetimes, and are automatically
invalidated when the Pod they are mounted into is deleted.
You can still manually create a Secret to hold a service account token; for example, if you need a token that never expires.
Once you manually create a Secret and link it to a ServiceAccount, the Kubernetes control plane automatically populates the token into that Secret.
Note:
Although the manual mechanism for creating a long-lived ServiceAccount token exists,
using TokenRequest
to obtain short-lived API access tokens is recommended instead.
Auto-generated legacy ServiceAccount token clean up
Before version 1.24, Kubernetes automatically generated Secret-based tokens for
ServiceAccounts. To distinguish between automatically generated tokens and
manually created ones, Kubernetes checks for a reference from the
ServiceAccount's secrets field. If the Secret is referenced in the secrets
field, it is considered an auto-generated legacy token. Otherwise, it is
considered a manually created legacy token. For example:
apiVersion:v1kind:ServiceAccountmetadata:name:build-robotnamespace:defaultsecrets:- name:build-robot-secret# usually NOT present for a manually generated token
Beginning from version 1.29, legacy ServiceAccount tokens that were generated
automatically will be marked as invalid if they remain unused for a certain
period of time (set to default at one year). Tokens that continue to be unused
for this defined period (again, by default, one year) will subsequently be
purged by the control plane.
If users use an invalidated auto-generated token, the token validator will
add an audit annotation for the key-value pair
authentication.k8s.io/legacy-token-invalidated: <secret name>/<namespace>,
increment the invalid_legacy_auto_token_uses_total metric count,
update the Secret label kubernetes.io/legacy-token-last-used with the new
date,
return an error indicating that the token has been invalidated.
When receiving this validation error, users can update the Secret to remove the
kubernetes.io/legacy-token-invalid-since label to temporarily allow use of
this token.
Here's an example of an auto-generated legacy token that has been marked with the
kubernetes.io/legacy-token-last-used and kubernetes.io/legacy-token-invalid-since
labels:
A ServiceAccount controller manages the ServiceAccounts inside namespaces, and
ensures a ServiceAccount named "default" exists in every active namespace.
Token controller
The service account token controller runs as part of kube-controller-manager.
This controller acts asynchronously. It:
watches for ServiceAccount deletion and deletes all corresponding ServiceAccount
token Secrets.
watches for ServiceAccount token Secret addition, and ensures the referenced
ServiceAccount exists, and adds a token to the Secret if needed.
watches for Secret deletion and removes a reference from the corresponding
ServiceAccount if needed.
You must pass a service account private key file to the token controller in
the kube-controller-manager using the --service-account-private-key-file
flag. The private key is used to sign generated service account tokens.
Similarly, you must pass the corresponding public key to the kube-apiserver
using the --service-account-key-file flag. The public key will be used to
verify the tokens during authentication.
ServiceAccount admission controller
The modification of pods is implemented via a plugin
called an Admission Controller.
It is part of the API server.
This admission controller acts synchronously to modify pods as they are created.
When this plugin is active (and it is by default on most distributions), then
it does the following when a Pod is created:
If the pod does not have a .spec.serviceAccountName set, the admission controller sets the name of the
ServiceAccount for this incoming Pod to default.
The admission controller ensures that the ServiceAccount referenced by the incoming Pod exists. If there
is no ServiceAccount with a matching name, the admission controller rejects the incoming Pod. That check
applies even for the default ServiceAccount.
Provided that neither the ServiceAccount's automountServiceAccountToken field nor the
Pod's automountServiceAccountToken field is set to false:
the admission controller mutates the incoming Pod, adding an extra
volume that contains
a token for API access.
the admission controller adds a volumeMount to each container in the Pod,
skipping any containers that already have a volume mount defined for the path
/var/run/secrets/kubernetes.io/serviceaccount.
For Linux containers, that volume is mounted at /var/run/secrets/kubernetes.io/serviceaccount;
on Windows nodes, the mount is at the equivalent path.
If the spec of the incoming Pod doesn't already contain any imagePullSecrets, then the
admission controller adds imagePullSecrets, copying them from the ServiceAccount.
Legacy ServiceAccount token tracking controller
FEATURE STATE:Kubernetes v1.28 [stable]
This controller generates a ConfigMap called
kube-system/kube-apiserver-legacy-service-account-token-tracking in the
kube-system namespace. The ConfigMap records the timestamp when legacy service
account tokens began to be monitored by the system.
Legacy ServiceAccount token cleaner
FEATURE STATE:Kubernetes v1.30 [stable]
The legacy ServiceAccount token cleaner runs as part of the
kube-controller-manager and checks every 24 hours to see if any auto-generated
legacy ServiceAccount token has not been used in a specified amount of time.
If so, the cleaner marks those tokens as invalid.
The cleaner works by first checking the ConfigMap created by the control plane
(provided that LegacyServiceAccountTokenTracking is enabled). If the current
time is a specified amount of time after the date in the ConfigMap, the
cleaner then loops through the list of Secrets in the cluster and evaluates each
Secret that has the type kubernetes.io/service-account-token.
If a Secret meets all of the following conditions, the cleaner marks it as
invalid:
The Secret is auto-generated, meaning that it is bi-directionally referenced
by a ServiceAccount.
The Secret is not currently mounted by any pods.
The Secret has not been used in a specified amount of time since it was
created or since it was last used.
The cleaner marks a Secret invalid by adding a label called
kubernetes.io/legacy-token-invalid-since to the Secret, with the current date
as the value. If an invalid Secret is not used in a specified amount of time,
the cleaner will delete it.
Note:
All the specified amount of time above defaults to one year. The cluster
administrator can configure this value through the
--legacy-service-account-token-clean-up-period command line argument for the
kube-controller-manager component.
TokenRequest API
FEATURE STATE:Kubernetes v1.22 [stable]
You use the TokenRequest
subresource of a ServiceAccount to obtain a time-bound token for that ServiceAccount.
You don't need to call this to obtain an API token for use within a container, since
the kubelet sets this up for you using a projected volume.
The Kubernetes control plane (specifically, the ServiceAccount admission controller)
adds a projected volume to Pods, and the kubelet ensures that this volume contains a token
that lets containers authenticate as the right ServiceAccount.
(This mechanism superseded an earlier mechanism that added a volume based on a Secret,
where the Secret represented the ServiceAccount for the Pod but did not expire.)
Here's an example of how that looks for a launched Pod:
That manifest snippet defines a projected volume that combines information from three sources:
A serviceAccountToken source, that contains a token that the kubelet acquires from kube-apiserver.
The kubelet fetches time-bound tokens using the TokenRequest API. A token served for a TokenRequest expires
either when the pod is deleted or after a defined lifespan (by default, that is 1 hour).
The token is bound to the specific Pod and has the kube-apiserver as its audience.
A configMap source. The ConfigMap contains a bundle of certificate authority data. Pods can use these
certificates to make sure that they are connecting to your cluster's kube-apiserver (and not to middlebox
or an accidentally misconfigured peer).
A downwardAPI source. This downwardAPI volume makes the name of the namespace containing the Pod available
to application code running inside the Pod.
Any container within the Pod that mounts this volume can access the above information.
Create additional API tokens
Caution:
Only create long-lived API tokens if the token request mechanism
is not suitable. The token request mechanism provides time-limited tokens; because these
expire, they represent a lower risk to information security.
To create a non-expiring, persisted API token for a ServiceAccount, create a
Secret of type kubernetes.io/service-account-token with an annotation
referencing the ServiceAccount. The control plane then generates a long-lived token and
updates that Secret with that generated token data.
If you launch a new Pod into the examplens namespace, it can use the myserviceaccount
service-account-token Secret that you just created.
Caution:
Do not reference manually created Secrets in the secrets field of a
ServiceAccount. Or the manually created Secrets will be cleaned if it is not used for a long
time. Please refer to auto-generated legacy ServiceAccount token clean up.
Delete/invalidate a ServiceAccount token
If you know the name of the Secret that contains the token you want to remove:
kubectl delete secret name-of-secret
Otherwise, first find the Secret for the ServiceAccount.
# This assumes that you already have a namespace named 'examplens'kubectl -n examplens get serviceaccount/example-automated-thing -o yaml
3.11 - Certificates and Certificate Signing Requests
Kubernetes certificate and trust bundle APIs enable automation of
X.509 credential provisioning by providing
a programmatic interface for clients of the Kubernetes API to request and obtain
X.509 certificates from a Certificate Authority (CA).
There is also experimental (alpha) support for distributing trust bundles.
Certificate signing requests
FEATURE STATE:Kubernetes v1.19 [stable]
A CertificateSigningRequest
(CSR) resource is used to request that a certificate be signed
by a denoted signer, after which the request may be approved or denied before
finally being signed.
Request signing process
The CertificateSigningRequest resource type allows a client to ask for an X.509 certificate
be issued, based on a signing request.
The CertificateSigningRequest object includes a PEM-encoded PKCS#10 signing request in
the spec.request field. The CertificateSigningRequest denotes the signer (the
recipient that the request is being made to) using the spec.signerName field.
Note that spec.signerName is a required key after API version certificates.k8s.io/v1.
In Kubernetes v1.22 and later, clients may optionally set the spec.expirationSeconds
field to request a particular lifetime for the issued certificate. The minimum valid
value for this field is 600, i.e. ten minutes.
Once created, a CertificateSigningRequest must be approved before it can be signed.
Depending on the signer selected, a CertificateSigningRequest may be automatically approved
by a controller.
Otherwise, a CertificateSigningRequest must be manually approved either via the REST API (or client-go)
or by running kubectl certificate approve. Likewise, a CertificateSigningRequest may also be denied,
which tells the configured signer that it must not sign the request.
For certificates that have been approved, the next step is signing. The relevant signing controller
first validates that the signing conditions are met and then creates a certificate.
The signing controller then updates the CertificateSigningRequest, storing the new certificate into
the status.certificate field of the existing CertificateSigningRequest object. The
status.certificate field is either empty or contains a X.509 certificate, encoded in PEM format.
The CertificateSigningRequest status.certificate field is empty until the signer does this.
Once the status.certificate field has been populated, the request has been completed and clients can now
fetch the signed certificate PEM data from the CertificateSigningRequest resource.
The signers can instead deny certificate signing if the approval conditions are not met.
In order to reduce the number of old CertificateSigningRequest resources left in a cluster, a garbage collection
controller runs periodically. The garbage collection removes CertificateSigningRequests that have not changed
state for some duration:
Approved requests: automatically deleted after 1 hour
Denied requests: automatically deleted after 1 hour
Failed requests: automatically deleted after 1 hour
Pending requests: automatically deleted after 24 hours
All requests: automatically deleted after the issued certificate has expired
Certificate signing authorization
To allow creating a CertificateSigningRequest and retrieving any CertificateSigningRequest:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:csr-approverrules:- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequestsverbs:- get- list- watch- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequests/approvalverbs:- update- apiGroups:- certificates.k8s.ioresources:- signersresourceNames:- example.com/my-signer-name# example.com/* can be used to authorize for all signers in the 'example.com' domainverbs:- approve
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:csr-signerrules:- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequestsverbs:- get- list- watch- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequests/statusverbs:- update- apiGroups:- certificates.k8s.ioresources:- signersresourceNames:- example.com/my-signer-name# example.com/* can be used to authorize for all signers in the 'example.com' domainverbs:- sign
Signers
Signers abstractly represent the entity or entities that might sign, or have
signed, a security certificate.
Any signer that is made available for outside a particular cluster should provide information
about how the signer works, so that consumers can understand what that means for CertifcateSigningRequests
and (if enabled) ClusterTrustBundles.
This includes:
Trust distribution: how trust anchors (CA certificates or certificate bundles) are distributed.
Permitted subjects: any restrictions on and behavior when a disallowed subject is requested.
Permitted x509 extensions: including IP subjectAltNames, DNS subjectAltNames,
Email subjectAltNames, URI subjectAltNames etc, and behavior when a disallowed extension is requested.
Permitted key usages / extended key usages: any restrictions on and behavior
when usages different than the signer-determined usages are specified in the CSR.
Expiration/certificate lifetime: whether it is fixed by the signer, configurable by the admin, determined by the CSR spec.expirationSeconds field, etc
and the behavior when the signer-determined expiration is different from the CSR spec.expirationSeconds field.
CA bit allowed/disallowed: and behavior if a CSR contains a request a for a CA certificate when the signer does not permit it.
Commonly, the status.certificate field of a CertificateSigningRequest contains a
single PEM-encoded X.509 certificate once the CSR is approved and the certificate is issued.
Some signers store multiple certificates into the status.certificate field. In
that case, the documentation for the signer should specify the meaning of
additional certificates; for example, this might be the certificate plus
intermediates to be presented during TLS handshakes.
If you want to make the trust anchor (root certificate) available, this should be done
separately from a CertificateSigningRequest and its status.certificate field. For example,
you could use a ClusterTrustBundle.
The PKCS#10 signing request format does not have a standard mechanism to specify a
certificate expiration or lifetime. The expiration or lifetime therefore has to be set
through the spec.expirationSeconds field of the CSR object. The built-in signers
use the ClusterSigningDuration configuration option, which defaults to 1 year,
(the --cluster-signing-duration command-line flag of the kube-controller-manager)
as the default when no spec.expirationSeconds is specified. When spec.expirationSeconds
is specified, the minimum of spec.expirationSeconds and ClusterSigningDuration is
used.
Note:
The spec.expirationSeconds field was added in Kubernetes v1.22. Earlier versions of Kubernetes do not honor this field.
Kubernetes API servers prior to v1.22 will silently drop this field when the object is created.
Kubernetes signers
Kubernetes provides built-in signers that each have a well-known signerName:
kubernetes.io/kube-apiserver-client: signs certificates that will be honored as client certificates by the API server.
Never auto-approved by kube-controller-manager.
Trust distribution: signed certificates must be honored as client certificates by the API server. The CA bundle is not distributed by any other means.
Permitted subjects - no subject restrictions, but approvers and signers may choose not to approve or sign.
Certain subjects like cluster-admin level users or groups vary between distributions and installations,
but deserve additional scrutiny before approval and signing.
The CertificateSubjectRestriction admission plugin is enabled by default to restrict system:masters,
but it is often not the only cluster-admin subject in a cluster.
Permitted x509 extensions - honors subjectAltName and key usage extensions and discards other extensions.
Permitted key usages - must include ["client auth"]. Must not include key usages beyond ["digital signature", "key encipherment", "client auth"].
Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the --cluster-signing-duration option or, if specified, the spec.expirationSeconds field of the CSR object.
CA bit allowed/disallowed - not allowed.
kubernetes.io/kube-apiserver-client-kubelet: signs client certificates that will be honored as client certificates by the
API server.
May be auto-approved by kube-controller-manager.
Trust distribution: signed certificates must be honored as client certificates by the API server. The CA bundle
is not distributed by any other means.
Permitted subjects - organizations are exactly ["system:nodes"], common name is "system:node:${NODE_NAME}".
Permitted x509 extensions - honors key usage extensions, forbids subjectAltName extensions and drops other extensions.
Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the --cluster-signing-duration option or, if specified, the spec.expirationSeconds field of the CSR object.
CA bit allowed/disallowed - not allowed.
kubernetes.io/kubelet-serving: signs serving certificates that are honored as a valid kubelet serving certificate
by the API server, but has no other guarantees.
Never auto-approved by kube-controller-manager.
Trust distribution: signed certificates must be honored by the API server as valid to terminate connections to a kubelet.
The CA bundle is not distributed by any other means.
Permitted subjects - organizations are exactly ["system:nodes"], common name is "system:node:${NODE_NAME}".
Permitted x509 extensions - honors key usage and DNSName/IPAddress subjectAltName extensions, forbids EmailAddress and
URI subjectAltName extensions, drops other extensions. At least one DNS or IP subjectAltName must be present.
Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the --cluster-signing-duration option or, if specified, the spec.expirationSeconds field of the CSR object.
CA bit allowed/disallowed - not allowed.
kubernetes.io/legacy-unknown: has no guarantees for trust at all. Some third-party distributions of Kubernetes
may honor client certificates signed by it. The stable CertificateSigningRequest API (version certificates.k8s.io/v1 and later)
does not allow to set the signerName as kubernetes.io/legacy-unknown.
Never auto-approved by kube-controller-manager.
Trust distribution: None. There is no standard trust or distribution for this signer in a Kubernetes cluster.
Permitted subjects - any
Permitted x509 extensions - honors subjectAltName and key usage extensions and discards other extensions.
Permitted key usages - any
Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the --cluster-signing-duration option or, if specified, the spec.expirationSeconds field of the CSR object.
CA bit allowed/disallowed - not allowed.
The kube-controller-manager implements control plane signing for each of the built in
signers. Failures for all of these are only reported in kube-controller-manager logs.
Note:
The spec.expirationSeconds field was added in Kubernetes v1.22. Earlier versions of Kubernetes do not honor this field.
Kubernetes API servers prior to v1.22 will silently drop this field when the object is created.
Distribution of trust happens out of band for these signers. Any trust outside of those described above are strictly
coincidental. For instance, some distributions may honor kubernetes.io/legacy-unknown as client certificates for the
kube-apiserver, but this is not a standard.
None of these usages are related to ServiceAccount token secrets .data[ca.crt] in any way. That CA bundle is only
guaranteed to verify a connection to the API server using the default service (kubernetes.default.svc).
Custom signers
You can also introduce your own custom signer, which should have a similar prefixed name but using your
own domain name. For example, if you represent an open source project that uses the domain open-fictional.example
then you might use issuer.open-fictional.example/service-mesh as a signer name.
A custom signer uses the Kubernetes API to issue a certificate. See API-based signers.
Signing
Control plane signer
The Kubernetes control plane implements each of the
Kubernetes signers,
as part of the kube-controller-manager.
Note:
Prior to Kubernetes v1.18, the kube-controller-manager would sign any CSRs that
were marked as approved.
Note:
The spec.expirationSeconds field was added in Kubernetes v1.22.
Earlier versions of Kubernetes do not honor this field.
Kubernetes API servers prior to v1.22 will silently drop this field when the object is created.
API-based signers
Users of the REST API can sign CSRs by submitting an UPDATE request to the status
subresource of the CSR to be signed.
As part of this request, the status.certificate field should be set to contain the
signed certificate. This field contains one or more PEM-encoded certificates.
All PEM blocks must have the "CERTIFICATE" label, contain no headers,
and the encoded data must be a BER-encoded ASN.1 Certificate structure
as described in section 4 of RFC5280.
Non-PEM content may appear before or after the CERTIFICATE PEM blocks and is unvalidated,
to allow for explanatory text as described in section 5.2 of RFC7468.
When encoded in JSON or YAML, this field is base-64 encoded.
A CertificateSigningRequest containing the example certificate above would look like this:
Before a signer issues a certificate based on a CertificateSigningRequest,
the signer typically checks that the issuance for that CSR has been approved.
Control plane automated approval
The kube-controller-manager ships with a built-in approver for certificates with
a signerName of kubernetes.io/kube-apiserver-client-kubelet that delegates various
permissions on CSRs for node credentials to authorization.
The kube-controller-manager POSTs SubjectAccessReview resources to the API server
in order to check authorization for certificate approval.
Approval or rejection using kubectl
A Kubernetes administrator (with appropriate permissions) can manually approve
(or deny) CertificateSigningRequests by using the kubectl certificate approve and kubectl certificate deny commands.
Users of the REST API can approve CSRs by submitting an UPDATE request to the approval
subresource of the CSR to be approved. For example, you could write an
operator that watches for a particular
kind of CSR and then sends an UPDATE to approve them.
When you make an approval or rejection request, set either the Approved or Denied
status condition based on the state you determine:
For Approved CSRs:
apiVersion:certificates.k8s.io/v1kind:CertificateSigningRequest...status:conditions:- lastUpdateTime:"2020-02-08T11:37:35Z"lastTransitionTime:"2020-02-08T11:37:35Z"message:Approved by my custom approver controllerreason:ApprovedByMyPolicy# You can set this to any stringtype:Approved
For Denied CSRs:
apiVersion:certificates.k8s.io/v1kind:CertificateSigningRequest...status:conditions:- lastUpdateTime:"2020-02-08T11:37:35Z"lastTransitionTime:"2020-02-08T11:37:35Z"message:Denied by my custom approver controllerreason:DeniedByMyPolicy# You can set this to any stringtype:Denied
It's usual to set status.conditions.reason to a machine-friendly reason
code using TitleCase; this is a convention but you can set it to anything
you like. If you want to add a note for human consumption, use the
status.conditions.message field.
Cluster trust bundles
FEATURE STATE:Kubernetes v1.27 [alpha]
Note:
In Kubernetes 1.31, you must enable the ClusterTrustBundlefeature gateand the certificates.k8s.io/v1alpha1API group in order to use
this API.
A ClusterTrustBundles is a cluster-scoped object for distributing X.509 trust
anchors (root certificates) to workloads within the cluster. They're designed
to work well with the signer concept from CertificateSigningRequests.
All ClusterTrustBundle objects have strong validation on the contents of their
trustBundle field. That field must contain one or more X.509 certificates,
DER-serialized, each wrapped in a PEM CERTIFICATE block. The certificates
must parse as valid X.509 certificates.
Esoteric PEM features like inter-block data and intra-block headers are either
rejected during object validation, or can be ignored by consumers of the object.
Additionally, consumers are allowed to reorder the certificates in
the bundle with their own arbitrary but stable ordering.
ClusterTrustBundle objects should be considered world-readable within the
cluster. If your cluster uses RBAC
authorization, all ServiceAccounts have a default grant that allows them to
get, list, and watch all ClusterTrustBundle objects.
If you use your own authorization mechanism and you have enabled
ClusterTrustBundles in your cluster, you should set up an equivalent rule to
make these objects public within the cluster, so that they work as intended.
If you do not have permission to list cluster trust bundles by default in your
cluster, you can impersonate a service account you have access to in order to
see available ClusterTrustBundles:
kubectl get clustertrustbundles --as='system:serviceaccount:mynamespace:default'
Signer-linked ClusterTrustBundles
Signer-linked ClusterTrustBundles are associated with a signer name, like this:
apiVersion:certificates.k8s.io/v1alpha1kind:ClusterTrustBundlemetadata:name:example.com:mysigner:foospec:signerName:example.com/mysignertrustBundle:"<... PEM data ...>"
These ClusterTrustBundles are intended to be maintained by a signer-specific
controller in the cluster, so they have several security features:
To create or update a signer-linked ClusterTrustBundle, you must be permitted
to attest on the signer (custom authorization verb attest,
API group certificates.k8s.io; resource path signers). You can configure
authorization for the specific resource name
<signerNameDomain>/<signerNamePath> or match a pattern such as
<signerNameDomain>/*.
Signer-linked ClusterTrustBundles must be named with a prefix derived from
their spec.signerName field. Slashes (/) are replaced with colons (:),
and a final colon is appended. This is followed by an arbitrary name. For
example, the signer example.com/mysigner can be linked to a
ClusterTrustBundle example.com:mysigner:<arbitrary-name>.
Signer-linked ClusterTrustBundles will typically be consumed in workloads
by a combination of a
field selector on the signer name, and a separate
label selector.
Signer-unlinked ClusterTrustBundles
Signer-unlinked ClusterTrustBundles have an empty spec.signerName field, like this:
apiVersion:certificates.k8s.io/v1alpha1kind:ClusterTrustBundlemetadata:name:foospec:# no signerName specified, so the field is blanktrustBundle:"<... PEM data ...>"
They are primarily intended for cluster configuration use cases.
Each signer-unlinked ClusterTrustBundle is an independent object, in contrast to the
customary grouping behavior of signer-linked ClusterTrustBundles.
Signer-unlinked ClusterTrustBundles have no attest verb requirement.
Instead, you control access to them directly using the usual mechanisms,
such as role-based access control.
To distinguish them from signer-linked ClusterTrustBundles, the names of
signer-unlinked ClusterTrustBundles must not contain a colon (:).
Accessing ClusterTrustBundles from pods
FEATURE STATE:Kubernetes v1.29 [alpha]
The contents of ClusterTrustBundles can be injected into the container filesystem, similar to ConfigMaps and Secrets.
See the clusterTrustBundle projected volume source for more details.
How to issue a certificate for a user
A few steps are required in order to get a normal user to be able to
authenticate and invoke an API. First, this user must have a certificate issued
by the Kubernetes cluster, and then present that certificate to the Kubernetes API.
Create private key
The following scripts show how to generate PKI private key and CSR. It is
important to set CN and O attribute of the CSR. CN is the name of the user and
O is the group that this user will belong to. You can refer to
RBAC for standard groups.