kubeadm - CLI tool to easily provision a secure Kubernetes cluster.
Components
kubelet - The
primary agent that runs on each node. The kubelet takes a set of PodSpecs
and ensures that the described containers are running and healthy.
kube-apiserver -
REST API that validates and configures data for API objects such as pods,
services, replication controllers.
kube-controller-manager -
Daemon that embeds the core control loops shipped with Kubernetes.
kube-proxy - Can
do simple TCP/UDP stream forwarding or round-robin TCP/UDP forwarding across
a set of back-ends.
kube-scheduler -
Scheduler that manages availability, performance, and capacity.
List of ports and protocols that
should be open on control plane and worker nodes
Config APIs
This section hosts the documentation for "unpublished" APIs which are used to
configure kubernetes components or tools. Most of these APIs are not exposed
by the API server in a RESTful way though they are essential for a user or an
operator to use or manage a cluster.
This section provides reference information for the Kubernetes API.
The REST API is the fundamental fabric of Kubernetes. All operations and
communications between components, and external user commands are REST API
calls that the API Server handles. Consequently, everything in the Kubernetes
platform is treated as an API object and has a corresponding entry in the
API.
The JSON and Protobuf serialization schemas follow the same guidelines for
schema changes. The following descriptions cover both formats.
The API versioning and software versioning are indirectly related.
The API and release versioning proposal
describes the relationship between API versioning and software versioning.
Different API versions indicate different levels of stability and support. You
can find more information about the criteria for each level in the
API Changes documentation.
Here's a summary of each level:
Alpha:
The version names contain alpha (for example, v1alpha1).
Built-in alpha API versions are disabled by default and must be explicitly enabled in the kube-apiserver configuration to be used.
The software may contain bugs. Enabling a feature may expose bugs.
Support for an alpha API may be dropped at any time without notice.
The API may change in incompatible ways in a later software release without notice.
The software is recommended for use only in short-lived testing clusters,
due to increased risk of bugs and lack of long-term support.
Beta:
The version names contain beta (for example, v2beta3).
Built-in beta API versions are disabled by default and must be explicitly enabled in the kube-apiserver configuration to be used
(except for beta versions of APIs introduced prior to Kubernetes 1.22, which were enabled by default).
Built-in beta API versions have a maximum lifetime of 9 months or 3 minor releases (whichever is longer) from introduction
to deprecation, and 9 months or 3 minor releases (whichever is longer) from deprecation to removal.
The software is well tested. Enabling a feature is considered safe.
The support for a feature will not be dropped, though the details may change.
The schema and/or semantics of objects may change in incompatible ways in
a subsequent beta or stable API version. When this happens, migration
instructions are provided. Adapting to a subsequent beta or stable API version
may require editing or re-creating API objects, and may not be straightforward.
The migration may require downtime for applications that rely on the feature.
The software is not recommended for production uses. Subsequent releases
may introduce incompatible changes. Use of beta API versions is
required to transition to subsequent beta or stable API versions
once the beta API version is deprecated and no longer served.
Note:
Please try beta features and provide feedback. After the features exit beta, it
may not be practical to make more changes.
Stable:
The version name is vX where X is an integer.
Stable API versions remain available for all future releases within a Kubernetes major version,
and there are no current plans for a major version revision of Kubernetes that removes stable APIs.
API groups
API groups
make it easier to extend the Kubernetes API.
The API group is specified in a REST path and in the apiVersion field of a
serialized object.
There are several API groups in Kubernetes:
The core (also called legacy) group is found at REST path /api/v1.
The core group is not specified as part of the apiVersion field, for
example, apiVersion: v1.
The named groups are at REST path /apis/$GROUP_NAME/$VERSION and use
apiVersion: $GROUP_NAME/$VERSION (for example, apiVersion: batch/v1).
You can find the full list of supported API groups in
Kubernetes API reference.
Enabling or disabling API groups
Certain resources and API groups are enabled by default. You can enable or
disable them by setting --runtime-config on the API server. The
--runtime-config flag accepts comma separated <key>[=<value>] pairs
describing the runtime configuration of the API server. If the =<value>
part is omitted, it is treated as if =true is specified. For example:
to disable batch/v1, set --runtime-config=batch/v1=false
to enable batch/v2alpha1, set --runtime-config=batch/v2alpha1
to enable a specific version of an API, such as storage.k8s.io/v1beta1/csistoragecapacities, set --runtime-config=storage.k8s.io/v1beta1/csistoragecapacities
Note:
When you enable or disable groups or resources, you need to restart the API
server and controller manager to pick up the --runtime-config changes.
Persistence
Kubernetes stores its serialized state in terms of the API resources by writing them into
etcd.
Kubernetes 1.33 includes optional declarative validation for APIs. When enabled, the Kubernetes API server can use this mechanism rather than the legacy approach that relies on hand-written Go
code (validation.go files) to ensure that requests against the API are valid.
Kubernetes developers, and people extending the Kubernetes API,
can define validation rules directly alongside the API type definitions (types.go files). Code authors define
pecial comment tags (e.g., +k8s:minimum=0). A code generator (validation-gen) then uses these tags to produce
optimized Go code for API validation.
While primarily a feature impacting Kubernetes contributors and potentially developers of extension API servers, cluster administrators should understand its behavior, especially during its rollout phases.
Declarative validation is being rolled out gradually.
In Kubernetes 1.33, the APIs that use declarative validation include:
For the beta of this feature, Kubernetes is intentionally using a superseded API as a test bed for the change.
Future Kubernetes releases may roll this out to more APIs.
DeclarativeValidation: (Beta, Default: true) When enabled, the API server runs both the new declarative validation and the old hand-written validation for migrated types/fields. The results are compared internally.
DeclarativeValidationTakeover: (Beta, Default: false) This gate determines which validation result is authoritative (i.e., returned to the user and used for admission decisions).
Default Behavior (Kubernetes 1.33):
With DeclarativeValidation=true and DeclarativeValidationTakeover=false (the default values for the gates), both validation systems run.
The results of the hand-written validation are used. The declarative validation runs in a mismatch mode for comparison.
Mismatches between the two validation systems are logged by the API server and increment the declarative_validation_mismatch_total metric. This helps developers identify and fix discrepancies during the Beta phase.
Cluster upgrades should be safe regarding this feature, as the authoritative validation logic doesn't change by default.
Administrators can choose to explicitly enable DeclarativeValidationTakeover=true to make the declarative validation authoritative for migrated fields, typically after verifying stability in their environment (e.g., by monitoring the mismatch metric).
Disabling declarative validation
As a cluster administrator, you might consider disabling declarative validation whilst it is still beta, under specific circumstances:
Unexpected Validation Behavior: If enabling DeclarativeValidationTakeover leads to unexpected validation errors or allows objects that were previously invalid.
Performance Regressions: If monitoring indicates significant latency increases (e.g., in apiserver_request_duration_seconds) correlated with the feature's enablement.
High Mismatch Rate: If the declarative_validation_mismatch_total metric shows frequent mismatches, suggesting potential bugs in the declarative rules affecting the cluster's workloads, even if DeclarativeValidationTakeover is false.
To revert to only using hand-written validation (as used before Kubernetes v1.33), disable the DeclarativeValidation feature gate, for example
via command-line arguments: (--feature-gates=DeclarativeValidation=false). This also implicitly disables the effect of DeclarativeValidationTakeover.
Considerations for downgrade and rollback
Disabling the feature acts as a safety mechanism. However, be aware of a potential edge case (considered unlikely due to extensive testing): If a bug in declarative validation (when DeclarativeValidationTakeover=true) incorrectly allowed an invalid object to be persisted, disabling the feature gates might then cause subsequent updates to that specific object to be blocked by the now-authoritative (and correct) hand-written validation. Resolving this might require manual correction of the stored object, potentially via direct etcd modification in rare cases.
For details on managing feature gates, see Feature Gates.
2.2 - Kubernetes API Concepts
The Kubernetes API is a resource-based (RESTful) programmatic interface
provided via HTTP. It supports retrieving, creating, updating, and deleting
primary resources via the standard HTTP verbs (POST, PUT, PATCH, DELETE,
GET).
For some resources, the API includes additional subresources that allow
fine-grained authorization (such as separate views for Pod details and
log retrievals), and can accept and serve those resources in different
representations for convenience or efficiency.
Kubernetes supports efficient change notifications on resources via
watches:
in the Kubernetes API, watch is a verb that is used to track changes to an object in Kubernetes as a stream.
It is used for the efficient detection of changes.
Kubernetes also provides consistent list operations so that API clients can
effectively cache, track, and synchronize the state of resources.
You can view the API reference online,
or read on to learn about the API in general.
Kubernetes API terminology
Kubernetes generally leverages common RESTful terminology to describe the
API concepts:
A resource type is the name used in the URL (pods, namespaces, services)
All resource types have a concrete representation (their object schema) which is called a kind
A list of instances of a resource type is known as a collection
A single instance of a resource type is called a resource, and also usually represents an object
For some resource types, the API includes one or more sub-resources, which are represented as URI paths below the resource
Most Kubernetes API resource types are
objects –
they represent a concrete instance of a concept on the cluster, like a
pod or namespace. A smaller number of API resource types are virtual in
that they often represent operations on objects, rather than objects, such
as a permission check
(use a POST with a JSON-encoded body of SubjectAccessReview to the
subjectaccessreviews resource), or the eviction sub-resource of a Pod
(used to trigger
API-initiated eviction).
Object names
All objects you can create via the API have a unique object
name to allow idempotent creation and
retrieval, except that virtual resource types may not have unique names if they are
not retrievable, or do not rely on idempotency.
Within a namespace, only one object
of a given kind can have a given name at a time. However, if you delete the object,
you can make a new object with the same name. Some objects are not namespaced (for
example: Nodes), and so their names must be unique across the whole cluster.
API verbs
Almost all object resource types support the standard HTTP verbs - GET, POST, PUT, PATCH,
and DELETE. Kubernetes also uses its own verbs, which are often written in lowercase to distinguish
them from HTTP verbs.
Kubernetes uses the term list to describe the action of returning a collection of
resources, to distinguish it from retrieving a single resource which is usually called
a get. If you sent an HTTP GET request with the ?watch query parameter,
Kubernetes calls this a watch and not a get
(see Efficient detection of changes for more details).
For PUT requests, Kubernetes internally classifies these as either create or update
based on the state of the existing object. An update is different from a patch; the
HTTP verb for a patch is PATCH.
Resource URIs
All resource types are either scoped by the cluster (/apis/GROUP/VERSION/*) or to a
namespace (/apis/GROUP/VERSION/namespaces/NAMESPACE/*). A namespace-scoped resource
type will be deleted when its namespace is deleted and access to that resource type
is controlled by authorization checks on the namespace scope.
Note: core resources use /api instead of /apis and omit the GROUP path segment.
You can also access collections of resources (for example: listing all Nodes).
The following paths are used to retrieve collections and resources:
Cluster-scoped resources:
GET /apis/GROUP/VERSION/RESOURCETYPE - return the collection of resources of the resource type
GET /apis/GROUP/VERSION/RESOURCETYPE/NAME - return the resource with NAME under the resource type
Namespace-scoped resources:
GET /apis/GROUP/VERSION/RESOURCETYPE - return the collection of all
instances of the resource type across all namespaces
GET /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE - return
collection of all instances of the resource type in NAMESPACE
GET /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE/NAME -
return the instance of the resource type with NAME in NAMESPACE
Since a namespace is a cluster-scoped resource type, you can retrieve the list
(“collection”) of all namespaces with GET /api/v1/namespaces and details about
a particular namespace with GET /api/v1/namespaces/NAME.
Cluster-scoped subresource: GET /apis/GROUP/VERSION/RESOURCETYPE/NAME/SUBRESOURCE
Namespace-scoped subresource: GET /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE/NAME/SUBRESOURCE
The verbs supported for each subresource will differ depending on the object -
see the API reference for more information. It
is not possible to access sub-resources across multiple resources - generally a new
virtual resource type would be used if that becomes necessary.
HTTP media types
Over HTTP, Kubernetes supports JSON and Protobuf wire encodings.
By default, Kubernetes returns objects in JSON serialization, using the
application/json media type. Although JSON is the default, clients may request a response in
YAML, or use the more efficient binary Protobuf representation for better performance at scale.
The Kubernetes API implements standard HTTP content type negotiation: passing an
Accept header with a GET call will request that the server tries to return
a response in your preferred media type. If you want to send an object in Protobuf to
the server for a PUT or POST request, you must set the Content-Type request header
appropriately.
If you request an available media type, the API server returns a response with a suitable
Content-Type; if none of the media types you request are supported, the API server returns
a 406 Not acceptable error message.
All built-in resource types support the application/json media type.
JSON resource encoding
The Kubernetes API defaults to using JSON for encoding
HTTP message bodies.
For example:
List all of the pods on a cluster, without specifying a preferred format
GET /api/v1/pods
200 OK
Content-Type: application/json
… JSON encoded collection of Pods (PodList object)
Create a pod by sending JSON to the server, requesting a JSON response.
POST /api/v1/namespaces/test/pods
Content-Type: application/json
Accept: application/json
… JSON encoded Pod object
Kubernetes also supports the application/yaml
media type for both requests and responses. YAML
can be used for defining Kubernetes manifests and API interactions.
For example:
List all of the pods on a cluster in YAML format
GET /api/v1/pods
Accept: application/yaml
200 OK
Content-Type: application/yaml
… YAML encoded collection of Pods (PodList object)
Create a pod by sending YAML-encoded data to the server, requesting a YAML response:
POST /api/v1/namespaces/test/pods
Content-Type: application/yaml
Accept: application/yaml
… YAML encoded Pod object
200 OK
Content-Type: application/yaml
apiVersion: v1
kind: Pod
metadata:
name: my-pod
…
Kubernetes Protobuf encoding
Kubernetes uses an envelope wrapper to encode Protobuf responses.
That wrapper starts with a 4 byte magic number to help identify content in disk or in etcd as Protobuf
(as opposed to JSON). The 4 byte magic number data is followed by a Protobuf encoded wrapper message, which
describes the encoding and type of the underlying object. Within the Protobuf wrapper message,
the inner object data is recorded using the raw field of Unknown (see the IDL
for more detail).
For example:
List all of the pods on a cluster in Protobuf format.
GET /api/v1/pods
Accept: application/vnd.kubernetes.protobuf
200 OK
Content-Type: application/vnd.kubernetes.protobuf
… JSON encoded collection of Pods (PodList object)
Create a pod by sending Protobuf encoded data to the server, but request a response
in JSON.
POST /api/v1/namespaces/test/pods
Content-Type: application/vnd.kubernetes.protobuf
Accept: application/json
… binary encoded Pod object
You can use both techniques together and use Kubernetes' Protobuf encoding to interact with any API that
supports it, for both reads and writes. Only some API resource types are compatible
with Protobuf.
As a client, if you might need to work with extension types you should specify multiple
content types in the request Accept header to support fallback to JSON.
For example:
FEATURE STATE:Kubernetes v1.32 [alpha] (enabled by default: false)
With the CBORServingAndStoragefeature gate
enabled, request and response bodies for all built-in resource types and all resources defined by a
CustomResourceDefinition may be encoded to the
CBOR binary data format. CBOR is also supported at the
aggregation layer if it is enabled in
individual aggregated API servers.
Clients should indicate the IANA media type application/cbor in the Content-Type HTTP request
header when the request body contains a single CBOR
encoded data item, and in the Accept HTTP request
header when prepared to accept a CBOR encoded data item in the response. API servers will use
application/cbor in the Content-Type HTTP response header when the response body contains a
CBOR-encoded object.
If an API server encodes its response to a watch request using
CBOR, the response body will be a CBOR Sequence and the
Content-Type HTTP response header will use the IANA media type application/cbor-seq. Each entry
of the sequence (if any) is a single CBOR-encoded watch event.
In addition to the existing application/apply-patch+yaml media type for YAML-encoded
server-side apply configurations, API servers that enable CBOR will accept the
application/apply-patch+cbor media type for CBOR-encoded server-side apply configurations. There
is no supported CBOR equivalent for application/json-patch+json or application/merge-patch+json,
or application/strategic-merge-patch+json.
Efficient detection of changes
The Kubernetes API allows clients to make an initial request for an object or a
collection, and then to track changes since that initial request: a watch. Clients
can send a list or a get and then make a follow-up watch request.
To make this change tracking possible, every Kubernetes object has a resourceVersion
field representing the version of that resource as stored in the underlying persistence
layer. When retrieving a collection of resources (either namespace or cluster scoped),
the response from the API server contains a resourceVersion value. The client can
use that resourceVersion to initiate a watch against the API server.
When you send a watch request, the API server responds with a stream of
changes. These changes itemize the outcome of operations (such as create, delete,
and update) that occurred after the resourceVersion you specified as a parameter
to the watch request. The overall watch mechanism allows a client to fetch
the current state and then subscribe to subsequent changes, without missing any events.
If a client watch is disconnected then that client can start a new watch from
the last returned resourceVersion; the client could also perform a fresh get /
list request and begin again. See Resource Version Semantics
for more detail.
For example:
List all of the pods in a given namespace.
GET /api/v1/namespaces/test/pods
---
200 OK
Content-Type: application/json
{
"kind": "PodList",
"apiVersion": "v1",
"metadata": {"resourceVersion":"10245"},
"items": [...]
}
Starting from resource version 10245, receive notifications of any API operations
(such as create, delete, patch or update) that affect Pods in the
test namespace. Each change notification is a JSON document. The HTTP response body
(served as application/json) consists a series of JSON documents.
A given Kubernetes server will only preserve a historical record of changes for a
limited time. Clusters using etcd 3 preserve changes in the last 5 minutes by default.
When the requested watch operations fail because the historical version of that
resource is not available, clients must handle the case by recognizing the status code
410 Gone, clearing their local cache, performing a new get or list operation,
and starting the watch from the resourceVersion that was returned.
For subscribing to collections, Kubernetes client libraries typically offer some form
of standard tool for this list-then-watch logic. (In the Go client library,
this is called a Reflector and is located in the k8s.io/client-go/tools/cache package.)
Watch bookmarks
To mitigate the impact of short history window, the Kubernetes API provides a watch
event named BOOKMARK. It is a special kind of event to mark that all changes up
to a given resourceVersion the client is requesting have already been sent. The
document representing the BOOKMARK event is of the type requested by the request,
but only includes a .metadata.resourceVersion field. For example:
As a client, you can request BOOKMARK events by setting the
allowWatchBookmarks=true query parameter to a watch request, but you shouldn't
assume bookmarks are returned at any specific interval, nor can clients assume that
the API server will send any BOOKMARK event even when requested.
Streaming lists
FEATURE STATE:Kubernetes v1.33 [beta] (enabled by default: false)
On large clusters, retrieving the collection of some resource types may result in
a significant increase of resource usage (primarily RAM) on the control plane.
To alleviate the impact and simplify the user experience of the list + watch
pattern, Kubernetes v1.32 promotes to beta the feature that allows requesting the initial state
(previously requested via the list request) as part of the watch request.
On the client-side the initial state can be requested by specifying sendInitialEvents=true as query string parameter
in a watch request. If set, the API server starts the watch stream with synthetic init
events (of type ADDED) to build the whole state of all existing objects followed by a
BOOKMARK event
(if requested via allowWatchBookmarks=true option). The bookmark event includes the resource version
to which is synced. After sending the bookmark event, the API server continues as for any other watch
request.
When you set sendInitialEvents=true in the query string, Kubernetes also requires that you set
resourceVersionMatch to NotOlderThan value.
If you provided resourceVersion in the query string without providing a value or don't provide
it at all, this is interpreted as a request for consistent read;
the bookmark event is sent when the state is synced at least to the moment of a consistent read
from when the request started to be processed. If you specify resourceVersion (in the query string),
the bookmark event is sent when the state is synced at least to the provided resource version.
Example
An example: you want to watch a collection of Pods. For that collection, the current resource version
is 10245 and there are two pods: foo and bar. Then sending the following request (explicitly requesting
consistent read by setting empty resource version using resourceVersion=) could result
in the following sequence of events:
FEATURE STATE:Kubernetes v1.16 [beta] (enabled by default: true)
APIResponseCompression is an option that allows the API server to compress the responses for get
and list requests, reducing the network bandwidth and improving the performance of large-scale clusters.
It is enabled by default since Kubernetes 1.16 and it can be disabled by including
APIResponseCompression=false in the --feature-gates flag on the API server.
API response compression can significantly reduce the size of the response, especially for large resources or
collections.
For example, a list request for pods can return hundreds of kilobytes or even megabytes of data,
depending on the number of pods and their attributes. By compressing the response, the network bandwidth
can be saved and the latency can be reduced.
To verify if APIResponseCompression is working, you can send a get or list request to the
API server with an Accept-Encoding header, and check the response size and headers. For example:
GET /api/v1/pods
Accept-Encoding: gzip
---
200 OK
Content-Type: application/json
content-encoding: gzip
...
The content-encoding header indicates that the response is compressed with gzip.
Retrieving large results sets in chunks
FEATURE STATE:Kubernetes v1.29 [stable] (enabled by default: true)
On large clusters, retrieving the collection of some resource types may result in
very large responses that can impact the server and client. For instance, a cluster
may have tens of thousands of Pods, each of which is equivalent to roughly 2 KiB of
encoded JSON. Retrieving all pods across all namespaces may result in a very large
response (10-20MB) and consume a large amount of server resources.
The Kubernetes API server supports the ability to break a single large collection request
into many smaller chunks while preserving the consistency of the total request. Each
chunk can be returned sequentially which reduces both the total size of the request and
allows user-oriented clients to display results incrementally to improve responsiveness.
You can request that the API server handles a list by serving single collection
using pages (which Kubernetes calls chunks). To retrieve a single collection in
chunks, two query parameters limit and continue are supported on requests against
collections, and a response field continue is returned from all list operations
in the collection's metadata field. A client should specify the maximum results they
wish to receive in each chunk with limit and the server will return up to limit
resources in the result and include a continue value if there are more resources
in the collection.
As an API client, you can then pass this continue value to the API server on the
next request, to instruct the server to return the next page (chunk) of results. By
continuing until the server returns an empty continue value, you can retrieve the
entire collection.
Like a watch operation, a continue token will expire after a short amount
of time (by default 5 minutes) and return a 410 Gone if more results cannot be
returned. In this case, the client will need to start from the beginning or omit the
limit parameter.
For example, if there are 1,253 pods on the cluster and you want to receive chunks
of 500 pods at a time, request those chunks as follows:
List all of the pods on a cluster, retrieving up to 500 pods each time.
Continue the previous call, retrieving the last 253 pods.
GET /api/v1/pods?limit=500&continue=ENCODED_CONTINUE_TOKEN_2
---
200 OK
Content-Type: application/json
{
"kind": "PodList",
"apiVersion": "v1",
"metadata": {
"resourceVersion":"10245",
"continue": "", // continue token is empty because we have reached the end of the list
...
},
"items": [...] // returns pods 1001-1253
}
Notice that the resourceVersion of the collection remains constant across each request,
indicating the server is showing you a consistent snapshot of the pods. Pods that
are created, updated, or deleted after version 10245 would not be shown unless
you make a separate list request without the continue token. This allows you
to break large requests into smaller chunks and then perform a watch operation
on the full set without missing any updates.
remainingItemCount is the number of subsequent items in the collection that are not
included in this response. If the list request contained label or field
selectors then the number of
remaining items is unknown and the API server does not include a remainingItemCount
field in its response.
If the list is complete (either because it is not chunking, or because this is the
last chunk), then there are no more remaining items and the API server does not include a
remainingItemCount field in its response. The intended use of the remainingItemCount
is estimating the size of a collection.
Collections
In Kubernetes terminology, the response you get from a list is
a collection. However, Kubernetes defines concrete kinds for
collections of different types of resource. Collections have a kind
named for the resource kind, with List appended.
When you query the API for a particular type, all items returned by that query are
of that type. For example, when you list Services, the collection response
has kind set to
ServiceList;
each item in that collection represents a single Service. For example:
There are dozens of collection types (such as PodList, ServiceList,
and NodeList) defined in the Kubernetes API.
You can get more information about each collection type from the
Kubernetes API documentation.
Some tools, such as kubectl, represent the Kubernetes collection
mechanism slightly differently from the Kubernetes API itself.
Because the output of kubectl might include the response from
multiple list operations at the API level, kubectl represents
a list of items using kind: List. For example:
Keep in mind that the Kubernetes API does not have a kind named List.
kind: List is a client-side, internal implementation detail for processing
collections that might be of different kinds of object. Avoid depending on
kind: List in automation or other code.
Receiving resources as Tables
When you run kubectl get, the default output format is a simple tabular
representation of one or more instances of a particular resource type. In the past,
clients were required to reproduce the tabular and describe output implemented in
kubectl to perform simple lists of objects.
A few limitations of that approach include non-trivial logic when dealing with
certain objects. Additionally, types provided by API aggregation or third party
resources are not known at compile time. This means that generic implementations
had to be in place for types unrecognized by a client.
In order to avoid potential limitations as described above, clients may request
the Table representation of objects, delegating specific details of printing to the
server. The Kubernetes API implements standard HTTP content type negotiation: passing
an Accept header containing a value of application/json;as=Table;g=meta.k8s.io;v=v1
with a GET call will request that the server return objects in the Table content
type.
For example, list all of the pods on a cluster in the Table format.
GET /api/v1/pods
Accept: application/json;as=Table;g=meta.k8s.io;v=v1
---
200 OK
Content-Type: application/json
{
"kind": "Table",
"apiVersion": "meta.k8s.io/v1",
...
"columnDefinitions": [
...
]
}
For API resource types that do not have a custom Table definition known to the control
plane, the API server returns a default Table response that consists of the resource's
name and creationTimestamp fields.
Not all API resource types support a Table response; for example, a
CustomResourceDefinitions
might not define field-to-table mappings, and an APIService that
extends the core Kubernetes API
might not serve Table responses at all. If you are implementing a client that
uses the Table information and must work against all resource types, including
extensions, you should make requests that specify multiple content types in the
Accept header. For example:
When a client first sends a delete to request the removal of a resource,
the .metadata.deletionTimestamp is set to the current time.
Once the .metadata.deletionTimestamp is set, external controllers that act on finalizers
may start performing their cleanup work at any time, in any order.
Order is not enforced between finalizers because it would introduce significant
risk of stuck .metadata.finalizers.
The .metadata.finalizers field is shared: any actor with permission can reorder it.
If the finalizer list were processed in order, then this might lead to a situation
in which the component responsible for the first finalizer in the list is
waiting for some signal (field value, external system, or other) produced by a
component responsible for a finalizer later in the list, resulting in a deadlock.
Without enforced ordering, finalizers are free to order amongst themselves and are
not vulnerable to ordering changes in the list.
Once the last finalizer is removed, the resource is actually removed from etcd.
Force deletion
FEATURE STATE:Kubernetes v1.32 [alpha] (enabled by default: false)
Caution:
This may break the workload associated with the resource being force deleted, if it
relies on the normal deletion flow, so cluster breaking consequences may apply.
By enabling the delete option ignoreStoreReadErrorWithClusterBreakingPotential, the
user can perform an unsafe force delete operation of an undecryptable/corrupt
resource. This option is behind an ALPHA feature gate, and it is disabled by
default. In order to use this option, the cluster operator must enable the feature by
setting the command line option --feature-gates=AllowUnsafeMalformedObjectDeletion=true.
Note:
The user performing the force delete operation must have the privileges to do both
the delete and unsafe-delete-ignore-read-errors verbs on the given resource.
A resource is considered corrupt if it can not be successfully retrieved from the
storage due to:
transformation error (for example: decryption failure), or
the object failed to decode.
The API server first attempts a normal deletion, and if it fails with
a corrupt resource error then it triggers the force delete. A force delete operation
is unsafe because it ignores finalizer constraints, and skips precondition checks.
The default value for this option is false, this maintains backward compatibility.
For a delete request with ignoreStoreReadErrorWithClusterBreakingPotential
set to true, the fields dryRun, gracePeriodSeconds, orphanDependents,
preconditions, and propagationPolicy must be left unset.
Note:
If the user issues a delete request with ignoreStoreReadErrorWithClusterBreakingPotential
set to true on an otherwise readable resource, the API server aborts the request with an error.
Single resource API
The Kubernetes API verbs get, create, update, patch,
delete and proxy support single resources only.
These verbs with single resource support have no support for submitting multiple
resources together in an ordered or unordered list or transaction.
When clients (including kubectl) act on a set of resources, the client makes a series
of single-resource API requests, then aggregates the responses if needed.
By contrast, the Kubernetes API verbs list and watch allow getting multiple
resources, and deletecollection allows deleting multiple resources.
Field validation
Kubernetes always validates the type of fields. For example, if a field in the
API is defined as a number, you cannot set the field to a text value. If a field
is defined as an array of strings, you can only provide an array. Some fields
allow you to omit them, other fields are required. Omitting a required field
from an API request is an error.
If you make a request with an extra field, one that the cluster's control plane
does not recognize, then the behavior of the API server is more complicated.
By default, the API server drops fields that it does not recognize
from an input that it receives (for example, the JSON body of a PUT request).
There are two situations where the API server drops fields that you supplied in
an HTTP request.
These situations are:
The field is unrecognized because it is not in the resource's OpenAPI schema. (One
exception to this is for CRDs
that explicitly choose not to prune unknown fields via x-kubernetes-preserve-unknown-fields).
The field is duplicated in the object.
Validation for unrecognized or duplicate fields
FEATURE STATE:Kubernetes v1.27 [stable] (enabled by default: true)
From 1.25 onward, unrecognized or duplicate fields in an object are detected via
validation on the server when you use HTTP verbs that can submit data (POST, PUT, and PATCH).
Possible levels of validation are Ignore, Warn (default), and Strict.
Ignore
The API server succeeds in handling the request as it would without the erroneous fields
being set, dropping all unknown and duplicate fields and giving no indication it
has done so.
Warn
(Default) The API server succeeds in handling the request, and reports a
warning to the client. The warning is sent using the Warning: response header,
adding one warning item for each unknown or duplicate field. For more
information about warnings and the Kubernetes API, see the blog article
Warning: Helpful Warnings Ahead.
Strict
The API server rejects the request with a 400 Bad Request error when it
detects any unknown or duplicate fields. The response message from the API
server specifies all the unknown or duplicate fields that the API server has
detected.
The field validation level is set by the fieldValidation query parameter.
Note:
If you submit a request that specifies an unrecognized field, and that is also invalid for
a different reason (for example, the request provides a string value where the API expects
an integer for a known field), then the API server responds with a 400 Bad Request error, but will
not provide any information on unknown or duplicate fields (only which fatal
error it encountered first).
You always receive an error response in this case, no matter what field validation level you requested.
Tools that submit requests to the server (such as kubectl), might set their own
defaults that are different from the Warn validation level that the API server uses
by default.
The kubectl tool uses the --validate flag to set the level of field
validation. It accepts the values ignore, warn, and strict while
also accepting the values true (equivalent to strict) and false
(equivalent to ignore). The default validation setting for kubectl is
--validate=true, which means strict server-side field validation.
When kubectl cannot connect to an API server with field validation (API servers
prior to Kubernetes 1.27), it will fall back to using client-side validation.
Client-side validation will be removed entirely in a future version of kubectl.
Note:
Prior to Kubernetes 1.25, kubectl --validate was used to toggle client-side validation on or off as
a boolean flag.
Dry-run
FEATURE STATE:Kubernetes v1.19 [stable] (enabled by default: true)
When you use HTTP verbs that can modify resources (POST, PUT, PATCH, and
DELETE), you can submit your request in a dry run mode. Dry run mode helps to
evaluate a request through the typical request stages (admission chain, validation,
merge conflicts) up until persisting objects to storage. The response body for the
request is as close as possible to a non-dry-run response. Kubernetes guarantees that
dry-run requests will not be persisted in storage or have any other side effects.
Make a dry-run request
Dry-run is triggered by setting the dryRun query parameter. This parameter is a
string, working as an enum, and the only accepted values are:
[no value set]
Allow side effects. You request this with a query string such as ?dryRun
or ?dryRun&pretty=true. The response is the final object that would have been
persisted, or an error if the request could not be fulfilled.
All
Every stage runs as normal, except for the final storage stage where side effects
are prevented.
When you set ?dryRun=All, any relevant
admission controllers
are run, validating admission controllers check the request post-mutation, merge is
performed on PATCH, fields are defaulted, and schema validation occurs. The changes
are not persisted to the underlying storage, but the final object which would have
been persisted is still returned to the user, along with the normal status code.
If the non-dry-run version of a request would trigger an admission controller that has
side effects, the request will be failed rather than risk an unwanted side effect. All
built in admission control plugins support dry-run. Additionally, admission webhooks can
declare in their
configuration object
that they do not have side effects, by setting their sideEffects field to None.
Note:
If a webhook actually does have side effects, then the sideEffects field should be
set to "NoneOnDryRun". That change is appropriate provided that the webhook is also
be modified to understand the DryRun field in AdmissionReview, and to prevent side
effects on any request marked as dry runs.
Here is an example dry-run request that uses ?dryRun=All:
POST /api/v1/namespaces/test/pods?dryRun=All
Content-Type: application/json
Accept: application/json
The response would look the same as for non-dry-run request, but the values of some
generated fields may differ.
Generated values
Some values of an object are typically generated before the object is persisted. It
is important not to rely upon the values of these fields set by a dry-run request,
since these values will likely be different in dry-run mode from when the real
request is made. Some of these fields are:
name: if generateName is set, name will have a unique random name
creationTimestamp / deletionTimestamp: records the time of creation/deletion
UID: uniquely identifies
the object and is randomly generated (non-deterministic)
resourceVersion: tracks the persisted version of the object
Any field set by a mutating admission controller
For the Service resource: Ports or IP addresses that the kube-apiserver assigns to Service objects
Dry-run authorization
Authorization for dry-run and non-dry-run requests is identical. Thus, to make
a dry-run request, you must be authorized to make the non-dry-run request.
For example, to run a dry-run patch for a Deployment, you must be authorized
to perform that patch. Here is an example of a rule for Kubernetes
RBAC that allows patching
Deployments:
Kubernetes provides several ways to update existing objects.
You can read choosing an update mechanism to
learn about which approach might be best for your use case.
You can overwrite (update) an existing resource - for example, a ConfigMap -
using an HTTP PUT. For a PUT request, it is the client's responsibility to specify
the resourceVersion (taking this from the object being updated). Kubernetes uses
that resourceVersion information so that the API server can detect lost updates
and reject requests made by a client that is out of date with the cluster.
In the event that the resource has changed (the resourceVersion the client
provided is stale), the API server returns a 409 Conflict error response.
Instead of sending a PUT request, the client can send an instruction to the API
server to patch an existing resource. A patch is typically appropriate
if the change that the client wants to make isn't conditional on the existing data.
Clients that need effective detection of lost updates should consider
making their request conditional on the existing resourceVersion (either HTTP PUT or HTTP PATCH),
and then handle any retries that are needed in case there is a conflict.
The Kubernetes API supports four different PATCH operations, determined by their
corresponding HTTP Content-Type header:
application/apply-patch+yaml
Server Side Apply YAML (a Kubernetes-specific extension, based on YAML).
All JSON documents are valid YAML, so you can also submit JSON using this
media type. See Server Side Apply serialization
for more details.
To Kubernetes, this is a create operation if the object does not exist,
or a patch operation if the object already exists.
application/json-patch+json
JSON Patch, as defined in RFC6902.
A JSON patch is a sequence of operations that are executed on the resource;
for example {"op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ]}.
To Kubernetes, this is a patch operation.
A patch using application/json-patch+json can include conditions to
validate consistency, allowing the operation to fail if those conditions
are not met (for example, to avoid a lost update).
application/merge-patch+json
JSON Merge Patch, as defined in RFC7386.
A JSON Merge Patch is essentially a partial representation of the resource.
The submitted JSON is combined with the current resource to create a new one,
then the new one is saved.
To Kubernetes, this is a patch operation.
application/strategic-merge-patch+json
Strategic Merge Patch (a Kubernetes-specific extension based on JSON).
Strategic Merge Patch is a custom implementation of JSON Merge Patch.
You can only use Strategic Merge Patch with built-in APIs, or with aggregated
API servers that have special support for it. You cannot use
application/strategic-merge-patch+json with any API
defined using a CustomResourceDefinition.
Note:
The Kubernetes server side apply mechanism has superseded Strategic Merge
Patch.
Kubernetes' Server Side Apply
feature allows the control plane to track managed fields for newly created objects.
Server Side Apply provides a clear pattern for managing field conflicts,
offers server-side apply and update operations, and replaces the
client-side functionality of kubectl apply.
For Server-Side Apply, Kubernetes treats the request as a create if the object
does not yet exist, and a patch otherwise. For other requests that use PATCH
at the HTTP level, the logical Kubernetes operation is always patch.
The update (HTTP PUT) operation is simple to implement and flexible,
but has drawbacks:
You need to handle conflicts where the resourceVersion of the object changes
between your client reading it and trying to write it back. Kubernetes always
detects the conflict, but you as the client author need to implement retries.
You might accidentally drop fields if you decode an object locally (for example,
using client-go, you could receive fields that your client does not know how to
handle - and then drop them as part of your update.
If there's a lot of contention on the object (even on a field, or set of fields,
that you're not trying to edit), you might have trouble sending the update.
The problem is worse for larger objects and for objects with many fields.
HTTP PATCH using JSON Patch
A patch update is helpful, because:
As you're only sending differences, you have less data to send in the PATCH
request.
You can make changes that rely on existing values, such as copying the
value of a particular field into an annotation.
Unlike with an update (HTTP PUT), making your change can happen right away
even if there are frequent changes to unrelated fields): you usually would
not need to retry.
You might still need to specify the resourceVersion (to match an existing object)
if you want to be extra careful to avoid lost updates
It's still good practice to write in some retry logic in case of errors.
You can use test conditions to careful craft specific update conditions.
For example, you can increment a counter without reading it if the existing
value matches what you expect. You can do this with no lost update risk,
even if the object has changed in other ways since you last wrote to it.
(If the test condition fails, you can fall back to reading the current value
and then write back the changed number).
However:
You need more local (client) logic to build the patch; it helps a lot if you have
a library implementation of JSON Patch, or even for making a JSON Patch specifically against Kubernetes.
As the author of client software, you need to be careful when building the patch
(the HTTP request body) not to drop fields (the order of operations matters).
HTTP PATCH using Server-Side Apply
Server-Side Apply has some clear benefits:
A single round trip: it rarely requires making a GET request first.
and you can still detect conflicts for unexpected changes
you have the option to force override a conflict, if appropriate
Client implementations are easy to make.
You get an atomic create-or-update operation without extra effort
(similar to UPSERT in some SQL dialects).
However:
Server-Side Apply does not work at all for field changes that depend on a current value of the object.
You can only apply updates to objects. Some resources in the Kubernetes HTTP API are
not objects (they do not have a .metadata field), and Server-Side Apply
is only relevant for Kubernetes objects.
Resource versions
Resource versions are strings that identify the server's internal version of an
object. Resource versions can be used by clients to determine when objects have
changed, or to express data consistency requirements when getting, listing and
watching resources. Resource versions must be treated as opaque by clients and passed
unmodified back to the server.
You must not assume resource versions are numeric or collatable. API clients may
only compare two resource versions for equality (this means that you must not compare
resource versions for greater-than or less-than relationships).
resourceVersion fields in metadata
Clients find resource versions in resources, including the resources from the response
stream for a watch, or when using list to enumerate resources.
v1.meta/ObjectMeta -
The metadata.resourceVersion of a resource instance identifies the resource version the instance was last modified at.
v1.meta/ListMeta -
The metadata.resourceVersion of a resource collection (the response to a list) identifies the
resource version at which the collection was constructed.
resourceVersion parameters in query strings
The get, list, and watch operations support the resourceVersion parameter.
From version v1.19, Kubernetes API servers also support the resourceVersionMatch
parameter on list requests.
The API server interprets the resourceVersion parameter differently depending
on the operation you request, and on the value of resourceVersion. If you set
resourceVersionMatch then this also affects the way matching happens.
Semantics for get and list
For get and list, the semantics of resourceVersion are:
get:
resourceVersion unset
resourceVersion="0"
resourceVersion="{value other than 0}"
Most Recent
Any
Not older than
list:
From version v1.19, Kubernetes API servers support the resourceVersionMatch parameter
on list requests. If you set both resourceVersion and resourceVersionMatch, the
resourceVersionMatch parameter determines how the API server interprets
resourceVersion.
You should always set the resourceVersionMatch parameter when setting
resourceVersion on a list request. However, be prepared to handle the case
where the API server that responds is unaware of resourceVersionMatch
and ignores it.
Unless you have strong consistency requirements, using resourceVersionMatch=NotOlderThan and
a known resourceVersion is preferable since it can achieve better performance and scalability
of your cluster than leaving resourceVersion and resourceVersionMatch unset, which requires
quorum read to be served.
Setting the resourceVersionMatch parameter without setting resourceVersion is not valid.
This table explains the behavior of list requests with various combinations of
resourceVersion and resourceVersionMatch:
resourceVersionMatch and paging parameters for list
resourceVersionMatch param
paging params
resourceVersion not set
resourceVersion="0"
resourceVersion="{value other than 0}"
unset
limit unset
Most Recent
Any
Not older than
unset
limit=<n>, continue unset
Most Recent
Any
Exact
unset
limit=<n>, continue=<token>
Continuation
Continuation
Invalid, HTTP 400 Bad Request
resourceVersionMatch=Exact
limit unset
Invalid
Invalid
Exact
resourceVersionMatch=Exact
limit=<n>, continue unset
Invalid
Invalid
Exact
resourceVersionMatch=NotOlderThan
limit unset
Invalid
Any
Not older than
resourceVersionMatch=NotOlderThan
limit=<n>, continue unset
Invalid
Any
Not older than
Note:
If your cluster's API server does not honor the resourceVersionMatch parameter,
the behavior is the same as if you did not set it.
The meaning of the get and list semantics are:
Any
Return data at any resource version. The newest available resource version is preferred,
but strong consistency is not required; data at any resource version may be served. It is possible
for the request to return data at a much older resource version that the client has previously
observed, particularly in high availability configurations, due to partitions or stale
caches. Clients that cannot tolerate this should not use this semantic.
Always served from watch cache, improving performance and reducing etcd load.
Most recent
Return data at the most recent resource version. The returned data must be
consistent (in detail: served from etcd via a quorum read).
For etcd v3.4.31+ and v3.5.13+, Kubernetes 1.33 serves “most recent” reads from the watch cache:
an internal, in-memory store within the API server that caches and mirrors the state of data
persisted into etcd. Kubernetes requests progress notification to maintain cache consistency against
the etcd persistence layer. Kubernetes v1.28 through to v1.30 also supported this
feature, although as Alpha it was not recommended for production nor enabled by default until the v1.31 release.
Not older than
Return data at least as new as the provided resourceVersion. The newest
available data is preferred, but any data not older than the provided resourceVersion may be
served. For list requests to servers that honor the resourceVersionMatch parameter, this
guarantees that the collection's .metadata.resourceVersion is not older than the requested
resourceVersion, but does not make any guarantee about the .metadata.resourceVersion of any
of the items in that collection.
Always served from watch cache, improving performance and reducing etcd load.
Exact
Return data at the exact resource version provided. If the provided resourceVersion is
unavailable, the server responds with HTTP 410 Gone. For list requests to servers that honor the
resourceVersionMatch parameter, this guarantees that the collection's .metadata.resourceVersion
is the same as the resourceVersion you requested in the query string. That guarantee does
not apply to the .metadata.resourceVersion of any items within that collection.
By default served from etcd, but with the ListFromCacheSnapshot feature gate enabled,
API server will attempt to serve the response from snapshot if available.
This improves performance and reduces etcd load. Cache snapshots are kept by default for 75 seconds,
so if the provided resourceVersion is unavailable, the server will fallback to etcd.
Continuation
Return the next page of data for a paginated list request, ensuring consistency with the exact resourceVersion established by the initial request in the sequence.
Response to list requests with limit include continue token, that encodes the resourceVersion and last observed position from which to resume the list.
If the resourceVersion in the provided continue token is unavailable, the server responds with HTTP 410 Gone.
By default served from etcd, but with the ListFromCacheSnapshot feature gate enabled,
API server will attempt to serve the response from snapshot if available.
This improves performance and reduces etcd load. Cache snapshots are kept by default for 75 seconds,
so if the resourceVersion in provided continue token is unavailable, the server will fallback to etcd.
Note:
When you list resources and receive a collection response, the response includes the
list metadata
of the collection as well as
object metadata
for each item in that collection. For individual objects found within a collection response,
.metadata.resourceVersion tracks when that object was last updated, and not how up-to-date
the object is when served.
When using resourceVersionMatch=NotOlderThan and limit is set, clients must
handle HTTP 410 Gone responses. For example, the client might retry with a
newer resourceVersion or fall back to resourceVersion="".
When using resourceVersionMatch=Exact and limit is unset, clients must
verify that the collection's .metadata.resourceVersion matches
the requested resourceVersion, and handle the case where it does not. For
example, the client might fall back to a request with limit set.
Semantics for watch
For watch, the semantics of resource version are:
watch:
resourceVersion for watch
resourceVersion unset
resourceVersion="0"
resourceVersion="{value other than 0}"
Get State and Start at Most Recent
Get State and Start at Any
Start at Exact
The meaning of those watch semantics are:
Get State and Start at Any
Start a watch at any resource version; the most recent resource version
available is preferred, but not required. Any starting resource version is
allowed. It is possible for the watch to start at a much older resource
version that the client has previously observed, particularly in high availability
configurations, due to partitions or stale caches. Clients that cannot tolerate
this apparent rewinding should not start a watch with this semantic. To
establish initial state, the watch begins with synthetic "Added" events for
all resource instances that exist at the starting resource version. All following
watch events are for all changes that occurred after the resource version the
watch started at.
Caution:
watches initialized this way may return arbitrarily stale
data. Please review this semantic before using it, and favor the other semantics
where possible.
Get State and Start at Most Recent
Start a watch at the most recent resource version, which must be consistent
(in detail: served from etcd via a quorum read). To establish initial state,
the watch begins with synthetic "Added" events of all resources instances
that exist at the starting resource version. All following watch events are for
all changes that occurred after the resource version the watch started at.
Start at Exact
Start a watch at an exact resource version. The watch events are for all changes
after the provided resource version. Unlike "Get State and Start at Most Recent"
and "Get State and Start at Any", the watch is not started with synthetic
"Added" events for the provided resource version. The client is assumed to already
have the initial state at the starting resource version since the client provided
the resource version.
"410 Gone" responses
Servers are not required to serve all older resource versions and may return a HTTP
410 (Gone) status code if a client requests a resourceVersion older than the
server has retained. Clients must be able to tolerate 410 (Gone) responses. See
Efficient detection of changes for details on
how to handle 410 (Gone) responses when watching resources.
If you request a resourceVersion outside the applicable limit then, depending
on whether a request is served from cache or not, the API server may reply with a
410 Gone HTTP response.
Unavailable resource versions
Servers are not required to serve unrecognized resource versions. If you request
list or get for a resource version that the API server does not recognize,
then the API server may either:
wait briefly for the resource version to become available, then timeout with a
504 (Gateway Timeout) if the provided resource versions does not become available
in a reasonable amount of time;
respond with a Retry-After response header indicating how many seconds a client
should wait before retrying the request.
If you request a resource version that an API server does not recognize, the
kube-apiserver additionally identifies its error responses with a message
Too large resource version.
If you make a watch request for an unrecognized resource version, the API server
may wait indefinitely (until the request timeout) for the resource version to become
available.
2.3 - Server-Side Apply
FEATURE STATE:Kubernetes v1.22 [stable] (enabled by default: true)
Kubernetes supports multiple appliers collaborating to manage the fields
of a single object.
Server-Side Apply provides an optional mechanism for your cluster's control plane to track
changes to an object's fields. At the level of a specific resource, Server-Side
Apply records and tracks information about control over the fields of that object.
Server-Side Apply helps users and controllers
manage their resources through declarative configuration. Clients can create and modify
objects
declaratively by submitting their fully specified intent.
A fully specified intent is a partial object that only includes the fields and
values for which the user has an opinion. That intent either creates a new
object (using default values for unspecified fields), or is
combined, by the API server, with the existing object.
Comparison with Client-Side Apply explains
how Server-Side Apply differs from the original, client-side kubectl apply
implementation.
Field management
The Kubernetes API server tracks managed fields for all newly created objects.
When trying to apply an object, fields that have a different value and are owned by
another manager will result in a conflict. This is done
in order to signal that the operation might undo another collaborator's changes.
Writes to objects with managed fields can be forced, in which case the value of any
conflicted field will be overridden, and the ownership will be transferred.
Whenever a field's value does change, ownership moves from its current manager to the
manager making the change.
Apply checks if there are any other field managers that also own the
field. If the field is not owned by any other field managers, that field is
set to its default value (if there is one), or otherwise is deleted from the
object.
The same rule applies to fields that are lists, associative lists, or maps.
For a user to manage a field, in the Server-Side Apply sense, means that the
user relies on and expects the value of the field not to change. The user who
last made an assertion about the value of a field will be recorded as the
current field manager. This can be done by changing the field manager
details explicitly using HTTP POST (create), PUT (update), or non-apply
PATCH (patch). You can also declare and record a field manager
by including a value for that field in a Server-Side Apply operation.
A Server-Side Apply patch request requires the client to provide its identity
as a field manager. When using Server-Side Apply, trying to change a
field that is controlled by a different manager results in a rejected
request unless the client forces an override.
For details of overrides, see Conflicts.
When two or more appliers set a field to the same value, they share ownership of
that field. Any subsequent attempt to change the value of the shared field, by any of
the appliers, results in a conflict. Shared field owners may give up ownership
of a field by making a Server-Side Apply patch request that doesn't include
that field.
Field management details are stored in a managedFields field that is part of an
object's metadata.
If you remove a field from a manifest and apply that manifest, Server-Side
Apply checks if there are any other field managers that also own the field.
If the field is not owned by any other field managers, it is either deleted
from the live object or reset to its default value, if it has one.
The same rule applies to associative list or map items.
Compared to the (legacy)
kubectl.kubernetes.io/last-applied-configuration
annotation managed by kubectl, Server-Side Apply uses a more declarative
approach, that tracks a user's (or client's) field management, rather than
a user's last applied state. As a side effect of using Server-Side Apply,
information about which field manager manages each field in an object also
becomes available.
Example
A simple example of an object created using Server-Side Apply could look like this:
Note:
kubectl get omits managed fields by default.
Add --show-managed-fields to show managedFields when the output format is either json or yaml.
---apiVersion:v1kind:ConfigMapmetadata:name:test-cmnamespace:defaultlabels:test-label:testmanagedFields:- manager:kubectloperation: Apply # note capitalization:"Apply"(or "Update")apiVersion:v1time:"2010-10-10T0:00:00Z"fieldsType:FieldsV1fieldsV1:f:metadata:f:labels:f:test-label:{}f:data:f:key:{}data:key:some value
That example ConfigMap object contains a single field management record in
.metadata.managedFields. The field management record consists of basic information
about the managing entity itself, plus details about the fields being managed and
the relevant operation (Apply or Update). If the request that last changed that
field was a Server-Side Apply patch then the value of operation is Apply;
otherwise, it is Update.
There is another possible outcome. A client could submit an invalid request
body. If the fully specified intent does not produce a valid object, the
request fails.
It is however possible to change .metadata.managedFields through an
update, or through a patch operation that does not use Server-Side Apply.
Doing so is highly discouraged, but might be a reasonable option to try if,
for example, the .metadata.managedFields get into an inconsistent state
(which should not happen in normal operations).
The format of managedFields is described
in the Kubernetes API reference.
Caution:
The .metadata.managedFields field is managed by the API server.
You should avoid updating it manually.
Conflicts
A conflict is a special status error that occurs when an Apply operation tries
to change a field that another manager also claims to manage. This prevents an
applier from unintentionally overwriting the value set by another user. When
this occurs, the applier has 3 options to resolve the conflicts:
Overwrite value, become sole manager: If overwriting the value was
intentional (or if the applier is an automated process like a controller) the
applier should set the force query parameter to true (for kubectl apply,
you use the --force-conflicts command line parameter), and make the request
again. This forces the operation to succeed, changes the value of the field,
and removes the field from all other managers' entries in managedFields.
Don't overwrite value, give up management claim: If the applier doesn't
care about the value of the field any more, the applier can remove it from their
local model of the resource, and make a new request with that particular field
omitted. This leaves the value unchanged, and causes the field to be removed
from the applier's entry in managedFields.
Don't overwrite value, become shared manager: If the applier still cares
about the value of a field, but doesn't want to overwrite it, they can
change the value of that field in their local model of the resource so as to
match the value of the object on the server, and then make a new request that
takes into account that local update. Doing so leaves the value unchanged,
and causes that field's management to be shared by the applier along with all
other field managers that already claimed to manage it.
Field managers
Managers identify distinct workflows that are modifying the object (especially
useful on conflicts!), and can be specified through the
fieldManager
query parameter as part of a modifying request. When you Apply to a resource,
the fieldManager parameter is required.
For other updates, the API server infers a field manager identity from the
"User-Agent:" HTTP header (if present).
When you use the kubectl tool to perform a Server-Side Apply operation, kubectl
sets the manager identity to "kubectl" by default.
Serialization
At the protocol level, Kubernetes represents Server-Side Apply message bodies
as YAML, with the media type application/apply-patch+yaml.
Note:
Whether you are submitting JSON data or YAML data, use
application/apply-patch+yaml as the Content-Type header value.
All JSON documents are valid YAML. However, Kubernetes has a bug where it uses a YAML
parser that does not fully implement the YAML specification. Some JSON escapes may
not be recognized.
The serialization is the same as for Kubernetes objects, with the exception that
clients are not required to send a complete object.
Here's an example of a Server-Side Apply message body (fully specified intent):
{"apiVersion": "v1","kind": "ConfigMap"}
(this would make a no-change update, provided that it was sent as the body
of a patch request to a valid v1/configmaps resource, and with the
appropriate request Content-Type).
Operations in scope for field management
The Kubernetes API operations where field management is considered are:
Server-Side Apply (HTTP PATCH, with content type application/apply-patch+yaml)
Replacing an existing object (update to Kubernetes; PUT at the HTTP level)
Both operations update .metadata.managedFields, but behave a little differently.
Unless you specify a forced override, an apply operation that encounters field-level
conflicts always fails; by contrast, if you make a change using update that would
affect a managed field, a conflict never provokes failure of the operation.
All Server-Side Apply patch requests are required to identify themselves by providing a
fieldManager query parameter, while the query parameter is optional for update
operations. Finally, when using the Apply operation you cannot define managedFields in
the body of the request that you submit.
An example object with multiple managers could look like this:
---apiVersion:v1kind:ConfigMapmetadata:name:test-cmnamespace:defaultlabels:test-label:testmanagedFields:- manager:kubectloperation:Applytime:'2019-03-30T15:00:00.000Z'apiVersion:v1fieldsType:FieldsV1fieldsV1:f:metadata:f:labels:f:test-label:{}- manager:kube-controller-manageroperation:UpdateapiVersion:v1time:'2019-03-30T16:00:00.000Z'fieldsType:FieldsV1fieldsV1:f:data:f:key:{}data:key:new value
In this example, a second operation was run as an update by the manager called
kube-controller-manager. The update request succeeded and changed a value in the data
field, which caused that field's management to change to the kube-controller-manager.
If this update has instead been attempted using Server-Side Apply, the request
would have failed due to conflicting ownership.
Merge strategy
The merging strategy, implemented with Server-Side Apply, provides a generally
more stable object lifecycle. Server-Side Apply tries to merge fields based on
the actor who manages them instead of overruling based on values. This way
multiple actors can update the same object without causing unexpected interference.
When a user sends a fully-specified intent object to the Server-Side Apply
endpoint, the server merges it with the live object favoring the value from the
request body if it is specified in both places. If the set of items present in
the applied config is not a superset of the items applied by the same user last
time, each missing item not managed by any other appliers is removed. For
more information about how an object's schema is used to make decisions when
merging, see
sigs.k8s.io/structured-merge-diff.
The Kubernetes API (and the Go code that implements that API for Kubernetes) allows
defining merge strategy markers. These markers describe the merge strategy supported
for fields within Kubernetes objects.
For a CustomResourceDefinition,
you can set these markers when you define the custom resource.
Golang marker
OpenAPI extension
Possible values
Description
//+listType
x-kubernetes-list-type
atomic/set/map
Applicable to lists. set applies to lists that include only scalar elements. These elements must be unique. map applies to lists of nested types only. The key values (see listMapKey) must be unique in the list. atomic can apply to any list. If configured as atomic, the entire list is replaced during merge. At any point in time, a single manager owns the list. If set or map, different managers can manage entries separately.
//+listMapKey
x-kubernetes-list-map-keys
List of field names, e.g. ["port", "protocol"]
Only applicable when +listType=map. A list of field names whose values uniquely identify entries in the list. While there can be multiple keys, listMapKey is singular because keys need to be specified individually in the Go type. The key fields must be scalars.
//+mapType
x-kubernetes-map-type
atomic/granular
Applicable to maps. atomic means that the map can only be entirely replaced by a single manager. granular means that the map supports separate managers updating individual fields.
//+structType
x-kubernetes-map-type
atomic/granular
Applicable to structs; otherwise same usage and OpenAPI annotation as //+mapType.
If listType is missing, the API server interprets a
patchStrategy=merge marker as a listType=map and the
corresponding patchMergeKey marker as a listMapKey.
The atomic list type is recursive.
(In the Go code for Kubernetes, these markers are specified as
comments and code authors need not repeat them as field tags).
Custom resources and Server-Side Apply
By default, Server-Side Apply treats custom resources as unstructured data. All
keys are treated the same as struct fields, and all lists are considered atomic.
If the CustomResourceDefinition defines a
schema
that contains annotations as defined in the previous Merge Strategy
section, these annotations will be used when merging objects of this
type.
Compatibility across topology changes
On rare occurrences, the author for a CustomResourceDefinition (CRD) or built-in
may want to change the specific topology of a field in their resource,
without incrementing its API version. Changing the topology of types,
by upgrading the cluster or updating the CRD, has different consequences when
updating existing objects. There are two categories of changes: when a field goes from
map/set/granular to atomic, and the other way around.
When the listType, mapType, or structType changes from
map/set/granular to atomic, the whole list, map, or struct of
existing objects will end-up being owned by actors who owned an element
of these types. This means that any further change to these objects
would cause a conflict.
When a listType, mapType, or structType changes from atomic to
map/set/granular, the API server is unable to infer the new
ownership of these fields. Because of that, no conflict will be produced
when objects have these fields updated. For that reason, it is not
recommended to change a type from atomic to map/set/granular.
Before spec.data gets changed from atomic to granular,
manager-one owns the field spec.data, and all the fields within it
(key1 and key2). When the CRD gets changed to make spec.datagranular, manager-one continues to own the top-level field
spec.data (meaning no other managers can delete the map called data
without a conflict), but it no longer owns key1 and key2, so another
manager can then modify or delete those fields without conflict.
Using Server-Side Apply in a controller
As a developer of a controller, you can use Server-Side Apply as a way to
simplify the update logic of your controller. The main differences with a
read-modify-write and/or patch are the following:
the applied object must contain all the fields that the controller cares about.
there is no way to remove fields that haven't been applied by the controller
before (controller can still send a patch or update for these use-cases).
the object doesn't have to be read beforehand; resourceVersion doesn't have
to be specified.
It is strongly recommended for controllers to always force conflicts on objects that
they own and manage, since they might not be able to resolve or act on these conflicts.
Transferring ownership
In addition to the concurrency controls provided by conflict resolution,
Server-Side Apply provides ways to perform coordinated
field ownership transfers from users to controllers.
This is best explained by example. Let's look at how to safely transfer
ownership of the replicas field from a user to a controller while enabling
automatic horizontal scaling for a Deployment, using the HorizontalPodAutoscaler
resource and its accompanying controller.
Say a user has defined Deployment with replicas set to the desired value:
Now, the user would like to remove replicas from their configuration, so they
don't accidentally fight with the HorizontalPodAutoscaler (HPA) and its controller.
However, there is a race: it might take some time before the HPA feels the need
to adjust .spec.replicas; if the user removes .spec.replicas before the HPA writes
to the field and becomes its owner, then the API server would set .spec.replicas to
1 (the default replica count for Deployment).
This is not what the user wants to happen, even temporarily - it might well degrade
a running workload.
There are two solutions:
(basic) Leave replicas in the configuration; when the HPA eventually writes to that
field, the system gives the user a conflict over it. At that point, it is safe
to remove from the configuration.
(more advanced) If, however, the user doesn't want to wait, for example
because they want to keep the cluster legible to their colleagues, then they
can take the following steps to make it safe to remove replicas from their
configuration:
First, the user defines a new manifest containing only the replicas field:
# Save this file as 'nginx-deployment-replicas-only.yaml'.apiVersion:apps/v1kind:Deploymentmetadata:name:nginx-deploymentspec:replicas:3
Note:
The YAML file for SSA in this case only contains the fields you want to change.
You are not supposed to provide a fully compliant Deployment manifest if you only
want to modify the spec.replicas field using SSA.
The user applies that manifest using a private field manager name. In this example,
the user picked handover-to-hpa:
If the apply results in a conflict with the HPA controller, then do nothing. The
conflict indicates the controller has claimed the field earlier in the
process than it sometimes does.
At this point the user may remove the replicas field from their manifest:
Note that whenever the HPA controller sets the replicas field to a new value,
the temporary field manager will no longer own any fields and will be
automatically deleted. No further clean up is required.
Transferring ownership between managers
Field managers can transfer ownership of a field between each other by setting the field
to the same value in both of their applied configurations, causing them to share
ownership of the field. Once the managers share ownership of the field, one of them
can remove the field from their applied configuration to give up ownership and
complete the transfer to the other field manager.
Comparison with Client-Side Apply
Server-Side Apply is meant both as a replacement for the original client-side
implementation of the kubectl apply subcommand, and as simple and effective
mechanism for controllers
to enact their changes.
Compared to the last-applied annotation managed by kubectl, Server-Side
Apply uses a more declarative approach, which tracks an object's field management,
rather than a user's last applied state. This means that as a side effect of
using Server-Side Apply, information about which field manager manages each
field in an object also becomes available.
A consequence of the conflict detection and resolution implemented by Server-Side
Apply is that an applier always has up to date field values in their local
state. If they don't, they get a conflict the next time they apply. Any of the
three options to resolve conflicts results in the applied configuration being an
up to date subset of the object on the server's fields.
This is different from Client-Side Apply, where outdated values which have been
overwritten by other users are left in an applier's local config. These values
only become accurate when the user updates that specific field, if ever, and an
applier has no way of knowing whether their next apply will overwrite other
users' changes.
Another difference is that an applier using Client-Side Apply is unable to
change the API version they are using, but Server-Side Apply supports this use
case.
Migration between client-side and server-side apply
Upgrading from client-side apply to server-side apply
Client-side apply users who manage a resource with kubectl apply can start
using server-side apply with the following flag.
kubectl apply --server-side [--dry-run=server]
By default, field management of the object transfers from client-side apply to
kubectl server-side apply, without encountering conflicts.
Caution:
Keep the last-applied-configuration annotation up to date.
The annotation infers client-side applies managed fields.
Any fields not managed by client-side apply raise conflicts.
For example, if you used kubectl scale to update the replicas field after
client-side apply, then this field is not owned by client-side apply and
creates conflicts on kubectl apply --server-side.
This behavior applies to server-side apply with the kubectl field manager.
As an exception, you can opt-out of this behavior by specifying a different,
non-default field manager, as seen in the following example. The default field
manager for kubectl server-side apply is kubectl.
Downgrading from server-side apply to client-side apply
If you manage a resource with kubectl apply --server-side,
you can downgrade to client-side apply directly with kubectl apply.
Downgrading works because kubectl Server-Side Apply keeps the
last-applied-configuration annotation up-to-date if you use
kubectl apply.
This behavior applies to Server-Side Apply with the kubectl field manager.
As an exception, you can opt-out of this behavior by specifying a different,
non-default field manager, as seen in the following example. The default field
manager for kubectl server-side apply is kubectl.
The PATCH verb for a resource that supports Server-Side Apply can accepts the
unofficial application/apply-patch+yaml content type. Users of Server-Side
Apply can send partially specified objects as YAML as the body of a PATCH request
to the URI of a resource. When applying a configuration, you should always include all the
fields that are important to the outcome (such as a desired state) that you want to define.
All JSON messages are valid YAML. Therefore, in addition to using YAML request bodies for Server-Side Apply requests, you can also use JSON request bodies, as they are also valid YAML.
In either case, use the media type application/apply-patch+yaml for the HTTP request.
Access control and permissions
Since Server-Side Apply is a type of PATCH, a principal (such as a Role for Kubernetes
RBAC) requires the patch permission to
edit existing resources, and also needs the create verb permission in order to create
new resources with Server-Side Apply.
Clearing managedFields
It is possible to strip all managedFields from an object by overwriting them
using a patch (JSON Merge Patch, Strategic Merge Patch, JSON Patch), or
through an update (HTTP PUT); in other words, through every write operation
other than apply. This can be done by overwriting the managedFields field
with an empty entry. Two examples are:
This will overwrite the managedFields with a list containing a single empty
entry that then results in the managedFields being stripped entirely from the
object. Note that setting the managedFields to an empty list will not
reset the field. This is on purpose, so managedFields never get stripped by
clients not aware of the field.
In cases where the reset operation is combined with changes to other fields
than the managedFields, this will result in the managedFields being reset
first and the other changes being processed afterwards. As a result the
applier takes ownership of any fields updated in the same request.
Note:
Server-Side Apply does not correctly track ownership on
sub-resources that don't receive the resource object type. If you are
using Server-Side Apply with such a sub-resource, the changed fields
may not be tracked.
What's next
You can read about managedFields within the Kubernetes API reference for the
metadata
top level field.
2.4 - Client Libraries
This page contains an overview of the client libraries for using the Kubernetes
API from various programming languages.
To write applications using the Kubernetes REST API,
you do not need to implement the API calls and request/response types yourself.
You can use a client library for the programming language you are using.
Client libraries often handle common tasks such as authentication for you.
Most client libraries can discover and use the Kubernetes Service Account to
authenticate if the API client is running inside the Kubernetes cluster, or can
understand the kubeconfig file
format to read the credentials and the API Server address.
Note: This section links to third party projects that provide functionality required by Kubernetes. The Kubernetes project authors aren't responsible for these projects, which are listed alphabetically. To add a project to this list, read the content guide before submitting a change. More information.
The following Kubernetes API client libraries are provided and maintained by
their authors, not the Kubernetes team.
The Common Expression Language (CEL) is used
in the Kubernetes API to declare validation rules, policy rules, and other
constraints or conditions.
CEL expressions are evaluated directly in the
API server, making CEL a
convenient alternative to out-of-process mechanisms, such as webhooks, for many
extensibility use cases. Your CEL expressions continue to execute so long as the
control plane's API server component remains available.
Language overview
The CEL language
has a straightforward syntax that is similar to the expressions in C, C++, Java,
JavaScript and Go.
CEL was designed to be embedded into applications. Each CEL "program" is a
single expression that evaluates to a single value. CEL expressions are
typically short "one-liners" that inline well into the string fields of Kubernetes
API resources.
Inputs to a CEL program are "variables". Each Kubernetes API field that contains
CEL declares in the API documentation which variables are available to use for
that field. For example, in the x-kubernetes-validations[i].rules field of
CustomResourceDefinitions, the self and oldSelf variables are available and
refer to the previous and current state of the custom resource data to be
validated by the CEL expression. Other Kubernetes API fields may declare
different variables. See the API documentation of the API fields to learn which
variables are available for that field.
Example CEL expressions:
Examples of CEL expressions and the purpose of each
CEL functions, features and language settings support Kubernetes control plane
rollbacks. For example, CEL Optional Values was introduced at Kubernetes 1.29
and so only API servers at that version or newer will accept write requests to
CEL expressions that use CEL Optional Values. However, when a cluster is
rolled back to Kubernetes 1.28 CEL expressions using "CEL Optional Values" that
are already stored in API resources will continue to evaluate correctly.
Kubernetes CEL libraries
In additional to the CEL community libraries, Kubernetes includes CEL libraries
that are available everywhere CEL is used in Kubernetes.
Kubernetes list library
The list library includes indexOf and lastIndexOf, which work similar to the
strings functions of the same names. These functions either the first or last
positional index of the provided element in the list.
The list library also includes min, max and sum. Sum is supported on all
number types as well as the duration type. Min and max are supported on all
comparable types.
isSorted is also provided as a convenience function and is supported on all
comparable types.
Examples:
Examples of CEL expressions using list library functions
CEL Expression
Purpose
names.isSorted()
Verify that a list of names is kept in alphabetical order
items.map(x, x.weight).sum() == 1.0
Verify that the "weights" of a list of objects sum to 1.0
In addition to the matches function provided by the CEL standard library, the
regex library provides find and findAll, enabling a much wider range of
regex operations.
Examples:
Examples of CEL expressions using regex library functions
For CEL expressions in the API where a variable of type Authorizer is available,
the authorizer may be used to perform authorization checks for the principal
(authenticated user) of the request.
API resource checks are performed as follows:
Specify the group and resource to check: Authorizer.group(string).resource(string) ResourceCheck
Optionally call any combination of the following builder functions to further narrow the authorization check.
Note that these functions return the receiver type and can be chained:
ResourceCheck.subresource(string) ResourceCheck
ResourceCheck.namespace(string) ResourceCheck
ResourceCheck.name(string) ResourceCheck
Call ResourceCheck.check(verb string) Decision to perform the authorization check.
Call allowed() bool or reason() string to inspect the result of the authorization check.
Non-resource authorization performed are used as follows:
Specify only a path: Authorizer.path(string) PathCheck
Call PathCheck.check(httpVerb string) Decision to perform the authorization check.
Call allowed() bool or reason() string to inspect the result of the authorization check.
To perform an authorization check for a service account:
Authorizer.serviceAccount(namespace string, name string) Authorizer
Examples of CEL expressions using URL library functions
quantity(string) Quantity converts a string to a Quantity or results in an error if the
string is not a valid quantity.
Once parsed via the quantity function, the resulting Quantity object has the
following library of member functions:
Available member functions of a Quantity
Member Function
CEL Return Value
Description
isInteger()
bool
Returns true if and only if asInteger is safe to call without an error
asInteger()
int
Returns a representation of the current value as an int64 if possible
or results in an error if conversion would result in overflowor loss of precision.
asApproximateFloat()
float
Returns a float64 representation of the quantity which may lose precision.
If the value of the quantity is outside the range of a float64,
+Inf/-Inf will be returned.
sign()
int
Returns 1 if the quantity is positive, -1 if it is negative.
0 if it is zero.
add(<Quantity>)
Quantity
Returns sum of two quantities
add(<int>)
Quantity
Returns sum of quantity and an integer
sub(<Quantity>)
Quantity
Returns difference between two quantities
sub(<int>)
Quantity
Returns difference between a quantity and an integer
isLessThan(<Quantity>)
bool
Returns true if and only if the receiver is less than the operand
isGreaterThan(<Quantity>)
bool
Returns true if and only if the receiver is greater than the operand
compareTo(<Quantity>)
int
Compares receiver to operand and returns 0 if they are equal,
1 if the receiver is greater, or -1 if the receiver is less than the operand
Examples:
Examples of CEL expressions using URL library functions
CEL Expression
Purpose
quantity("500000G").isInteger()
Test if conversion to integer would throw an error
Some Kubernetes API fields contain partially type checked CEL expressions. A
partially type checked expression is an expressions where some of the variables
are statically typed but others are dynamically typed. For example, in the CEL
expressions of
ValidatingAdmissionPolicies
the request variable is typed, but the object variable is dynamically typed.
As a result, an expression containing request.namex would fail type checking
because the namex field is not defined. However, object.namex would pass
type checking even when the namex field is not defined for the resource kinds
that object refers to, because object is dynamically typed.
The has() macro in CEL may be used in CEL expressions to check if a field of a
dynamically typed variable is accessible before attempting to access the field's
value. For example:
Equality comparison for arrays with x-kubernetes-list-type of set or map ignores element
order. For example [1, 2] == [2, 1] if the arrays represent Kubernetes set values.
Concatenation on arrays with x-kubernetes-list-type use the semantics of the
list type:
set
X + Y performs a union where the array positions of all elements in
X are preserved and non-intersecting elements in Y are appended, retaining
their partial order.
map
X + Y performs a merge where the array positions of all keys in X
are preserved but the values are overwritten by values in Y when the key
sets of X and Y intersect. Elements in Y with non-intersecting keys are
appended, retaining their partial order.
Escaping
Only Kubernetes resource property names of the form
[a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible from CEL. Accessible property
names are escaped according to the following rules when accessed in the
expression:
When you escape any of CEL's RESERVED keywords you need to match the exact property name
use the underscore escaping
(for example, int in the word sprint would not be escaped and nor would it need to be).
Examples on escaping:
Examples escaped CEL identifiers
property name
rule with escaped property name
namespace
self.__namespace__ > 0
x-prop
self.x__dash__prop > 0
redact_d
self.redact__underscores__d > 0
string
self.startsWith('kube')
Resource constraints
CEL is non-Turing complete and offers a variety of production safety controls to
limit execution time. CEL's resource constraint features provide feedback to
developers about expression complexity and help protect the API server from
excessive resource consumption during evaluation. CEL's resource constraint
features are used to prevent CEL evaluation from consuming excessive API server
resources.
A key element of the resource constraint features is a cost unit that CEL
defines as a way of tracking CPU utilization. Cost units are independent of
system load and hardware. Cost units are also deterministic; for any given CEL
expression and input data, evaluation of the expression by the CEL interpreter
will always result in the same cost.
Many of CEL's core operations have fixed costs. The simplest operations, such as
comparisons (e.g. <) have a cost of 1. Some have a higher fixed cost, for
example list literal declarations have a fixed base cost of 40 cost units.
Calls to functions implemented in native code approximate cost based on the time
complexity of the operation. For example: operations that use regular
expressions, such as match and find, are estimated using an approximated
cost of length(regexString)*length(inputString). The approximated cost
reflects the worst case time complexity of Go's RE2 implementation.
Runtime cost budget
All CEL expressions evaluated by Kubernetes are constrained by a runtime cost
budget. The runtime cost budget is an estimate of actual CPU utilization
computed by incrementing a cost unit counter while interpreting a CEL
expression. If the CEL interpreter executes too many instructions, the runtime
cost budget will be exceeded, execution of the expressions will be halted, and
an error will result.
Some Kubernetes resources define an additional runtime cost budget that bounds
the execution of multiple expressions. If the sum total of the cost of
expressions exceed the budget, execution of the expressions will be halted, and
an error will result. For example the validation of a custom resource has a
per-validation runtime cost budget for all
Validation Rules
evaluated to validate the custom resource.
Estimated cost limits
For some Kubernetes resources, the API server may also check if worst case
estimated running time of CEL expressions would be prohibitively expensive to
execute. If so, the API server prevent the CEL expression from being written to
API resources by rejecting create or update operations containing the CEL
expression to the API resources. This feature offers a stronger assurance that
CEL expressions written to the API resource will be evaluated at runtime without
exceeding the runtime cost budget.
2.6 - Kubernetes Deprecation Policy
This document details the deprecation policy for various facets of the system.
Kubernetes is a large system with many components and many contributors. As
with any such software, the feature set naturally evolves over time, and
sometimes a feature may need to be removed. This could include an API, a flag,
or even an entire feature. To avoid breaking existing users, Kubernetes follows
a deprecation policy for aspects of the system that are slated to be removed.
Deprecating parts of the API
Since Kubernetes is an API-driven system, the API has evolved over time to
reflect the evolving understanding of the problem space. The Kubernetes API is
actually a set of APIs, called "API groups", and each API group is
independently versioned. API versions fall
into 3 main tracks, each of which has different policies for deprecation:
Example
Track
v1
GA (generally available, stable)
v1beta1
Beta (pre-release)
v1alpha1
Alpha (experimental)
A given release of Kubernetes can support any number of API groups and any
number of versions of each.
The following rules govern the deprecation of elements of the API. This
includes:
REST resources (aka API objects)
Fields of REST resources
Annotations on REST resources, including "beta" annotations but not
including "alpha" annotations.
Enumerated or constant values
Component config structures
These rules are enforced between official releases, not between
arbitrary commits to master or release branches.
Rule #1: API elements may only be removed by incrementing the version of the
API group.
Once an API element has been added to an API group at a particular version, it
can not be removed from that version or have its behavior significantly
changed, regardless of track.
Note:
For historical reasons, there are 2 "monolithic" API groups - "core" (no
group name) and "extensions". Resources will incrementally be moved from these
legacy API groups into more domain-specific API groups.
Rule #2: API objects must be able to round-trip between API versions in a given
release without information loss, with the exception of whole REST resources
that do not exist in some versions.
For example, an object can be written as v1 and then read back as v2 and
converted to v1, and the resulting v1 resource will be identical to the
original. The representation in v2 might be different from v1, but the system
knows how to convert between them in both directions. Additionally, any new
field added in v2 must be able to round-trip to v1 and back, which means v1
might have to add an equivalent field or represent it as an annotation.
Rule #3: An API version in a given track may not be deprecated in favor of a less stable API version.
GA API versions can replace beta and alpha API versions.
Beta API versions can replace earlier beta and alpha API versions, but may not replace GA API versions.
Alpha API versions can replace earlier alpha API versions, but may not replace GA or beta API versions.
Rule #4a: API lifetime is determined by the API stability level
GA API versions may be marked as deprecated, but must not be removed within a major version of Kubernetes
Beta API versions are deprecated no more than 9 months or 3 minor releases after introduction (whichever is longer),
and are no longer served 9 months or 3 minor releases after deprecation (whichever is longer)
Alpha API versions may be removed in any release without prior deprecation notice
This ensures beta API support covers the maximum supported version skew of 2 releases,
and that APIs don't stagnate on unstable beta versions, accumulating production usage that will be
disrupted when support for the beta API ends.
Note:
There are no current plans for a major version revision of Kubernetes that removes GA APIs.
Note:
Until #52185 is
resolved, no API versions that have been persisted to storage may be removed.
Serving REST endpoints for those versions may be disabled (subject to the
deprecation timelines in this document), but the API server must remain capable
of decoding/converting previously persisted data from storage.
Rule #4b: The "preferred" API version and the "storage version" for a given
group may not advance until after a release has been made that supports both the
new version and the previous version
Users must be able to upgrade to a new release of Kubernetes and then roll back
to a previous release, without converting anything to the new API version or
suffering breakages (unless they explicitly used features only available in the
newer version). This is particularly evident in the stored representation of
objects.
All of this is best illustrated by examples. Imagine a Kubernetes release,
version X, which introduces a new API group. A new Kubernetes release is made
every approximately 4 months (3 per year). The following table describes which
API versions are supported in a series of subsequent releases.
Release
API Versions
Preferred/Storage Version
Notes
X
v1alpha1
v1alpha1
X+1
v1alpha2
v1alpha2
v1alpha1 is removed. See release notes for required actions.
X+2
v1beta1
v1beta1
v1alpha2 is removed. See release notes for required actions.
X+3
v1beta2, v1beta1 (deprecated)
v1beta1
v1beta1 is deprecated. See release notes for required actions.
X+4
v1beta2, v1beta1 (deprecated)
v1beta2
X+5
v1, v1beta1 (deprecated), v1beta2 (deprecated)
v1beta2
v1beta2 is deprecated. See release notes for required actions.
X+6
v1, v1beta2 (deprecated)
v1
v1beta1 is removed. See release notes for required actions.
X+7
v1, v1beta2 (deprecated)
v1
X+8
v2alpha1, v1
v1
v1beta2 is removed. See release notes for required actions.
X+9
v2alpha2, v1
v1
v2alpha1 is removed. See release notes for required actions.
X+10
v2beta1, v1
v1
v2alpha2 is removed. See release notes for required actions.
X+11
v2beta2, v2beta1 (deprecated), v1
v1
v2beta1 is deprecated. See release notes for required actions.
v2beta1 is removed. See release notes for required actions.
X+15
v2, v1 (deprecated)
v2
v2beta2 is removed. See release notes for required actions.
REST resources (aka API objects)
Consider a hypothetical REST resource named Widget, which was present in API v1
in the above timeline, and which needs to be deprecated. We document and
announce the
deprecation in sync with release X+1. The Widget resource still exists in API
version v1 (deprecated) but not in v2alpha1. The Widget resource continues to
exist and function in releases up to and including X+8. Only in release X+9,
when API v1 has aged out, does the Widget resource cease to exist, and the
behavior get removed.
Starting in Kubernetes v1.19, making an API request to a deprecated REST API endpoint:
Returns a Warning header
(as defined in RFC7234, Section 5.5) in the API response.
Adds a "k8s.io/deprecated":"true" annotation to the
audit event recorded for the request.
Sets an apiserver_requested_deprecated_apis gauge metric to 1 in the kube-apiserver
process. The metric has labels for group, version, resource, subresource that can be joined
to the apiserver_request_total metric, and a removed_release label that indicates the
Kubernetes release in which the API will no longer be served. The following Prometheus query
returns information about requests made to deprecated APIs which will be removed in v1.22:
As with whole REST resources, an individual field which was present in API v1
must exist and function until API v1 is removed. Unlike whole resources, the
v2 APIs may choose a different representation for the field, as long as it can
be round-tripped. For example a v1 field named "magnitude" which was
deprecated might be named "deprecatedMagnitude" in API v2. When v1 is
eventually removed, the deprecated field can be removed from v2.
Enumerated or constant values
As with whole REST resources and fields thereof, a constant value which was
supported in API v1 must exist and function until API v1 is removed.
Component config structures
Component configs are versioned and managed similar to REST resources.
Future work
Over time, Kubernetes will introduce more fine-grained API versions, at which
point these rules will be adjusted as needed.
Deprecating a flag or CLI
The Kubernetes system is comprised of several different programs cooperating.
Sometimes, a Kubernetes release might remove flags or CLI commands
(collectively "CLI elements") in these programs. The individual programs
naturally sort into two main groups - user-facing and admin-facing programs,
which vary slightly in their deprecation policies. Unless a flag is explicitly
prefixed or documented as "alpha" or "beta", it is considered GA.
CLI elements are effectively part of the API to the system, but since they are
not versioned in the same way as the REST API, the rules for deprecation are as
follows:
Rule #5a: CLI elements of user-facing components (e.g. kubectl) must function
after their announced deprecation for no less than:
GA: 12 months or 2 releases (whichever is longer)
Beta: 3 months or 1 release (whichever is longer)
Alpha: 0 releases
Rule #5b: CLI elements of admin-facing components (e.g. kubelet) must function
after their announced deprecation for no less than:
GA: 6 months or 1 release (whichever is longer)
Beta: 3 months or 1 release (whichever is longer)
Alpha: 0 releases
Rule #5c: Command line interface (CLI) elements cannot be deprecated in favor of
less stable CLI elements
Similar to the Rule #3 for APIs, if an element of a command line interface is being replaced with an
alternative implementation, such as by renaming an existing element, or by switching to
use configuration sourced from a file
instead of a command line argument, that recommended alternative must be of
the same or higher stability level.
Rule #6: Deprecated CLI elements must emit warnings (optionally disable)
when used.
Deprecating a feature or behavior
Occasionally a Kubernetes release needs to deprecate some feature or behavior
of the system that is not controlled by the API or CLI. In this case, the
rules for deprecation are as follows:
Rule #7: Deprecated behaviors must function for no less than 1 year after their
announced deprecation.
If the feature or behavior is being replaced with an alternative implementation
that requires work to adopt the change, there should be an effort to simplify
the transition whenever possible. If an alternative implementation is under
Kubernetes organization control, the following rules apply:
Rule #8: The feature of behavior must not be deprecated in favor of an alternative
implementation that is less stable
For example, a generally available feature cannot be deprecated in favor of a Beta replacement.
The Kubernetes project does, however, encourage users to adopt and transitions to alternative
implementations even before they reach the same maturity level. This is particularly important
for exploring new use cases of a feature or getting an early feedback on the replacement.
Alternative implementations may sometimes be external tools or products,
for example a feature may move from the kubelet to container runtime
that is not under Kubernetes project control. In such cases, the rule cannot be
applied, but there must be an effort to ensure that there is a transition path
that does not compromise on components' maturity levels. In the example with
container runtimes, the effort may involve trying to ensure that popular container runtimes
have versions that offer the same level of stability while implementing that replacement behavior.
Deprecation rules for features and behaviors do not imply that all changes
to the system are governed by this policy.
These rules apply only to significant, user-visible behaviors which impact the
correctness of applications running on Kubernetes or that impact the
administration of Kubernetes clusters, and which are being removed entirely.
An exception to the above rule is feature gates. Feature gates are key=value
pairs that allow for users to enable/disable experimental features.
Feature gates are intended to cover the development life cycle of a feature - they
are not intended to be long-term APIs. As such, they are expected to be deprecated
and removed after a feature becomes GA or is dropped.
As a feature moves through the stages, the associated feature gate evolves.
The feature life cycle matched to its corresponding feature gate is:
Alpha: the feature gate is disabled by default and can be enabled by the user.
Beta: the feature gate is enabled by default and can be disabled by the user.
GA: the feature gate is deprecated (see "Deprecation") and becomes
non-operational.
GA, deprecation window complete: the feature gate is removed and calls to it are
no longer accepted.
Deprecation
Features can be removed at any point in the life cycle prior to GA. When features are
removed prior to GA, their associated feature gates are also deprecated.
When an invocation tries to disable a non-operational feature gate, the call fails in order
to avoid unsupported scenarios that might otherwise run silently.
In some cases, removing pre-GA features requires considerable time. Feature gates can remain
operational until their associated feature is fully removed, at which point the feature gate
itself can be deprecated.
When removing a feature gate for a GA feature also requires considerable time, calls to
feature gates may remain operational if the feature gate has no effect on the feature,
and if the feature gate causes no errors.
Features intended to be disabled by users should include a mechanism for disabling the
feature in the associated feature gate.
Versioning for feature gates is different from the previously discussed components,
therefore the rules for deprecation are as follows:
Rule #9: Feature gates must be deprecated when the corresponding feature they control
transitions a lifecycle stage as follows. Feature gates must function for no less than:
Beta feature to GA: 6 months or 2 releases (whichever is longer)
Beta feature to EOL: 3 months or 1 release (whichever is longer)
Alpha feature to EOL: 0 releases
Rule #10: Deprecated feature gates must respond with a warning when used. When a feature gate
is deprecated it must be documented in both in the release notes and the corresponding CLI help.
Both warnings and documentation must indicate whether a feature gate is non-operational.
Deprecating a metric
Each component of the Kubernetes control-plane exposes metrics (usually the
/metrics endpoint), which are typically ingested by cluster administrators.
Not all metrics are the same: some metrics are commonly used as SLIs or used
to determine SLOs, these tend to have greater import. Other metrics are more
experimental in nature or are used primarily in the Kubernetes development
process.
Accordingly, metrics fall under three stability classes (ALPHA, BETASTABLE);
this impacts removal of a metric during a Kubernetes release. These classes
are determined by the perceived importance of the metric. The rules for
deprecating and removing a metric are as follows:
Rule #11a: Metrics, for the corresponding stability class, must function for no less than:
STABLE: 4 releases or 12 months (whichever is longer)
BETA: 2 releases or 8 months (whichever is longer)
ALPHA: 0 releases
Rule #11b: Metrics, after their announced deprecation, must function for no less than:
STABLE: 3 releases or 9 months (whichever is longer)
BETA: 1 releases or 4 months (whichever is longer)
ALPHA: 0 releases
Deprecated metrics will have their description text prefixed with a deprecation notice
string '(Deprecated from x.y)' and a warning log will be emitted during metric
registration. Like their stable undeprecated counterparts, deprecated metrics will
be automatically registered to the metrics endpoint and therefore visible.
On a subsequent release (when the metric's deprecatedVersion is equal to
current_kubernetes_version - 3), a deprecated metric will become a hidden metric.
Unlike their deprecated counterparts, hidden metrics will no longer be
automatically registered to the metrics endpoint (hence hidden). However, they
can be explicitly enabled through a command line flag on the binary
(--show-hidden-metrics-for-version=). This provides cluster admins an
escape hatch to properly migrate off of a deprecated metric, if they were not
able to react to the earlier deprecation warnings. Hidden metrics should be
deleted after one release.
Exceptions
No policy can cover every possible situation. This policy is a living
document, and will evolve over time. In practice, there will be situations
that do not fit neatly into this policy, or for which this policy becomes a
serious impediment. Such situations should be discussed with SIGs and project
leaders to find the best solutions for those specific cases, always bearing in
mind that Kubernetes is committed to being a stable system that, as much as
possible, never breaks users. Exceptions will always be announced in all
relevant release notes.
2.7 - Deprecated API Migration Guide
As the Kubernetes API evolves, APIs are periodically reorganized or upgraded.
When APIs evolve, the old API is deprecated and eventually removed.
This page contains information you need to know when migrating from
deprecated API versions to newer and more stable API versions.
Removed APIs by release
v1.32
The v1.32 release stopped serving the following deprecated API versions:
Flow control resources
The flowcontrol.apiserver.k8s.io/v1beta3 API version of FlowSchema and PriorityLevelConfiguration is no longer served as of v1.32.
Migrate manifests and API clients to use the flowcontrol.apiserver.k8s.io/v1 API version, available since v1.29.
All existing persisted objects are accessible via the new API
Notable changes in flowcontrol.apiserver.k8s.io/v1:
The PriorityLevelConfiguration spec.limited.nominalConcurrencyShares field only defaults to 30 when unspecified, and an explicit value of 0 is not changed to 30.
v1.29
The v1.29 release stopped serving the following deprecated API versions:
Flow control resources
The flowcontrol.apiserver.k8s.io/v1beta2 API version of FlowSchema and PriorityLevelConfiguration is no longer served as of v1.29.
Migrate manifests and API clients to use the flowcontrol.apiserver.k8s.io/v1 API version, available since v1.29, or the flowcontrol.apiserver.k8s.io/v1beta3 API version, available since v1.26.
All existing persisted objects are accessible via the new API
Notable changes in flowcontrol.apiserver.k8s.io/v1:
The PriorityLevelConfiguration spec.limited.assuredConcurrencyShares field is renamed to spec.limited.nominalConcurrencyShares and only defaults to 30 when unspecified, and an explicit value of 0 is not changed to 30.
Notable changes in flowcontrol.apiserver.k8s.io/v1beta3:
The PriorityLevelConfiguration spec.limited.assuredConcurrencyShares field is renamed to spec.limited.nominalConcurrencyShares
v1.27
The v1.27 release stopped serving the following deprecated API versions:
CSIStorageCapacity
The storage.k8s.io/v1beta1 API version of CSIStorageCapacity is no longer served as of v1.27.
Migrate manifests and API clients to use the storage.k8s.io/v1 API version, available since v1.24.
All existing persisted objects are accessible via the new API
No notable changes
v1.26
The v1.26 release stopped serving the following deprecated API versions:
Flow control resources
The flowcontrol.apiserver.k8s.io/v1beta1 API version of FlowSchema and PriorityLevelConfiguration is no longer served as of v1.26.
Migrate manifests and API clients to use the flowcontrol.apiserver.k8s.io/v1beta2 API version.
All existing persisted objects are accessible via the new API
No notable changes
HorizontalPodAutoscaler
The autoscaling/v2beta2 API version of HorizontalPodAutoscaler is no longer served as of v1.26.
Migrate manifests and API clients to use the autoscaling/v2 API version, available since v1.23.
All existing persisted objects are accessible via the new API
The v1.25 release stopped serving the following deprecated API versions:
CronJob
The batch/v1beta1 API version of CronJob is no longer served as of v1.25.
Migrate manifests and API clients to use the batch/v1 API version, available since v1.21.
All existing persisted objects are accessible via the new API
No notable changes
EndpointSlice
The discovery.k8s.io/v1beta1 API version of EndpointSlice is no longer served as of v1.25.
Migrate manifests and API clients to use the discovery.k8s.io/v1 API version, available since v1.21.
All existing persisted objects are accessible via the new API
Notable changes in discovery.k8s.io/v1:
use per Endpoint nodeName field instead of deprecated topology["kubernetes.io/hostname"] field
use per Endpoint zone field instead of deprecated topology["topology.kubernetes.io/zone"] field
topology is replaced with the deprecatedTopology field which is not writable in v1
Event
The events.k8s.io/v1beta1 API version of Event is no longer served as of v1.25.
Migrate manifests and API clients to use the events.k8s.io/v1 API version, available since v1.19.
All existing persisted objects are accessible via the new API
Notable changes in events.k8s.io/v1:
type is limited to Normal and Warning
involvedObject is renamed to regarding
action, reason, reportingController, and reportingInstance are required
when creating new events.k8s.io/v1 Events
use eventTime instead of the deprecated firstTimestamp field (which is renamed
to deprecatedFirstTimestamp and not permitted in new events.k8s.io/v1 Events)
use series.lastObservedTime instead of the deprecated lastTimestamp field
(which is renamed to deprecatedLastTimestamp and not permitted in new events.k8s.io/v1 Events)
use series.count instead of the deprecated count field
(which is renamed to deprecatedCount and not permitted in new events.k8s.io/v1 Events)
use reportingController instead of the deprecated source.component field
(which is renamed to deprecatedSource.component and not permitted in new events.k8s.io/v1 Events)
use reportingInstance instead of the deprecated source.host field
(which is renamed to deprecatedSource.host and not permitted in new events.k8s.io/v1 Events)
HorizontalPodAutoscaler
The autoscaling/v2beta1 API version of HorizontalPodAutoscaler is no longer served as of v1.25.
Migrate manifests and API clients to use the autoscaling/v2 API version, available since v1.23.
All existing persisted objects are accessible via the new API
The policy/v1beta1 API version of PodDisruptionBudget is no longer served as of v1.25.
Migrate manifests and API clients to use the policy/v1 API version, available since v1.21.
All existing persisted objects are accessible via the new API
Notable changes in policy/v1:
an empty spec.selector ({}) written to a policy/v1 PodDisruptionBudget selects all
pods in the namespace (in policy/v1beta1 an empty spec.selector selected no pods).
An unset spec.selector selects no pods in either API version.
PodSecurityPolicy
PodSecurityPolicy in the policy/v1beta1 API version is no longer served as of v1.25,
and the PodSecurityPolicy admission controller will be removed.
RuntimeClass in the node.k8s.io/v1beta1 API version is no longer served as of v1.25.
Migrate manifests and API clients to use the node.k8s.io/v1 API version, available since v1.20.
All existing persisted objects are accessible via the new API
No notable changes
v1.22
The v1.22 release stopped serving the following deprecated API versions:
Webhook resources
The admissionregistration.k8s.io/v1beta1 API version of MutatingWebhookConfiguration
and ValidatingWebhookConfiguration is no longer served as of v1.22.
Migrate manifests and API clients to use the admissionregistration.k8s.io/v1 API version, available since v1.16.
All existing persisted objects are accessible via the new APIs
Notable changes:
webhooks[*].failurePolicy default changed from Ignore to Fail for v1
webhooks[*].matchPolicy default changed from Exact to Equivalent for v1
webhooks[*].timeoutSeconds default changed from 30s to 10s for v1
webhooks[*].sideEffects default value is removed, and the field made required,
and only None and NoneOnDryRun are permitted for v1
webhooks[*].admissionReviewVersions default value is removed and the field made
required for v1 (supported versions for AdmissionReview are v1 and v1beta1)
webhooks[*].name must be unique in the list for objects created via admissionregistration.k8s.io/v1
CustomResourceDefinition
The apiextensions.k8s.io/v1beta1 API version of CustomResourceDefinition is no longer served as of v1.22.
Migrate manifests and API clients to use the apiextensions.k8s.io/v1 API version, available since v1.16.
All existing persisted objects are accessible via the new API
Notable changes:
spec.scope is no longer defaulted to Namespaced and must be explicitly specified
spec.version is removed in v1; use spec.versions instead
spec.validation is removed in v1; use spec.versions[*].schema instead
spec.subresources is removed in v1; use spec.versions[*].subresources instead
spec.additionalPrinterColumns is removed in v1; use spec.versions[*].additionalPrinterColumns instead
spec.conversion.webhookClientConfig is moved to spec.conversion.webhook.clientConfig in v1
spec.conversion.conversionReviewVersions is moved to spec.conversion.webhook.conversionReviewVersions in v1
spec.versions[*].schema.openAPIV3Schema is now required when creating v1 CustomResourceDefinition objects,
and must be a structural schema
spec.preserveUnknownFields: true is disallowed when creating v1 CustomResourceDefinition objects;
it must be specified within schema definitions as x-kubernetes-preserve-unknown-fields: true
In additionalPrinterColumns items, the JSONPath field was renamed to jsonPath in v1
(fixes #66531)
APIService
The apiregistration.k8s.io/v1beta1 API version of APIService is no longer served as of v1.22.
Migrate manifests and API clients to use the apiregistration.k8s.io/v1 API version, available since v1.10.
All existing persisted objects are accessible via the new API
No notable changes
TokenReview
The authentication.k8s.io/v1beta1 API version of TokenReview is no longer served as of v1.22.
Migrate manifests and API clients to use the authentication.k8s.io/v1 API version, available since v1.6.
No notable changes
SubjectAccessReview resources
The authorization.k8s.io/v1beta1 API version of LocalSubjectAccessReview,
SelfSubjectAccessReview, SubjectAccessReview, and SelfSubjectRulesReview is no longer served as of v1.22.
Migrate manifests and API clients to use the authorization.k8s.io/v1 API version, available since v1.6.
Notable changes:
spec.group was renamed to spec.groups in v1 (fixes #32709)
CertificateSigningRequest
The certificates.k8s.io/v1beta1 API version of CertificateSigningRequest is no longer served as of v1.22.
Migrate manifests and API clients to use the certificates.k8s.io/v1 API version, available since v1.19.
All existing persisted objects are accessible via the new API
Notable changes in certificates.k8s.io/v1:
For API clients requesting certificates:
spec.signerName is now required
(see known Kubernetes signers),
and requests for kubernetes.io/legacy-unknown are not allowed to be created via the certificates.k8s.io/v1 API
spec.usages is now required, may not contain duplicate values, and must only contain known usages
For API clients approving or signing certificates:
status.conditions may not contain duplicate types
status.conditions[*].status is now required
status.certificate must be PEM-encoded, and contain only CERTIFICATE blocks
Lease
The coordination.k8s.io/v1beta1 API version of Lease is no longer served as of v1.22.
Migrate manifests and API clients to use the coordination.k8s.io/v1 API version, available since v1.14.
All existing persisted objects are accessible via the new API
No notable changes
Ingress
The extensions/v1beta1 and networking.k8s.io/v1beta1 API versions of Ingress is no longer served as of v1.22.
Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.19.
All existing persisted objects are accessible via the new API
Notable changes:
spec.backend is renamed to spec.defaultBackend
The backend serviceName field is renamed to service.name
Numeric backend servicePort fields are renamed to service.port.number
String backend servicePort fields are renamed to service.port.name
pathType is now required for each specified path. Options are Prefix,
Exact, and ImplementationSpecific. To match the undefined v1beta1 behavior, use ImplementationSpecific.
IngressClass
The networking.k8s.io/v1beta1 API version of IngressClass is no longer served as of v1.22.
Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.19.
All existing persisted objects are accessible via the new API
No notable changes
RBAC resources
The rbac.authorization.k8s.io/v1beta1 API version of ClusterRole, ClusterRoleBinding,
Role, and RoleBinding is no longer served as of v1.22.
Migrate manifests and API clients to use the rbac.authorization.k8s.io/v1 API version, available since v1.8.
All existing persisted objects are accessible via the new APIs
No notable changes
PriorityClass
The scheduling.k8s.io/v1beta1 API version of PriorityClass is no longer served as of v1.22.
Migrate manifests and API clients to use the scheduling.k8s.io/v1 API version, available since v1.14.
All existing persisted objects are accessible via the new API
No notable changes
Storage resources
The storage.k8s.io/v1beta1 API version of CSIDriver, CSINode, StorageClass, and VolumeAttachment is no longer served as of v1.22.
Migrate manifests and API clients to use the storage.k8s.io/v1 API version
CSIDriver is available in storage.k8s.io/v1 since v1.19.
CSINode is available in storage.k8s.io/v1 since v1.17
StorageClass is available in storage.k8s.io/v1 since v1.6
VolumeAttachment is available in storage.k8s.io/v1 v1.13
All existing persisted objects are accessible via the new APIs
No notable changes
v1.16
The v1.16 release stopped serving the following deprecated API versions:
NetworkPolicy
The extensions/v1beta1 API version of NetworkPolicy is no longer served as of v1.16.
Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.8.
All existing persisted objects are accessible via the new API
DaemonSet
The extensions/v1beta1 and apps/v1beta2 API versions of DaemonSet are no longer served as of v1.16.
Migrate manifests and API clients to use the apps/v1 API version, available since v1.9.
All existing persisted objects are accessible via the new API
Notable changes:
spec.templateGeneration is removed
spec.selector is now required and immutable after creation; use the existing
template labels as the selector for seamless upgrades
spec.updateStrategy.type now defaults to RollingUpdate
(the default in extensions/v1beta1 was OnDelete)
Deployment
The extensions/v1beta1, apps/v1beta1, and apps/v1beta2 API versions of Deployment are no longer served as of v1.16.
Migrate manifests and API clients to use the apps/v1 API version, available since v1.9.
All existing persisted objects are accessible via the new API
Notable changes:
spec.rollbackTo is removed
spec.selector is now required and immutable after creation; use the existing
template labels as the selector for seamless upgrades
spec.progressDeadlineSeconds now defaults to 600 seconds
(the default in extensions/v1beta1 was no deadline)
spec.revisionHistoryLimit now defaults to 10
(the default in apps/v1beta1 was 2, the default in extensions/v1beta1 was to retain all)
maxSurge and maxUnavailable now default to 25%
(the default in extensions/v1beta1 was 1)
StatefulSet
The apps/v1beta1 and apps/v1beta2 API versions of StatefulSet are no longer served as of v1.16.
Migrate manifests and API clients to use the apps/v1 API version, available since v1.9.
All existing persisted objects are accessible via the new API
Notable changes:
spec.selector is now required and immutable after creation;
use the existing template labels as the selector for seamless upgrades
spec.updateStrategy.type now defaults to RollingUpdate
(the default in apps/v1beta1 was OnDelete)
ReplicaSet
The extensions/v1beta1, apps/v1beta1, and apps/v1beta2 API versions of ReplicaSet are no longer served as of v1.16.
Migrate manifests and API clients to use the apps/v1 API version, available since v1.9.
All existing persisted objects are accessible via the new API
Notable changes:
spec.selector is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
PodSecurityPolicy
The extensions/v1beta1 API version of PodSecurityPolicy is no longer served as of v1.16.
Migrate manifests and API client to use the policy/v1beta1 API version, available since v1.10.
Note that the policy/v1beta1 API version of PodSecurityPolicy will be removed in v1.25.
What to do
Test with deprecated APIs disabled
You can test your clusters by starting an API server with specific API versions disabled
to simulate upcoming removals. Add the following flag to the API server startup arguments:
This conversion may use non-ideal default values. To learn more about a specific
resource, check the Kubernetes API reference.
Note:
The kubectl convert tool is not installed by default, although
in fact it once was part of kubectl itself. For more details, you can read the
deprecation and removal issue
for the built-in subcommand.
To learn how to set up kubectl convert on your computer, visit the page that is right for your
operating system:
Linux,
macOS, or
Windows.
2.8 - Kubernetes API health endpoints
The Kubernetes API server provides API endpoints to indicate the current status of the API server.
This page describes these API endpoints and explains how you can use them.
API endpoints for health
The Kubernetes API server provides 3 API endpoints (healthz, livez and readyz) to indicate the current status of the API server.
The healthz endpoint is deprecated (since Kubernetes v1.16), and you should use the more specific livez and readyz endpoints instead.
The livez endpoint can be used with the --livez-grace-periodflag to specify the startup duration.
For a graceful shutdown you can specify the --shutdown-delay-durationflag with the /readyz endpoint.
Machines that check the healthz/livez/readyz of the API server should rely on the HTTP status code.
A status code 200 indicates the API server is healthy/live/ready, depending on the called endpoint.
The more verbose options shown below are intended to be used by human operators to debug their cluster or understand the state of the API server.
The following examples will show how you can interact with the health API endpoints.
For all endpoints, you can use the verbose parameter to print out the checks and their status.
This can be useful for a human operator to debug the current status of the API server, it is not intended to be consumed by a machine:
curl -k https://localhost:6443/livez?verbose
or from a remote host with authentication:
kubectl get --raw='/readyz?verbose'
The output will look like this:
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check passed
The Kubernetes API server also supports to exclude specific checks.
The query parameters can also be combined like in this example:
[+]ping ok
[+]log ok
[+]etcd excluded: ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]shutdown ok
healthz check passed
Individual health checks
FEATURE STATE:Kubernetes v1.33 [alpha]
Each individual health check exposes an HTTP endpoint and can be checked individually.
The schema for the individual health checks is /livez/<healthcheck-name> or /readyz/<healthcheck-name>, where livez and readyz can be used to indicate if you want to check the liveness or the readiness of the API server, respectively.
The <healthcheck-name> path can be discovered using the verbose flag from above and take the path between [+] and ok.
These individual health checks should not be consumed by machines but can be helpful for a human operator to debug a system:
All Kubernetes clusters have two categories of users: service accounts managed
by Kubernetes, and normal users.
It is assumed that a cluster-independent service manages normal users in the following ways:
an administrator distributing private keys
a user store like Keystone or Google Accounts
a file with a list of usernames and passwords
In this regard, Kubernetes does not have objects which represent normal user accounts.
Normal users cannot be added to a cluster through an API call.
Even though a normal user cannot be added via an API call, any user that
presents a valid certificate signed by the cluster's certificate authority
(CA) is considered authenticated. In this configuration, Kubernetes determines
the username from the common name field in the 'subject' of the cert (e.g.,
"/CN=bob"). From there, the role based access control (RBAC) sub-system would
determine whether the user is authorized to perform a specific operation on a
resource.
In contrast, service accounts are users managed by the Kubernetes API. They are
bound to specific namespaces, and created automatically by the API server or
manually through API calls. Service accounts are tied to a set of credentials
stored as Secrets, which are mounted into pods allowing in-cluster processes
to talk to the Kubernetes API.
API requests are tied to either a normal user or a service account, or are treated
as anonymous requests. This means every process inside or outside the cluster, from
a human user typing kubectl on a workstation, to kubelets on nodes, to members
of the control plane, must authenticate when making requests to the API server,
or be treated as an anonymous user.
Authentication strategies
Kubernetes uses client certificates, bearer tokens, or an authenticating proxy to
authenticate API requests through authentication plugins. As HTTP requests are
made to the API server, plugins attempt to associate the following attributes
with the request:
Username: a string which identifies the end user. Common values might be kube-admin or jane@example.com.
UID: a string which identifies the end user and attempts to be more consistent and unique than username.
Groups: a set of strings, each of which indicates the user's membership in a named logical collection of users.
Common values might be system:masters or devops-team.
Extra fields: a map of strings to list of strings which holds additional information authorizers may find useful.
All values are opaque to the authentication system and only hold significance
when interpreted by an authorizer.
You can enable multiple authentication methods at once. You should usually use at least two methods:
service account tokens for service accounts
at least one other method for user authentication.
When multiple authenticator modules are enabled, the first module
to successfully authenticate the request short-circuits evaluation.
The API server does not guarantee the order authenticators run in.
The system:authenticated group is included in the list of groups for all authenticated users.
Integrations with other authentication protocols (LDAP, SAML, Kerberos, alternate x509 schemes, etc)
can be accomplished using an authenticating proxy or the
authentication webhook.
X509 client certificates
Client certificate authentication is enabled by passing the --client-ca-file=SOMEFILE
option to API server. The referenced file must contain one or more certificate authorities
to use to validate client certificates presented to the API server. If a client certificate
is presented and verified, the common name of the subject is used as the user name for the
request. As of Kubernetes 1.4, client certificates can also indicate a user's group memberships
using the certificate's organization fields. To include multiple group memberships for a user,
include multiple organization fields in the certificate.
For example, using the openssl command line tool to generate a certificate signing request:
The API server reads bearer tokens from a file when given the --token-auth-file=SOMEFILE option
on the command line. Currently, tokens last indefinitely, and the token list cannot be
changed without restarting the API server.
The token file is a csv file with a minimum of 3 columns: token, user name, user uid,
followed by optional group names.
Note:
If you have more than one group, the column must be double quoted e.g.
token,user,uid,"group1,group2,group3"
Putting a bearer token in a request
When using bearer token authentication from an http client, the API
server expects an Authorization header with a value of Bearer <token>. The bearer token must be a character sequence that can be
put in an HTTP header value using no more than the encoding and
quoting facilities of HTTP. For example: if the bearer token is
31ada4fd-adec-460c-809a-9e56ceb75269 then it would appear in an HTTP
header as shown below.
To allow for streamlined bootstrapping for new clusters, Kubernetes includes a
dynamically-managed Bearer token type called a Bootstrap Token. These tokens
are stored as Secrets in the kube-system namespace, where they can be
dynamically managed and created. Controller Manager contains a TokenCleaner
controller that deletes bootstrap tokens as they expire.
The tokens are of the form [a-z0-9]{6}.[a-z0-9]{16}. The first component is a
Token ID and the second component is the Token Secret. You specify the token
in an HTTP header as follows:
Authorization: Bearer 781292.db7bc3a58fc5f07e
You must enable the Bootstrap Token Authenticator with the
--enable-bootstrap-token-auth flag on the API Server. You must enable
the TokenCleaner controller via the --controllers flag on the Controller
Manager. This is done with something like --controllers=*,tokencleaner.
kubeadm will do this for you if you are using it to bootstrap a cluster.
The authenticator authenticates as system:bootstrap:<Token ID>. It is
included in the system:bootstrappers group. The naming and groups are
intentionally limited to discourage users from using these tokens past
bootstrapping. The user names and group can be used (and are used by kubeadm)
to craft the appropriate authorization policies to support bootstrapping a
cluster.
Please see Bootstrap Tokens for in depth
documentation on the Bootstrap Token authenticator and controllers along with
how to manage these tokens with kubeadm.
Service account tokens
A service account is an automatically enabled authenticator that uses signed
bearer tokens to verify requests. The plugin takes two optional flags:
--service-account-key-file File containing PEM-encoded x509 RSA or ECDSA
private or public keys, used to verify ServiceAccount tokens. The specified file
can contain multiple keys, and the flag can be specified multiple times with
different files. If unspecified, --tls-private-key-file is used.
--service-account-lookup If enabled, tokens which are deleted from the API will be revoked.
Service accounts are usually created automatically by the API server and
associated with pods running in the cluster through the ServiceAccountAdmission Controller. Bearer tokens are
mounted into pods at well-known locations, and allow in-cluster processes to
talk to the API server. Accounts may be explicitly associated with pods using the
serviceAccountName field of a PodSpec.
Note:
serviceAccountName is usually omitted because this is done automatically.
apiVersion:apps/v1# this apiVersion is relevant as of Kubernetes 1.9kind:Deploymentmetadata:name:nginx-deploymentnamespace:defaultspec:replicas:3template:metadata:# ...spec:serviceAccountName:bob-the-botcontainers:- name:nginximage:nginx:1.14.2
Service account bearer tokens are perfectly valid to use outside the cluster and
can be used to create identities for long standing jobs that wish to talk to the
Kubernetes API. To manually create a service account, use the kubectl create serviceaccount (NAME) command. This creates a service account in the current
namespace.
kubectl create serviceaccount jenkins
serviceaccount/jenkins created
Create an associated token:
kubectl create token jenkins
eyJhbGciOiJSUzI1NiIsImtp...
The created token is a signed JSON Web Token (JWT).
The signed JWT can be used as a bearer token to authenticate as the given service
account. See above for how the token is included
in a request. Normally these tokens are mounted into pods for in-cluster access to
the API server, but can be used from outside the cluster as well.
Service accounts authenticate with the username system:serviceaccount:(NAMESPACE):(SERVICEACCOUNT),
and are assigned to the groups system:serviceaccounts and system:serviceaccounts:(NAMESPACE).
Warning:
Because service account tokens can also be stored in Secret API objects, any user with
write access to Secrets can request a token, and any user with read access to those
Secrets can authenticate as the service account. Be cautious when granting permissions
to service accounts and read or write capabilities for Secrets.
OpenID Connect Tokens
OpenID Connect is a flavor of OAuth2 supported by
some OAuth2 providers, notably Microsoft Entra ID, Salesforce, and Google.
The protocol's main extension of OAuth2 is an additional field returned with
the access token called an ID Token.
This token is a JSON Web Token (JWT) with well known fields, such as a user's
email, signed by the server.
To identify the user, the authenticator uses the id_token (not the access_token)
from the OAuth2 token response
as a bearer token. See above for how the token
is included in a request.
sequenceDiagram
participant user as User
participant idp as Identity Provider
participant kube as kubectl
participant api as API Server
user ->> idp: 1. Log in to IdP
activate idp
idp -->> user: 2. Provide access_token, id_token, and refresh_token
deactivate idp
activate user
user ->> kube: 3. Call kubectl with --token being the id_token OR add tokens to .kube/config
deactivate user
activate kube
kube ->> api: 4. Authorization: Bearer...
deactivate kube
activate api
api ->> api: 5. Is JWT signature valid?
api ->> api: 6. Has the JWT expired? (iat+exp)
api ->> api: 7. User authorized?
api -->> kube: 8. Authorized: Perform action and return result
deactivate api
activate kube
kube --x user: 9. Return result
deactivate kube
Log in to your identity provider
Your identity provider will provide you with an access_token, id_token and a refresh_token
When using kubectl, use your id_token with the --token flag or add it directly to your kubeconfig
kubectl sends your id_token in a header called Authorization to the API server
The API server will make sure the JWT signature is valid
Check to make sure the id_token hasn't expired
Perform claim and/or user validation if CEL expressions are configured with AuthenticationConfiguration.
Make sure the user is authorized
Once authorized the API server returns a response to kubectl
kubectl provides feedback to the user
Since all of the data needed to validate who you are is in the id_token, Kubernetes doesn't need to
"phone home" to the identity provider. In a model where every request is stateless this provides a
very scalable solution for authentication. It does offer a few challenges:
Kubernetes has no "web interface" to trigger the authentication process. There is no browser or
interface to collect credentials which is why you need to authenticate to your identity provider first.
The id_token can't be revoked, it's like a certificate so it should be short-lived (only a few minutes)
so it can be very annoying to have to get a new token every few minutes.
To authenticate to the Kubernetes dashboard, you must use the kubectl proxy command or a reverse proxy
that injects the id_token.
Configuring the API Server
Using flags
To enable the plugin, configure the following flags on the API server:
Parameter
Description
Example
Required
--oidc-issuer-url
URL of the provider that allows the API server to discover public signing keys. Only URLs that use the https:// scheme are accepted. This is typically the provider's discovery URL, changed to have an empty path.
If the issuer's OIDC discovery URL is https://accounts.provider.example/.well-known/openid-configuration, the value should be https://accounts.provider.example
Yes
--oidc-client-id
A client id that all tokens must be issued for.
kubernetes
Yes
--oidc-username-claim
JWT claim to use as the user name. By default sub, which is expected to be a unique identifier of the end user. Admins can choose other claims, such as email or name, depending on their provider. However, claims other than email will be prefixed with the issuer URL to prevent naming clashes with other plugins.
sub
No
--oidc-username-prefix
Prefix prepended to username claims to prevent clashes with existing names (such as system: users). For example, the value oidc: will create usernames like oidc:jane.doe. If this flag isn't provided and --oidc-username-claim is a value other than email the prefix defaults to ( Issuer URL )# where ( Issuer URL ) is the value of --oidc-issuer-url. The value - can be used to disable all prefixing.
oidc:
No
--oidc-groups-claim
JWT claim to use as the user's group. If the claim is present it must be an array of strings.
groups
No
--oidc-groups-prefix
Prefix prepended to group claims to prevent clashes with existing names (such as system: groups). For example, the value oidc: will create group names like oidc:engineering and oidc:infra.
oidc:
No
--oidc-required-claim
A key=value pair that describes a required claim in the ID Token. If set, the claim is verified to be present in the ID Token with a matching value. Repeat this flag to specify multiple claims.
claim=value
No
--oidc-ca-file
The path to the certificate for the CA that signed your identity provider's web certificate. Defaults to the host's root CAs.
/etc/kubernetes/ssl/kc-ca.pem
No
--oidc-signing-algs
The signing algorithms accepted. Default is "RS256".
RS512
No
Authentication configuration from a file
FEATURE STATE:Kubernetes v1.30 [beta] (enabled by default: true)
JWT Authenticator is an authenticator to authenticate Kubernetes users using JWT compliant tokens.
The authenticator will attempt to parse a raw ID token, verify it's been signed by the configured issuer.
The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery.
The minimum valid JWT payload must contain the following claims:
{
"iss": "https://example.com", // must match the issuer.url
"aud": ["my-app"], // at least one of the entries in issuer.audiences must match the "aud" claim in presented JWTs.
"exp": 1234567890, // token expiration as Unix time (the number of seconds elapsed since January 1, 1970 UTC)
"<username-claim>": "user"// this is the username claim configured in the claimMappings.username.claim or claimMappings.username.expression
}
The configuration file approach allows you to configure multiple JWT authenticators, each with a unique
issuer.url and issuer.discoveryURL. The configuration file even allows you to specify CEL
expressions to map claims to user attributes, and to validate claims and user information.
The API server also automatically reloads the authenticators when the configuration file is modified.
You can use apiserver_authentication_config_controller_automatic_reload_last_timestamp_seconds metric
to monitor the last time the configuration was reloaded by the API server.
You must specify the path to the authentication configuration using the --authentication-config flag
on the API server. If you want to use command line flags instead of the configuration file, those will
continue to work as-is. To access the new capabilities like configuring multiple authenticators,
setting multiple audiences for an issuer, switch to using the configuration file.
For Kubernetes v1.33, the structured authentication configuration file format
is beta-level, and the mechanism for using that configuration is also beta. Provided you didn't specifically
disable the StructuredAuthenticationConfigurationfeature gate for your cluster,
you can turn on structured authentication by specifying the --authentication-config command line
argument to the kube-apiserver. An example of the structured authentication configuration file is shown below.
Note:
If you specify --authentication-config along with any of the --oidc-* command line arguments, this is
a misconfiguration. In this situation, the API server reports an error and then immediately exits.
If you want to switch to using structured authentication configuration, you have to remove the --oidc-*
command line arguments, and use the configuration file instead.
---## CAUTION: this is an example configuration.# Do not use this for your own cluster!#apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfiguration# list of authenticators to authenticate Kubernetes users using JWT compliant tokens.# the maximum number of allowed authenticators is 64.jwt:- issuer:# url must be unique across all authenticators.# url must not conflict with issuer configured in --service-account-issuer.url:https://example.com# Same as --oidc-issuer-url.# discoveryURL, if specified, overrides the URL used to fetch discovery# information instead of using "{url}/.well-known/openid-configuration".# The exact value specified is used, so "/.well-known/openid-configuration"# must be included in discoveryURL if needed.## The "issuer" field in the fetched discovery information must match the "issuer.url" field# in the AuthenticationConfiguration and will be used to validate the "iss" claim in the presented JWT.# This is for scenarios where the well-known and jwks endpoints are hosted at a different# location than the issuer (such as locally in the cluster).# discoveryURL must be different from url if specified and must be unique across all authenticators.discoveryURL:https://discovery.example.com/.well-known/openid-configuration# PEM encoded CA certificates used to validate the connection when fetching# discovery information. If not set, the system verifier will be used.# Same value as the content of the file referenced by the --oidc-ca-file flag.certificateAuthority:<PEM encoded CA certificates> # audiences is the set of acceptable audiences the JWT must be issued to.# At least one of the entries must match the "aud" claim in presented JWTs.audiences:- my-app# Same as --oidc-client-id.- my-other-app# this is required to be set to "MatchAny" when multiple audiences are specified.audienceMatchPolicy:MatchAny# rules applied to validate token claims to authenticate users.claimValidationRules:# Same as --oidc-required-claim key=value.- claim:hdrequiredValue:example.com# Instead of claim and requiredValue, you can use expression to validate the claim.# expression is a CEL expression that evaluates to a boolean.# all the expressions must evaluate to true for validation to succeed.- expression:'claims.hd == "example.com"'# Message customizes the error message seen in the API server logs when the validation fails.message:the hd claim must be set to example.com- expression:'claims.exp - claims.nbf <= 86400'message:total token lifetime must not exceed 24 hoursclaimMappings:# username represents an option for the username attribute.# This is the only required attribute.username:# Same as --oidc-username-claim. Mutually exclusive with username.expression.claim:"sub"# Same as --oidc-username-prefix. Mutually exclusive with username.expression.# if username.claim is set, username.prefix is required.# Explicitly set it to "" if no prefix is desired.prefix:""# Mutually exclusive with username.claim and username.prefix.# expression is a CEL expression that evaluates to a string.## 1. If username.expression uses 'claims.email', then 'claims.email_verified' must be used in# username.expression or extra[*].valueExpression or claimValidationRules[*].expression.# An example claim validation rule expression that matches the validation automatically# applied when username.claim is set to 'email' is 'claims.?email_verified.orValue(true) == true'.# By explicitly comparing the value to true, we let type-checking see the result will be a boolean, and# to make sure a non-boolean email_verified claim will be caught at runtime.# 2. If the username asserted based on username.expression is the empty string, the authentication# request will fail.expression:'claims.username + ":external-user"'# groups represents an option for the groups attribute.groups:# Same as --oidc-groups-claim. Mutually exclusive with groups.expression.claim:"sub"# Same as --oidc-groups-prefix. Mutually exclusive with groups.expression.# if groups.claim is set, groups.prefix is required.# Explicitly set it to "" if no prefix is desired.prefix:""# Mutually exclusive with groups.claim and groups.prefix.# expression is a CEL expression that evaluates to a string or a list of strings.expression:'claims.roles.split(",")'# uid represents an option for the uid attribute.uid:# Mutually exclusive with uid.expression.claim:'sub'# Mutually exclusive with uid.claim# expression is a CEL expression that evaluates to a string.expression:'claims.sub'# extra attributes to be added to the UserInfo object. Keys must be domain-prefix path and must be unique.extra:# key is a string to use as the extra attribute key.# key must be a domain-prefix path (e.g. example.org/foo). All characters before the first "/" must be a valid# subdomain as defined by RFC 1123. All characters trailing the first "/" must# be valid HTTP Path characters as defined by RFC 3986.# k8s.io, kubernetes.io and their subdomains are reserved for Kubernetes use and cannot be used.# key must be lowercase and unique across all extra attributes.- key:'example.com/tenant'# valueExpression is a CEL expression that evaluates to a string or a list of strings.valueExpression:'claims.tenant'# validation rules applied to the final user object.userValidationRules:# expression is a CEL expression that evaluates to a boolean.# all the expressions must evaluate to true for the user to be valid.- expression:"!user.username.startsWith('system:')"# Message customizes the error message seen in the API server logs when the validation fails.message: 'username cannot used reserved system:prefix'- expression:"user.groups.all(group, !group.startsWith('system:'))"message: 'groups cannot used reserved system:prefix'
Claim validation rule expression
jwt.claimValidationRules[i].expression represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of the token payload, organized into claims CEL variable.
claims is a map of claim names (as strings) to claim values (of any type).
User validation rule expression
jwt.userValidationRules[i].expression represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of userInfo, organized into user CEL variable.
Refer to the UserInfo
API documentation for the schema of user.
Claim mapping expression
jwt.claimMappings.username.expression, jwt.claimMappings.groups.expression, jwt.claimMappings.uid.expressionjwt.claimMappings.extra[i].valueExpression represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of the token payload, organized into claims CEL variable.
claims is a map of claim names (as strings) to claim values (of any type).
apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfigurationjwt:- issuer:url:https://example.comaudiences:- my-appclaimMappings:username:expression:'claims.username + ":external-user"'groups:expression:'claims.roles.split(",")'uid:expression:'claims.sub'extra:- key:'example.com/tenant'valueExpression:'claims.tenant'userValidationRules:- expression:"!user.username.startsWith('system:')"# the expression will evaluate to true, so validation will succeed.message: 'username cannot used reserved system:prefix'
apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfigurationjwt:- issuer:url:https://example.comaudiences:- my-appclaimValidationRules:- expression:'claims.hd == "example.com"'# the token below does not have this claim, so validation will fail.message:the hd claim must be set to example.comclaimMappings:username:expression:'claims.username + ":external-user"'groups:expression:'claims.roles.split(",")'uid:expression:'claims.sub'extra:- key:'example.com/tenant'valueExpression:'claims.tenant'userValidationRules:- expression:"!user.username.startsWith('system:')"# the expression will evaluate to true, so validation will succeed.message: 'username cannot used reserved system:prefix'
The token with the above AuthenticationConfiguration will fail to authenticate because the
hd claim is not set to example.com. The API server will return 401 Unauthorized error.
apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfigurationjwt:- issuer:url:https://example.comaudiences:- my-appclaimValidationRules:- expression:'claims.hd == "example.com"'message:the hd claim must be set to example.comclaimMappings:username:expression:'"system:" + claims.username'# this will prefix the username with "system:" and will fail user validation.groups:expression:'claims.roles.split(",")'uid:expression:'claims.sub'extra:- key:'example.com/tenant'valueExpression:'claims.tenant'userValidationRules:- expression:"!user.username.startsWith('system:')"# the username will be system:foo and expression will evaluate to false, so validation will fail.message: 'username cannot used reserved system:prefix'
which will fail user validation because the username starts with system:.
The API server will return 401 Unauthorized error.
Limitations
Distributed claims do not work via CEL expressions.
Egress selector configuration is not supported for calls to issuer.url and issuer.discoveryURL.
Kubernetes does not provide an OpenID Connect Identity Provider.
You can use an existing public OpenID Connect Identity Provider (such as Google, or
others).
Or, you can run your own Identity Provider, such as dex,
Keycloak,
CloudFoundry UAA, or
Tremolo Security's OpenUnison.
For an identity provider to work with Kubernetes it must:
The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery.
If you're using the authentication configuration file, the identity provider doesn't need to publicly expose the discovery endpoint.
You can host the discovery endpoint at a different location than the issuer (such as locally in the cluster) and specify the
issuer.discoveryURL in the configuration file.
Run in TLS with non-obsolete ciphers
Have a CA signed certificate (even if the CA is not a commercial CA or is self signed)
A note about requirement #3 above, requiring a CA signed certificate. If you deploy your own
identity provider (as opposed to one of the cloud providers like Google or Microsoft) you MUST
have your identity provider's web server certificate signed by a certificate with the CA flag
set to TRUE, even if it is self signed. This is due to GoLang's TLS client implementation
being very strict to the standards around certificate validation. If you don't have a CA handy,
you can use the gencert script
from the Dex team to create a simple CA and a signed certificate and key pair. Or you can use
this similar script
that generates SHA256 certs with a longer life and larger key size.
The first option is to use the kubectl oidc authenticator, which sets the id_token as a bearer token
for all requests and refreshes the token once it expires. After you've logged into your provider, use
kubectl to add your id_token, refresh_token, client_id, and client_secret to configure the plugin.
Providers that don't return an id_token as part of their refresh token response aren't supported
by this plugin and should use "Option 2" below.
kubectl config set-credentials USER_NAME \
--auth-provider=oidc \
--auth-provider-arg=idp-issuer-url=( issuer url )\
--auth-provider-arg=client-id=( your client id )\
--auth-provider-arg=client-secret=( your client secret )\
--auth-provider-arg=refresh-token=( your refresh token )\
--auth-provider-arg=idp-certificate-authority=( path to your ca certificate )\
--auth-provider-arg=id-token=( your id_token )
As an example, running the below command after authenticating to your identity provider:
Once your id_token expires, kubectl will attempt to refresh your id_token using your refresh_token
and client_secret storing the new values for the refresh_token and id_token in your .kube/config.
Option 2 - Use the --token Option
The kubectl command lets you pass in a token using the --token option.
Copy and paste the id_token into this option:
kubectl --token=eyJhbGciOiJSUzI1NiJ9.eyJpc3MiOiJodHRwczovL21sYi50cmVtb2xvLmxhbjo4MDQzL2F1dGgvaWRwL29pZGMiLCJhdWQiOiJrdWJlcm5ldGVzIiwiZXhwIjoxNDc0NTk2NjY5LCJqdGkiOiI2RDUzNXoxUEpFNjJOR3QxaWVyYm9RIiwiaWF0IjoxNDc0NTk2MzY5LCJuYmYiOjE0NzQ1OTYyNDksInN1YiI6Im13aW5kdSIsInVzZXJfcm9sZSI6WyJ1c2VycyIsIm5ldy1uYW1lc3BhY2Utdmlld2VyIl0sImVtYWlsIjoibXdpbmR1QG5vbW9yZWplZGkuY29tIn0.f2As579n9VNoaKzoF-dOQGmXkFKf1FMyNV0-va_B63jn-_n9LGSCca_6IVMP8pO-Zb4KvRqGyTP0r3HkHxYy5c81AnIh8ijarruczl-TK_yF5akjSTHFZD-0gRzlevBDiH8Q79NAr-ky0P4iIXS8lY9Vnjch5MF74Zx0c3alKJHJUnnpjIACByfF2SCaYzbWFMUNat-K1PaUk5-ujMBG7yYnr95xD-63n8CO8teGUAAEMx6zRjzfhnhbzX-ajwZLGwGUBT4WqjMs70-6a7_8gZmLZb2az1cZynkFRj2BaCkVT3A2RrjeEwZEtGXlMqKJ1_I2ulrOVsYx01_yD35-rw get nodes
Webhook Token Authentication
Webhook authentication is a hook for verifying bearer tokens.
--authentication-token-webhook-config-file a configuration file describing how to access the remote webhook service.
--authentication-token-webhook-cache-ttl how long to cache authentication decisions. Defaults to two minutes.
--authentication-token-webhook-version determines whether to use authentication.k8s.io/v1beta1 or authentication.k8s.io/v1TokenReview objects to send/receive information from the webhook. Defaults to v1beta1.
The configuration file uses the kubeconfig
file format. Within the file, clusters refers to the remote service and
users refers to the API server webhook. An example would be:
# Kubernetes API versionapiVersion:v1# kind of the API objectkind:Config# clusters refers to the remote service.clusters:- name:name-of-remote-authn-servicecluster:certificate-authority:/path/to/ca.pem # CA for verifying the remote service.server:https://authn.example.com/authenticate# URL of remote service to query. 'https' recommended for production.# users refers to the API server's webhook configuration.users:- name:name-of-api-serveruser:client-certificate:/path/to/cert.pem# cert for the webhook plugin to useclient-key:/path/to/key.pem # key matching the cert# kubeconfig files require a context. Provide one for the API server.current-context:webhookcontexts:- context:cluster:name-of-remote-authn-serviceuser:name-of-api-servername:webhook
When a client attempts to authenticate with the API server using a bearer token as discussed
above, the authentication webhook POSTs a JSON-serialized
TokenReview object containing the token to the remote service.
Note that webhook API objects are subject to the same versioning compatibility rules
as other Kubernetes API objects. Implementers should check the apiVersion field of the request to ensure correct deserialization,
and must respond with a TokenReview object of the same version as the request.
The Kubernetes API server defaults to sending authentication.k8s.io/v1beta1 token reviews for backwards compatibility.
To opt into receiving authentication.k8s.io/v1 token reviews, the API server must be started with --authentication-token-webhook-version=v1.
{"apiVersion": "authentication.k8s.io/v1","kind": "TokenReview","spec": {# Opaque bearer token sent to the API server"token": "014fbff9a07c...",# Optional list of the audience identifiers for the server the token was presented to.# Audience-aware token authenticators (for example, OIDC token authenticators)# should verify the token was intended for at least one of the audiences in this list,# and return the intersection of this list and the valid audiences for the token in the response status.# This ensures the token is valid to authenticate to the server it was presented to.# If no audiences are provided, the token should be validated to authenticate to the Kubernetes API server."audiences": ["https://myserver.example.com","https://myserver.internal.example.com"]}}
{"apiVersion": "authentication.k8s.io/v1beta1","kind": "TokenReview","spec": {# Opaque bearer token sent to the API server"token": "014fbff9a07c...",# Optional list of the audience identifiers for the server the token was presented to.# Audience-aware token authenticators (for example, OIDC token authenticators)# should verify the token was intended for at least one of the audiences in this list,# and return the intersection of this list and the valid audiences for the token in the response status.# This ensures the token is valid to authenticate to the server it was presented to.# If no audiences are provided, the token should be validated to authenticate to the Kubernetes API server."audiences": ["https://myserver.example.com","https://myserver.internal.example.com"]}}
The remote service is expected to fill the status field of the request to indicate the success of the login.
The response body's spec field is ignored and may be omitted.
The remote service must return a response using the same TokenReview API version that it received.
A successful validation of the bearer token would return:
{"apiVersion": "authentication.k8s.io/v1","kind": "TokenReview","status": {"authenticated": true,"user": {# Required"username": "janedoe@example.com",# Optional"uid": "42",# Optional group memberships"groups": ["developers","qa"],# Optional additional information provided by the authenticator.# This should not contain confidential data, as it can be recorded in logs# or API objects, and is made available to admission webhooks."extra": {"extrafield1": ["extravalue1","extravalue2"]}},# Optional list audience-aware token authenticators can return,# containing the audiences from the `spec.audiences` list for which the provided token was valid.# If this is omitted, the token is considered to be valid to authenticate to the Kubernetes API server."audiences": ["https://myserver.example.com"]}}
{"apiVersion": "authentication.k8s.io/v1beta1","kind": "TokenReview","status": {"authenticated": true,"user": {# Required"username": "janedoe@example.com",# Optional"uid": "42",# Optional group memberships"groups": ["developers","qa"],# Optional additional information provided by the authenticator.# This should not contain confidential data, as it can be recorded in logs# or API objects, and is made available to admission webhooks."extra": {"extrafield1": ["extravalue1","extravalue2"]}},# Optional list audience-aware token authenticators can return,# containing the audiences from the `spec.audiences` list for which the provided token was valid.# If this is omitted, the token is considered to be valid to authenticate to the Kubernetes API server."audiences": ["https://myserver.example.com"]}}
{"apiVersion": "authentication.k8s.io/v1","kind": "TokenReview","status": {"authenticated": false,# Optionally include details about why authentication failed.# If no error is provided, the API will return a generic Unauthorized message.# The error field is ignored when authenticated=true."error": "Credentials are expired"}}
{"apiVersion": "authentication.k8s.io/v1beta1","kind": "TokenReview","status": {"authenticated": false,# Optionally include details about why authentication failed.# If no error is provided, the API will return a generic Unauthorized message.# The error field is ignored when authenticated=true."error": "Credentials are expired"}}
Authenticating Proxy
The API server can be configured to identify users from request header values, such as X-Remote-User.
It is designed for use in combination with an authenticating proxy, which sets the request header value.
--requestheader-username-headers Required, case-insensitive. Header names to check, in order,
for the user identity. The first header containing a value is used as the username.
--requestheader-group-headers 1.6+. Optional, case-insensitive. "X-Remote-Group" is suggested.
Header names to check, in order, for the user's groups. All values in all specified headers are used as group names.
--requestheader-extra-headers-prefix 1.6+. Optional, case-insensitive. "X-Remote-Extra-" is suggested.
Header prefixes to look for to determine extra information about the user (typically used by the configured authorization plugin).
Any headers beginning with any of the specified prefixes have the prefix removed.
The remainder of the header name is lowercased and percent-decoded
and becomes the extra key, and the header value is the extra value.
Note:
Prior to 1.11.3 (and 1.10.7, 1.9.11), the extra key could only contain characters which
were legal in HTTP header labels.
In order to prevent header spoofing, the authenticating proxy is required to present a valid client
certificate to the API server for validation against the specified CA before the request headers are
checked. WARNING: do not reuse a CA that is used in a different context unless you understand
the risks and the mechanisms to protect the CA's usage.
--requestheader-client-ca-file Required. PEM-encoded certificate bundle. A valid client certificate
must be presented and validated against the certificate authorities in the specified file before the
request headers are checked for user names.
--requestheader-allowed-names Optional. List of Common Name values (CNs). If set, a valid client
certificate with a CN in the specified list must be presented before the request headers are checked
for user names. If empty, any CN is allowed.
Anonymous requests
When enabled, requests that are not rejected by other configured authentication methods are
treated as anonymous requests, and given a username of system:anonymous and a group of
system:unauthenticated.
For example, on a server with token authentication configured, and anonymous access enabled,
a request providing an invalid bearer token would receive a 401 Unauthorized error.
A request providing no bearer token would be treated as an anonymous request.
In 1.5.1-1.5.x, anonymous access is disabled by default, and can be enabled by
passing the --anonymous-auth=true option to the API server.
In 1.6+, anonymous access is enabled by default if an authorization mode other than AlwaysAllow
is used, and can be disabled by passing the --anonymous-auth=false option to the API server.
Starting in 1.6, the ABAC and RBAC authorizers require explicit authorization of the
system:anonymous user or the system:unauthenticated group, so legacy policy rules
that grant access to the * user or * group do not include anonymous users.
Anonymous Authenticator Configuration
FEATURE STATE:Kubernetes v1.32 [beta] (enabled by default: true)
The AuthenticationConfiguration can be used to configure the anonymous
authenticator. If you set the anonymous field in the AuthenticationConfiguration
file then you cannot set the --anonymous-auth flag.
The main advantage of configuring anonymous authenticator using the authentication
configuration file is that in addition to enabling and disabling anonymous authentication
you can also configure which endpoints support anonymous authentication.
A sample authentication configuration file is below:
---## CAUTION: this is an example configuration.# Do not use this for your own cluster!#apiVersion:apiserver.config.k8s.io/v1beta1kind:AuthenticationConfigurationanonymous:enabled:trueconditions:- path:/livez- path:/readyz- path:/healthz
In the configuration above only the /livez, /readyz and /healthz endpoints
are reachable by anonymous requests. Any other endpoints will not be reachable
even if it is allowed by RBAC configuration.
User impersonation
A user can act as another user through impersonation headers. These let requests
manually override the user info a request authenticates as. For example, an admin
could use this feature to debug an authorization policy by temporarily
impersonating another user and seeing if a request was denied.
Impersonation requests first authenticate as the requesting user, then switch
to the impersonated user info.
A user makes an API call with their credentials and impersonation headers.
API server authenticates the user.
API server ensures the authenticated users have impersonation privileges.
Request user info is replaced with impersonation values.
Request is evaluated, authorization acts on impersonated user info.
The following HTTP headers can be used to performing an impersonation request:
Impersonate-User: The username to act as.
Impersonate-Group: A group name to act as. Can be provided multiple times to set multiple groups.
Optional. Requires "Impersonate-User".
Impersonate-Extra-( extra name ): A dynamic header used to associate extra fields with the user.
Optional. Requires "Impersonate-User". In order to be preserved consistently, ( extra name )
must be lower-case, and any characters which aren't legal in HTTP header labels
MUST be utf8 and percent-encoded.
Impersonate-Uid: A unique identifier that represents the user being impersonated. Optional.
Requires "Impersonate-User". Kubernetes does not impose any format requirements on this string.
Note:
Prior to 1.11.3 (and 1.10.7, 1.9.11), ( extra name ) could only contain characters which
were legal in HTTP header labels.
Note:
Impersonate-Uid is only available in versions 1.22.0 and higher.
An example of the impersonation headers used when impersonating a user with groups:
To impersonate a user, group, user identifier (UID) or extra fields, the impersonating user must
have the ability to perform the "impersonate" verb on the kind of attribute
being impersonated ("user", "group", "uid", etc.). For clusters that enable the RBAC
authorization plugin, the following ClusterRole encompasses the rules needed to
set user and group impersonation headers:
For impersonation, extra fields and impersonated UIDs are both under the "authentication.k8s.io" apiGroup.
Extra fields are evaluated as sub-resources of the resource "userextras". To
allow a user to use impersonation headers for the extra field "scopes" and
for UIDs, a user should be granted the following role:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:scopes-and-uid-impersonatorrules:# Can set "Impersonate-Extra-scopes" header and the "Impersonate-Uid" header.- apiGroups:["authentication.k8s.io"]resources:["userextras/scopes","uids"]verbs:["impersonate"]
The values of impersonation headers can also be restricted by limiting the set
of resourceNames a resource can take.
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:limited-impersonatorrules:# Can impersonate the user "jane.doe@example.com"- apiGroups:[""]resources:["users"]verbs:["impersonate"]resourceNames:["jane.doe@example.com"]# Can impersonate the groups "developers" and "admins"- apiGroups:[""]resources:["groups"]verbs:["impersonate"]resourceNames:["developers","admins"]# Can impersonate the extras field "scopes" with the values "view" and "development"- apiGroups:["authentication.k8s.io"]resources:["userextras/scopes"]verbs:["impersonate"]resourceNames:["view","development"]# Can impersonate the uid "06f6ce97-e2c5-4ab8-7ba5-7654dd08d52b"- apiGroups:["authentication.k8s.io"]resources:["uids"]verbs:["impersonate"]resourceNames:["06f6ce97-e2c5-4ab8-7ba5-7654dd08d52b"]
Note:
Impersonating a user or group allows you to perform any action as if you were that user or group;
for that reason, impersonation is not namespace scoped.
If you want to allow impersonation using Kubernetes RBAC,
this requires using a ClusterRole and a ClusterRoleBinding,
not a Role and RoleBinding.
client-go credential plugins
FEATURE STATE:Kubernetes v1.22 [stable]
k8s.io/client-go and tools using it such as kubectl and kubelet are able to execute an
external command to receive user credentials.
This feature is intended for client side integrations with authentication protocols not natively
supported by k8s.io/client-go (LDAP, Kerberos, OAuth2, SAML, etc.). The plugin implements the
protocol specific logic, then returns opaque credentials to use. Almost all credential plugin
use cases require a server side component with support for the webhook token authenticator
to interpret the credential format produced by the client plugin.
Note:
Earlier versions of kubectl included built-in support for authenticating to AKS and GKE, but this is no longer present.
Example use case
In a hypothetical use case, an organization would run an external service that exchanges LDAP credentials
for user specific, signed tokens. The service would also be capable of responding to webhook token
authenticator requests to validate the tokens. Users would be required
to install a credential plugin on their workstation.
To authenticate against the API:
The user issues a kubectl command.
Credential plugin prompts the user for LDAP credentials, exchanges credentials with external service for a token.
Credential plugin returns token to client-go, which uses it as a bearer token against the API server.
apiVersion:v1kind:Configusers:- name:my-useruser:exec:# Command to execute. Required.command:"example-client-go-exec-plugin"# API version to use when decoding the ExecCredentials resource. Required.## The API version returned by the plugin MUST match the version listed here.## To integrate with tools that support multiple versions (such as client.authentication.k8s.io/v1beta1),# set an environment variable, pass an argument to the tool that indicates which version the exec plugin expects,# or read the version from the ExecCredential object in the KUBERNETES_EXEC_INFO environment variable.apiVersion:"client.authentication.k8s.io/v1"# Environment variables to set when executing the plugin. Optional.env:- name:"FOO"value:"bar"# Arguments to pass when executing the plugin. Optional.args:- "arg1"- "arg2"# Text shown to the user when the executable doesn't seem to be present. Optional.installHint:| example-client-go-exec-plugin is required to authenticate
to the current cluster. It can be installed:
On macOS: brew install example-client-go-exec-plugin
On Ubuntu: apt-get install example-client-go-exec-plugin
On Fedora: dnf install example-client-go-exec-plugin
...# Whether or not to provide cluster information, which could potentially contain# very large CA data, to this exec plugin as a part of the KUBERNETES_EXEC_INFO# environment variable.provideClusterInfo:true# The contract between the exec plugin and the standard input I/O stream. If the# contract cannot be satisfied, this plugin will not be run and an error will be# returned. Valid values are "Never" (this exec plugin never uses standard input),# "IfAvailable" (this exec plugin wants to use standard input if it is available),# or "Always" (this exec plugin requires standard input to function). Required.interactiveMode:Neverclusters:- name:my-clustercluster:server:"https://172.17.4.100:6443"certificate-authority:"/etc/kubernetes/ca.pem"extensions:- name:client.authentication.k8s.io/exec# reserved extension name for per cluster exec configextension:arbitrary:configthis:can be provided via the KUBERNETES_EXEC_INFO environment variable upon setting provideClusterInfoyou:["can","put","anything","here"]contexts:- name:my-clustercontext:cluster:my-clusteruser:my-usercurrent-context:my-cluster
apiVersion:v1kind:Configusers:- name:my-useruser:exec:# Command to execute. Required.command:"example-client-go-exec-plugin"# API version to use when decoding the ExecCredentials resource. Required.## The API version returned by the plugin MUST match the version listed here.## To integrate with tools that support multiple versions (such as client.authentication.k8s.io/v1),# set an environment variable, pass an argument to the tool that indicates which version the exec plugin expects,# or read the version from the ExecCredential object in the KUBERNETES_EXEC_INFO environment variable.apiVersion:"client.authentication.k8s.io/v1beta1"# Environment variables to set when executing the plugin. Optional.env:- name:"FOO"value:"bar"# Arguments to pass when executing the plugin. Optional.args:- "arg1"- "arg2"# Text shown to the user when the executable doesn't seem to be present. Optional.installHint:| example-client-go-exec-plugin is required to authenticate
to the current cluster. It can be installed:
On macOS: brew install example-client-go-exec-plugin
On Ubuntu: apt-get install example-client-go-exec-plugin
On Fedora: dnf install example-client-go-exec-plugin
...# Whether or not to provide cluster information, which could potentially contain# very large CA data, to this exec plugin as a part of the KUBERNETES_EXEC_INFO# environment variable.provideClusterInfo:true# The contract between the exec plugin and the standard input I/O stream. If the# contract cannot be satisfied, this plugin will not be run and an error will be# returned. Valid values are "Never" (this exec plugin never uses standard input),# "IfAvailable" (this exec plugin wants to use standard input if it is available),# or "Always" (this exec plugin requires standard input to function). Optional.# Defaults to "IfAvailable".interactiveMode:Neverclusters:- name:my-clustercluster:server:"https://172.17.4.100:6443"certificate-authority:"/etc/kubernetes/ca.pem"extensions:- name:client.authentication.k8s.io/exec# reserved extension name for per cluster exec configextension:arbitrary:configthis:can be provided via the KUBERNETES_EXEC_INFO environment variable upon setting provideClusterInfoyou:["can","put","anything","here"]contexts:- name:my-clustercontext:cluster:my-clusteruser:my-usercurrent-context:my-cluster
Relative command paths are interpreted as relative to the directory of the config file. If
KUBECONFIG is set to /home/jane/kubeconfig and the exec command is ./bin/example-client-go-exec-plugin,
the binary /home/jane/bin/example-client-go-exec-plugin is executed.
- name:my-useruser:exec:# Path relative to the directory of the kubeconfigcommand:"./bin/example-client-go-exec-plugin"apiVersion:"client.authentication.k8s.io/v1"interactiveMode:Never
Input and output formats
The executed command prints an ExecCredential object to stdout. k8s.io/client-go
authenticates against the Kubernetes API using the returned credentials in the status.
The executed command is passed an ExecCredential object as input via the KUBERNETES_EXEC_INFO
environment variable. This input contains helpful information like the expected API version
of the returned ExecCredential object and whether or not the plugin can use stdin to interact
with the user.
When run from an interactive session (i.e., a terminal), stdin can be exposed directly
to the plugin. Plugins should use the spec.interactive field of the input
ExecCredential object from the KUBERNETES_EXEC_INFO environment variable in order to
determine if stdin has been provided. A plugin's stdin requirements (i.e., whether
stdin is optional, strictly required, or never used in order for the plugin
to run successfully) is declared via the user.exec.interactiveMode field in the
kubeconfig
(see table below for valid values). The user.exec.interactiveMode field is optional
in client.authentication.k8s.io/v1beta1 and required in client.authentication.k8s.io/v1.
interactiveMode values
interactiveMode Value
Meaning
Never
This exec plugin never needs to use standard input, and therefore the exec plugin will be run regardless of whether standard input is available for user input.
IfAvailable
This exec plugin would like to use standard input if it is available, but can still operate if standard input is not available. Therefore, the exec plugin will be run regardless of whether stdin is available for user input. If standard input is available for user input, then it will be provided to this exec plugin.
Always
This exec plugin requires standard input in order to run, and therefore the exec plugin will only be run if standard input is available for user input. If standard input is not available for user input, then the exec plugin will not be run and an error will be returned by the exec plugin runner.
To use bearer token credentials, the plugin returns a token in the status of the
ExecCredential
Alternatively, a PEM-encoded client certificate and key can be returned to use TLS client auth.
If the plugin returns a different certificate and key on a subsequent call, k8s.io/client-go
will close existing connections with the server to force a new TLS handshake.
If specified, clientKeyData and clientCertificateData must both must be present.
clientCertificateData may contain additional intermediate certificates to send to the server.
Optionally, the response can include the expiry of the credential formatted as a
RFC 3339 timestamp.
Presence or absence of an expiry has the following impact:
If an expiry is included, the bearer token and TLS credentials are cached until
the expiry time is reached, or if the server responds with a 401 HTTP status code,
or when the process exits.
If an expiry is omitted, the bearer token and TLS credentials are cached until
the server responds with a 401 HTTP status code or until the process exits.
To enable the exec plugin to obtain cluster-specific information, set provideClusterInfo on the user.exec
field in the kubeconfig.
The plugin will then be supplied this cluster-specific information in the KUBERNETES_EXEC_INFO environment variable.
Information from this environment variable can be used to perform cluster-specific
credential acquisition logic.
The following ExecCredential manifest describes a cluster information sample.
{
"apiVersion": "client.authentication.k8s.io/v1",
"kind": "ExecCredential",
"spec": {
"cluster": {
"server": "https://172.17.4.100:6443",
"certificate-authority-data": "LS0t...",
"config": {
"arbitrary": "config",
"this": "can be provided via the KUBERNETES_EXEC_INFO environment variable upon setting provideClusterInfo",
"you": ["can", "put", "anything", "here"]
}
},
"interactive": true }
}
{
"apiVersion": "client.authentication.k8s.io/v1beta1",
"kind": "ExecCredential",
"spec": {
"cluster": {
"server": "https://172.17.4.100:6443",
"certificate-authority-data": "LS0t...",
"config": {
"arbitrary": "config",
"this": "can be provided via the KUBERNETES_EXEC_INFO environment variable upon setting provideClusterInfo",
"you": ["can", "put", "anything", "here"]
}
},
"interactive": true }
}
API access to authentication information for a client
FEATURE STATE:Kubernetes v1.28 [stable]
If your cluster has the API enabled, you can use the SelfSubjectReview API to find out
how your Kubernetes cluster maps your authentication information to identify you as a client.
This works whether you are authenticating as a user (typically representing
a real person) or as a ServiceAccount.
SelfSubjectReview objects do not have any configurable fields. On receiving a request,
the Kubernetes API server fills the status with the user attributes and returns it to the user.
Request example (the body would be a SelfSubjectReview):
POST /apis/authentication.k8s.io/v1/selfsubjectreviews
For convenience, the kubectl auth whoami command is present. Executing this command will
produce the following output (yet different user attributes will be shown):
Simple output example
ATTRIBUTE VALUE
Username jane.doe
Groups [system:authenticated]
Complex example including extra attributes
ATTRIBUTE VALUE
Username jane.doe
UID b79dbf30-0c6a-11ed-861d-0242ac120002
Groups [students teachers system:authenticated]
Extra: skills [reading learning]
Extra: subjects [math sports]
By providing the output flag, it is also possible to print the JSON or YAML representation of the result:
The Kubernetes API server fills the userInfo after all authentication mechanisms are applied,
including impersonation.
If you, or an authentication proxy, make a SelfSubjectReview using impersonation,
you see the user details and properties for the user that was impersonated.
By default, all authenticated users can create SelfSubjectReview objects when the APISelfSubjectReview
feature is enabled. It is allowed by the system:basic-user cluster role.
Note:
You can only make SelfSubjectReview requests if:
the APISelfSubjectReviewfeature gate
is enabled for your cluster (not needed for Kubernetes 1.33, but older
Kubernetes versions might not offer this feature gate, or might default it to be off)
(if you are running a version of Kubernetes older than v1.28) the API server for your
cluster has the authentication.k8s.io/v1alpha1 or authentication.k8s.io/v1beta1API group
enabled.
Bootstrap tokens are a simple bearer token that is meant to be used when
creating new clusters or joining new nodes to an existing cluster.
It was built to support kubeadm, but can be used in other contexts
for users that wish to start clusters without kubeadm. It is also built to
work, via RBAC policy, with the
kubelet TLS Bootstrapping system.
Bootstrap Tokens Overview
Bootstrap Tokens are defined with a specific type
(bootstrap.kubernetes.io/token) of secrets that lives in the kube-system
namespace. These Secrets are then read by the Bootstrap Authenticator in the
API Server. Expired tokens are removed with the TokenCleaner controller in the
Controller Manager. The tokens are also used to create a signature for a
specific ConfigMap used in a "discovery" process through a BootstrapSigner
controller.
Token Format
Bootstrap Tokens take the form of abcdef.0123456789abcdef.
More formally, they must match the regular expression [a-z0-9]{6}\.[a-z0-9]{16}.
The first part of the token is the "Token ID" and is considered public
information. It is used when referring to a token without leaking the secret
part used for authentication. The second part is the "Token Secret" and should
only be shared with trusted parties.
Enabling Bootstrap Token Authentication
The Bootstrap Token authenticator can be enabled using the following flag on the
API server:
--enable-bootstrap-token-auth
When enabled, bootstrapping tokens can be used as bearer token credentials to
authenticate requests against the API server.
Authorization: Bearer 07401b.f395accd246ae52d
Tokens authenticate as the username system:bootstrap:<token id> and are members
of the group system:bootstrappers.
Additional groups may be specified in the token's Secret.
Expired tokens can be deleted automatically by enabling the tokencleaner
controller on the controller manager.
--controllers=*,tokencleaner
Bootstrap Token Secret Format
Each valid token is backed by a secret in the kube-system namespace. You can
find the full design doc
here.
Here is what the secret looks like.
apiVersion:v1kind:Secretmetadata:# Name MUST be of form "bootstrap-token-<token id>"name:bootstrap-token-07401bnamespace:kube-system# Type MUST be 'bootstrap.kubernetes.io/token'type:bootstrap.kubernetes.io/tokenstringData:# Human readable description. Optional.description:"The default bootstrap token generated by 'kubeadm init'."# Token ID and secret. Required.token-id:07401btoken-secret:f395accd246ae52d# Expiration. Optional.expiration:2017-03-10T03:22:11Z# Allowed usages.usage-bootstrap-authentication:"true"usage-bootstrap-signing:"true"# Extra groups to authenticate the token as. Must start with "system:bootstrappers:"auth-extra-groups:system:bootstrappers:worker,system:bootstrappers:ingress
The type of the secret must be bootstrap.kubernetes.io/token and the name must
be bootstrap-token-<token id>. It must also exist in the kube-system namespace.
The usage-bootstrap-* members indicate what this secret is intended to be used for.
A value must be set to true to be enabled.
usage-bootstrap-authentication indicates that the token can be used to
authenticate to the API server as a bearer token.
usage-bootstrap-signing indicates that the token may be used to sign the
cluster-info ConfigMap as described below.
The expiration field controls the expiry of the token. Expired tokens are
rejected when used for authentication and ignored during ConfigMap signing.
The expiry value is encoded as an absolute UTC time using RFC3339. Enable the
tokencleaner controller to automatically delete expired tokens.
Token Management with kubeadm
You can use the kubeadm tool to manage tokens on a running cluster. See the
kubeadm token docs for details.
ConfigMap Signing
In addition to authentication, the tokens can be used to sign a ConfigMap.
This is used early in a cluster bootstrap process before the client trusts the API
server. The signed ConfigMap can be authenticated by the shared token.
Enable ConfigMap signing by enabling the bootstrapsigner controller on the
Controller Manager.
--controllers=*,bootstrapsigner
The ConfigMap that is signed is cluster-info in the kube-public namespace.
The typical flow is that a client reads this ConfigMap while unauthenticated and
ignoring TLS errors. It then validates the payload of the ConfigMap by looking
at a signature embedded in the ConfigMap.
The kubeconfig member of the ConfigMap is a config file with only the cluster
information filled out. The key thing being communicated here is the
certificate-authority-data. This may be expanded in the future.
The signature is a JWS signature using the "detached" mode. To validate the
signature, the user should encode the kubeconfig payload according to JWS
rules (base64 encoded while discarding any trailing =). That encoded payload
is then used to form a whole JWS by inserting it between the 2 dots. You can
verify the JWS using the HS256 scheme (HMAC-SHA256) with the full token (e.g.
07401b.f395accd246ae52d) as the shared secret. Users must verify that HS256
is used.
Warning:
Any party with a bootstrapping token can create a valid signature for that
token. When using ConfigMap signing it's discouraged to share the same token with
many clients, since a compromised client can potentially man-in-the middle another
client relying on the signature to bootstrap TLS trust.
Details of Kubernetes authorization mechanisms and supported authorization modes.
Kubernetes authorization takes place following
authentication.
Usually, a client making a request must be authenticated (logged in) before its
request can be allowed; however, Kubernetes also allows anonymous requests in
some circumstances.
Kubernetes authorization of API requests takes place within the API server.
The API server evaluates all of the request attributes against all policies,
potentially also consulting external services, and then allows or denies the
request.
All parts of an API request must be allowed by some authorization
mechanism in order to proceed. In other words: access is denied by default.
Note:
Access controls and policies that
depend on specific fields of specific kinds of objects are handled by
admission controllers.
Kubernetes admission control happens after authorization has completed (and,
therefore, only when the authorization decision was to allow the request).
When multiple authorization modules are configured,
each is checked in sequence.
If any authorizer approves or denies a request, that decision is immediately
returned and no other authorizer is consulted. If all modules have no opinion
on the request, then the request is denied. An overall deny verdict means that
the API server rejects the request and responds with an HTTP 403 (Forbidden)
status.
Request attributes used in authorization
Kubernetes reviews only the following API request attributes:
user - The user string provided during authentication.
group - The list of group names to which the authenticated user belongs.
extra - A map of arbitrary string keys to string values, provided by the authentication layer.
API - Indicates whether the request is for an API resource.
Request path - Path to miscellaneous non-resource endpoints like /api or /healthz.
API request verb - API verbs like get, list, create, update, patch, watch, delete, and deletecollection are used for resource requests. To determine the request verb for a resource API endpoint, see request verbs and authorization.
HTTP request verb - Lowercased HTTP methods like get, post, put, and delete are used for non-resource requests.
Resource - The ID or name of the resource that is being accessed (for resource requests only) -- For resource requests using get, update, patch, and delete verbs, you must provide the resource name.
Subresource - The subresource that is being accessed (for resource requests only).
Namespace - The namespace of the object that is being accessed (for namespaced resource requests only).
API group - The API Group being accessed (for resource requests only). An empty string designates the coreAPI group.
Request verbs and authorization
Non-resource requests
Requests to endpoints other than /api/v1/... or /apis/<group>/<version>/...
are considered non-resource requests, and use the lower-cased HTTP method of the request as the verb.
For example, making a GET request using HTTP to endpoints such as /api or /healthz would use get as the verb.
Resource requests
To determine the request verb for a resource API endpoint, Kubernetes maps the HTTP verb
used and considers whether or not the request acts on an individual resource or on a
collection of resources:
HTTP verb
request verb
POST
create
GET, HEAD
get (for individual resources), list (for collections, including full object content), watch (for watching an individual resource or collection of resources)
+The get, list and watch verbs can all return the full details of a resource. In
terms of access to the returned data they are equivalent. For example, list on secrets
will reveal the data attributes of any returned resources.
Kubernetes sometimes checks authorization for additional permissions using specialized verbs. For example:
bind and escalate verbs on roles and clusterroles resources in the rbac.authorization.k8s.io API group.
Authorization context
Kubernetes expects attributes that are common to REST API requests. This means
that Kubernetes authorization works with existing organization-wide or
cloud-provider-wide access control systems which may handle other APIs besides
the Kubernetes API.
Authorization modes
The Kubernetes API server may authorize a request using one of several authorization modes:
AlwaysAllow
This mode allows all requests, which brings security risks. Use this authorization mode only if you do not require authorization for your API requests (for example, for testing).
AlwaysDeny
This mode blocks all requests. Use this authorization mode only for testing.
Kubernetes ABAC mode defines an access control paradigm whereby access rights are granted to users through the use of policies which combine attributes together. The policies can use any type of attributes (user attributes, resource attributes, object, environment attributes, etc).
Kubernetes RBAC is a method of regulating access to computer or network resources based on the roles of individual users within an enterprise. In this context, access is the ability of an individual user to perform a specific task, such as view, create, or modify a file. In this mode, Kubernetes uses the rbac.authorization.k8s.io API group to drive authorization decisions, allowing you to dynamically configure permission policies through the Kubernetes API.
Node
A special-purpose authorization mode that grants permissions to kubelets based on the pods they are scheduled to run. To learn more about the Node authorization mode, see Node Authorization.
Webhook
Kubernetes webhook mode for authorization makes a synchronous HTTP callout, blocking the request until the remote HTTP service responds to the query.You can write your own software to handle the callout, or use solutions from the ecosystem.
You should not use the AlwaysAllow mode on a Kubernetes cluster where the API server
is reachable from the public internet.
The system:masters group
The system:masters group is a built-in Kubernetes group that grants unrestricted
access to the API server. Any user assigned to this group has full cluster administrator
privileges, bypassing any authorization restrictions imposed by the RBAC or Webhook mechanisms.
Avoid adding users
to this group. If you do need to grant a user cluster-admin rights, you can create a
ClusterRoleBinding
to the built-in cluster-admin ClusterRole.
You have to pick one of the two configuration approaches; setting both --authorization-config
path and configuring an authorization webhook using the --authorization-mode and
--authorization-webhook-* command line arguments is not allowed.
If you try this, the API server reports an error message during startup, then exits immediately.
The configuration file approach even allows you to specify
CEL rules to pre-filter requests before they are dispatched
to webhooks, helping you to prevent unnecessary invocations. The API server also automatically
reloads the authorizer chain when the configuration file is modified.
You specify the path to the authorization configuration using the
--authorization-config command line argument.
If you want to use command line arguments instead of a configuration file, that's also a valid and supported approach.
Some authorization capabilities (for example: multiple webhooks, webhook failure policy, and pre-filter rules)
are only available if you use an authorization configuration file.
Example configuration
---## DO NOT USE THE CONFIG AS IS. THIS IS AN EXAMPLE.#apiVersion:apiserver.config.k8s.io/v1kind:AuthorizationConfigurationauthorizers:- type:Webhook# Name used to describe the authorizer# This is explicitly used in monitoring machinery for metrics# Note:# - Validation for this field is similar to how K8s labels are validated today.# Required, with no defaultname:webhookwebhook:# The duration to cache 'authorized' responses from the webhook# authorizer.# Same as setting `--authorization-webhook-cache-authorized-ttl` flag# Default: 5m0sauthorizedTTL:30s# The duration to cache 'unauthorized' responses from the webhook# authorizer.# Same as setting `--authorization-webhook-cache-unauthorized-ttl` flag# Default: 30sunauthorizedTTL:30s# Timeout for the webhook request# Maximum allowed is 30s.# Required, with no default.timeout:3s# The API version of the authorization.k8s.io SubjectAccessReview to# send to and expect from the webhook.# Same as setting `--authorization-webhook-version` flag# Required, with no default# Valid values: v1beta1, v1subjectAccessReviewVersion:v1# MatchConditionSubjectAccessReviewVersion specifies the SubjectAccessReview# version the CEL expressions are evaluated against# Valid values: v1# Required, no default valuematchConditionSubjectAccessReviewVersion:v1# Controls the authorization decision when a webhook request fails to# complete or returns a malformed response or errors evaluating# matchConditions.# Valid values:# - NoOpinion: continue to subsequent authorizers to see if one of# them allows the request# - Deny: reject the request without consulting subsequent authorizers# Required, with no default.failurePolicy:DenyconnectionInfo:# Controls how the webhook should communicate with the server.# Valid values:# - KubeConfigFile: use the file specified in kubeConfigFile to locate the# server.# - InClusterConfig: use the in-cluster configuration to call the# SubjectAccessReview API hosted by kube-apiserver. This mode is not# allowed for kube-apiserver.type:KubeConfigFile# Path to KubeConfigFile for connection info# Required, if connectionInfo.Type is KubeConfigFilekubeConfigFile:/kube-system-authz-webhook.yaml# matchConditions is a list of conditions that must be met for a request to be sent to this# webhook. An empty list of matchConditions matches all requests.# There are a maximum of 64 match conditions allowed.## The exact matching logic is (in order):# 1. If at least one matchCondition evaluates to FALSE, then the webhook is skipped.# 2. If ALL matchConditions evaluate to TRUE, then the webhook is called.# 3. If at least one matchCondition evaluates to an error (but none are FALSE):# - If failurePolicy=Deny, then the webhook rejects the request# - If failurePolicy=NoOpinion, then the error is ignored and the webhook is skippedmatchConditions:# expression represents the expression which will be evaluated by CEL. Must evaluate to bool.# CEL expressions have access to the contents of the SubjectAccessReview in v1 version.# If version specified by subjectAccessReviewVersion in the request variable is v1beta1,# the contents would be converted to the v1 version before evaluating the CEL expression.## Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/## only send resource requests to the webhook- expression:has(request.resourceAttributes)# only intercept requests to kube-system- expression:request.resourceAttributes.namespace == 'kube-system'# don't intercept requests from kube-system service accounts- expression:"!('system:serviceaccounts:kube-system' in request.groups)"- type:Nodename:node- type:RBACname:rbac- type:Webhookname:in-cluster-authorizerwebhook:authorizedTTL:5munauthorizedTTL:30stimeout:3ssubjectAccessReviewVersion:v1failurePolicy:NoOpinionconnectionInfo:type:InClusterConfig
When configuring the authorizer chain using a configuration file, make sure all the
control plane nodes have the same file contents. Take a note of the API server
configuration when upgrading / downgrading your clusters. For example, if upgrading
from Kubernetes 1.32 to Kubernetes 1.33,
you would need to make sure the config file is in a format that Kubernetes 1.33
can understand, before you upgrade the cluster. If you downgrade to 1.32,
you would need to set the configuration appropriately.
Authorization configuration and reloads
Kubernetes reloads the authorization configuration file when the API server observes a change
to the file, and also on a 60 second schedule if no change events were observed.
Note:
You must ensure that all non-webhook authorizer types remain unchanged in the file on reload.
A reload must not add or remove Node or RBAC authorizers (they can be reordered,
but cannot be added or removed).
Command line authorization mode configuration
You can use the following modes:
--authorization-mode=ABAC (Attribute-based access control mode)
--authorization-mode=RBAC (Role-based access control mode)
You can choose more than one authorization mode; for example:
--authorization-mode=Node,RBAC,Webhook
Kubernetes checks authorization modules based on the order that you specify them
on the API server's command line, so an earlier module has higher priority to allow
or deny a request.
For more information on command line arguments to the API server, read the
kube-apiserver reference.
Privilege escalation via workload creation or edits
Users who can create/edit pods in a namespace, either directly or through an object that
enables indirect workload management, may be
able to escalate their privileges in that namespace. The potential routes to privilege
escalation include Kubernetes API extensions
and their associated controllers.
Caution:
As a cluster administrator, use caution when granting access to create or edit workloads.
Some details of how these can be misused are documented in
escalation paths.
Escalation paths
There are different ways that an attacker or untrustworthy user could gain additional
privilege within a namespace, if you allow them to run arbitrary Pods in that namespace:
Mounting arbitrary Secrets in that namespace
Can be used to access confidential information meant for other workloads
Can be used to obtain a more privileged ServiceAccount's service account token
Using arbitrary ServiceAccounts in that namespace
Can perform Kubernetes API actions as another workload (impersonation)
Can perform any privileged actions that ServiceAccount has
Mounting or using ConfigMaps meant for other workloads in that namespace
Can be used to obtain information meant for other workloads, such as database host names.
Mounting volumes meant for other workloads in that namespace
Can be used to obtain information meant for other workloads, and change it.
Caution:
As a system administrator, you should be cautious when deploying CustomResourceDefinitions
that let users make changes to the above areas. These may open privilege escalations paths.
Consider the consequences of this kind of change when deciding on your authorization controls.
Checking API access
kubectl provides the auth can-i subcommand for quickly querying the API authorization layer.
The command uses the SelfSubjectAccessReview API to determine if the current user can perform
a given action, and works regardless of the authorization mode used.
kubectl auth can-i create deployments --namespace dev
SelfSubjectAccessReview is part of the authorization.k8s.io API group, which
exposes the API server authorization to external services. Other resources in
this group include:
SubjectAccessReview
Access review for any user, not only the current one. Useful for delegating authorization decisions to the API server. For example, the kubelet and extension API servers use this to determine user access to their own APIs.
LocalSubjectAccessReview
Like SubjectAccessReview but restricted to a specific namespace.
SelfSubjectRulesReview
A review which returns the set of actions a user can perform within a namespace. Useful for users to quickly summarize their own access, or for UIs to hide/show actions.
These APIs can be queried by creating normal Kubernetes resources, where the response status
field of the returned object is the result of the query. For example:
Role-based access control (RBAC) is a method of regulating access to computer or
network resources based on the roles of individual users within your organization.
RBAC authorization uses the rbac.authorization.k8s.ioAPI group to drive authorization
decisions, allowing you to dynamically configure policies through the Kubernetes API.
To enable RBAC, start the API server
with the --authorization-config flag set to a file that includes the RBAC authorizer; for example:
The RBAC API declares four kinds of Kubernetes object: Role, ClusterRole,
RoleBinding and ClusterRoleBinding. You can describe or amend the RBAC
objects
using tools such as kubectl, just like any other Kubernetes object.
Caution:
These objects, by design, impose access restrictions. If you are making changes
to a cluster as you learn, see
privilege escalation prevention and bootstrapping
to understand how those restrictions can prevent you making some changes.
Role and ClusterRole
An RBAC Role or ClusterRole contains rules that represent a set of permissions.
Permissions are purely additive (there are no "deny" rules).
A Role always sets permissions within a particular namespace;
when you create a Role, you have to specify the namespace it belongs in.
ClusterRole, by contrast, is a non-namespaced resource. The resources have different names (Role
and ClusterRole) because a Kubernetes object always has to be either namespaced or not namespaced;
it can't be both.
ClusterRoles have several uses. You can use a ClusterRole to:
define permissions on namespaced resources and be granted access within individual namespace(s)
define permissions on namespaced resources and be granted access across all namespaces
define permissions on cluster-scoped resources
If you want to define a role within a namespace, use a Role; if you want to define
a role cluster-wide, use a ClusterRole.
Role example
Here's an example Role in the "default" namespace that can be used to grant read access to
pods:
apiVersion:rbac.authorization.k8s.io/v1kind:Rolemetadata:namespace:defaultname:pod-readerrules:- apiGroups:[""]# "" indicates the core API groupresources:["pods"]verbs:["get","watch","list"]
ClusterRole example
A ClusterRole can be used to grant the same permissions as a Role.
Because ClusterRoles are cluster-scoped, you can also use them to grant access to:
namespaced resources (like Pods), across all namespaces
For example: you can use a ClusterRole to allow a particular user to run
kubectl get pods --all-namespaces
Here is an example of a ClusterRole that can be used to grant read access to
secrets in any particular namespace,
or across all namespaces (depending on how it is bound):
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:# "namespace" omitted since ClusterRoles are not namespacedname:secret-readerrules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing Secret# objects is "secrets"resources:["secrets"]verbs:["get","watch","list"]
The name of a Role or a ClusterRole object must be a valid
path segment name.
RoleBinding and ClusterRoleBinding
A role binding grants the permissions defined in a role to a user or set of users.
It holds a list of subjects (users, groups, or service accounts), and a reference to the
role being granted.
A RoleBinding grants permissions within a specific namespace whereas a ClusterRoleBinding
grants that access cluster-wide.
A RoleBinding may reference any Role in the same namespace. Alternatively, a RoleBinding
can reference a ClusterRole and bind that ClusterRole to the namespace of the RoleBinding.
If you want to bind a ClusterRole to all the namespaces in your cluster, you use a
ClusterRoleBinding.
The name of a RoleBinding or ClusterRoleBinding object must be a valid
path segment name.
RoleBinding examples
Here is an example of a RoleBinding that grants the "pod-reader" Role to the user "jane"
within the "default" namespace.
This allows "jane" to read pods in the "default" namespace.
apiVersion:rbac.authorization.k8s.io/v1# This role binding allows "jane" to read pods in the "default" namespace.# You need to already have a Role named "pod-reader" in that namespace.kind:RoleBindingmetadata:name:read-podsnamespace:defaultsubjects:# You can specify more than one "subject"- kind:Username:jane# "name" is case sensitiveapiGroup:rbac.authorization.k8s.ioroleRef:# "roleRef" specifies the binding to a Role / ClusterRolekind:Role#this must be Role or ClusterRolename:pod-reader# this must match the name of the Role or ClusterRole you wish to bind toapiGroup:rbac.authorization.k8s.io
A RoleBinding can also reference a ClusterRole to grant the permissions defined in that
ClusterRole to resources inside the RoleBinding's namespace. This kind of reference
lets you define a set of common roles across your cluster, then reuse them within
multiple namespaces.
For instance, even though the following RoleBinding refers to a ClusterRole,
"dave" (the subject, case sensitive) will only be able to read Secrets in the "development"
namespace, because the RoleBinding's namespace (in its metadata) is "development".
apiVersion:rbac.authorization.k8s.io/v1# This role binding allows "dave" to read secrets in the "development" namespace.# You need to already have a ClusterRole named "secret-reader".kind:RoleBindingmetadata:name:read-secrets## The namespace of the RoleBinding determines where the permissions are granted.# This only grants permissions within the "development" namespace.namespace:developmentsubjects:- kind:Username:dave# Name is case sensitiveapiGroup:rbac.authorization.k8s.ioroleRef:kind:ClusterRolename:secret-readerapiGroup:rbac.authorization.k8s.io
ClusterRoleBinding example
To grant permissions across a whole cluster, you can use a ClusterRoleBinding.
The following ClusterRoleBinding allows any user in the group "manager" to read
secrets in any namespace.
apiVersion:rbac.authorization.k8s.io/v1# This cluster role binding allows anyone in the "manager" group to read secrets in any namespace.kind:ClusterRoleBindingmetadata:name:read-secrets-globalsubjects:- kind:Groupname:manager# Name is case sensitiveapiGroup:rbac.authorization.k8s.ioroleRef:kind:ClusterRolename:secret-readerapiGroup:rbac.authorization.k8s.io
After you create a binding, you cannot change the Role or ClusterRole that it refers to.
If you try to change a binding's roleRef, you get a validation error. If you do want
to change the roleRef for a binding, you need to remove the binding object and create
a replacement.
There are two reasons for this restriction:
Making roleRef immutable allows granting someone update permission on an existing binding
object, so that they can manage the list of subjects, without being able to change
the role that is granted to those subjects.
A binding to a different role is a fundamentally different binding.
Requiring a binding to be deleted/recreated in order to change the roleRef
ensures the full list of subjects in the binding is intended to be granted
the new role (as opposed to enabling or accidentally modifying only the roleRef
without verifying all of the existing subjects should be given the new role's
permissions).
The kubectl auth reconcile command-line utility creates or updates a manifest file containing RBAC objects,
and handles deleting and recreating binding objects if required to change the role they refer to.
See command usage and examples for more information.
Referring to resources
In the Kubernetes API, most resources are represented and accessed using a string representation of
their object name, such as pods for a Pod. RBAC refers to resources using exactly the same
name that appears in the URL for the relevant API endpoint.
Some Kubernetes APIs involve a
subresource, such as the logs for a Pod. A request for a Pod's logs looks like:
GET /api/v1/namespaces/{namespace}/pods/{name}/log
In this case, pods is the namespaced resource for Pod resources, and log is a
subresource of pods. To represent this in an RBAC role, use a slash (/) to
delimit the resource and subresource. To allow a subject to read pods and
also access the log subresource for each of those Pods, you write:
You can also refer to resources by name for certain requests through the resourceNames list.
When specified, requests can be restricted to individual instances of a resource.
Here is an example that restricts its subject to only get or update a
ConfigMap named my-configmap:
apiVersion:rbac.authorization.k8s.io/v1kind:Rolemetadata:namespace:defaultname:configmap-updaterrules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing ConfigMap# objects is "configmaps"resources:["configmaps"]resourceNames:["my-configmap"]verbs:["update","get"]
Note:
You cannot restrict create or deletecollection requests by their resource name.
For create, this limitation is because the name of the new object may not be known at authorization time.
If you restrict list or watch by resourceName, clients must include a metadata.name field selector in their list or watch request that matches the specified resourceName in order to be authorized.
For example, kubectl get configmaps --field-selector=metadata.name=my-configmap
Rather than referring to individual resources, apiGroups, and verbs,
you can use the wildcard * symbol to refer to all such objects.
For nonResourceURLs, you can use the wildcard * as a suffix glob match.
For resourceNames, an empty set means that everything is allowed.
Here is an example that allows access to perform any current and future action on
all current and future resources in the example.com API group.
This is similar to the built-in cluster-admin role.
apiVersion:rbac.authorization.k8s.io/v1kind:Rolemetadata:namespace:defaultname:example.com-superuser# DO NOT USE THIS ROLE, IT IS JUST AN EXAMPLErules:- apiGroups:["example.com"]resources:["*"]verbs:["*"]
Caution:
Using wildcards in resource and verb entries could result in overly permissive access being granted
to sensitive resources.
For instance, if a new resource type is added, or a new subresource is added,
or a new custom verb is checked, the wildcard entry automatically grants access, which may be undesirable.
The principle of least privilege
should be employed, using specific resources and verbs to ensure only the permissions required for the
workload to function correctly are applied.
Aggregated ClusterRoles
You can aggregate several ClusterRoles into one combined ClusterRole.
A controller, running as part of the cluster control plane, watches for ClusterRole
objects with an aggregationRule set. The aggregationRule defines a label
selector that the controller
uses to match other ClusterRole objects that should be combined into the rules
field of this one.
Caution:
The control plane overwrites any values that you manually specify in the rules field of an
aggregate ClusterRole. If you want to change or add rules, do so in the ClusterRole objects
that are selected by the aggregationRule.
Here is an example aggregated ClusterRole:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:monitoringaggregationRule:clusterRoleSelectors:- matchLabels:rbac.example.com/aggregate-to-monitoring:"true"rules:[]# The control plane automatically fills in the rules
If you create a new ClusterRole that matches the label selector of an existing aggregated ClusterRole,
that change triggers adding the new rules into the aggregated ClusterRole.
Here is an example that adds rules to the "monitoring" ClusterRole, by creating another
ClusterRole labeled rbac.example.com/aggregate-to-monitoring: true.
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:monitoring-endpointslabels:rbac.example.com/aggregate-to-monitoring:"true"# When you create the "monitoring-endpoints" ClusterRole,# the rules below will be added to the "monitoring" ClusterRole.rules:- apiGroups:[""]resources:["services","endpointslices","pods"]verbs:["get","list","watch"]
The default user-facing roles use ClusterRole aggregation. This lets you,
as a cluster administrator, include rules for custom resources, such as those served by
CustomResourceDefinitions
or aggregated API servers, to extend the default roles.
For example: the following ClusterRoles let the "admin" and "edit" default roles manage the custom resource
named CronTab, whereas the "view" role can perform only read actions on CronTab resources.
You can assume that CronTab objects are named "crontabs" in URLs as seen by the API server.
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:aggregate-cron-tabs-editlabels:# Add these permissions to the "admin" and "edit" default roles.rbac.authorization.k8s.io/aggregate-to-admin:"true"rbac.authorization.k8s.io/aggregate-to-edit:"true"rules:- apiGroups:["stable.example.com"]resources:["crontabs"]verbs:["get","list","watch","create","update","patch","delete"]---kind:ClusterRoleapiVersion:rbac.authorization.k8s.io/v1metadata:name:aggregate-cron-tabs-viewlabels:# Add these permissions to the "view" default role.rbac.authorization.k8s.io/aggregate-to-view:"true"rules:- apiGroups:["stable.example.com"]resources:["crontabs"]verbs:["get","list","watch"]
Role examples
The following examples are excerpts from Role or ClusterRole objects, showing only
the rules section.
Allow reading "pods" resources in the core
API Group:
rules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing Pod# objects is "pods"resources:["pods"]verbs:["get","list","watch"]
Allow reading/writing Deployments (at the HTTP level: objects with "deployments"
in the resource part of their URL) in the "apps" API groups:
rules:- apiGroups:["apps"]## at the HTTP level, the name of the resource for accessing Deployment# objects is "deployments"resources:["deployments"]verbs:["get","list","watch","create","update","patch","delete"]
Allow reading Pods in the core API group, as well as reading or writing Job
resources in the "batch" API group:
rules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing Pod# objects is "pods"resources:["pods"]verbs:["get","list","watch"]- apiGroups:["batch"]## at the HTTP level, the name of the resource for accessing Job# objects is "jobs"resources:["jobs"]verbs:["get","list","watch","create","update","patch","delete"]
Allow reading a ConfigMap named "my-config" (must be bound with a
RoleBinding to limit to a single ConfigMap in a single namespace):
rules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing ConfigMap# objects is "configmaps"resources:["configmaps"]resourceNames:["my-config"]verbs:["get"]
Allow reading the resource "nodes" in the core group (because a
Node is cluster-scoped, this must be in a ClusterRole bound with a
ClusterRoleBinding to be effective):
rules:- apiGroups:[""]## at the HTTP level, the name of the resource for accessing Node# objects is "nodes"resources:["nodes"]verbs:["get","list","watch"]
Allow GET and POST requests to the non-resource endpoint /healthz and
all subpaths (must be in a ClusterRole bound with a ClusterRoleBinding
to be effective):
rules:- nonResourceURLs:["/healthz","/healthz/*"]# '*' in a nonResourceURL is a suffix glob matchverbs:["get","post"]
Referring to subjects
A RoleBinding or ClusterRoleBinding binds a role to subjects.
Subjects can be groups, users or
ServiceAccounts.
Kubernetes represents usernames as strings.
These can be: plain names, such as "alice"; email-style names, like "bob@example.com";
or numeric user IDs represented as a string. It is up to you as a cluster administrator
to configure the authentication modules
so that authentication produces usernames in the format you want.
Caution:
The prefix system: is reserved for Kubernetes system use, so you should ensure
that you don't have users or groups with names that start with system: by
accident.
Other than this special prefix, the RBAC authorization system does not require any format
for usernames.
In Kubernetes, Authenticator modules provide group information.
Groups, like users, are represented as strings, and that string has no format requirements,
other than that the prefix system: is reserved.
ServiceAccounts have names prefixed
with system:serviceaccount:, and belong to groups that have names prefixed with system:serviceaccounts:.
Note:
system:serviceaccount: (singular) is the prefix for service account usernames.
system:serviceaccounts: (plural) is the prefix for service account groups.
RoleBinding examples
The following examples are RoleBinding excerpts that only
show the subjects section.
API servers create a set of default ClusterRole and ClusterRoleBinding objects.
Many of these are system: prefixed, which indicates that the resource is directly
managed by the cluster control plane.
All of the default ClusterRoles and ClusterRoleBindings are labeled with kubernetes.io/bootstrapping=rbac-defaults.
Caution:
Take care when modifying ClusterRoles and ClusterRoleBindings with names
that have a system: prefix.
Modifications to these resources can result in non-functional clusters.
Auto-reconciliation
At each start-up, the API server updates default cluster roles with any missing permissions,
and updates default cluster role bindings with any missing subjects.
This allows the cluster to repair accidental modifications, and helps to keep roles and role bindings
up-to-date as permissions and subjects change in new Kubernetes releases.
To opt out of this reconciliation, set the rbac.authorization.kubernetes.io/autoupdate
annotation on a default cluster role or default cluster RoleBinding to false.
Be aware that missing default permissions and subjects can result in non-functional clusters.
Auto-reconciliation is enabled by default if the RBAC authorizer is active.
API discovery roles
Default cluster role bindings authorize unauthenticated and authenticated users to read API information
that is deemed safe to be publicly accessible (including CustomResourceDefinitions).
To disable anonymous unauthenticated access, add --anonymous-auth=false flag to
the API server configuration.
To view the configuration of these roles via kubectl run:
kubectl get clusterroles system:discovery -o yaml
Note:
If you edit that ClusterRole, your changes will be overwritten on API server restart
via auto-reconciliation. To avoid that overwriting,
either do not manually edit the role, or disable auto-reconciliation.
Kubernetes RBAC API discovery roles
Default ClusterRole
Default ClusterRoleBinding
Description
system:basic-user
system:authenticated group
Allows a user read-only access to basic information about themselves. Prior to v1.14, this role was also bound to system:unauthenticated by default.
system:discovery
system:authenticated group
Allows read-only access to API discovery endpoints needed to discover and negotiate an API level. Prior to v1.14, this role was also bound to system:unauthenticated by default.
system:public-info-viewer
system:authenticated and system:unauthenticated groups
Allows read-only access to non-sensitive information about the cluster. Introduced in Kubernetes v1.14.
User-facing roles
Some of the default ClusterRoles are not system: prefixed. These are intended to be user-facing roles.
They include super-user roles (cluster-admin), roles intended to be granted cluster-wide
using ClusterRoleBindings, and roles intended to be granted within particular
namespaces using RoleBindings (admin, edit, view).
User-facing ClusterRoles use ClusterRole aggregation to allow admins to include
rules for custom resources on these ClusterRoles. To add rules to the admin, edit, or view roles, create
a ClusterRole with one or more of the following labels:
Allows super-user access to perform any action on any resource.
When used in a ClusterRoleBinding, it gives full control over every resource in the cluster and in all namespaces.
When used in a RoleBinding, it gives full control over every resource in the role binding's namespace, including the namespace itself.
admin
None
Allows admin access, intended to be granted within a namespace using a RoleBinding.
If used in a RoleBinding, allows read/write access to most resources in a namespace,
including the ability to create roles and role bindings within the namespace.
This role does not allow write access to resource quota or to the namespace itself.
This role also does not allow write access to EndpointSlices in clusters created
using Kubernetes v1.22+. More information is available in the
"Write Access for EndpointSlices" section.
edit
None
Allows read/write access to most objects in a namespace.
This role does not allow viewing or modifying roles or role bindings.
However, this role allows accessing Secrets and running Pods as any ServiceAccount in
the namespace, so it can be used to gain the API access levels of any ServiceAccount in
the namespace. This role also does not allow write access to EndpointSlices in
clusters created using Kubernetes v1.22+. More information is available in the
"Write Access for EndpointSlices" section.
view
None
Allows read-only access to see most objects in a namespace.
It does not allow viewing roles or role bindings.
This role does not allow viewing Secrets, since reading
the contents of Secrets enables access to ServiceAccount credentials
in the namespace, which would allow API access as any ServiceAccount
in the namespace (a form of privilege escalation).
Core component roles
Default ClusterRole
Default ClusterRoleBinding
Description
system:kube-scheduler
system:kube-scheduler user
Allows access to the resources required by the scheduler component.
system:volume-scheduler
system:kube-scheduler user
Allows access to the volume resources required by the kube-scheduler component.
system:kube-controller-manager
system:kube-controller-manager user
Allows access to the resources required by the controller manager component.
The permissions required by individual controllers are detailed in the controller roles.
system:node
None
Allows access to resources required by the kubelet, including read access to all secrets, and write access to all pod status objects.
You should use the Node authorizer and NodeRestriction admission plugin instead of the system:node role, and allow granting API access to kubelets based on the Pods scheduled to run on them.
The system:node role only exists for compatibility with Kubernetes clusters upgraded from versions prior to v1.8.
system:node-proxier
system:kube-proxy user
Allows access to the resources required by the kube-proxy component.
Other component roles
Default ClusterRole
Default ClusterRoleBinding
Description
system:auth-delegator
None
Allows delegated authentication and authorization checks.
This is commonly used by add-on API servers for unified authentication and authorization.
Allows read access to control-plane monitoring endpoints (i.e. kube-apiserver liveness and readiness endpoints (/healthz, /livez, /readyz), the individual health-check endpoints (/healthz/*, /livez/*, /readyz/*), and /metrics). Note that individual health check endpoints and the metric endpoint may expose sensitive information.
Roles for built-in controllers
The Kubernetes controller manager runs
controllers that are built in to the Kubernetes
control plane.
When invoked with --use-service-account-credentials, kube-controller-manager starts each controller
using a separate service account.
Corresponding roles exist for each built-in controller, prefixed with system:controller:.
If the controller manager is not started with --use-service-account-credentials, it runs all control loops
using its own credential, which must be granted all the relevant roles.
These roles include:
The RBAC API prevents users from escalating privileges by editing roles or role bindings.
Because this is enforced at the API level, it applies even when the RBAC authorizer is not in use.
Restrictions on role creation or update
You can only create/update a role if at least one of the following things is true:
You already have all the permissions contained in the role, at the same scope as the object being modified
(cluster-wide for a ClusterRole, within the same namespace or cluster-wide for a Role).
You are granted explicit permission to perform the escalate verb on the roles or
clusterroles resource in the rbac.authorization.k8s.io API group.
For example, if user-1 does not have the ability to list Secrets cluster-wide, they cannot create a ClusterRole
containing that permission. To allow a user to create/update roles:
Grant them a role that allows them to create/update Role or ClusterRole objects, as desired.
Grant them permission to include specific permissions in the roles they create/update:
implicitly, by giving them those permissions (if they attempt to create or modify a Role or
ClusterRole with permissions they themselves have not been granted, the API request will be forbidden)
or explicitly allow specifying any permission in a Role or ClusterRole by giving them
permission to perform the escalate verb on roles or clusterroles resources in the
rbac.authorization.k8s.io API group
Restrictions on role binding creation or update
You can only create/update a role binding if you already have all the permissions contained in the referenced role
(at the same scope as the role binding) or if you have been authorized to perform the bind verb on the referenced role.
For example, if user-1 does not have the ability to list Secrets cluster-wide, they cannot create a ClusterRoleBinding
to a role that grants that permission. To allow a user to create/update role bindings:
Grant them a role that allows them to create/update RoleBinding or ClusterRoleBinding objects, as desired.
Grant them permissions needed to bind a particular role:
implicitly, by giving them the permissions contained in the role.
explicitly, by giving them permission to perform the bind verb on the particular Role (or ClusterRole).
For example, this ClusterRole and RoleBinding would allow user-1 to grant other users the admin, edit, and view roles in the namespace user-1-namespace:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:role-grantorrules:- apiGroups:["rbac.authorization.k8s.io"]resources:["rolebindings"]verbs:["create"]- apiGroups:["rbac.authorization.k8s.io"]resources:["clusterroles"]verbs:["bind"]# omit resourceNames to allow binding any ClusterRoleresourceNames:["admin","edit","view"]---apiVersion:rbac.authorization.k8s.io/v1kind:RoleBindingmetadata:name:role-grantor-bindingnamespace:user-1-namespaceroleRef:apiGroup:rbac.authorization.k8s.iokind:ClusterRolename:role-grantorsubjects:- apiGroup:rbac.authorization.k8s.iokind:Username:user-1
When bootstrapping the first roles and role bindings, it is necessary for the initial user to grant permissions they do not yet have.
To bootstrap initial roles and role bindings:
Use a credential with the "system:masters" group, which is bound to the "cluster-admin" super-user role by the default bindings.
Command-line utilities
kubectl create role
Creates a Role object defining permissions within a single namespace. Examples:
Create a Role named "pod-reader" that allows users to perform get, watch and list on pods:
kubectl create role pod-reader --verb=get --verb=list --verb=watch --resource=pods
Create a Role named "pod-reader" with resourceNames specified:
kubectl create role pod-reader --verb=get --resource=pods --resource-name=readablepod --resource-name=anotherpod
Create a Role named "foo" with apiGroups specified:
kubectl create role foo --verb=get,list,watch --resource=replicasets.apps
Create a Role named "foo" with subresource permissions:
kubectl create role foo --verb=get,list,watch --resource=pods,pods/status
Create a Role named "my-component-lease-holder" with permissions to get/update a resource with a specific name:
kubectl create role my-component-lease-holder --verb=get,list,watch,update --resource=lease --resource-name=my-component
kubectl create clusterrole
Creates a ClusterRole. Examples:
Create a ClusterRole named "pod-reader" that allows user to perform get, watch and list on pods:
Default RBAC policies grant scoped permissions to control-plane components, nodes,
and controllers, but grant no permissions to service accounts outside the kube-system namespace
(beyond the permissions given by API discovery roles).
This allows you to grant particular roles to particular ServiceAccounts as needed.
Fine-grained role bindings provide greater security, but require more effort to administrate.
Broader grants can give unnecessary (and potentially escalating) API access to
ServiceAccounts, but are easier to administrate.
In order from most secure to least secure, the approaches are:
Grant a role to an application-specific service account (best practice)
This requires the application to specify a serviceAccountName in its pod spec,
and for the service account to be created (via the API, application manifest, kubectl create serviceaccount, etc.).
For example, grant read-only permission within "my-namespace" to the "my-sa" service account:
Many add-ons run as the
"default" service account in the kube-system namespace.
To allow those add-ons to run with super-user access, grant cluster-admin
permissions to the "default" service account in the kube-system namespace.
Caution:
Enabling this means the kube-system namespace contains Secrets
that grant super-user access to your cluster's API.
Grant a role to all service accounts in a namespace
If you want all applications in a namespace to have a role, no matter what service account they use,
you can grant a role to the service account group for that namespace.
For example, grant read-only permission within "my-namespace" to all service accounts in that namespace:
Grant super-user access to all service accounts cluster-wide (strongly discouraged)
If you don't care about partitioning permissions at all, you can grant super-user access to all service accounts.
Warning:
This allows any application full access to your cluster, and also grants
any user with read access to Secrets (or the ability to create any pod)
full access to your cluster.
Kubernetes clusters created before Kubernetes v1.22 include write access to
EndpointSlices (and the now-deprecated Endpoints API) in the aggregated "edit" and "admin" roles.
As a mitigation for CVE-2021-25740,
this access is not part of the aggregated roles in clusters that you create using
Kubernetes v1.22 or later.
Existing clusters that have been upgraded to Kubernetes v1.22 will not be
subject to this change. The CVE
announcement includes
guidance for restricting this access in existing clusters.
If you want new clusters to retain this level of access in the aggregated roles,
you can create the following ClusterRole:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:annotations:kubernetes.io/description:|- Add endpoints write permissions to the edit and admin roles. This was
removed by default in 1.22 because of CVE-2021-25740. See
https://issue.k8s.io/103675. This can allow writers to direct LoadBalancer
or Ingress implementations to expose backend IPs that would not otherwise
be accessible, and can circumvent network policies or security controls
intended to prevent/isolate access to those backends.
EndpointSlices were never included in the edit or admin roles, so there
is nothing to restore for the EndpointSlice API.labels:rbac.authorization.k8s.io/aggregate-to-edit:"true"name:custom:aggregate-to-edit:endpoints# you can change this if you wishrules:- apiGroups:[""]resources:["endpoints"]verbs:["create","delete","deletecollection","patch","update"]
Upgrading from ABAC
Clusters that originally ran older Kubernetes versions often used
permissive ABAC policies, including granting full API access to all
service accounts.
Default RBAC policies grant scoped permissions to control-plane components, nodes,
and controllers, but grant no permissions to service accounts outside the kube-system namespace
(beyond the permissions given by API discovery roles).
While far more secure, this can be disruptive to existing workloads expecting to automatically receive API permissions.
Here are two approaches for managing this transition:
Parallel authorizers
Run both the RBAC and ABAC authorizers, and specify a policy file that contains
the legacy ABAC policy:
To explain that first command line option in detail: if earlier authorizers, such as Node,
deny a request, then the RBAC authorizer attempts to authorize the API request. If RBAC
also denies that API request, the ABAC authorizer is then run. This means that any request
allowed by either the RBAC or ABAC policies is allowed.
When the kube-apiserver is run with a log level of 5 or higher for the RBAC component
(--vmodule=rbac*=5 or --v=5), you can see RBAC denials in the API server log
(prefixed with RBAC).
You can use that information to determine which roles need to be granted to which users, groups, or service accounts.
Once you have granted roles to service accounts and workloads
are running with no RBAC denial messages in the server logs, you can remove the ABAC authorizer.
Permissive RBAC permissions
You can replicate a permissive ABAC policy using RBAC role bindings.
Warning:
The following policy allows ALL service accounts to act as cluster administrators.
Any application running in a container receives service account credentials automatically,
and could perform any action against the API, including viewing secrets and modifying permissions.
This is not a recommended policy.
After you have transitioned to use RBAC, you should adjust the access controls
for your cluster to ensure that these meet your information security needs.
3.5 - Using Node Authorization
Node authorization is a special-purpose authorization mode that specifically
authorizes API requests made by kubelets.
Overview
The Node authorizer allows a kubelet to perform API operations. This includes:
Read operations:
services
endpoints
nodes
pods
secrets, configmaps, persistent volume claims and persistent volumes related
to pods bound to the kubelet's node
FEATURE STATE:Kubernetes v1.32 [beta] (enabled by default: true)
When the AuthorizeNodeWithSelectors feature is enabled
(along with the pre-requisite AuthorizeWithSelectors feature),
kubelets are only allowed to read their own Node objects,
and are only allowed to read pods bound to their node.
Write operations:
nodes and node status (enable the NodeRestriction admission plugin to limit
a kubelet to modify its own node)
pods and pod status (enable the NodeRestriction admission plugin to limit a
kubelet to modify pods bound to itself)
the ability to create TokenReviews and SubjectAccessReviews for delegated
authentication/authorization checks
In future releases, the node authorizer may add or remove permissions to ensure
kubelets have the minimal set of permissions required to operate correctly.
In order to be authorized by the Node authorizer, kubelets must use a credential
that identifies them as being in the system:nodes group, with a username of
system:node:<nodeName>.
This group and user name format match the identity created for each kubelet as part of
kubelet TLS bootstrapping.
The value of <nodeName>must match precisely the name of the node as
registered by the kubelet. By default, this is the host name as provided by
hostname, or overridden via the
kubelet option--hostname-override. However, when using the --cloud-provider kubelet
option, the specific hostname may be determined by the cloud provider, ignoring
the local hostname and the --hostname-override option.
For specifics about how the kubelet determines the hostname, see the
kubelet options reference.
To enable the Node authorizer, start the API server
with the --authorization-config flag set to a file that includes the Node authorizer; for example:
To limit the API objects kubelets are able to write, enable the
NodeRestriction
admission plugin by starting the apiserver with
--enable-admission-plugins=...,NodeRestriction,...
Migration considerations
Kubelets outside the system:nodes group
Kubelets outside the system:nodes group would not be authorized by the Node
authorization mode, and would need to continue to be authorized via whatever
mechanism currently authorizes them.
The node admission plugin would not restrict requests from these kubelets.
Kubelets with undifferentiated usernames
In some deployments, kubelets have credentials that place them in the system:nodes group,
but do not identify the particular node they are associated with,
because they do not have a username in the system:node:... format.
These kubelets would not be authorized by the Node authorization mode,
and would need to continue to be authorized via whatever mechanism currently authorizes them.
The NodeRestriction admission plugin would ignore requests from these kubelets,
since the default node identifier implementation would not consider that a node identity.
3.6 - Webhook Mode
A WebHook is an HTTP callback: an HTTP POST that occurs when something happens; a simple event-notification via HTTP POST. A web application implementing WebHooks will POST a message to a URL when certain things happen.
When specified, mode Webhook causes Kubernetes to query an outside REST
service when determining user privileges.
Configuration File Format
Mode Webhook requires a file for HTTP configuration, specify by the
--authorization-webhook-config-file=SOME_FILENAME flag.
The configuration file uses the kubeconfig
file format. Within the file "users" refers to the API Server webhook and
"clusters" refers to the remote service.
A configuration example which uses HTTPS client auth:
# Kubernetes API versionapiVersion:v1# kind of the API objectkind:Config# clusters refers to the remote service.clusters:- name:name-of-remote-authz-servicecluster:# CA for verifying the remote service.certificate-authority:/path/to/ca.pem# URL of remote service to query. Must use 'https'. May not include parameters.server:https://authz.example.com/authorize# users refers to the API Server's webhook configuration.users:- name:name-of-api-serveruser:client-certificate:/path/to/cert.pem# cert for the webhook plugin to useclient-key:/path/to/key.pem # key matching the cert# kubeconfig files require a context. Provide one for the API Server.current-context:webhookcontexts:- context:cluster:name-of-remote-authz-serviceuser:name-of-api-servername:webhook
Request Payloads
When faced with an authorization decision, the API Server POSTs a JSON-
serialized authorization.k8s.io/v1beta1SubjectAccessReview object describing the
action. This object contains fields describing the user attempting to make the
request, and either details about the resource being accessed or requests
attributes.
Note that webhook API objects are subject to the same versioning compatibility rules
as other Kubernetes API objects. Implementers should be aware of looser
compatibility promises for beta objects and check the "apiVersion" field of the
request to ensure correct deserialization. Additionally, the API Server must
enable the authorization.k8s.io/v1beta1 API extensions group (--runtime-config=authorization.k8s.io/v1beta1=true).
The remote service is expected to fill the status field of
the request and respond to either allow or disallow access. The response body's
spec field is ignored and may be omitted. A permissive response would return:
The first method is preferred in most cases, and indicates the authorization
webhook does not allow, or has "no opinion" about the request, but if other
authorizers are configured, they are given a chance to allow the request.
If there are no other authorizers, or none of them allow the request, the
request is forbidden. The webhook would return:
{
"apiVersion": "authorization.k8s.io/v1beta1",
"kind": "SubjectAccessReview",
"status": {
"allowed": false,
"reason": "user does not have read access to the namespace" }
}
The second method denies immediately, short-circuiting evaluation by other
configured authorizers. This should only be used by webhooks that have
detailed knowledge of the full authorizer configuration of the cluster.
The webhook would return:
{
"apiVersion": "authorization.k8s.io/v1beta1",
"kind": "SubjectAccessReview",
"status": {
"allowed": false,
"denied": true,
"reason": "user does not have read access to the namespace" }
}
FEATURE STATE:Kubernetes v1.32 [beta] (enabled by default: true)
With the AuthorizeWithSelectors feature enabled, field and label selectors in the request
are passed to the authorization webhook. The webhook can make authorization decisions
informed by the scoped field and label selectors, if it wishes.
The SubjectAccessReview API documentation
gives guidelines for how these fields should be interpreted and handled by authorization webhooks,
specifically using the parsed requirements rather than the raw selector strings,
and how to handle unrecognized operators safely.
Non-resource paths include: /api, /apis, /metrics,
/logs, /debug, /healthz, /livez, /openapi/v2, /readyz, and
/version. Clients require access to /api, /api/*, /apis, /apis/*,
and /version to discover what resources and versions are present on the server.
Access to other non-resource paths can be disallowed without restricting access
to the REST api.
Attribute-based access control (ABAC) defines an access control paradigm whereby access rights are granted
to users through the use of policies which combine attributes together.
Policy File Format
To enable ABAC mode, specify --authorization-policy-file=SOME_FILENAME and --authorization-mode=ABAC
on startup.
The file format is one JSON object per line. There
should be no enclosing list or map, only one map per line.
Each line is a "policy object", where each such object is a map with the following
properties:
Versioning properties:
apiVersion, type string; valid values are "abac.authorization.kubernetes.io/v1beta1". Allows versioning
and conversion of the policy format.
kind, type string: valid values are "Policy". Allows versioning and conversion of the policy format.
spec property set to a map with the following properties:
Subject-matching properties:
user, type string; the user-string from --token-auth-file. If you specify user, it must match the
username of the authenticated user.
group, type string; if you specify group, it must match one of the groups of the authenticated user.
system:authenticated matches all authenticated requests. system:unauthenticated matches all
unauthenticated requests.
Resource-matching properties:
apiGroup, type string; an API group.
Ex: apps, networking.k8s.io
Wildcard: * matches all API groups.
namespace, type string; a namespace.
Ex: kube-system
Wildcard: * matches all resource requests.
resource, type string; a resource type
Ex: pods, deployments
Wildcard: * matches all resource requests.
Non-resource-matching properties:
nonResourcePath, type string; non-resource request paths.
Ex: /version or /apis
Wildcard:
* matches all non-resource requests.
/foo/* matches all subpaths of /foo/.
readonly, type boolean, when true, means that the Resource-matching policy only applies to get, list,
and watch operations, Non-resource-matching policy only applies to get operation.
Note:
An unset property is the same as a property set to the zero value for its type
(e.g. empty string, 0, false). However, unset should be preferred for
readability.
In the future, policies may be expressed in a JSON format, and managed via a
REST interface.
Authorization Algorithm
A request has attributes which correspond to the properties of a policy object.
When a request is received, the attributes are determined. Unknown attributes
are set to the zero value of its type (e.g. empty string, 0, false).
A property set to "*" will match any value of the corresponding attribute.
The tuple of attributes is checked for a match against every policy in the
policy file. If at least one line matches the request attributes, then the
request is authorized (but may fail later validation).
To permit any authenticated user to do something, write a policy with the
group property set to "system:authenticated".
To permit any unauthenticated user to do something, write a policy with the
group property set to "system:unauthenticated".
To permit a user to do anything, write a policy with the apiGroup, namespace,
resource, and nonResourcePath properties set to "*".
Kubectl
Kubectl uses the /api and /apis endpoints of apiserver to discover
served resource types, and validates objects sent to the API by create/update
operations using schema information located at /openapi/v2.
When using ABAC authorization, those special resources have to be explicitly
exposed via the nonResourcePath property in a policy (see examples below):
/api, /api/*, /apis, and /apis/* for API version negotiation.
/version for retrieving the server version via kubectl version.
/swaggerapi/* for create/update operations.
To inspect the HTTP calls involved in a specific kubectl operation you can turn
up the verbosity:
Creating a new namespace leads to the creation of a new service account in the following format:
system:serviceaccount:<namespace>:default
For example, if you wanted to grant the default service account (in the kube-system namespace) full
privilege to the API using ABAC, you would add this line to your policy file:
The apiserver will need to be restarted to pick up the new policy lines.
3.8 - Admission Control in Kubernetes
This page provides an overview of admission controllers.
An admission controller is a piece of code that intercepts requests to the
Kubernetes API server prior to persistence of the resource, but after the request
is authenticated and authorized.
Several important features of Kubernetes require an admission controller to be enabled in order
to properly support the feature. As a result, a Kubernetes API server that is not properly
configured with the right set of admission controllers is an incomplete server that will not
support all the features you expect.
What are they?
Admission controllers are code within the Kubernetes
API server that check the
data arriving in a request to modify a resource.
Admission controllers apply to requests that create, delete, or modify objects.
Admission controllers can also block custom verbs, such as a request to connect to a
pod via an API server proxy. Admission controllers do not (and cannot) block requests
to read (get, watch or list) objects, because reads bypass the admission
control layer.
Admission control mechanisms may be validating, mutating, or both. Mutating
controllers may modify the data for the resource being modified; validating controllers may not.
The admission controllers in Kubernetes 1.33 consist of the
list below, are compiled into the
kube-apiserver binary, and may only be configured by the cluster
administrator.
Admission control extension points
Within the full list, there are three
special controllers:
MutatingAdmissionWebhook,
ValidatingAdmissionWebhook, and
ValidatingAdmissionPolicy.
The two webhook controllers execute the mutating and validating (respectively)
admission control webhooks
which are configured in the API. ValidatingAdmissionPolicy provides a way to embed
declarative validation code within the API, without relying on any external HTTP
callouts.
You can use these three admission controllers to customize cluster behavior at
admission time.
Admission control phases
The admission control process proceeds in two phases. In the first phase,
mutating admission controllers are run. In the second phase, validating
admission controllers are run. Note again that some of the controllers are
both.
If any of the controllers in either phase reject the request, the entire
request is rejected immediately and an error is returned to the end-user.
Finally, in addition to sometimes mutating the object in question, admission
controllers may sometimes have side effects, that is, mutate related
resources as part of request processing. Incrementing quota usage is the
canonical example of why this is necessary. Any such side-effect needs a
corresponding reclamation or reconciliation process, as a given admission
controller does not know for sure that a given request will pass all of the
other admission controllers.
The ordering of these calls can be seen below.
Why do I need them?
Several important features of Kubernetes require an admission controller to be enabled in order
to properly support the feature. As a result, a Kubernetes API server that is not properly
configured with the right set of admission controllers is an incomplete server and will not
support all the features you expect.
How do I turn on an admission controller?
The Kubernetes API server flag enable-admission-plugins takes a comma-delimited list of admission control plugins to invoke prior to modifying objects in the cluster.
For example, the following command line enables the NamespaceLifecycle and the LimitRanger
admission control plugins:
Depending on the way your Kubernetes cluster is deployed and how the API server is
started, you may need to apply the settings in different ways. For example, you may
have to modify the systemd unit file if the API server is deployed as a systemd
service, you may modify the manifest file for the API server if Kubernetes is deployed
in a self-hosted way.
How do I turn off an admission controller?
The Kubernetes API server flag disable-admission-plugins takes a comma-delimited list of admission control plugins to be disabled, even if they are in the list of plugins enabled by default.
This admission controller allows all pods into the cluster. It is deprecated because
its behavior is the same as if there were no admission controller at all.
AlwaysDeny
FEATURE STATE:Kubernetes v1.13 [deprecated]
Type: Validating.
Rejects all requests. AlwaysDeny is deprecated as it has no real meaning.
AlwaysPullImages
Type: Mutating and Validating.
This admission controller modifies every new Pod to force the image pull policy to Always. This is useful in a
multitenant cluster so that users can be assured that their private images can only be used by those
who have the credentials to pull them. Without this admission controller, once an image has been pulled to a
node, any pod from any user can use it by knowing the image's name (assuming the Pod is
scheduled onto the right node), without any authorization check against the image. When this admission controller
is enabled, images are always pulled prior to starting containers, which means valid credentials are
required.
CertificateApproval
Type: Validating.
This admission controller observes requests to approve CertificateSigningRequest resources and performs additional
authorization checks to ensure the approving user has permission to approve certificate requests with the
spec.signerName requested on the CertificateSigningRequest resource.
See Certificate Signing Requests for more
information on the permissions required to perform different actions on CertificateSigningRequest resources.
CertificateSigning
Type: Validating.
This admission controller observes updates to the status.certificate field of CertificateSigningRequest resources
and performs an additional authorization checks to ensure the signing user has permission to sign certificate
requests with the spec.signerName requested on the CertificateSigningRequest resource.
See Certificate Signing Requests for more
information on the permissions required to perform different actions on CertificateSigningRequest resources.
CertificateSubjectRestriction
Type: Validating.
This admission controller observes creation of CertificateSigningRequest resources that have a spec.signerName
of kubernetes.io/kube-apiserver-client. It rejects any request that specifies a 'group' (or 'organization attribute')
of system:masters.
DefaultIngressClass
Type: Mutating.
This admission controller observes creation of Ingress objects that do not request any specific
ingress class and automatically adds a default ingress class to them. This way, users that do not
request any special ingress class do not need to care about them at all and they will get the
default one.
This admission controller does not do anything when no default ingress class is configured. When more than one ingress
class is marked as default, it rejects any creation of Ingress with an error and an administrator
must revisit their IngressClass objects and mark only one as default (with the annotation
"ingressclass.kubernetes.io/is-default-class"). This admission controller ignores any Ingress
updates; it acts only on creation.
See the Ingress documentation for more about ingress
classes and how to mark one as default.
DefaultStorageClass
Type: Mutating.
This admission controller observes creation of PersistentVolumeClaim objects that do not request any specific storage class
and automatically adds a default storage class to them.
This way, users that do not request any special storage class do not need to care about them at all and they
will get the default one.
This admission controller does nothing when no default StorageClass exists. When more than one storage
class is marked as default, and you then create a PersistentVolumeClaim with no storageClassName set,
Kubernetes uses the most recently created default StorageClass.
When a PersistentVolumeClaim is created with a specified volumeName, it remains in a pending state
if the static volume's storageClassName does not match the storageClassName on the PersistentVolumeClaim
after any default StorageClass is applied to it.
This admission controller ignores any PersistentVolumeClaim updates; it acts only on creation.
See persistent volume documentation about persistent volume claims and
storage classes and how to mark a storage class as default.
DefaultTolerationSeconds
Type: Mutating.
This admission controller sets the default forgiveness toleration for pods to tolerate
the taints notready:NoExecute and unreachable:NoExecute based on the k8s-apiserver input parameters
default-not-ready-toleration-seconds and default-unreachable-toleration-seconds if the pods don't already
have toleration for taints node.kubernetes.io/not-ready:NoExecute or
node.kubernetes.io/unreachable:NoExecute.
The default value for default-not-ready-toleration-seconds and default-unreachable-toleration-seconds is 5 minutes.
DenyServiceExternalIPs
Type: Validating.
This admission controller rejects all net-new usage of the Service field externalIPs. This
feature is very powerful (allows network traffic interception) and not well
controlled by policy. When enabled, users of the cluster may not create new
Services which use externalIPs and may not add new values to externalIPs on
existing Service objects. Existing uses of externalIPs are not affected,
and users may remove values from externalIPs on existing Service objects.
Most users do not need this feature at all, and cluster admins should consider disabling it.
Clusters that do need to use this feature should consider using some custom policy to manage usage
of it.
This admission controller is disabled by default.
EventRateLimit
FEATURE STATE:Kubernetes v1.13 [alpha]
Type: Validating.
This admission controller mitigates the problem where the API server gets flooded by
requests to store new Events. The cluster admin can specify event rate limits by:
Enabling the EventRateLimit admission controller;
Referencing an EventRateLimit configuration file from the file provided to the API
server's command line flag --admission-control-config-file:
This plug-in facilitates creation of dedicated nodes with extended resources.
If operators want to create dedicated nodes with extended resources (like GPUs, FPGAs etc.), they are expected to
taint the node with the extended resource
name as the key. This admission controller, if enabled, automatically
adds tolerations for such taints to pods requesting extended resources, so users don't have to manually
add these tolerations.
This admission controller is disabled by default.
ImagePolicyWebhook
Type: Validating.
The ImagePolicyWebhook admission controller allows a backend webhook to make admission decisions.
This admission controller is disabled by default.
Configuration file format
ImagePolicyWebhook uses a configuration file to set options for the behavior of the backend.
This file may be json or yaml and has the following format:
imagePolicy:kubeConfigFile:/path/to/kubeconfig/for/backend# time in s to cache approvalallowTTL:50# time in s to cache denialdenyTTL:50# time in ms to wait between retriesretryBackoff:500# determines behavior if the webhook backend failsdefaultAllow:true
Reference the ImagePolicyWebhook configuration file from the file provided to the API server's command line flag --admission-control-config-file:
The ImagePolicyWebhook config file must reference a
kubeconfig
formatted file which sets up the connection to the backend.
It is required that the backend communicate over TLS.
The kubeconfig file's cluster field must point to the remote service, and the user field
must contain the returned authorizer.
# clusters refers to the remote service.clusters:- name:name-of-remote-imagepolicy-servicecluster:certificate-authority:/path/to/ca.pem # CA for verifying the remote service.server:https://images.example.com/policy# URL of remote service to query. Must use 'https'.# users refers to the API server's webhook configuration.users:- name:name-of-api-serveruser:client-certificate:/path/to/cert.pem# cert for the webhook admission controller to useclient-key:/path/to/key.pem # key matching the cert
For additional HTTP configuration, refer to the
kubeconfig documentation.
Request payloads
When faced with an admission decision, the API Server POSTs a JSON serialized
imagepolicy.k8s.io/v1alpha1ImageReview object describing the action.
This object contains fields describing the containers being admitted, as well as
any pod annotations that match *.image-policy.k8s.io/*.
Note:
The webhook API objects are subject to the same versioning compatibility rules
as other Kubernetes API objects. Implementers should be aware of looser compatibility
promises for alpha objects and check the apiVersion field of the request to
ensure correct deserialization.
Additionally, the API Server must enable the imagepolicy.k8s.io/v1alpha1 API extensions
group (--runtime-config=imagepolicy.k8s.io/v1alpha1=true).
The remote service is expected to fill the status field of the request and
respond to either allow or disallow access. The response body's spec field is ignored, and
may be omitted. A permissive response would return:
All annotations on a Pod that match *.image-policy.k8s.io/* are sent to the webhook.
Sending annotations allows users who are aware of the image policy backend to
send extra information to it, and for different backends implementations to
accept different information.
Examples of information you might put here are:
request to "break glass" to override a policy, in case of emergency.
a ticket number from a ticket system that documents the break-glass request
provide a hint to the policy server as to the imageID of the image being provided, to save it a lookup
In any case, the annotations are provided by the user and are not validated by Kubernetes in any way.
LimitPodHardAntiAffinityTopology
Type: Validating.
This admission controller denies any pod that defines AntiAffinity topology key other than
kubernetes.io/hostname in requiredDuringSchedulingRequiredDuringExecution.
This admission controller is disabled by default.
LimitRanger
Type: Mutating and Validating.
This admission controller will observe the incoming request and ensure that it does not violate
any of the constraints enumerated in the LimitRange object in a Namespace. If you are using
LimitRange objects in your Kubernetes deployment, you MUST use this admission controller to
enforce those constraints. LimitRanger can also be used to apply default resource requests to Pods
that don't specify any; currently, the default LimitRanger applies a 0.1 CPU requirement to all
Pods in the default namespace.
This admission controller calls any mutating webhooks which match the request. Matching
webhooks are called in serial; each one may modify the object if it desires.
This admission controller (as implied by the name) only runs in the mutating phase.
If a webhook called by this has side effects (for example, decrementing quota) it
must have a reconciliation system, as it is not guaranteed that subsequent
webhooks or validating admission controllers will permit the request to finish.
If you disable the MutatingAdmissionWebhook, you must also disable the
MutatingWebhookConfiguration object in the admissionregistration.k8s.io/v1
group/version via the --runtime-config flag, both are on by default.
Use caution when authoring and installing mutating webhooks
Users may be confused when the objects they try to create are different from
what they get back.
Built in control loops may break when the objects they try to create are
different when read back.
Setting originally unset fields is less likely to cause problems than
overwriting fields set in the original request. Avoid doing the latter.
Future changes to control loops for built-in resources or third-party resources
may break webhooks that work well today. Even when the webhook installation API
is finalized, not all possible webhook behaviors will be guaranteed to be supported
indefinitely.
NamespaceAutoProvision
Type: Mutating.
This admission controller examines all incoming requests on namespaced resources and checks
if the referenced namespace does exist.
It creates a namespace if it cannot be found.
This admission controller is useful in deployments that do not want to restrict creation of
a namespace prior to its usage.
NamespaceExists
Type: Validating.
This admission controller checks all requests on namespaced resources other than Namespace itself.
If the namespace referenced from a request doesn't exist, the request is rejected.
NamespaceLifecycle
Type: Validating.
This admission controller enforces that a Namespace that is undergoing termination cannot have
new objects created in it, and ensures that requests in a non-existent Namespace are rejected.
This admission controller also prevents deletion of three system reserved namespaces default,
kube-system, kube-public.
A Namespace deletion kicks off a sequence of operations that remove all objects (pods, services,
etc.) in that namespace. In order to enforce integrity of that process, we strongly recommend
running this admission controller.
NodeRestriction
Type: Validating.
This admission controller limits the Node and Pod objects a kubelet can modify. In order to be limited by this admission controller,
kubelets must use credentials in the system:nodes group, with a username in the form system:node:<nodeName>.
Such kubelets will only be allowed to modify their own Node API object, and only modify Pod API objects that are bound to their node.
kubelets are not allowed to update or remove taints from their Node API object.
The NodeRestriction admission plugin prevents kubelets from deleting their Node API object,
and enforces kubelet modification of labels under the kubernetes.io/ or k8s.io/ prefixes as follows:
Prevents kubelets from adding/removing/updating labels with a node-restriction.kubernetes.io/ prefix.
This label prefix is reserved for administrators to label their Node objects for workload isolation purposes,
and kubelets will not be allowed to modify labels with that prefix.
Allows kubelets to add/remove/update these labels and label prefixes:
Use of any other labels under the kubernetes.io or k8s.io prefixes by kubelets is reserved,
and may be disallowed or allowed by the NodeRestriction admission plugin in the future.
Future versions may add additional restrictions to ensure kubelets have the minimal set of
permissions required to operate correctly.
OwnerReferencesPermissionEnforcement
Type: Validating.
This admission controller protects the access to the metadata.ownerReferences of an object
so that only users with delete permission to the object can change it.
This admission controller also protects the access to metadata.ownerReferences[x].blockOwnerDeletion
of an object, so that only users with update permission to the finalizers
subresource of the referenced owner can change it.
PersistentVolumeClaimResize
FEATURE STATE:Kubernetes v1.24 [stable]
Type: Validating.
This admission controller implements additional validations for checking incoming
PersistentVolumeClaim resize requests.
Enabling the PersistentVolumeClaimResize admission controller is recommended.
This admission controller prevents resizing of all claims by default unless a claim's StorageClass
explicitly enables resizing by setting allowVolumeExpansion to true.
For example: all PersistentVolumeClaims created from the following StorageClass support volume expansion:
This admission controller defaults and limits what node selectors may be used within a namespace
by reading a namespace annotation and a global configuration.
This admission controller is disabled by default.
Configuration file format
PodNodeSelector uses a configuration file to set options for the behavior of the backend.
Note that the configuration file format will move to a versioned file in a future release.
This file may be json or yaml and has the following format:
This admission controller has the following behavior:
If the Namespace has an annotation with a key scheduler.alpha.kubernetes.io/node-selector,
use its value as the node selector.
If the namespace lacks such an annotation, use the clusterDefaultNodeSelector defined in the
PodNodeSelector plugin configuration file as the node selector.
Evaluate the pod's node selector against the namespace node selector for conflicts. Conflicts
result in rejection.
Evaluate the pod's node selector against the namespace-specific allowed selector defined the
plugin configuration file. Conflicts result in rejection.
Note:
PodNodeSelector allows forcing pods to run on specifically labeled nodes. Also see the PodTolerationRestriction
admission plugin, which allows preventing pods from running on specifically tainted nodes.
PodSecurity
FEATURE STATE:Kubernetes v1.25 [stable]
Type: Validating.
The PodSecurity admission controller checks new Pods before they are
admitted, determines if it should be admitted based on the requested security context and the restrictions on permitted
Pod Security Standards
for the namespace that the Pod would be in.
PodSecurity replaced an older admission controller named PodSecurityPolicy.
PodTolerationRestriction
FEATURE STATE:Kubernetes v1.7 [alpha]
Type: Mutating and Validating.
The PodTolerationRestriction admission controller verifies any conflict between tolerations of a
pod and the tolerations of its namespace.
It rejects the pod request if there is a conflict.
It then merges the tolerations annotated on the namespace into the tolerations of the pod.
The resulting tolerations are checked against a list of allowed tolerations annotated to the namespace.
If the check succeeds, the pod request is admitted otherwise it is rejected.
If the namespace of the pod does not have any associated default tolerations or allowed
tolerations annotated, the cluster-level default tolerations or cluster-level list of allowed tolerations are used
instead if they are specified.
Tolerations to a namespace are assigned via the scheduler.alpha.kubernetes.io/defaultTolerations annotation key.
The list of allowed tolerations can be added via the scheduler.alpha.kubernetes.io/tolerationsWhitelist annotation key.
The PodTopologyLabels admission controller mutates the pods/binding subresources
for all pods bound to a Node, adding topology labels matching those of the bound Node.
This allows Node topology labels to be available as pod labels,
which can be surfaced to running containers using the
Downward API.
The labels available as a result of this controller are the
topology.kubernetes.io/region and
topology.kuberentes.io/zone labels.
{{ }}
If any mutating admission webhook adds or modifies labels of the pods/binding subresource,
these changes will propagate to pod labels as a result of this controller,
overwriting labels with conflicting keys.
{{ }}
This admission controller is enabled when the PodTopologyLabelsAdmission feature gate is enabled.
Priority
Type: Mutating and Validating.
The priority admission controller uses the priorityClassName field and populates the integer
value of the priority.
If the priority class is not found, the Pod is rejected.
ResourceQuota
Type: Validating.
This admission controller will observe the incoming request and ensure that it does not violate
any of the constraints enumerated in the ResourceQuota object in a Namespace. If you are
using ResourceQuota objects in your Kubernetes deployment, you MUST use this admission
controller to enforce quota constraints.
If you define a RuntimeClass with Pod overhead
configured, this admission controller checks incoming Pods.
When enabled, this admission controller rejects any Pod create requests
that have the overhead already set.
For Pods that have a RuntimeClass configured and selected in their .spec,
this admission controller sets .spec.overhead in the Pod based on the value
defined in the corresponding RuntimeClass.
This admission controller implements automation for
serviceAccounts.
The Kubernetes project strongly recommends enabling this admission controller.
You should enable this admission controller if you intend to make any use of Kubernetes
ServiceAccount objects.
To enhance the security measures around Secrets, use separate namespaces to isolate access to mounted secrets.
StorageObjectInUseProtection
Type: Mutating.
The StorageObjectInUseProtection plugin adds the kubernetes.io/pvc-protection or kubernetes.io/pv-protection
finalizers to newly created Persistent Volume Claims (PVCs) or Persistent Volumes (PV).
In case a user deletes a PVC or PV the PVC or PV is not removed until the finalizer is removed
from the PVC or PV by PVC or PV Protection Controller.
Refer to the
Storage Object in Use Protection
for more detailed information.
TaintNodesByCondition
Type: Mutating.
This admission controller taints newly created
Nodes as NotReady and NoSchedule. That tainting avoids a race condition that could cause Pods
to be scheduled on new Nodes before their taints were updated to accurately reflect their reported
conditions.
ValidatingAdmissionPolicy
Type: Validating.
This admission controller implements the CEL validation for incoming matched requests.
It is enabled when both feature gate validatingadmissionpolicy and admissionregistration.k8s.io/v1alpha1 group/version are enabled.
If any of the ValidatingAdmissionPolicy fails, the request fails.
ValidatingAdmissionWebhook
Type: Validating.
This admission controller calls any validating webhooks which match the request. Matching
webhooks are called in parallel; if any of them rejects the request, the request
fails. This admission controller only runs in the validation phase; the webhooks it calls may not
mutate the object, as opposed to the webhooks called by the MutatingAdmissionWebhook admission controller.
If a webhook called by this has side effects (for example, decrementing quota) it
must have a reconciliation system, as it is not guaranteed that subsequent
webhooks or other validating admission controllers will permit the request to finish.
If you disable the ValidatingAdmissionWebhook, you must also disable the
ValidatingWebhookConfiguration object in the admissionregistration.k8s.io/v1
group/version via the --runtime-config flag.
Is there a recommended set of admission controllers to use?
Yes. The recommended admission controllers are enabled by default
(shown here),
so you do not need to explicitly specify them.
You can enable additional admission controllers beyond the default set using the
--enable-admission-plugins flag (order doesn't matter).
3.9 - Dynamic Admission Control
In addition to compiled-in admission plugins,
admission plugins can be developed as extensions and run as webhooks configured at runtime.
This page describes how to build, configure, use, and monitor admission webhooks.
What are admission webhooks?
Admission webhooks are HTTP callbacks that receive admission requests and do
something with them. You can define two types of admission webhooks,
validating admission webhook
and
mutating admission webhook.
Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom defaults.
After all object modifications are complete, and after the incoming object is validated by the API server,
validating admission webhooks are invoked and can reject requests to enforce custom policies.
Note:
Admission webhooks that need to guarantee they see the final state of the object in order to enforce policy
should use a validating admission webhook, since objects can be modified after being seen by mutating webhooks.
Experimenting with admission webhooks
Admission webhooks are essentially part of the cluster control-plane. You should
write and deploy them with great caution. Please read the
user guides
for instructions if you intend to write/deploy production-grade admission webhooks.
In the following, we describe how to quickly experiment with admission webhooks.
Prerequisites
Ensure that MutatingAdmissionWebhook and ValidatingAdmissionWebhook
admission controllers are enabled.
Here
is a recommended set of admission controllers to enable in general.
Ensure that the admissionregistration.k8s.io/v1 API is enabled.
Write an admission webhook server
Please refer to the implementation of the admission webhook server
that is validated in a Kubernetes e2e test. The webhook handles the
AdmissionReview request sent by the API servers, and sends back its decision
as an AdmissionReview object in the same version it received.
See the webhook request section for details on the data sent to webhooks.
See the webhook response section for the data expected from webhooks.
The example admission webhook server leaves the ClientAuth field
empty,
which defaults to NoClientCert. This means that the webhook server does not
authenticate the identity of the clients, supposedly API servers. If you need
mutual TLS or other ways to authenticate the clients, see
how to authenticate API servers.
Deploy the admission webhook service
The webhook server in the e2e test is deployed in the Kubernetes cluster, via
the deployment API.
The test also creates a service
as the front-end of the webhook server. See
code.
You may also deploy your webhooks outside of the cluster. You will need to update
your webhook configurations accordingly.
The following is an example ValidatingWebhookConfiguration, a mutating webhook configuration is similar.
See the webhook configuration section for details about each config field.
You must replace the <CA_BUNDLE> in the above example by a valid CA bundle
which is a PEM-encoded (field value is Base64 encoded) CA bundle for validating the webhook's server certificate.
The scope field specifies if only cluster-scoped resources ("Cluster") or namespace-scoped
resources ("Namespaced") will match this rule. "∗" means that there are no scope restrictions.
Note:
When using clientConfig.service, the server cert must be valid for
<svc_name>.<svc_namespace>.svc.
Note:
Default timeout for a webhook call is 10 seconds,
You can set the timeout and it is encouraged to use a short timeout for webhooks.
If the webhook call times out, the request is handled according to the webhook's
failure policy.
When an API server receives a request that matches one of the rules, the
API server sends an admissionReview request to webhook as specified in the
clientConfig.
After you create the webhook configuration, the system will take a few seconds
to honor the new configuration.
Authenticate API servers
If your admission webhooks require authentication, you can configure the
API servers to use basic auth, bearer token, or a cert to authenticate itself to
the webhooks. There are three steps to complete the configuration.
When starting the API server, specify the location of the admission control
configuration file via the --admission-control-config-file flag.
In the admission control configuration file, specify where the
MutatingAdmissionWebhook controller and ValidatingAdmissionWebhook controller
should read the credentials. The credentials are stored in kubeConfig files
(yes, the same schema that's used by kubectl), so the field name is
kubeConfigFile. Here is an example admission control configuration file:
# Deprecated in v1.17 in favor of apiserver.config.k8s.io/v1apiVersion:apiserver.k8s.io/v1alpha1kind:AdmissionConfigurationplugins:- name:ValidatingAdmissionWebhookconfiguration:# Deprecated in v1.17 in favor of apiserver.config.k8s.io/v1, kind=WebhookAdmissionConfigurationapiVersion:apiserver.config.k8s.io/v1alpha1kind:WebhookAdmissionkubeConfigFile:"<path-to-kubeconfig-file>"- name:MutatingAdmissionWebhookconfiguration:# Deprecated in v1.17 in favor of apiserver.config.k8s.io/v1, kind=WebhookAdmissionConfigurationapiVersion:apiserver.config.k8s.io/v1alpha1kind:WebhookAdmissionkubeConfigFile:"<path-to-kubeconfig-file>"
apiVersion:v1kind:Configusers:# name should be set to the DNS name of the service or the host (including port) of the URL the webhook is configured to speak to.# If a non-443 port is used for services, it must be included in the name when configuring 1.16+ API servers.## For a webhook configured to speak to a service on the default port (443), specify the DNS name of the service:# - name: webhook1.ns1.svc# user: ...## For a webhook configured to speak to a service on non-default port (e.g. 8443), specify the DNS name and port of the service in 1.16+:# - name: webhook1.ns1.svc:8443# user: ...# and optionally create a second stanza using only the DNS name of the service for compatibility with 1.15 API servers:# - name: webhook1.ns1.svc# user: ...## For webhooks configured to speak to a URL, match the host (and port) specified in the webhook's URL. Examples:# A webhook with `url: https://www.example.com`:# - name: www.example.com# user: ...## A webhook with `url: https://www.example.com:443`:# - name: www.example.com:443# user: ...## A webhook with `url: https://www.example.com:8443`:# - name: www.example.com:8443# user: ...#- name:'webhook1.ns1.svc'user:client-certificate-data:"<pem encoded certificate>"client-key-data:"<pem encoded key>"# The `name` supports using * to wildcard-match prefixing segments.- name:'*.webhook-company.org'user:password:"<password>"username:"<name>"# '*' is the default match.- name:'*'user:token:"<token>"
Of course you need to set up the webhook server to handle these authentication requests.
Webhook request and response
Request
Webhooks are sent as POST requests, with Content-Type: application/json,
with an AdmissionReview API object in the admission.k8s.io API group
serialized to JSON as the body.
Webhooks can specify what versions of AdmissionReview objects they accept
with the admissionReviewVersions field in their configuration:
admissionReviewVersions is a required field when creating webhook configurations.
Webhooks are required to support at least one AdmissionReview
version understood by the current and previous API server.
API servers send the first AdmissionReview version in the admissionReviewVersions list they support.
If none of the versions in the list are supported by the API server, the configuration will not be allowed to be created.
If an API server encounters a webhook configuration that was previously created and does not support any of the AdmissionReview
versions the API server knows how to send, attempts to call to the webhook will fail and be subject to the failure policy.
This example shows the data contained in an AdmissionReview object
for a request to update the scale subresource of an apps/v1Deployment:
{"apiVersion": "admission.k8s.io/v1","kind": "AdmissionReview","request": {# Random uid uniquely identifying this admission call"uid": "705ab4f5-6393-11e8-b7cc-42010a800002",# Fully-qualified group/version/kind of the incoming object"kind": {"group": "autoscaling","version": "v1","kind": "Scale"},# Fully-qualified group/version/kind of the resource being modified"resource": {"group": "apps","version": "v1","resource": "deployments"},# Subresource, if the request is to a subresource"subResource": "scale",# Fully-qualified group/version/kind of the incoming object in the original request to the API server# This only differs from `kind` if the webhook specified `matchPolicy: Equivalent` and the original# request to the API server was converted to a version the webhook registered for"requestKind": {"group": "autoscaling","version": "v1","kind": "Scale"},# Fully-qualified group/version/kind of the resource being modified in the original request to the API server# This only differs from `resource` if the webhook specified `matchPolicy: Equivalent` and the original# request to the API server was converted to a version the webhook registered for"requestResource": {"group": "apps","version": "v1","resource": "deployments"},# Subresource, if the request is to a subresource# This only differs from `subResource` if the webhook specified `matchPolicy: Equivalent` and the original# request to the API server was converted to a version the webhook registered for"requestSubResource": "scale",# Name of the resource being modified"name": "my-deployment",# Namespace of the resource being modified, if the resource is namespaced (or is a Namespace object)"namespace": "my-namespace",# operation can be CREATE, UPDATE, DELETE, or CONNECT"operation": "UPDATE","userInfo": {# Username of the authenticated user making the request to the API server"username": "admin",# UID of the authenticated user making the request to the API server"uid": "014fbff9a07c",# Group memberships of the authenticated user making the request to the API server"groups": ["system:authenticated","my-admin-group"],# Arbitrary extra info associated with the user making the request to the API server# This is populated by the API server authentication layer"extra": {"some-key": ["some-value1","some-value2"]}},# object is the new object being admitted. It is null for DELETE operations"object": {"apiVersion": "autoscaling/v1","kind": "Scale"},# oldObject is the existing object. It is null for CREATE and CONNECT operations"oldObject": {"apiVersion": "autoscaling/v1","kind": "Scale"},# options contain the options for the operation being admitted, like meta.k8s.io/v1 CreateOptions,# UpdateOptions, or DeleteOptions. It is null for CONNECT operations"options": {"apiVersion": "meta.k8s.io/v1","kind": "UpdateOptions"},# dryRun indicates the API request is running in dry run mode and will not be persisted# Webhooks with side effects should avoid actuating those side effects when dryRun is true"dryRun": false}}
Response
Webhooks respond with a 200 HTTP status code, Content-Type: application/json,
and a body containing an AdmissionReview object (in the same version they were sent),
with the response stanza populated, serialized to JSON.
At a minimum, the response stanza must contain the following fields:
uid, copied from the request.uid sent to the webhook
allowed, either set to true or false
Example of a minimal response from a webhook to allow a request:
When rejecting a request, the webhook can customize the http code and message returned to the user
using the status field. The specified status object is returned to the user.
See the API documentation
for details about the status type.
Example of a response to forbid a request, customizing the HTTP status code and message presented to the user:
{
"apiVersion": "admission.k8s.io/v1",
"kind": "AdmissionReview",
"response": {
"uid": "<value from request.uid>",
"allowed": false,
"status": {
"code": 403,
"message": "You cannot do this because it is Tuesday and your name starts with A" }
}
}
When allowing a request, a mutating admission webhook may optionally modify the incoming object as well.
This is done using the patch and patchType fields in the response.
The only currently supported patchType is JSONPatch.
See JSON patch documentation for more details.
For patchType: JSONPatch, the patch field contains a base64-encoded array of JSON patch operations.
As an example, a single patch operation that would set spec.replicas would be
[{"op": "add", "path": "/spec/replicas", "value": 3}]
Base64-encoded, this would be W3sib3AiOiAiYWRkIiwgInBhdGgiOiAiL3NwZWMvcmVwbGljYXMiLCAidmFsdWUiOiAzfV0=
Admission webhooks can optionally return warning messages that are returned to the requesting client
in HTTP Warning headers with a warning code of 299. Warnings can be sent with allowed or rejected admission responses.
If you're implementing a webhook that returns a warning:
Don't include a "Warning:" prefix in the message
Use warning messages to describe problems the client making the API request should correct or be aware of
Limit warnings to 120 characters if possible
Caution:
Individual warning messages over 256 characters may be truncated by the API server before being returned to clients.
If more than 4096 characters of warning messages are added (from all sources), additional warning messages are ignored.
{
"apiVersion": "admission.k8s.io/v1",
"kind": "AdmissionReview",
"response": {
"uid": "<value from request.uid>",
"allowed": true,
"warnings": [
"duplicate envvar entries specified with name MY_ENV",
"memory request less than 4MB specified for container mycontainer, which will not start successfully" ]
}
}
Webhook configuration
To register admission webhooks, create MutatingWebhookConfiguration or ValidatingWebhookConfiguration API objects.
The name of a MutatingWebhookConfiguration or a ValidatingWebhookConfiguration object must be a valid
DNS subdomain name.
Each configuration can contain one or more webhooks.
If multiple webhooks are specified in a single configuration, each must be given a unique name.
This is required in order to make resulting audit logs and metrics easier to match up to active
configurations.
Each webhook defines the following things.
Matching requests: rules
Each webhook must specify a list of rules used to determine if a request to the API server should be sent to the webhook.
Each rule specifies one or more operations, apiGroups, apiVersions, and resources, and a resource scope:
operations lists one or more operations to match. Can be "CREATE", "UPDATE", "DELETE", "CONNECT",
or "*" to match all.
apiGroups lists one or more API groups to match. "" is the core API group. "*" matches all API groups.
apiVersions lists one or more API versions to match. "*" matches all API versions.
resources lists one or more resources to match.
"*" matches all resources, but not subresources.
"*/*" matches all resources and subresources.
"pods/*" matches all subresources of pods.
"*/status" matches all status subresources.
scope specifies a scope to match. Valid values are "Cluster", "Namespaced", and "*".
Subresources match the scope of their parent resource. Default is "*".
"Cluster" means that only cluster-scoped resources will match this rule (Namespace API objects are cluster-scoped).
"Namespaced" means that only namespaced resources will match this rule.
"*" means that there are no scope restrictions.
If an incoming request matches one of the specified operations, groups, versions,
resources, and scope for any of a webhook's rules, the request is sent to the webhook.
Here are other examples of rules that could be used to specify which resources should be intercepted.
Match CREATE or UPDATE requests to apps/v1 and apps/v1beta1deployments and replicasets:
Webhooks may optionally limit which requests are intercepted based on the labels of the
objects they would be sent, by specifying an objectSelector. If specified, the objectSelector
is evaluated against both the object and oldObject that would be sent to the webhook,
and is considered to match if either object matches the selector.
A null object (oldObject in the case of create, or newObject in the case of delete),
or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object)
is not considered to match.
Use the object selector only if the webhook is opt-in, because end users may skip
the admission webhook by setting the labels.
This example shows a mutating webhook that would match a CREATE of any resource (but not subresources) with the label foo: bar:
See labels concept
for more examples of label selectors.
Matching requests: namespaceSelector
Webhooks may optionally limit which requests for namespaced resources are intercepted,
based on the labels of the containing namespace, by specifying a namespaceSelector.
The namespaceSelector decides whether to run the webhook on a request for a namespaced resource
(or a Namespace object), based on whether the namespace's labels match the selector.
If the object itself is a namespace, the matching is performed on object.metadata.labels.
If the object is a cluster scoped resource other than a Namespace, namespaceSelector has no effect.
This example shows a mutating webhook that matches a CREATE of any namespaced resource inside a namespace
that does not have a "runlevel" label of "0" or "1":
This example shows a validating webhook that matches a CREATE of any namespaced resource inside
a namespace that is associated with the "environment" of "prod" or "staging":
See labels concept
for more examples of label selectors.
Matching requests: matchPolicy
API servers can make objects available via multiple API groups or versions.
For example, if a webhook only specified a rule for some API groups/versions
(like apiGroups:["apps"], apiVersions:["v1","v1beta1"]),
and a request was made to modify the resource via another API group/version (like extensions/v1beta1),
the request would not be sent to the webhook.
The matchPolicy lets a webhook define how its rules are used to match incoming requests.
Allowed values are Exact or Equivalent.
Exact means a request should be intercepted only if it exactly matches a specified rule.
Equivalent means a request should be intercepted if it modifies a resource listed in rules,
even via another API group or version.
In the example given above, the webhook that only registered for apps/v1 could use matchPolicy:
matchPolicy: Exact would mean the extensions/v1beta1 request would not be sent to the webhook
matchPolicy: Equivalent means the extensions/v1beta1 request would be sent to the webhook
(with the objects converted to a version the webhook had specified: apps/v1)
Specifying Equivalent is recommended, and ensures that webhooks continue to intercept the
resources they expect when upgrades enable new versions of the resource in the API server.
When a resource stops being served by the API server, it is no longer considered equivalent to
other versions of that resource that are still served.
For example, extensions/v1beta1 deployments were first deprecated and then removed (in Kubernetes v1.16).
Since that removal, a webhook with a apiGroups:["extensions"], apiVersions:["v1beta1"], resources:["deployments"] rule
does not intercept deployments created via apps/v1 APIs. For that reason, webhooks should prefer registering
for stable versions of resources.
This example shows a validating webhook that intercepts modifications to deployments (no matter the API group or version),
and is always sent an apps/v1Deployment object:
The matchPolicy for an admission webhooks defaults to Equivalent.
Matching requests: matchConditions
FEATURE STATE:Kubernetes v1.30 [stable] (enabled by default: true)
You can define match conditions for webhooks if you need fine-grained request filtering. These
conditions are useful if you find that match rules, objectSelectors and namespaceSelectors still
doesn't provide the filtering you want over when to call out over HTTP. Match conditions are
CEL expressions. All match conditions must evaluate to true for the
webhook to be called.
Here is an example illustrating a few different uses for match conditions:
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingWebhookConfigurationwebhooks:- name:my-webhook.example.commatchPolicy:Equivalentrules:- operations:['CREATE','UPDATE']apiGroups:['*']apiVersions:['*']resources:['*']failurePolicy:'Ignore'# Fail-open (optional)sideEffects:NoneclientConfig:service:namespace:my-namespacename:my-webhookcaBundle:'<omitted>'# You can have up to 64 matchConditions per webhookmatchConditions:- name:'exclude-leases'# Each match condition must have a unique nameexpression:'!(request.resource.group == "coordination.k8s.io" && request.resource.resource == "leases")'# Match non-lease resources.- name:'exclude-kubelet-requests'expression:'!("system:nodes" in request.userInfo.groups)'# Match requests made by non-node users.- name:'rbac'# Skip RBAC requests, which are handled by the second webhook.expression:'request.resource.group != "rbac.authorization.k8s.io"'# This example illustrates the use of the 'authorizer'. The authorization check is more expensive# than a simple expression, so in this example it is scoped to only RBAC requests by using a second# webhook. Both webhooks can be served by the same endpoint.- name:rbac.my-webhook.example.commatchPolicy:Equivalentrules:- operations:['CREATE','UPDATE']apiGroups:['rbac.authorization.k8s.io']apiVersions:['*']resources:['*']failurePolicy:'Fail'# Fail-closed (the default)sideEffects:NoneclientConfig:service:namespace:my-namespacename:my-webhookcaBundle:'<omitted>'# You can have up to 64 matchConditions per webhookmatchConditions:- name:'breakglass'# Skip requests made by users authorized to 'breakglass' on this webhook.# The 'breakglass' API verb does not need to exist outside this check.expression:'!authorizer.group("admissionregistration.k8s.io").resource("validatingwebhookconfigurations").name("my-webhook.example.com").check("breakglass").allowed()'
Note:
You can define up to 64 elements in the matchConditions field per webhook.
Match conditions have access to the following CEL variables:
object - The object from the incoming request. The value is null for DELETE requests. The object
version may be converted based on the matchPolicy.
oldObject - The existing object. The value is null for CREATE requests.
request - The request portion of the AdmissionReview, excluding object and oldObject.
authorizer - A CEL Authorizer. May be used to perform authorization checks for the principal
(authenticated user) of the request. See
Authz in the Kubernetes CEL library
documentation for more details.
authorizer.requestResource - A shortcut for an authorization check configured with the request
resource (group, resource, (subresource), namespace, name).
Once the API server has determined a request should be sent to a webhook,
it needs to know how to contact the webhook. This is specified in the clientConfig
stanza of the webhook configuration.
Webhooks can either be called via a URL or a service reference,
and can optionally include a custom CA bundle to use to verify the TLS connection.
URL
url gives the location of the webhook, in standard URL form
(scheme://host:port/path).
The host should not refer to a service running in the cluster; use
a service reference by specifying the service field instead.
The host might be resolved via external DNS in some API servers
(e.g., kube-apiserver cannot resolve in-cluster DNS as that would
be a layering violation). host may also be an IP address.
Please note that using localhost or 127.0.0.1 as a host is
risky unless you take great care to run this webhook on all hosts
which run an API server which might need to make calls to this
webhook. Such installations are likely to be non-portable or not readily
run in a new cluster.
The scheme must be "https"; the URL must begin with "https://".
Attempting to use a user or basic auth (for example user:password@) is not allowed.
Fragments (#...) and query parameters (?...) are also not allowed.
Here is an example of a mutating webhook configured to call a URL
(and expects the TLS certificate to be verified using system trust roots, so does not specify a caBundle):
The service stanza inside clientConfig is a reference to the service for this webhook.
If the webhook is running within the cluster, then you should use service instead of url.
The service namespace and name are required. The port is optional and defaults to 443.
The path is optional and defaults to "/".
Here is an example of a mutating webhook configured to call a service on port "1234"
at the subpath "/my-path", and to verify the TLS connection against the ServerName
my-service-name.my-service-namespace.svc using a custom CA bundle:
You must replace the <CA_BUNDLE> in the above example by a valid CA bundle
which is a PEM-encoded CA bundle for validating the webhook's server certificate.
Side effects
Webhooks typically operate only on the content of the AdmissionReview sent to them.
Some webhooks, however, make out-of-band changes as part of processing admission requests.
Webhooks that make out-of-band changes ("side effects") must also have a reconciliation mechanism
(like a controller) that periodically determines the actual state of the world, and adjusts
the out-of-band data modified by the admission webhook to reflect reality.
This is because a call to an admission webhook does not guarantee the admitted object will be persisted as is, or at all.
Later webhooks can modify the content of the object, a conflict could be encountered while writing to storage,
or the server could power off before persisting the object.
Additionally, webhooks with side effects must skip those side-effects when dryRun: true admission requests are handled.
A webhook must explicitly indicate that it will not have side-effects when run with dryRun,
or the dry-run request will not be sent to the webhook and the API request will fail instead.
Webhooks indicate whether they have side effects using the sideEffects field in the webhook configuration:
None: calling the webhook will have no side effects.
NoneOnDryRun: calling the webhook will possibly have side effects, but if a request with
dryRun: true is sent to the webhook, the webhook will suppress the side effects (the webhook
is dryRun-aware).
Here is an example of a validating webhook indicating it has no side effects on dryRun: true requests:
Because webhooks add to API request latency, they should evaluate as quickly as possible.
timeoutSeconds allows configuring how long the API server should wait for a webhook to respond
before treating the call as a failure.
If the timeout expires before the webhook responds, the webhook call will be ignored or
the API call will be rejected based on the failure policy.
The timeout value must be between 1 and 30 seconds.
Here is an example of a validating webhook with a custom timeout of 2 seconds:
The timeout for an admission webhook defaults to 10 seconds.
Reinvocation policy
A single ordering of mutating admissions plugins (including webhooks) does not work for all cases
(see https://issue.k8s.io/64333 as an example). A mutating webhook can add a new sub-structure
to the object (like adding a container to a pod), and other mutating plugins which have already
run may have opinions on those new structures (like setting an imagePullPolicy on all containers).
To allow mutating admission plugins to observe changes made by other plugins,
built-in mutating admission plugins are re-run if a mutating webhook modifies an object,
and mutating webhooks can specify a reinvocationPolicy to control whether they are reinvoked as well.
reinvocationPolicy may be set to Never or IfNeeded. It defaults to Never.
Never: the webhook must not be called more than once in a single admission evaluation.
IfNeeded: the webhook may be called again as part of the admission evaluation if the object
being admitted is modified by other admission plugins after the initial webhook call.
The important elements to note are:
The number of additional invocations is not guaranteed to be exactly one.
If additional invocations result in further modifications to the object, webhooks are not
guaranteed to be invoked again.
Webhooks that use this option may be reordered to minimize the number of additional invocations.
To validate an object after all mutations are guaranteed complete, use a validating admission
webhook instead (recommended for webhooks with side-effects).
Here is an example of a mutating webhook opting into being re-invoked if later admission plugins
modify the object:
Mutating webhooks must be idempotent, able to successfully process an object they have already admitted
and potentially modified. This is true for all mutating admission webhooks, since any change they can make
in an object could already exist in the user-provided object, but it is essential for webhooks that opt into reinvocation.
Failure policy
failurePolicy defines how unrecognized errors and timeout errors from the admission webhook
are handled. Allowed values are Ignore or Fail.
Ignore means that an error calling the webhook is ignored and the API request is allowed to continue.
Fail means that an error calling the webhook causes the admission to fail and the API request to be rejected.
Here is a mutating webhook configured to reject an API request if errors are encountered calling the admission webhook:
The default failurePolicy for an admission webhooks is Fail.
Monitoring admission webhooks
The API server provides ways to monitor admission webhook behaviors. These
monitoring mechanisms help cluster admins to answer questions like:
Which mutating webhook mutated the object in a API request?
What change did the mutating webhook applied to the object?
Which webhooks are frequently rejecting API requests? What's the reason for a rejection?
Mutating webhook auditing annotations
Sometimes it's useful to know which mutating webhook mutated the object in a API request, and what change did the
webhook apply.
The Kubernetes API server performs auditing on each
mutating webhook invocation. Each invocation generates an auditing annotation
capturing if a request object is mutated by the invocation, and optionally generates an annotation
capturing the applied patch from the webhook admission response. The annotations are set in the
audit event for given request on given stage of its execution, which is then pre-processed
according to a certain policy and written to a backend.
The audit level of a event determines which annotations get recorded:
At Metadata audit level or higher, an annotation with key
mutation.webhook.admission.k8s.io/round_{round idx}_index_{order idx} gets logged with JSON
payload indicating a webhook gets invoked for given request and whether it mutated the object or not.
For example, the following annotation gets recorded for a webhook being reinvoked. The webhook is
ordered the third in the mutating webhook chain, and didn't mutated the request object during the
invocation.
# the audit event recorded{"kind": "Event","apiVersion": "audit.k8s.io/v1","annotations": {"mutation.webhook.admission.k8s.io/round_1_index_2": "{\"configuration\":\"my-mutating-webhook-configuration.example.com\",\"webhook\":\"my-webhook.example.com\",\"mutated\": false}"# other annotations...}# other fields...}
# the annotation value deserialized{"configuration": "my-mutating-webhook-configuration.example.com","webhook": "my-webhook.example.com","mutated": false}
The following annotation gets recorded for a webhook being invoked in the first round. The webhook
is ordered the first in the mutating webhook chain, and mutated the request object during the
invocation.
# the audit event recorded{"kind": "Event","apiVersion": "audit.k8s.io/v1","annotations": {"mutation.webhook.admission.k8s.io/round_0_index_0": "{\"configuration\":\"my-mutating-webhook-configuration.example.com\",\"webhook\":\"my-webhook-always-mutate.example.com\",\"mutated\": true}"# other annotations...}# other fields...}
# the annotation value deserialized{"configuration": "my-mutating-webhook-configuration.example.com","webhook": "my-webhook-always-mutate.example.com","mutated": true}
At Request audit level or higher, an annotation with key
patch.webhook.admission.k8s.io/round_{round idx}_index_{order idx} gets logged with JSON payload indicating
a webhook gets invoked for given request and what patch gets applied to the request object.
For example, the following annotation gets recorded for a webhook being reinvoked. The webhook is ordered the fourth in the
mutating webhook chain, and responded with a JSON patch which got applied to the request object.
# the audit event recorded{"kind": "Event","apiVersion": "audit.k8s.io/v1","annotations": {"patch.webhook.admission.k8s.io/round_1_index_3": "{\"configuration\":\"my-other-mutating-webhook-configuration.example.com\",\"webhook\":\"my-webhook-always-mutate.example.com\",\"patch\":[{\"op\":\"add\",\"path\":\"/data/mutation-stage\",\"value\":\"yes\"}],\"patchType\":\"JSONPatch\"}"# other annotations...}# other fields...}
# the annotation value deserialized{"configuration": "my-other-mutating-webhook-configuration.example.com","webhook": "my-webhook-always-mutate.example.com","patchType": "JSONPatch","patch": [{"op": "add","path": "/data/mutation-stage","value": "yes"}]}
Admission webhook metrics
The API server exposes Prometheus metrics from the /metrics endpoint, which can be used for monitoring and
diagnosing API server status. The following metrics record status related to admission webhooks.
API server admission webhook rejection count
Sometimes it's useful to know which admission webhooks are frequently rejecting API requests, and the
reason for a rejection.
The API server exposes a Prometheus counter metric recording admission webhook rejections. The
metrics are labelled to identify the causes of webhook rejection(s):
name: the name of the webhook that rejected a request.
operation: the operation type of the request, can be one of CREATE,
UPDATE, DELETE and CONNECT.
type: the admission webhook type, can be one of admit and validating.
error_type: identifies if an error occurred during the webhook invocation
that caused the rejection. Its value can be one of:
calling_webhook_error: unrecognized errors or timeout errors from the admission webhook happened and the
webhook's Failure policy is set to Fail.
no_error: no error occurred. The webhook rejected the request with allowed: false in the admission
response. The metrics label rejection_code records the .status.code set in the admission response.
apiserver_internal_error: an API server internal error happened.
rejection_code: the HTTP status code set in the admission response when a
webhook rejected a request.
Example of the rejection count metrics:
# HELP apiserver_admission_webhook_rejection_count [ALPHA] Admission webhook rejection count, identified by name and broken out for each admission type (validating or admit) and operation. Additional labels specify an error type (calling_webhook_error or apiserver_internal_error if an error occurred; no_error otherwise) and optionally a non-zero rejection code if the webhook rejects the request with an HTTP status code (honored by the apiserver when the code is greater or equal to 400). Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded.
# TYPE apiserver_admission_webhook_rejection_count counter
apiserver_admission_webhook_rejection_count{error_type="calling_webhook_error",name="always-timeout-webhook.example.com",operation="CREATE",rejection_code="0",type="validating"} 1
apiserver_admission_webhook_rejection_count{error_type="calling_webhook_error",name="invalid-admission-response-webhook.example.com",operation="CREATE",rejection_code="0",type="validating"} 1
apiserver_admission_webhook_rejection_count{error_type="no_error",name="deny-unwanted-configmap-data.example.com",operation="CREATE",rejection_code="400",type="validating"} 13
This task guide explains some of the concepts behind ServiceAccounts. The
guide also explains how to obtain or revoke tokens that represent
ServiceAccounts, and how to (optionally) bind a ServiceAccount's validity to
the lifetime of an API object.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must
be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a
cluster, you can create one by using
minikube
or you can use one of these Kubernetes playgrounds:
To be able to follow these steps exactly, ensure you have a namespace named
examplens.
If you don't, create one by running:
kubectl create namespace examplens
User accounts versus service accounts
Kubernetes distinguishes between the concept of a user account and a service account
for a number of reasons:
User accounts are for humans. Service accounts are for application processes,
which (for Kubernetes) run in containers that are part of pods.
User accounts are intended to be global: names must be unique across all
namespaces of a cluster. No matter what namespace you look at, a particular
username that represents a user represents the same user.
In Kubernetes, service accounts are namespaced: two different namespaces can
contain ServiceAccounts that have identical names.
Typically, a cluster's user accounts might be synchronised from a corporate
database, where new user account creation requires special privileges and is
tied to complex business processes. By contrast, service account creation is
intended to be more lightweight, allowing cluster users to create service accounts
for specific tasks on demand. Separating ServiceAccount creation from the steps to
onboard human users makes it easier for workloads to follow the principle of
least privilege.
Auditing considerations for humans and service accounts may differ; the separation
makes that easier to achieve.
A configuration bundle for a complex system may include definition of various service
accounts for components of that system. Because service accounts can be created
without many constraints and have namespaced names, such configuration is
usually portable.
Bound service account tokens
ServiceAccount tokens can be bound to API objects that exist in the kube-apiserver.
This can be used to tie the validity of a token to the existence of another API object.
Supported object types are as follows:
Pod (used for projected volume mounts, see below)
Secret (can be used to allow revoking a token by deleting the Secret)
Node (can be used to auto-revoke a token when its Node is deleted; creating new node-bound tokens is GA in v1.33+)
When a token is bound to an object, the object's metadata.name and metadata.uid are
stored as extra 'private claims' in the issued JWT.
When a bound token is presented to the kube-apiserver, the service account authenticator
will extract and verify these claims.
If the referenced object or the ServiceAccount is pending deletion (for example, due to finalizers),
then for any instant that is 60 seconds (or more) after the .metadata.deletionTimestamp date,
authentication with that token would fail.
If the referenced object no longer exists (or its metadata.uid does not match),
the request will not be authenticated.
Additional metadata in Pod bound tokens
FEATURE STATE:Kubernetes v1.32 [stable] (enabled by default: true)
When a service account token is bound to a Pod object, additional metadata is also
embedded into the token that indicates the value of the bound pod's spec.nodeName field,
and the uid of that Node, if available.
This node information is not verified by the kube-apiserver when the token is used for authentication.
It is included so integrators do not have to fetch Pod or Node API objects to check the associated Node name
and uid when inspecting a JWT.
Verifying and inspecting private claims
The TokenReview API can be used to verify and extract private claims from a token:
First, assume you have a pod named test-pod and a service account named my-sa.
Despite using kubectl create -f to create this resource, and defining it similar to
other resource types in Kubernetes, TokenReview is a special type and the kube-apiserver
does not actually persist the TokenReview object into etcd.
Hence kubectl get tokenreview is not a valid command.
Schema for service account private claims
The schema for the Kubernetes-specific claims within JWT tokens is not currently documented,
however the relevant code area can be found in
the serviceaccount package
in the Kubernetes codebase.
You can inspect a JWT using standard JWT decoding tool. Below is an example of a JWT for the
my-serviceaccount ServiceAccount, bound to a Pod object named my-pod which is scheduled
to the Node my-node, in the my-namespace namespace:
The aud and iss fields in this JWT may differ between different Kubernetes clusters depending
on your configuration.
The presence of both the pod and node claim implies that this token is bound
to a Pod object. When verifying Pod bound ServiceAccount tokens, the API server does not
verify the existence of the referenced Node object.
Services that run outside of Kubernetes and want to perform offline validation of JWTs may
use this schema, along with a compliant JWT validator configured with OpenID Discovery information
from the API server, to verify presented JWTs without requiring use of the TokenReview API.
Services that verify JWTs in this way do not verify the claims embedded in the JWT token to be
current and still valid.
This means if the token is bound to an object, and that object no longer exists, the token will still
be considered valid (until the configured token expires).
Clients that require assurance that a token's bound claims are still valid MUST use the TokenReview
API to present the token to the kube-apiserver for it to verify and expand the embedded claims, using
similar steps to the Verifying and inspecting private claims
section above, but with a supported client library.
For more information on JWTs and their structure, see the JSON Web Token RFC.
Bound service account token volume mechanism
FEATURE STATE:Kubernetes v1.22 [stable] (enabled by default: true)
Here's an example of how that looks for a launched Pod:
...- name:kube-api-access-<random-suffix>projected:sources:- serviceAccountToken:path:token# must match the path the app expects- configMap:items:- key:ca.crtpath:ca.crtname:kube-root-ca.crt- downwardAPI:items:- fieldRef:apiVersion:v1fieldPath:metadata.namespacepath:namespace
That manifest snippet defines a projected volume that consists of three sources. In this case,
each source also represents a single path within that volume. The three sources are:
A serviceAccountToken source, that contains a token that the kubelet acquires from kube-apiserver.
The kubelet fetches time-bound tokens using the TokenRequest API. A token served for a TokenRequest expires
either when the pod is deleted or after a defined lifespan (by default, that is 1 hour).
The kubelet also refreshes that token before the token expires.
The token is bound to the specific Pod and has the kube-apiserver as its audience.
This mechanism superseded an earlier mechanism that added a volume based on a Secret,
where the Secret represented the ServiceAccount for the Pod, but did not expire.
A configMap source. The ConfigMap contains a bundle of certificate authority data. Pods can use these
certificates to make sure that they are connecting to your cluster's kube-apiserver (and not to middlebox
or an accidentally misconfigured peer).
A downwardAPI source that looks up the name of the namespace containing the Pod, and makes
that name information available to application code running inside the Pod.
Any container within the Pod that mounts this particular volume can access the above information.
Note:
There is no specific mechanism to invalidate a token issued via TokenRequest. If you no longer
trust a bound service account token for a Pod, you can delete that Pod. Deleting a Pod expires
its bound service account tokens.
Manual Secret management for ServiceAccounts
Versions of Kubernetes before v1.22 automatically created credentials for accessing
the Kubernetes API. This older mechanism was based on creating token Secrets that
could then be mounted into running Pods.
In more recent versions, including Kubernetes v1.33, API credentials
are obtained directly using the
TokenRequest API,
and are mounted into Pods using a projected volume.
The tokens obtained using this method have bounded lifetimes, and are automatically
invalidated when the Pod they are mounted into is deleted.
You can still manually create
a Secret to hold a service account token; for example, if you need a token that never expires.
Once you manually create a Secret and link it to a ServiceAccount,
the Kubernetes control plane automatically populates the token into that Secret.
Note:
Although the manual mechanism for creating a long-lived ServiceAccount token exists,
using TokenRequest
to obtain short-lived API access tokens is recommended instead.
Auto-generated legacy ServiceAccount token clean up
Before version 1.24, Kubernetes automatically generated Secret-based tokens for
ServiceAccounts. To distinguish between automatically generated tokens and
manually created ones, Kubernetes checks for a reference from the
ServiceAccount's secrets field. If the Secret is referenced in the secrets
field, it is considered an auto-generated legacy token. Otherwise, it is
considered a manually created legacy token. For example:
apiVersion:v1kind:ServiceAccountmetadata:name:build-robotnamespace:defaultsecrets:- name:build-robot-secret# usually NOT present for a manually generated token
Beginning from version 1.29, legacy ServiceAccount tokens that were generated
automatically will be marked as invalid if they remain unused for a certain
period of time (set to default at one year). Tokens that continue to be unused
for this defined period (again, by default, one year) will subsequently be
purged by the control plane.
If users use an invalidated auto-generated token, the token validator will
add an audit annotation for the key-value pair
authentication.k8s.io/legacy-token-invalidated: <secret name>/<namespace>,
increment the invalid_legacy_auto_token_uses_total metric count,
update the Secret label kubernetes.io/legacy-token-last-used with the new
date,
return an error indicating that the token has been invalidated.
When receiving this validation error, users can update the Secret to remove the
kubernetes.io/legacy-token-invalid-since label to temporarily allow use of
this token.
Here's an example of an auto-generated legacy token that has been marked with the
kubernetes.io/legacy-token-last-used and kubernetes.io/legacy-token-invalid-since
labels:
A ServiceAccount controller manages the ServiceAccounts inside namespaces, and
ensures a ServiceAccount named "default" exists in every active namespace.
Token controller
The service account token controller runs as part of kube-controller-manager.
This controller acts asynchronously. It:
watches for ServiceAccount deletion and deletes all corresponding ServiceAccount
token Secrets.
watches for ServiceAccount token Secret addition, and ensures the referenced
ServiceAccount exists, and adds a token to the Secret if needed.
watches for Secret deletion and removes a reference from the corresponding
ServiceAccount if needed.
You must pass a service account private key file to the token controller in
the kube-controller-manager using the --service-account-private-key-file
flag. The private key is used to sign generated service account tokens.
Similarly, you must pass the corresponding public key to the kube-apiserver
using the --service-account-key-file flag. The public key will be used to
verify the tokens during authentication.
FEATURE STATE:Kubernetes v1.32 [alpha] (enabled by default: false)
An alternate setup to setting --service-account-private-key-file and --service-account-key-file flags is
to configure an external JWT signer for external ServiceAccount token signing and key management.
Note that these setups are mutually exclusive and cannot be configured together.
ServiceAccount admission controller
The modification of pods is implemented via a plugin
called an Admission Controller.
It is part of the API server.
This admission controller acts synchronously to modify pods as they are created.
When this plugin is active (and it is by default on most distributions), then
it does the following when a Pod is created:
If the pod does not have a .spec.serviceAccountName set, the admission controller sets the name of the
ServiceAccount for this incoming Pod to default.
The admission controller ensures that the ServiceAccount referenced by the incoming Pod exists. If there
is no ServiceAccount with a matching name, the admission controller rejects the incoming Pod. That check
applies even for the default ServiceAccount.
Provided that neither the ServiceAccount's automountServiceAccountToken field nor the
Pod's automountServiceAccountToken field is set to false:
the admission controller mutates the incoming Pod, adding an extra
volume that contains
a token for API access.
the admission controller adds a volumeMount to each container in the Pod,
skipping any containers that already have a volume mount defined for the path
/var/run/secrets/kubernetes.io/serviceaccount.
For Linux containers, that volume is mounted at /var/run/secrets/kubernetes.io/serviceaccount;
on Windows nodes, the mount is at the equivalent path.
If the spec of the incoming Pod doesn't already contain any imagePullSecrets, then the
admission controller adds imagePullSecrets, copying them from the ServiceAccount.
Legacy ServiceAccount token tracking controller
FEATURE STATE:Kubernetes v1.28 [stable] (enabled by default: true)
This controller generates a ConfigMap called
kube-system/kube-apiserver-legacy-service-account-token-tracking in the
kube-system namespace. The ConfigMap records the timestamp when legacy service
account tokens began to be monitored by the system.
Legacy ServiceAccount token cleaner
FEATURE STATE:Kubernetes v1.30 [stable] (enabled by default: true)
The legacy ServiceAccount token cleaner runs as part of the
kube-controller-manager and checks every 24 hours to see if any auto-generated
legacy ServiceAccount token has not been used in a specified amount of time.
If so, the cleaner marks those tokens as invalid.
The cleaner works by first checking the ConfigMap created by the control plane
(provided that LegacyServiceAccountTokenTracking is enabled). If the current
time is a specified amount of time after the date in the ConfigMap, the
cleaner then loops through the list of Secrets in the cluster and evaluates each
Secret that has the type kubernetes.io/service-account-token.
If a Secret meets all of the following conditions, the cleaner marks it as
invalid:
The Secret is auto-generated, meaning that it is bi-directionally referenced
by a ServiceAccount.
The Secret is not currently mounted by any pods.
The Secret has not been used in a specified amount of time since it was
created or since it was last used.
The cleaner marks a Secret invalid by adding a label called
kubernetes.io/legacy-token-invalid-since to the Secret, with the current date
as the value. If an invalid Secret is not used in a specified amount of time,
the cleaner will delete it.
Note:
All the specified amount of time above defaults to one year. The cluster
administrator can configure this value through the
--legacy-service-account-token-clean-up-period command line argument for the
kube-controller-manager component.
TokenRequest API
FEATURE STATE:Kubernetes v1.22 [stable]
You use the TokenRequest
subresource of a ServiceAccount to obtain a time-bound token for that ServiceAccount.
You don't need to call this to obtain an API token for use within a container, since
the kubelet sets this up for you using a projected volume.
The Kubernetes control plane (specifically, the ServiceAccount admission controller)
adds a projected volume to Pods, and the kubelet ensures that this volume contains a token
that lets containers authenticate as the right ServiceAccount.
(This mechanism superseded an earlier mechanism that added a volume based on a Secret,
where the Secret represented the ServiceAccount for the Pod but did not expire.)
Here's an example of how that looks for a launched Pod:
That manifest snippet defines a projected volume that combines information from three sources:
A serviceAccountToken source, that contains a token that the kubelet acquires from kube-apiserver.
The kubelet fetches time-bound tokens using the TokenRequest API. A token served for a TokenRequest expires
either when the pod is deleted or after a defined lifespan (by default, that is 1 hour).
The token is bound to the specific Pod and has the kube-apiserver as its audience.
A configMap source. The ConfigMap contains a bundle of certificate authority data. Pods can use these
certificates to make sure that they are connecting to your cluster's kube-apiserver (and not to a middlebox
or an accidentally misconfigured peer).
A downwardAPI source. This downwardAPI volume makes the name of the namespace containing the Pod available
to application code running inside the Pod.
Any container within the Pod that mounts this volume can access the above information.
Create additional API tokens
Caution:
Only create long-lived API tokens if the token request mechanism
is not suitable. The token request mechanism provides time-limited tokens; because these
expire, they represent a lower risk to information security.
To create a non-expiring, persisted API token for a ServiceAccount, create a
Secret of type kubernetes.io/service-account-token with an annotation
referencing the ServiceAccount. The control plane then generates a long-lived token and
updates that Secret with that generated token data.
If you launch a new Pod into the examplens namespace, it can use the myserviceaccount
service-account-token Secret that you just created.
Caution:
Do not reference manually created Secrets in the secrets field of a
ServiceAccount. Or the manually created Secrets will be cleaned if it is not used for a long
time. Please refer to auto-generated legacy ServiceAccount token clean up.
Delete/invalidate a ServiceAccount token
If you know the name of the Secret that contains the token you want to remove:
kubectl delete secret name-of-secret
Otherwise, first find the Secret for the ServiceAccount.
# This assumes that you already have a namespace named 'examplens'kubectl -n examplens get serviceaccount/example-automated-thing -o yaml
External ServiceAccount token signing and key management
FEATURE STATE:Kubernetes v1.32 [alpha] (enabled by default: false)
The kube-apiserver can be configured to use external signer for token signing and token verifying key management.
This feature enables kubernetes distributions to integrate with key management solutions of their choice
(for example, HSMs, cloud KMSes) for service account credential signing and verification.
To configure kube-apiserver to use external-jwt-signer set the --service-account-signing-endpoint flag
to the location of a Unix domain socket (UDS) on a filesystem, or be prefixed with an @ symbol and name
a UDS in the abstract socket namespace. At the configured UDS, shall be an RPC server which implements
ExternalJWTSigner.
The external-jwt-signer must be healthy and be ready to serve supported service account keys for the kube-apiserver to start.
Check out KEP-740
for more details on ExternalJWTSigner.
Note:
The kube-apiserver flags --service-account-key-file and --service-account-signing-key-file will continue
to be used for reading from files unless --service-account-signing-endpoint is set; they are mutually
exclusive ways of supporting JWT signing and authentication.
Clean up
If you created a namespace examplens to experiment with, you can remove it:
3.11 - Certificates and Certificate Signing Requests
Kubernetes certificate and trust bundle APIs enable automation of
X.509 credential provisioning by providing
a programmatic interface for clients of the Kubernetes API to request and obtain
X.509 certificates from a Certificate Authority (CA).
There is also experimental (alpha) support for distributing trust bundles.
Certificate signing requests
FEATURE STATE:Kubernetes v1.19 [stable]
A CertificateSigningRequest
(CSR) resource is used to request that a certificate be signed
by a denoted signer, after which the request may be approved or denied before
finally being signed.
Request signing process
The CertificateSigningRequest resource type allows a client to ask for an X.509 certificate
be issued, based on a signing request.
The CertificateSigningRequest object includes a PEM-encoded PKCS#10 signing request in
the spec.request field. The CertificateSigningRequest denotes the signer (the
recipient that the request is being made to) using the spec.signerName field.
Note that spec.signerName is a required key after API version certificates.k8s.io/v1.
In Kubernetes v1.22 and later, clients may optionally set the spec.expirationSeconds
field to request a particular lifetime for the issued certificate. The minimum valid
value for this field is 600, i.e. ten minutes.
Once created, a CertificateSigningRequest must be approved before it can be signed.
Depending on the signer selected, a CertificateSigningRequest may be automatically approved
by a controller.
Otherwise, a CertificateSigningRequest must be manually approved either via the REST API (or client-go)
or by running kubectl certificate approve. Likewise, a CertificateSigningRequest may also be denied,
which tells the configured signer that it must not sign the request.
For certificates that have been approved, the next step is signing. The relevant signing controller
first validates that the signing conditions are met and then creates a certificate.
The signing controller then updates the CertificateSigningRequest, storing the new certificate into
the status.certificate field of the existing CertificateSigningRequest object. The
status.certificate field is either empty or contains a X.509 certificate, encoded in PEM format.
The CertificateSigningRequest status.certificate field is empty until the signer does this.
Once the status.certificate field has been populated, the request has been completed and clients can now
fetch the signed certificate PEM data from the CertificateSigningRequest resource.
The signers can instead deny certificate signing if the approval conditions are not met.
In order to reduce the number of old CertificateSigningRequest resources left in a cluster, a garbage collection
controller runs periodically. The garbage collection removes CertificateSigningRequests that have not changed
state for some duration:
Approved requests: automatically deleted after 1 hour
Denied requests: automatically deleted after 1 hour
Failed requests: automatically deleted after 1 hour
Pending requests: automatically deleted after 24 hours
All requests: automatically deleted after the issued certificate has expired
Certificate signing authorization
To allow creating a CertificateSigningRequest and retrieving any CertificateSigningRequest:
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:csr-approverrules:- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequestsverbs:- get- list- watch- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequests/approvalverbs:- update- apiGroups:- certificates.k8s.ioresources:- signersresourceNames:- example.com/my-signer-name# example.com/* can be used to authorize for all signers in the 'example.com' domainverbs:- approve
apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:csr-signerrules:- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequestsverbs:- get- list- watch- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequests/statusverbs:- update- apiGroups:- certificates.k8s.ioresources:- signersresourceNames:- example.com/my-signer-name# example.com/* can be used to authorize for all signers in the 'example.com' domainverbs:- sign
Signers
Signers abstractly represent the entity or entities that might sign, or have
signed, a security certificate.
Any signer that is made available for outside a particular cluster should provide information
about how the signer works, so that consumers can understand what that means for CertifcateSigningRequests
and (if enabled) ClusterTrustBundles.
This includes:
Trust distribution: how trust anchors (CA certificates or certificate bundles) are distributed.
Permitted subjects: any restrictions on and behavior when a disallowed subject is requested.
Permitted x509 extensions: including IP subjectAltNames, DNS subjectAltNames,
Email subjectAltNames, URI subjectAltNames etc, and behavior when a disallowed extension is requested.
Permitted key usages / extended key usages: any restrictions on and behavior
when usages different than the signer-determined usages are specified in the CSR.
Expiration/certificate lifetime: whether it is fixed by the signer, configurable by the admin, determined by the CSR spec.expirationSeconds field, etc
and the behavior when the signer-determined expiration is different from the CSR spec.expirationSeconds field.
CA bit allowed/disallowed: and behavior if a CSR contains a request a for a CA certificate when the signer does not permit it.
Commonly, the status.certificate field of a CertificateSigningRequest contains a
single PEM-encoded X.509 certificate once the CSR is approved and the certificate is issued.
Some signers store multiple certificates into the status.certificate field. In
that case, the documentation for the signer should specify the meaning of
additional certificates; for example, this might be the certificate plus
intermediates to be presented during TLS handshakes.
If you want to make the trust anchor (root certificate) available, this should be done
separately from a CertificateSigningRequest and its status.certificate field. For example,
you could use a ClusterTrustBundle.
The PKCS#10 signing request format does not have a standard mechanism to specify a
certificate expiration or lifetime. The expiration or lifetime therefore has to be set
through the spec.expirationSeconds field of the CSR object. The built-in signers
use the ClusterSigningDuration configuration option, which defaults to 1 year,
(the --cluster-signing-duration command-line flag of the kube-controller-manager)
as the default when no spec.expirationSeconds is specified. When spec.expirationSeconds
is specified, the minimum of spec.expirationSeconds and ClusterSigningDuration is
used.
Note:
The spec.expirationSeconds field was added in Kubernetes v1.22. Earlier versions of Kubernetes do not honor this field.
Kubernetes API servers prior to v1.22 will silently drop this field when the object is created.
Kubernetes signers
Kubernetes provides built-in signers that each have a well-known signerName:
kubernetes.io/kube-apiserver-client: signs certificates that will be honored as client certificates by the API server.
Never auto-approved by kube-controller-manager.
Trust distribution: signed certificates must be honored as client certificates by the API server. The CA bundle is not distributed by any other means.
Permitted subjects - no subject restrictions, but approvers and signers may choose not to approve or sign.
Certain subjects like cluster-admin level users or groups vary between distributions and installations,
but deserve additional scrutiny before approval and signing.
The CertificateSubjectRestriction admission plugin is enabled by default to restrict system:masters,
but it is often not the only cluster-admin subject in a cluster.
Permitted x509 extensions - honors subjectAltName and key usage extensions and discards other extensions.
Permitted key usages - must include ["client auth"]. Must not include key usages beyond ["digital signature", "key encipherment", "client auth"].
Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the --cluster-signing-duration option or, if specified, the spec.expirationSeconds field of the CSR object.
CA bit allowed/disallowed - not allowed.
kubernetes.io/kube-apiserver-client-kubelet: signs client certificates that will be honored as client certificates by the
API server.
May be auto-approved by kube-controller-manager.
Trust distribution: signed certificates must be honored as client certificates by the API server. The CA bundle
is not distributed by any other means.
Permitted subjects - organizations are exactly ["system:nodes"], common name is "system:node:${NODE_NAME}".
Permitted x509 extensions - honors key usage extensions, forbids subjectAltName extensions and drops other extensions.
Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the --cluster-signing-duration option or, if specified, the spec.expirationSeconds field of the CSR object.
CA bit allowed/disallowed - not allowed.
kubernetes.io/kubelet-serving: signs serving certificates that are honored as a valid kubelet serving certificate
by the API server, but has no other guarantees.
Never auto-approved by kube-controller-manager.
Trust distribution: signed certificates must be honored by the API server as valid to terminate connections to a kubelet.
The CA bundle is not distributed by any other means.
Permitted subjects - organizations are exactly ["system:nodes"], common name is "system:node:${NODE_NAME}".
Permitted x509 extensions - honors key usage and DNSName/IPAddress subjectAltName extensions, forbids EmailAddress and
URI subjectAltName extensions, drops other extensions. At least one DNS or IP subjectAltName must be present.
Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the --cluster-signing-duration option or, if specified, the spec.expirationSeconds field of the CSR object.
CA bit allowed/disallowed - not allowed.
kubernetes.io/legacy-unknown: has no guarantees for trust at all. Some third-party distributions of Kubernetes
may honor client certificates signed by it. The stable CertificateSigningRequest API (version certificates.k8s.io/v1 and later)
does not allow to set the signerName as kubernetes.io/legacy-unknown.
Never auto-approved by kube-controller-manager.
Trust distribution: None. There is no standard trust or distribution for this signer in a Kubernetes cluster.
Permitted subjects - any
Permitted x509 extensions - honors subjectAltName and key usage extensions and discards other extensions.
Permitted key usages - any
Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the --cluster-signing-duration option or, if specified, the spec.expirationSeconds field of the CSR object.
CA bit allowed/disallowed - not allowed.
The kube-controller-manager implements control plane signing for each of the built in
signers. Failures for all of these are only reported in kube-controller-manager logs.
Note:
The spec.expirationSeconds field was added in Kubernetes v1.22. Earlier versions of Kubernetes do not honor this field.
Kubernetes API servers prior to v1.22 will silently drop this field when the object is created.
Distribution of trust happens out of band for these signers. Any trust outside of those described above are strictly
coincidental. For instance, some distributions may honor kubernetes.io/legacy-unknown as client certificates for the
kube-apiserver, but this is not a standard.
None of these usages are related to ServiceAccount token secrets .data[ca.crt] in any way. That CA bundle is only
guaranteed to verify a connection to the API server using the default service (kubernetes.default.svc).
Custom signers
You can also introduce your own custom signer, which should have a similar prefixed name but using your
own domain name. For example, if you represent an open source project that uses the domain open-fictional.example
then you might use issuer.open-fictional.example/service-mesh as a signer name.
A custom signer uses the Kubernetes API to issue a certificate. See API-based signers.
Signing
Control plane signer
The Kubernetes control plane implements each of the
Kubernetes signers,
as part of the kube-controller-manager.
Note:
Prior to Kubernetes v1.18, the kube-controller-manager would sign any CSRs that
were marked as approved.
Note:
The spec.expirationSeconds field was added in Kubernetes v1.22.
Earlier versions of Kubernetes do not honor this field.
Kubernetes API servers prior to v1.22 will silently drop this field when the object is created.
API-based signers
Users of the REST API can sign CSRs by submitting an UPDATE request to the status
subresource of the CSR to be signed.
As part of this request, the status.certificate field should be set to contain the
signed certificate. This field contains one or more PEM-encoded certificates.
All PEM blocks must have the "CERTIFICATE" label, contain no headers,
and the encoded data must be a BER-encoded ASN.1 Certificate structure
as described in section 4 of RFC5280.
Non-PEM content may appear before or after the CERTIFICATE PEM blocks and is unvalidated,
to allow for explanatory text as described in section 5.2 of RFC7468.
When encoded in JSON or YAML, this field is base-64 encoded.
A CertificateSigningRequest containing the example certificate above would look like this:
Before a signer issues a certificate based on a CertificateSigningRequest,
the signer typically checks that the issuance for that CSR has been approved.
Control plane automated approval
The kube-controller-manager ships with a built-in approver for certificates with
a signerName of kubernetes.io/kube-apiserver-client-kubelet that delegates various
permissions on CSRs for node credentials to authorization.
The kube-controller-manager POSTs SubjectAccessReview resources to the API server
in order to check authorization for certificate approval.
Approval or rejection using kubectl
A Kubernetes administrator (with appropriate permissions) can manually approve
(or deny) CertificateSigningRequests by using the kubectl certificate approve and kubectl certificate deny commands.
Users of the REST API can approve CSRs by submitting an UPDATE request to the approval
subresource of the CSR to be approved. For example, you could write an
operator that watches for a particular
kind of CSR and then sends an UPDATE to approve them.
When you make an approval or rejection request, set either the Approved or Denied
status condition based on the state you determine:
For Approved CSRs:
apiVersion:certificates.k8s.io/v1kind:CertificateSigningRequest...status:conditions:- lastUpdateTime:"2020-02-08T11:37:35Z"lastTransitionTime:"2020-02-08T11:37:35Z"message:Approved by my custom approver controllerreason:ApprovedByMyPolicy# You can set this to any stringtype:Approved
For Denied CSRs:
apiVersion:certificates.k8s.io/v1kind:CertificateSigningRequest...status:conditions:- lastUpdateTime:"2020-02-08T11:37:35Z"lastTransitionTime:"2020-02-08T11:37:35Z"message:Denied by my custom approver controllerreason:DeniedByMyPolicy# You can set this to any stringtype:Denied
It's usual to set status.conditions.reason to a machine-friendly reason
code using TitleCase; this is a convention but you can set it to anything
you like. If you want to add a note for human consumption, use the
status.conditions.message field.
Cluster trust bundles
FEATURE STATE:Kubernetes v1.33 [beta] (enabled by default: false)
Note:
In Kubernetes 1.33, you must enable the ClusterTrustBundlefeature gateand the certificates.k8s.io/v1alpha1API group in order to use
this API.
A ClusterTrustBundles is a cluster-scoped object for distributing X.509 trust
anchors (root certificates) to workloads within the cluster. They're designed
to work well with the signer concept from CertificateSigningRequests.
All ClusterTrustBundle objects have strong validation on the contents of their
trustBundle field. That field must contain one or more X.509 certificates,
DER-serialized, each wrapped in a PEM CERTIFICATE block. The certificates
must parse as valid X.509 certificates.
Esoteric PEM features like inter-block data and intra-block headers are either
rejected during object validation, or can be ignored by consumers of the object.
Additionally, consumers are allowed to reorder the certificates in
the bundle with their own arbitrary but stable ordering.
ClusterTrustBundle objects should be considered world-readable within the
cluster. If your cluster uses RBAC
authorization, all ServiceAccounts have a default grant that allows them to
get, list, and watch all ClusterTrustBundle objects.
If you use your own authorization mechanism and you have enabled
ClusterTrustBundles in your cluster, you should set up an equivalent rule to
make these objects public within the cluster, so that they work as intended.
If you do not have permission to list cluster trust bundles by default in your
cluster, you can impersonate a service account you have access to in order to
see available ClusterTrustBundles:
kubectl get clustertrustbundles --as='system:serviceaccount:mynamespace:default'
Signer-linked ClusterTrustBundles
Signer-linked ClusterTrustBundles are associated with a signer name, like this:
apiVersion:certificates.k8s.io/v1alpha1kind:ClusterTrustBundlemetadata:name:example.com:mysigner:foospec:signerName:example.com/mysignertrustBundle:"<... PEM data ...>"
These ClusterTrustBundles are intended to be maintained by a signer-specific
controller in the cluster, so they have several security features:
To create or update a signer-linked ClusterTrustBundle, you must be permitted
to attest on the signer (custom authorization verb attest,
API group certificates.k8s.io; resource path signers). You can configure
authorization for the specific resource name
<signerNameDomain>/<signerNamePath> or match a pattern such as
<signerNameDomain>/*.
Signer-linked ClusterTrustBundles must be named with a prefix derived from
their spec.signerName field. Slashes (/) are replaced with colons (:),
and a final colon is appended. This is followed by an arbitrary name. For
example, the signer example.com/mysigner can be linked to a
ClusterTrustBundle example.com:mysigner:<arbitrary-name>.
Signer-linked ClusterTrustBundles will typically be consumed in workloads
by a combination of a
field selector on the signer name, and a separate
label selector.
Signer-unlinked ClusterTrustBundles
Signer-unlinked ClusterTrustBundles have an empty spec.signerName field, like this:
apiVersion:certificates.k8s.io/v1alpha1kind:ClusterTrustBundlemetadata:name:foospec:# no signerName specified, so the field is blanktrustBundle:"<... PEM data ...>"
They are primarily intended for cluster configuration use cases.
Each signer-unlinked ClusterTrustBundle is an independent object, in contrast to the
customary grouping behavior of signer-linked ClusterTrustBundles.
Signer-unlinked ClusterTrustBundles have no attest verb requirement.
Instead, you control access to them directly using the usual mechanisms,
such as role-based access control.
To distinguish them from signer-linked ClusterTrustBundles, the names of
signer-unlinked ClusterTrustBundles must not contain a colon (:).
Accessing ClusterTrustBundles from pods
FEATURE STATE:Kubernetes v1.33 [beta] (enabled by default: false)
The contents of ClusterTrustBundles can be injected into the container filesystem, similar to ConfigMaps and Secrets.
See the clusterTrustBundle projected volume source for more details.
View the source code for the kube-controller-manager built in
signer
View the source code for the kube-controller-manager built in
approver
For details of X.509 itself, refer to RFC 5280 section 3.1
For information on the syntax of PKCS#10 certificate signing requests, refer to RFC 2986
Read about the ClusterTrustBundle API:
%!s()
3.12 - Mapping PodSecurityPolicies to Pod Security Standards
The tables below enumerate the configuration parameters on
PodSecurityPolicy objects, whether the field mutates
and/or validates pods, and how the configuration values map to the
Pod Security Standards.
For each applicable parameter, the allowed values for the
Baseline and
Restricted profiles are listed.
Anything outside the allowed values for those profiles would fall under the
Privileged profile. "No opinion"
means all values are allowed under all Pod Security Standards.
Baseline: "runtime/default,"(Trailing comma to allow unset)
Restricted: "runtime/default"(No trailing comma)
localhost/* values are also permitted for both Baseline & Restricted.
3.13 - Kubelet authentication/authorization
Overview
A kubelet's HTTPS endpoint exposes APIs which give access to data of varying sensitivity,
and allow you to perform operations with varying levels of power on the node and within containers.
This document describes how to authenticate and authorize access to the kubelet's HTTPS endpoint.
Kubelet authentication
By default, requests to the kubelet's HTTPS endpoint that are not rejected by other configured
authentication methods are treated as anonymous requests, and given a username of system:anonymous
and a group of system:unauthenticated.
To disable anonymous access and send 401 Unauthorized responses to unauthenticated requests:
start the kubelet with the --anonymous-auth=false flag
To enable X509 client certificate authentication to the kubelet's HTTPS endpoint:
start the kubelet with the --client-ca-file flag, providing a CA bundle to verify client certificates with
start the apiserver with --kubelet-client-certificate and --kubelet-client-key flags
To enable API bearer tokens (including service account tokens) to be used to authenticate to the kubelet's HTTPS endpoint:
ensure the authentication.k8s.io/v1beta1 API group is enabled in the API server
start the kubelet with the --authentication-token-webhook and --kubeconfig flags
the kubelet calls the TokenReview API on the configured API server to determine user information from bearer tokens
Kubelet authorization
Any request that is successfully authenticated (including an anonymous request) is then authorized. The default authorization mode is AlwaysAllow, which allows all requests.
There are many possible reasons to subdivide access to the kubelet API:
anonymous auth is enabled, but anonymous users' ability to call the kubelet API should be limited
bearer token auth is enabled, but arbitrary API users' (like service accounts) ability to call the kubelet API should be limited
client certificate auth is enabled, but only some of the client certificates signed by the configured CA should be allowed to use the kubelet API
To subdivide access to the kubelet API, delegate authorization to the API server:
ensure the authorization.k8s.io/v1beta1 API group is enabled in the API server
start the kubelet with the --authorization-mode=Webhook and the --kubeconfig flags
the kubelet calls the SubjectAccessReview API on the configured API server to determine whether each request is authorized
The kubelet authorizes API requests using the same request attributes approach as the apiserver.
The verb is determined from the incoming request's HTTP verb:
HTTP verb
request verb
POST
create
GET, HEAD
get
PUT
update
PATCH
patch
DELETE
delete
The resource and subresource is determined from the incoming request's path:
Kubelet API
resource
subresource
/stats/*
nodes
stats
/metrics/*
nodes
metrics
/logs/*
nodes
log
/spec/*
nodes
spec
/checkpoint/*
nodes
checkpoint
all others
nodes
proxy
The namespace and API group attributes are always an empty string, and
the resource name is always the name of the kubelet's Node API object.
When running in this mode, ensure the user identified by the --kubelet-client-certificate and --kubelet-client-key
flags passed to the apiserver is authorized for the following attributes:
verb=*, resource=nodes, subresource=proxy
verb=*, resource=nodes, subresource=stats
verb=*, resource=nodes, subresource=log
verb=*, resource=nodes, subresource=spec
verb=*, resource=nodes, subresource=metrics
Fine-grained authorization
FEATURE STATE:Kubernetes v1.33 [beta] (enabled by default: true)
When the feature gate KubeletFineGrainedAuthz is enabled kubelet performs a
fine-grained check before falling back to the proxy subresource for the /pods,
/runningPods, /configz and /healthz endpoints. The resource and subresource
are determined from the incoming request's path:
Kubelet API
resource
subresource
/stats/*
nodes
stats
/metrics/*
nodes
metrics
/logs/*
nodes
log
/pods
nodes
pods, proxy
/runningPods/
nodes
pods, proxy
/healthz
nodes
healthz, proxy
/configz
nodes
configz, proxy
all others
nodes
proxy
When the feature-gate KubeletFineGrainedAuthz is enabled, ensure the user
identified by the --kubelet-client-certificate and --kubelet-client-key
flags passed to the API server is authorized for the following attributes:
verb=*, resource=nodes, subresource=proxy
verb=*, resource=nodes, subresource=stats
verb=*, resource=nodes, subresource=log
verb=*, resource=nodes, subresource=metrics
verb=*, resource=nodes, subresource=configz
verb=*, resource=nodes, subresource=healthz
verb=*, resource=nodes, subresource=pods
If RBAC authorization is used,
enabling this gate also ensure that the builtin system:kubelet-api-admin ClusterRole
is updated with permissions to access all the above mentioned subresources.
3.14 - TLS bootstrapping
In a Kubernetes cluster, the components on the worker nodes - kubelet and kube-proxy - need
to communicate with Kubernetes control plane components, specifically kube-apiserver.
In order to ensure that communication is kept private, not interfered with, and ensure that
each component of the cluster is talking to another trusted component, we strongly
recommend using client TLS certificates on nodes.
The normal process of bootstrapping these components, especially worker nodes that need certificates
so they can communicate safely with kube-apiserver, can be a challenging process as it is often outside
of the scope of Kubernetes and requires significant additional work.
This in turn, can make it challenging to initialize or scale a cluster.
In order to simplify the process, beginning in version 1.4, Kubernetes introduced a certificate request
and signing API. The proposal can be found here.
This document describes the process of node initialization, how to set up TLS client certificate bootstrapping for
kubelets, and how it works.
Initialization process
When a worker node starts up, the kubelet does the following:
Look for its kubeconfig file
Retrieve the URL of the API server and credentials, normally a TLS key and signed certificate from the kubeconfig file
Attempt to communicate with the API server using the credentials.
Assuming that the kube-apiserver successfully validates the kubelet's credentials,
it will treat the kubelet as a valid node, and begin to assign pods to it.
Note that the above process depends upon:
Existence of a key and certificate on the local host in the kubeconfig
The certificate having been signed by a Certificate Authority (CA) trusted by the kube-apiserver
All of the following are responsibilities of whoever sets up and manages the cluster:
Creating the CA key and certificate
Distributing the CA certificate to the control plane nodes, where kube-apiserver is running
Creating a key and certificate for each kubelet; strongly recommended to have a unique one, with a unique CN, for each kubelet
Signing the kubelet certificate using the CA key
Distributing the kubelet key and signed certificate to the specific node on which the kubelet is running
The TLS Bootstrapping described in this document is intended to simplify, and partially or even
completely automate, steps 3 onwards, as these are the most common when initializing or scaling
a cluster.
Bootstrap initialization
In the bootstrap initialization process, the following occurs:
kubelet begins
kubelet sees that it does not have a kubeconfig file
kubelet searches for and finds a bootstrap-kubeconfig file
kubelet reads its bootstrap file, retrieving the URL of the API server and a limited usage "token"
kubelet connects to the API server, authenticates using the token
kubelet now has limited credentials to create and retrieve a certificate signing request (CSR)
kubelet creates a CSR for itself with the signerName set to kubernetes.io/kube-apiserver-client-kubelet
CSR is approved in one of two ways:
If configured, kube-controller-manager automatically approves the CSR
If configured, an outside process, possibly a person, approves the CSR using the Kubernetes API or via kubectl
Certificate is created for the kubelet
Certificate is issued to the kubelet
kubelet retrieves the certificate
kubelet creates a proper kubeconfig with the key and signed certificate
kubelet begins normal operation
Optional: if configured, kubelet automatically requests renewal of the certificate when it is close to expiry
The renewed certificate is approved and issued, either automatically or manually, depending on configuration.
The rest of this document describes the necessary steps to configure TLS Bootstrapping, and its limitations.
Configuration
To configure for TLS bootstrapping and optional automatic approval, you must configure options on the following components:
kube-apiserver
kube-controller-manager
kubelet
in-cluster resources: ClusterRoleBinding and potentially ClusterRole
In addition, you need your Kubernetes Certificate Authority (CA).
Certificate Authority
As without bootstrapping, you will need a Certificate Authority (CA) key and certificate.
As without bootstrapping, these will be used to sign the kubelet certificate. As before,
it is your responsibility to distribute them to control plane nodes.
For the purposes of this document, we will assume these have been distributed to control
plane nodes at /var/lib/kubernetes/ca.pem (certificate) and /var/lib/kubernetes/ca-key.pem (key).
We will refer to these as "Kubernetes CA certificate and key".
All Kubernetes components that use these certificates - kubelet, kube-apiserver,
kube-controller-manager - assume the key and certificate to be PEM-encoded.
kube-apiserver configuration
The kube-apiserver has several requirements to enable TLS bootstrapping:
Recognizing CA that signs the client certificate
Authenticating the bootstrapping kubelet to the system:bootstrappers group
Authorize the bootstrapping kubelet to create a certificate signing request (CSR)
Recognizing client certificates
This is normal for all client certificate authentication.
If not already set, add the --client-ca-file=FILENAME flag to the kube-apiserver command to enable
client certificate authentication, referencing a certificate authority bundle
containing the signing certificate, for example
--client-ca-file=/var/lib/kubernetes/ca.pem.
Initial bootstrap authentication
In order for the bootstrapping kubelet to connect to kube-apiserver and request a certificate,
it must first authenticate to the server. You can use any
authenticator that can authenticate the kubelet.
While any authentication strategy can be used for the kubelet's initial
bootstrap credentials, the following two authenticators are recommended for ease
of provisioning.
Using bootstrap tokens is a simpler and more easily managed method to authenticate kubelets,
and does not require any additional flags when starting kube-apiserver.
Whichever method you choose, the requirement is that the kubelet be able to authenticate as a user with the rights to:
create and retrieve CSRs
be automatically approved to request node client certificates, if automatic approval is enabled.
A kubelet authenticating using bootstrap tokens is authenticated as a user in the group
system:bootstrappers, which is the standard method to use.
As this feature matures, you
should ensure tokens are bound to a Role Based Access Control (RBAC) policy
which limits requests (using the bootstrap token) strictly to client
requests related to certificate provisioning. With RBAC in place, scoping the
tokens to a group allows for great flexibility. For example, you could disable a
particular bootstrap group's access when you are done provisioning the nodes.
Bootstrap tokens
Bootstrap tokens are described in detail here.
These are tokens that are stored as secrets in the Kubernetes cluster, and then issued to the individual kubelet.
You can use a single token for an entire cluster, or issue one per worker node.
The process is two-fold:
Create a Kubernetes secret with the token ID, secret and scope(s).
Issue the token to the kubelet
From the kubelet's perspective, one token is like another and has no special meaning.
From the kube-apiserver's perspective, however, the bootstrap token is special.
Due to its type, namespace and name, kube-apiserver recognizes it as a special token,
and grants anyone authenticating with that token special bootstrap rights, notably treating
them as a member of the system:bootstrappers group. This fulfills a basic requirement
for TLS bootstrapping.
The details for creating the secret are available here.
If you want to use bootstrap tokens, you must enable it on kube-apiserver with the flag:
--enable-bootstrap-token-auth=true
Token authentication file
kube-apiserver has the ability to accept tokens as authentication.
These tokens are arbitrary but should represent at least 128 bits of entropy derived
from a secure random number generator (such as /dev/urandom on most modern Linux
systems). There are multiple ways you can generate a token. For example:
head -c 16 /dev/urandom | od -An -t x | tr -d ' '
This will generate tokens that look like 02b50b05283e98dd0fd71db496ef01e8.
The token file should look like the following example, where the first three
values can be anything and the quoted group name should be as depicted:
Add the --token-auth-file=FILENAME flag to the kube-apiserver command (in your
systemd unit file perhaps) to enable the token file. See docs
here for
further details.
Authorize kubelet to create CSR
Now that the bootstrapping node is authenticated as part of the
system:bootstrappers group, it needs to be authorized to create a
certificate signing request (CSR) as well as retrieve it when done.
Fortunately, Kubernetes ships with a ClusterRole with precisely these (and
only these) permissions, system:node-bootstrapper.
To do this, you only need to create a ClusterRoleBinding that binds the system:bootstrappers
group to the cluster role system:node-bootstrapper.
# enable bootstrapping nodes to create CSRapiVersion:rbac.authorization.k8s.io/v1kind:ClusterRoleBindingmetadata:name:create-csrs-for-bootstrappingsubjects:- kind:Groupname:system:bootstrappersapiGroup:rbac.authorization.k8s.ioroleRef:kind:ClusterRolename:system:node-bootstrapperapiGroup:rbac.authorization.k8s.io
kube-controller-manager configuration
While the apiserver receives the requests for certificates from the kubelet and authenticates those requests,
the controller-manager is responsible for issuing actual signed certificates.
The controller-manager performs this function via a certificate-issuing control loop.
This takes the form of a
cfssl local signer using
assets on disk. Currently, all certificates issued have one year validity and a
default set of key usages.
In order for the controller-manager to sign certificates, it needs the following:
access to the "Kubernetes CA key and certificate" that you created and distributed
enabling CSR signing
Access to key and certificate
As described earlier, you need to create a Kubernetes CA key and certificate, and distribute it to the control plane nodes.
These will be used by the controller-manager to sign the kubelet certificates.
Since these signed certificates will, in turn, be used by the kubelet to authenticate as a regular kubelet
to kube-apiserver, it is important that the CA provided to the controller-manager at this stage also be
trusted by kube-apiserver for authentication. This is provided to kube-apiserver with the flag --client-ca-file=FILENAME
(for example, --client-ca-file=/var/lib/kubernetes/ca.pem), as described in the kube-apiserver configuration section.
To provide the Kubernetes CA key and certificate to kube-controller-manager, use the following flags:
The validity duration of signed certificates can be configured with flag:
--cluster-signing-duration
Approval
In order to approve CSRs, you need to tell the controller-manager that it is acceptable to approve them. This is done by granting
RBAC permissions to the correct group.
There are two distinct sets of permissions:
nodeclient: If a node is creating a new certificate for a node, then it does not have a certificate yet.
It is authenticating using one of the tokens listed above, and thus is part of the group system:bootstrappers.
selfnodeclient: If a node is renewing its certificate, then it already has a certificate (by definition),
which it uses continuously to authenticate as part of the group system:nodes.
To enable the kubelet to request and receive a new certificate, create a ClusterRoleBinding that binds
the group in which the bootstrapping node is a member system:bootstrappers to the ClusterRole that
grants it permission, system:certificates.k8s.io:certificatesigningrequests:nodeclient:
# Approve all CSRs for the group "system:bootstrappers"apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRoleBindingmetadata:name:auto-approve-csrs-for-groupsubjects:- kind:Groupname:system:bootstrappersapiGroup:rbac.authorization.k8s.ioroleRef:kind:ClusterRolename:system:certificates.k8s.io:certificatesigningrequests:nodeclientapiGroup:rbac.authorization.k8s.io
To enable the kubelet to renew its own client certificate, create a ClusterRoleBinding that binds
the group in which the fully functioning node is a member system:nodes to the ClusterRole that
grants it permission, system:certificates.k8s.io:certificatesigningrequests:selfnodeclient:
# Approve renewal CSRs for the group "system:nodes"apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRoleBindingmetadata:name:auto-approve-renewals-for-nodessubjects:- kind:Groupname:system:nodesapiGroup:rbac.authorization.k8s.ioroleRef:kind:ClusterRolename:system:certificates.k8s.io:certificatesigningrequests:selfnodeclientapiGroup:rbac.authorization.k8s.io
The csrapproving controller that ships as part of
kube-controller-manager and is enabled
by default. The controller uses the
SubjectAccessReview API to
determine if a given user is authorized to request a CSR, then approves based on
the authorization outcome. To prevent conflicts with other approvers, the
built-in approver doesn't explicitly deny CSRs. It only ignores unauthorized
requests. The controller also prunes expired certificates as part of garbage
collection.
kubelet configuration
Finally, with the control plane nodes properly set up and all of the necessary
authentication and authorization in place, we can configure the kubelet.
The kubelet requires the following configuration to bootstrap:
A path to store the key and certificate it generates (optional, can use default)
A path to a kubeconfig file that does not yet exist; it will place the bootstrapped config file here
A path to a bootstrap kubeconfig file to provide the URL for the server and bootstrap credentials, e.g. a bootstrap token
Optional: instructions to rotate certificates
The bootstrap kubeconfig should be in a path available to the kubelet, for example /var/lib/kubelet/bootstrap-kubeconfig.
Its format is identical to a normal kubeconfig file. A sample file might look as follows:
certificate-authority: path to a CA file, used to validate the server certificate presented by kube-apiserver
server: URL to kube-apiserver
token: the token to use
The format of the token does not matter, as long as it matches what kube-apiserver expects. In the above example, we used a bootstrap token.
As stated earlier, any valid authentication method can be used, not only tokens.
Because the bootstrap kubeconfigis a standard kubeconfig, you can use kubectl to generate it. To create the above example file:
When starting the kubelet, if the file specified via --kubeconfig does not
exist, the bootstrap kubeconfig specified via --bootstrap-kubeconfig is used
to request a client certificate from the API server. On approval of the
certificate request and receipt back by the kubelet, a kubeconfig file
referencing the generated key and obtained certificate is written to the path
specified by --kubeconfig. The certificate and key file will be placed in the
directory specified by --cert-dir.
Client and serving certificates
All of the above relate to kubelet client certificates, specifically, the certificates a kubelet
uses to authenticate to kube-apiserver.
A kubelet also can use serving certificates. The kubelet itself exposes an https endpoint for certain features.
To secure these, the kubelet can do one of:
use provided key and certificate, via the --tls-private-key-file and --tls-cert-file flags
create self-signed key and certificate, if a key and certificate are not provided
request serving certificates from the cluster server, via the CSR API
The client certificate provided by TLS bootstrapping is signed, by default, for client auth only, and thus cannot
be used as serving certificates, or server auth.
However, you can enable its server certificate, at least partially, via certificate rotation.
Certificate rotation
Kubernetes v1.8 and higher kubelet implements features for enabling
rotation of its client and/or serving certificates. Note, rotation of serving
certificate is a beta feature and requires the RotateKubeletServerCertificate
feature flag on the kubelet (enabled by default).
You can configure the kubelet to rotate its client certificates by creating new CSRs
as its existing credentials expire. To enable this feature, use the rotateCertificates
field of kubelet configuration file
or pass the following command line argument to the kubelet (deprecated):
--rotate-certificates
Enabling RotateKubeletServerCertificate causes the kubelet both to request a serving
certificate after bootstrapping its client credentials and to rotate that
certificate. To enable this behavior, use the field serverTLSBootstrap of
the kubelet configuration file
or pass the following command line argument to the kubelet (deprecated):
--rotate-server-certificates
Note:
The CSR approving controllers implemented in core Kubernetes do not
approve node serving certificates for
security reasons. To use
RotateKubeletServerCertificate operators need to run a custom approving
controller, or manually approve the serving certificate requests.
A deployment-specific approval process for kubelet serving certificates should typically only approve CSRs which:
are requested by nodes (ensure the spec.username field is of the form
system:node:<nodeName> and spec.groups contains system:nodes)
request usages for a serving certificate (ensure spec.usages contains server auth,
optionally contains digital signature and key encipherment, and contains no other usages)
only have IP and DNS subjectAltNames that belong to the requesting node,
and have no URI and Email subjectAltNames (parse the x509 Certificate Signing Request
in spec.request to verify subjectAltNames)
Other authenticating components
All of TLS bootstrapping described in this document relates to the kubelet. However,
other components may need to communicate directly with kube-apiserver. Notable is kube-proxy, which
is part of the Kubernetes node components and runs on every node, but may also include other components such as monitoring or networking.
Like the kubelet, these other components also require a method of authenticating to kube-apiserver.
You have several options for generating these credentials:
The old way: Create and distribute certificates the same way you did for kubelet before TLS bootstrapping
DaemonSet: Since the kubelet itself is loaded on each node, and is sufficient to start base services,
you can run kube-proxy and other node-specific services not as a standalone process, but rather as a
daemonset in the kube-system namespace. Since it will be in-cluster, you can give it a proper service
account with appropriate permissions to perform its activities. This may be the simplest way to configure
such services.
kubectl approval
CSRs can be approved outside of the approval flows built into the controller
manager.
The signing controller does not immediately sign all certificate requests.
Instead, it waits until they have been flagged with an "Approved" status by an
appropriately-privileged user. This flow is intended to allow for automated
approval handled by an external approval controller or the approval controller
implemented in the core controller-manager. However cluster administrators can
also manually approve certificate requests using kubectl. An administrator can
list CSRs with kubectl get csr and describe one in detail with
kubectl describe csr <name>. An administrator can approve or deny a CSR with
kubectl certificate approve <name> and kubectl certificate deny <name>.
3.15 - Mutating Admission Policy
FEATURE STATE:Kubernetes v1.32 [alpha]
This page provides an overview of MutatingAdmissionPolicies.
What are MutatingAdmissionPolicies?
Mutating admission policies offer a declarative, in-process alternative to mutating admission webhooks.
Mutating admission policies use the Common Expression Language (CEL) to declare mutations to resources.
Mutations can be defined either with an apply configuration that is merged using the
server side apply merge strategy,
or a JSON patch.
Mutating admission policies are highly configurable, enabling policy authors to define policies
that can be parameterized and scoped to resources as needed by cluster administrators.
What resources make a policy
A policy is generally made up of three resources:
The MutatingAdmissionPolicy describes the abstract logic of a policy
(think: "this policy sets a particular label to a particular value").
A parameter resource provides information to a MutatingAdmissionPolicy to make it a concrete
statement (think "set the owner label to something like company.example.com").
Parameter resources refer to Kubernetes resources, available in the Kubernetes API. They can be built-in types or extensions,
such as a CustomResourceDefinition (CRD). For example, you can use a ConfigMap as a parameter.
A MutatingAdmissionPolicyBinding links the above (MutatingAdmissionPolicy and parameter) resources together and provides scoping.
If you only want to set an owner label for Pods, and not other API kinds, the binding is where you
specify this mutation.
At least a MutatingAdmissionPolicy and a corresponding MutatingAdmissionPolicyBinding
must be defined for a policy to have an effect.
If a MutatingAdmissionPolicy does not need to be configured via parameters, simply leave
spec.paramKind in MutatingAdmissionPolicy not specified.
Getting Started with MutatingAdmissionPolicies
Mutating admission policy is part of the cluster control-plane. You should write
and deploy them with great caution. The following describes how to quickly
experiment with Mutating admission policy.
Create a MutatingAdmissionPolicy
The following is an example of a MutatingAdmissionPolicy. This policy mutates newly created Pods to have a sidecar container if it does not exist.
The .spec.mutations field consists of a list of expressions that evaluate to resource patches.
The emitted patches may be either apply configurations or JSON Patch
patches. You cannot specify an empty list of mutations. After evaluating all the
expressions, the API server applies those changes to the resource that is
passing through admission.
To configure a mutating admission policy for use in a cluster, a binding is
required. The MutatingAdmissionPolicy will only be active if a corresponding
binding exists with the referenced spec.policyName matching the spec.name of
a policy.
Once the binding and policy are created, any resource request that matches the
spec.matchConditions of a policy will trigger the set of mutations defined.
In the example above, creating a Pod will add the mesh-proxy initContainer mutation:
Parameter resources allow a policy configuration to be separate from its
definition. A policy can define paramKind, which outlines GVK of the parameter
resource, and then a policy binding ties a policy by name (via policyName) to a
particular parameter resource via paramRef.
MutatingAdmissionPolicy expressions are always CEL. Each apply configuration
expression must evaluate to a CEL object (declared using Object()
initialization).
Apply configurations may not modify atomic structs, maps or arrays due to the risk of accidental deletion of
values not included in the apply configuration.
CEL expressions have access to the object types needed to create apply configurations:
Object - CEL type of the resource object.
Object.<fieldName> - CEL type of object field (such as Object.spec)
Object.<fieldName1>.<fieldName2>...<fieldNameN> - CEL type of nested field (such as Object.spec.containers)
CEL expressions have access to the contents of the API request, organized into CEL variables as well as some other useful variables:
object - The object from the incoming request. The value is null for DELETE requests.
oldObject - The existing object. The value is null for CREATE requests.
request - Attributes of the API request.
params - Parameter resource referred to by the policy binding being evaluated. Only populated if the policy has a ParamKind.
namespaceObject - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources.
variables - Map of composited variables, from its name to its lazily evaluated value.
For example, a variable named foo can be accessed as variables.foo.
authorizer.requestResource - A CEL ResourceCheck constructed from the authorizer and configured with the
request resource.
The apiVersion, kind, metadata.name, metadata.generateName and metadata.labels are always accessible from the root of the object. No other metadata properties are accessible.
JSONPatch
The same mutation can be written as a JSON Patch as follows:
CEL expressions have access to the types needed to create JSON patches and objects:
JSONPatch - CEL type of JSON Patch operations. JSONPatch has the fields op, from, path and value.
See JSON patch for more details. The value field may be set to any of: string,
integer, array, map or object. If set, the path and from fields must be set to a
JSON pointer string, where the jsonpatch.escapeKey() CEL
function may be used to escape path keys containing / and ~.
Object - CEL type of the resource object.
Object.<fieldName> - CEL type of object field (such as Object.spec)
Object.<fieldName1>.<fieldName2>...<fieldNameN> - CEL type of nested field (such as Object.spec.containers)
CEL expressions have access to the contents of the API request, organized into CEL variables as well as some other useful variables:
object - The object from the incoming request. The value is null for DELETE requests.
oldObject - The existing object. The value is null for CREATE requests.
request - Attributes of the API request.
params - Parameter resource referred to by the policy binding being evaluated. Only populated if the policy has a ParamKind.
namespaceObject - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources.
variables - Map of composited variables, from its name to its lazily evaluated value.
For example, a variable named foo can be accessed as variables.foo.
jsonpatch.escapeKey - Performs JSONPatch key escaping. ~ and / are escaped as ~0 and ~1 respectively.
Only property names of the form [a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible.
API kinds exempt from mutating admission
There are certain API kinds that are exempt from admission-time mutation. For example, you can't create a MutatingAdmissionPolicy that changes a MutatingAdmissionPolicy.
This page provides an overview of Validating Admission Policy.
What is Validating Admission Policy?
Validating admission policies offer a declarative, in-process alternative to validating admission webhooks.
Validating admission policies use the Common Expression Language (CEL) to declare the validation
rules of a policy.
Validation admission policies are highly configurable, enabling policy authors to define policies
that can be parameterized and scoped to resources as needed by cluster administrators.
What Resources Make a Policy
A policy is generally made up of three resources:
The ValidatingAdmissionPolicy describes the abstract logic of a policy
(think: "this policy makes sure a particular label is set to a particular value").
A parameter resource provides information to a ValidatingAdmissionPolicy to make it a concrete
statement (think "the owner label must be set to something that ends in .company.com").
A native type such as ConfigMap or a CRD defines the schema of a parameter resource.
ValidatingAdmissionPolicy objects specify what Kind they are expecting for their parameter resource.
A ValidatingAdmissionPolicyBinding links the above resources together and provides scoping.
If you only want to require an owner label to be set for Pods, the binding is where you would
specify this restriction.
At least a ValidatingAdmissionPolicy and a corresponding ValidatingAdmissionPolicyBinding
must be defined for a policy to have an effect.
If a ValidatingAdmissionPolicy does not need to be configured via parameters, simply leave
spec.paramKind in ValidatingAdmissionPolicy not specified.
Getting Started with Validating Admission Policy
Validating Admission Policy is part of the cluster control-plane. You should write and deploy them
with great caution. The following describes how to quickly experiment with Validating Admission Policy.
Creating a ValidatingAdmissionPolicy
The following is an example of a ValidatingAdmissionPolicy.
spec.validations contains CEL expressions which use the Common Expression Language (CEL)
to validate the request. If an expression evaluates to false, the validation check is enforced
according to the spec.failurePolicy field.
To configure a validating admission policy for use in a cluster, a binding is required.
The following is an example of a ValidatingAdmissionPolicyBinding.:
The above provides a simple example of using ValidatingAdmissionPolicy without a parameter configured.
Validation actions
Each ValidatingAdmissionPolicyBinding must specify one or more
validationActions to declare how validations of a policy are enforced.
The supported validationActions are:
Deny: Validation failure results in a denied request.
Warn: Validation failure is reported to the request client
as a warning.
Audit: Validation failure is included in the audit event for the API request.
For example, to both warn clients about a validation failure and to audit the
validation failures, use:
validationActions:[Warn, Audit]
Deny and Warn may not be used together since this combination
needlessly duplicates the validation failure both in the
API response body and the HTTP warning headers.
A validation that evaluates to false is always enforced according to these
actions. Failures defined by the failurePolicy are enforced
according to these actions only if the failurePolicy is set to Fail (or not specified),
otherwise the failures are ignored.
Parameter resources allow a policy configuration to be separate from its definition.
A policy can define paramKind, which outlines GVK of the parameter resource,
and then a policy binding ties a policy by name (via policyName) to a particular parameter resource via paramRef.
If parameter configuration is needed, the following is an example of a ValidatingAdmissionPolicy
with parameter configuration.
The spec.paramKind field of the ValidatingAdmissionPolicy specifies the kind of resources used
to parameterize this policy. For this example, it is configured by ReplicaLimit custom resources.
Note in this example how the CEL expression references the parameters via the CEL params variable,
e.g. params.maxReplicas. spec.matchConstraints specifies what resources this policy is
designed to validate. Note that the native types such like ConfigMap could also be used as
parameter reference.
The spec.validations fields contain CEL expressions. If an expression evaluates to false, the
validation check is enforced according to the spec.failurePolicy field.
The validating admission policy author is responsible for providing the ReplicaLimit parameter CRD.
To configure an validating admission policy for use in a cluster, a binding and parameter resource
are created. The following is an example of a ValidatingAdmissionPolicyBinding
that uses a cluster-wide param - the same param will be used to validate
every resource request that matches the binding:
This policy parameter resource limits deployments to a max of 3 replicas.
An admission policy may have multiple bindings. To bind all other environments
to have a maxReplicas limit of 100, create another ValidatingAdmissionPolicyBinding:
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingAdmissionPolicyBindingmetadata:name:"replicalimit-binding-nontest"spec:policyName:"replicalimit-policy.example.com"validationActions:[Deny]paramRef:name:"replica-limit-prod.example.com"namespace:"default"parameterNotFoundAction:DenymatchResources:namespaceSelector:matchExpressions:- key:environmentoperator:NotInvalues:- test
Notice this binding applies a different parameter to resources which
are not in the test environment.
For each admission request, the API server evaluates CEL expressions of each
(policy, binding, param) combination that match the request. For a request
to be admitted it must pass all evaluations.
If multiple bindings match the request, the policy will be evaluated for each,
and they must all pass evaluation for the policy to be considered passed.
If multiple parameters match a single binding, the policy rules will be evaluated
for each param, and they too must all pass for the binding to be considered passed.
Bindings can have overlapping match criteria. The policy is evaluated for each
matching binding-parameter combination. A policy may even be evaluated multiple
times if multiple bindings match it, or a single binding that matches multiple
parameters.
The params object representing a parameter resource will not be set if a parameter resource has
not been bound, so for policies requiring a parameter resource, it can be useful to add a check to
ensure one has been bound. A parameter resource will not be bound and params will be null
if paramKind of the policy, or paramRef of the binding are not specified.
For the use cases require parameter configuration, we recommend to add a param check in
spec.validations[0].expression:
- expression: "params != null"
message: "params missing but required to bind to this policy"
Optional parameters
It can be convenient to be able to have optional parameters as part of a parameter resource, and
only validate them if present. CEL provides has(), which checks if the key passed to it exists.
CEL also implements Boolean short-circuiting. If the first half of a logical OR evaluates to true,
it won’t evaluate the other half (since the result of the entire OR will be true regardless).
Combining the two, we can provide a way to validate optional parameters:
Here, we first check that the optional parameter is present with !has(params.optionalNumber).
If optionalNumber hasn’t been defined, then the expression short-circuits since
!has(params.optionalNumber) will evaluate to true.
If optionalNumber has been defined, then the latter half of the CEL expression will be
evaluated, and optionalNumber will be checked to ensure that it contains a value between 5 and
10 inclusive.
Per-namespace Parameters
As the author of a ValidatingAdmissionPolicy and its ValidatingAdmissionPolicyBinding,
you can choose to specify cluster-wide, or per-namespace parameters.
If you specify a namespace for the binding's paramRef, the control plane only
searches for parameters in that namespace.
However, if namespace is not specified in the ValidatingAdmissionPolicyBinding, the
API server can search for relevant parameters in the namespace that a request is against.
For example, if you make a request to modify a ConfigMap in the default namespace and
there is a relevant ValidatingAdmissionPolicyBinding with no namespace set, then the
API server looks for a parameter object in default.
This design enables policy configuration that depends on the namespace
of the resource being manipulated, for more fine-tuned control.
Parameter selector
In addition to specify a parameter in a binding by name, you may
choose instead to specify label selector, such that all resources of the
policy's paramKind, and the param's namespace (if applicable) that match the
label selector are selected for evaluation. See selector for more information on how label selectors match resources.
If multiple parameters are found to meet the condition, the policy's rules are
evaluated for each parameter found and the results will be ANDed together.
If namespace is provided, only objects of the paramKind in the provided
namespace are eligible for selection. Otherwise, when namespace is empty and
paramKind is namespace-scoped, the namespace used in the request being
admitted will be used.
Authorization checks
We introduced the authorization check for parameter resources.
User is expected to have read access to the resources referenced by paramKind in
ValidatingAdmissionPolicy and paramRef in ValidatingAdmissionPolicyBinding.
Note that if a resource in paramKind fails resolving via the restmapper, read access to all
resources of groups is required.
paramRef
The paramRef field specifies the parameter resource used by the policy. It has the following fields:
name: The name of the parameter resource.
namespace: The namespace of the parameter resource.
selector: A label selector to match multiple parameter resources.
parameterNotFoundAction: (Required) Controls the behavior when the specified parameters are not found.
Allowed Values:
Allow: The absence of matched parameters is treated as a successful validation by the binding.
Deny: The absence of matched parameters is subject to the failurePolicy of the policy.
One of name or selector must be set, but not both.
Note:
The parameterNotFoundAction field in paramRef is required. It specifies the action to take when no parameters are found matching the paramRef. If not specified, the policy binding may be considered invalid and will be ignored or could lead to unexpected behavior.
Allow: If set to Allow, and no parameters are found, the binding treats the absence of parameters as a successful validation, and the policy is considered to have passed.
Deny: If set to Deny, and no parameters are found, the binding enforces the failurePolicy of the policy. If the failurePolicy is Fail, the request is rejected.
Make sure to set parameterNotFoundAction according to the desired behavior when parameters are missing.
Handling Missing Parameters with parameterNotFoundAction
When using paramRef with a selector, it's possible that no parameters match the selector. The parameterNotFoundAction field determines how the binding behaves in this scenario.
failurePolicy defines how mis-configurations and CEL expressions evaluating to error from the
admission policy are handled. Allowed values are Ignore or Fail.
Ignore means that an error calling the ValidatingAdmissionPolicy is ignored and the API
request is allowed to continue.
Fail means that an error calling the ValidatingAdmissionPolicy causes the admission to fail
and the API request to be rejected.
Note that the failurePolicy is defined inside ValidatingAdmissionPolicy:
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingAdmissionPolicyspec:...failurePolicy:Ignore# The default is "Fail"validations:- expression:"object.spec.xyz == params.x"
Validation Expression
spec.validations[i].expression represents the expression which will be evaluated by CEL.
To learn more, see the CEL language specification
CEL expressions have access to the contents of the Admission request/response, organized into CEL
variables as well as some other useful variables:
'object' - The object from the incoming request. The value is null for DELETE requests.
'oldObject' - The existing object. The value is null for CREATE requests.
'params' - Parameter resource referred to by the policy binding being evaluated. The value is
null if ParamKind is not specified.
namespaceObject - The namespace, as a Kubernetes resource, that the incoming object belongs to.
The value is null if the incoming object is cluster-scoped.
authorizer - A CEL Authorizer. May be used to perform authorization checks for the principal
(authenticated user) of the request. See
AuthzSelectors and
Authz in the Kubernetes CEL library
documentation for more details.
authorizer.requestResource - A shortcut for an authorization check configured with the request
resource (group, resource, (subresource), namespace, name).
The apiVersion, kind, metadata.name and metadata.generateName are always accessible from
the root of the object. No other metadata properties are accessible.
Equality on arrays with list type of 'set' or 'map' ignores element order, i.e. [1, 2] == [2, 1].
Concatenation on arrays with x-kubernetes-list-type use the semantics of the list type:
'set': X + Y performs a union where the array positions of all elements in X are preserved and
non-intersecting elements in Y are appended, retaining their partial order.
'map': X + Y performs a merge where the array positions of all keys in X are preserved but the values
are overwritten by values in Y when the key sets of X and Y intersect. Elements in Y with
non-intersecting keys are appended, retaining their partial order.
spec.validation[i].reason represents a machine-readable description of why this validation failed.
If this is the first validation in the list to fail, this reason, as well as the corresponding
HTTP response code, are used in the HTTP response to the client.
The currently supported reasons are: Unauthorized, Forbidden, Invalid, RequestEntityTooLarge.
If not set, StatusReasonInvalid is used in the response to the client.
Matching requests: matchConditions
You can define match conditions for a ValidatingAdmissionPolicy if you need fine-grained request filtering. These
conditions are useful if you find that match rules, objectSelectors and namespaceSelectors still
doesn't provide the filtering you want. Match conditions are
CEL expressions. All match conditions must evaluate to true for the
resource to be evaluated.
Here is an example illustrating a few different uses for match conditions:
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingAdmissionPolicymetadata:name:"demo-policy.example.com"spec:failurePolicy:FailmatchConstraints:resourceRules:- apiGroups:["*"]apiVersions:["*"]operations:["CREATE","UPDATE"]resources:["*"]matchConditions:- name:'exclude-leases'# Each match condition must have a unique nameexpression:'!(request.resource.group == "coordination.k8s.io" && request.resource.resource == "leases")'# Match non-lease resources.- name:'exclude-kubelet-requests'expression:'!("system:nodes" in request.userInfo.groups)'# Match requests made by non-node users.- name:'rbac'# Skip RBAC requests.expression:'request.resource.group != "rbac.authorization.k8s.io"'validations:- expression:"!object.metadata.name.contains('demo') || object.metadata.namespace == 'demo'"
Match conditions have access to the same CEL variables as validation expressions.
In the event of an error evaluating a match condition the policy is not evaluated. Whether to reject
the request is determined as follows:
If any match condition evaluated to false (regardless of other errors), the API server skips the policy.
Otherwise:
for failurePolicy: Fail, reject the request (without evaluating the policy).
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingAdmissionPolicymetadata:name:"demo-policy.example.com"spec:failurePolicy:FailmatchConstraints:resourceRules:- apiGroups:["apps"]apiVersions:["v1"]operations:["CREATE","UPDATE"]resources:["deployments"]validations:- expression:"object.spec.replicas > 50"messageExpression:"'Deployment spec.replicas set to ' + string(object.spec.replicas)"auditAnnotations:- key:"high-replica-count"valueExpression:"'Deployment spec.replicas set to ' + string(object.spec.replicas)"
When an API request is validated with this admission policy, the resulting audit event will look like:
# the audit event recorded
{
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"annotations": {
"demo-policy.example.com/high-replica-count": "Deployment spec.replicas set to 128"
# other annotations
...
}
# other fields
...
}
In this example the annotation will only be included if the spec.replicas of the Deployment is more than
50, otherwise the CEL expression evaluates to null and the annotation will not be included.
Note that audit annotation keys are prefixed by the name of the ValidatingAdmissionPolicy and a /. If
another admission controller, such as an admission webhook, uses the exact same audit annotation key, the
value of the first admission controller to include the audit annotation will be included in the audit
event and all other values will be ignored.
Message expression
To return a more friendly message when the policy rejects a request, we can use a CEL expression
to composite a message with spec.validations[i].messageExpression. Similar to the validation expression,
a message expression has access to object, oldObject, request, params, and namespaceObject.
Unlike validations, message expression must evaluate to a string.
For example, to better inform the user of the reason of denial when the policy refers to a parameter,
we can have the following validation:
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingAdmissionPolicymetadata:name:"deploy-replica-policy.example.com"spec:paramKind:apiVersion:rules.example.com/v1kind:ReplicaLimitmatchConstraints:resourceRules:- apiGroups:["apps"]apiVersions:["v1"]operations:["CREATE","UPDATE"]resources:["deployments"]validations:- expression:"object.spec.replicas <= params.maxReplicas"messageExpression:"'object.spec.replicas must be no greater than ' + string(params.maxReplicas)"reason:Invalid
After creating a params object that limits the replicas to 3 and setting up the binding,
when we try to create a deployment with 5 replicas, we will receive the following message.
$ kubectl create deploy --image=nginx nginx --replicas=5
error: failed to create deployment: deployments.apps "nginx" is forbidden: ValidatingAdmissionPolicy 'deploy-replica-policy.example.com' with binding 'demo-binding-test.example.com' denied request: object.spec.replicas must be no greater than 3
This is more informative than a static message of "too many replicas".
The message expression takes precedence over the static message defined in spec.validations[i].message if both are defined.
However, if the message expression fails to evaluate, the static message will be used instead.
Additionally, if the message expression evaluates to a multi-line string,
the evaluation result will be discarded and the static message will be used if present.
Note that static message is validated against multi-line strings.
Type checking
When a policy definition is created or updated, the validation process parses the expressions it contains
and reports any syntax errors, rejecting the definition if any errors are found.
Afterward, the referred variables are checked for type errors, including missing fields and type confusion,
against the matched types of spec.matchConstraints.
The result of type checking can be retrieved from status.typeChecking.
The presence of status.typeChecking indicates the completion of type checking,
and an empty status.typeChecking means that no errors were detected.
For example, given the following policy definition:
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingAdmissionPolicymetadata:name:"deploy-replica-policy.example.com"spec:matchConstraints:resourceRules:- apiGroups:["apps"]apiVersions:["v1"]operations:["CREATE","UPDATE"]resources:["deployments"]validations:- expression:"object.replicas > 1"# should be "object.spec.replicas > 1"message:"must be replicated"reason:Invalid
If multiple resources are matched in spec.matchConstraints, all of matched resources will be checked against.
For example, the following policy definition
apiVersion:admissionregistration.k8s.io/v1kind:ValidatingAdmissionPolicymetadata:name:"replica-policy.example.com"spec:matchConstraints:resourceRules:- apiGroups:["apps"]apiVersions:["v1"]operations:["CREATE","UPDATE"]resources:["deployments","replicasets"]validations:- expression:"object.replicas > 1"# should be "object.spec.replicas > 1"message:"must be replicated"reason:Invalid
will have multiple types and type checking result of each type in the warning message.
No wildcard matching. If spec.matchConstraints.resourceRules contains "*" in any of apiGroups, apiVersions or resources,
the types that "*" matches will not be checked.
The number of matched types is limited to 10. This is to prevent a policy that manually specifying too many types.
to consume excessive computing resources. In the order of ascending group, version, and then resource, 11th combination and beyond are ignored.
Type Checking does not affect the policy behavior in any way. Even if the type checking detects errors, the policy will continue
to evaluate. If errors do occur during evaluate, the failure policy will decide its outcome.
Type Checking does not apply to CRDs, including matched CRD types and reference of paramKind. The support for CRDs will come in future release.
Variable composition
If an expression grows too complicated, or part of the expression is reusable and computationally expensive to evaluate,
you can extract some part of the expressions into variables. A variable is a named expression that can be referred later
in variables in other expressions.
A variable is lazily evaluated when it is first referred. Any error that occurs during the evaluation will be
reported during the evaluation of the referring expression. Both the result and potential error are memorized and
count only once towards the runtime cost.
The order of variables are important because a variable can refer to other variables that are defined before it.
This ordering prevents circular references.
The following is a more complex example of enforcing that image repo names match the environment defined in its namespace.
# This policy enforces that all containers of a deployment has the image repo match the environment label of its namespace.# Except for "exempt" deployments, or any containers that do not belong to the "example.com" organization (e.g. common sidecars).# For example, if the namespace has a label of {"environment": "staging"}, all container images must be either staging.example.com/*# or do not contain "example.com" at all, unless the deployment has {"exempt": "true"} label.apiVersion:admissionregistration.k8s.io/v1kind:ValidatingAdmissionPolicymetadata:name:"image-matches-namespace-environment.policy.example.com"spec:failurePolicy:FailmatchConstraints:resourceRules:- apiGroups:["apps"]apiVersions:["v1"]operations:["CREATE","UPDATE"]resources:["deployments"]variables:- name:environmentexpression:"'environment' in namespaceObject.metadata.labels ? namespaceObject.metadata.labels['environment'] : 'prod'"- name:exemptexpression:"'exempt' in object.metadata.labels && object.metadata.labels['exempt'] == 'true'"- name:containersexpression:"object.spec.template.spec.containers"- name:containersToCheckexpression:"variables.containers.filter(c, c.image.contains('example.com/'))"validations:- expression:"variables.exempt || variables.containersToCheck.all(c, c.image.startsWith(variables.environment + '.'))"messageExpression:"'only ' + variables.environment + ' images are allowed in namespace ' + namespaceObject.metadata.name"
With the policy bound to the namespace default, which is labeled environment: prod,
the following attempt to create a deployment would be rejected.
error: failed to create deployment: deployments.apps "invalid" is forbidden: ValidatingAdmissionPolicy 'image-matches-namespace-environment.policy.example.com' with binding 'demo-binding-test.example.com' denied request: only prod images are allowed in namespace default
API kinds exempt from admission validation
There are certain API kinds that are exempt from admission-time validation checks. For example, you can't create a ValidatingAdmissionPolicy that prevents changes to ValidatingAdmissionPolicyBindings.
If this annotation is set to true on a FlowSchema or PriorityLevelConfiguration, the spec for that object
is managed by the kube-apiserver. If the API server does not recognize an APF object, and you annotate it
for automatic update, the API server deletes the entire object. Otherwise, the API server does not manage the
object spec.
For more details, read Maintenance of the Mandatory and Suggested Configuration Objects.
Use of this annotation is Alpha.
For Kubernetes version 1.33, you can use this annotation on Secrets,
ConfigMaps, or custom resources if the
CustomResourceDefinition
defining them has the applyset.kubernetes.io/is-parent-type label.
Part of the specification used to implement
ApplySet-based pruning in kubectl.
This annotation is applied to the parent object used to track an ApplySet to extend the scope of
the ApplySet beyond the parent object's own namespace (if any).
The value is a comma-separated list of the names of namespaces other than the parent's namespace
in which objects are found.
Use of this annotation is Alpha.
For Kubernetes version 1.33, you can use this annotation on Secrets, ConfigMaps,
or custom resources if the CustomResourceDefinition
defining them has the applyset.kubernetes.io/is-parent-type label.
Part of the specification used to implement
ApplySet-based pruning in kubectl.
This annotation is applied to the parent object used to track an ApplySet to optimize listing of
ApplySet member objects. It is optional in the ApplySet specification, as tools can perform discovery
or use a different optimization. However, as of Kubernetes version 1.33,
it is required by kubectl. When present, the value of this annotation must be a comma separated list
of the group-kinds, in the fully-qualified name format, i.e. <resource>.<group>.
For Kubernetes version 1.33, you can use this annotation on Secrets, ConfigMaps,
or custom resources if the CustomResourceDefinition
defining them has the applyset.kubernetes.io/is-parent-type label.
Part of the specification used to implement
ApplySet-based pruning in kubectl.
This annotation is applied to the parent object used to track an ApplySet to optimize listing of
ApplySet member objects. It is optional in the ApplySet specification, as tools can perform discovery
or use a different optimization. However, in Kubernetes version 1.33,
it is required by kubectl. When present, the value of this annotation must be a comma separated list
of the group-kinds, in the fully-qualified name format, i.e. <resource>.<group>.
Use of this label is Alpha.
For Kubernetes version 1.33, you can use this label on Secrets, ConfigMaps,
or custom resources if the CustomResourceDefinition
defining them has the applyset.kubernetes.io/is-parent-type label.
Part of the specification used to implement
ApplySet-based pruning in kubectl.
This label is what makes an object an ApplySet parent object.
Its value is the unique ID of the ApplySet, which is derived from the identity of the parent
object itself. This ID must be the base64 encoding (using the URL safe encoding of RFC4648) of
the hash of the group-kind-name-namespace of the object it is on, in the form:
<base64(sha256(<name>.<namespace>.<kind>.<group>))>.
There is no relation between the value of this label and object UID.
Use of this label is Alpha.
Part of the specification used to implement
ApplySet-based pruning in kubectl.
You can set this label on a CustomResourceDefinition (CRD) to identify the custom resource type it
defines (not the CRD itself) as an allowed parent for an ApplySet.
The only permitted value for this label is "true"; if you want to mark a CRD as
not being a valid parent for ApplySets, omit this label.
Use of this label is Alpha.
Part of the specification used to implement
ApplySet-based pruning in kubectl.
This label is what makes an object a member of an ApplySet.
The value of the label must match the value of the applyset.kubernetes.io/id
label on the parent object.
Use of this annotation is Alpha.
For Kubernetes version 1.33, you can use this annotation on Secrets,
ConfigMaps, or custom resources if the CustomResourceDefinitiondefining them has the
applyset.kubernetes.io/is-parent-type label.
Part of the specification used to implement
ApplySet-based pruning in kubectl.
This annotation is applied to the parent object used to track an ApplySet to indicate which
tooling manages that ApplySet. Tooling should refuse to mutate ApplySets belonging to other tools.
The value must be in the format <toolname>/<semver>.
apps.kubernetes.io/pod-index (beta)
Type: Label
Example: apps.kubernetes.io/pod-index: "0"
Used on: Pod
When a StatefulSet controller creates a Pod for the StatefulSet, it sets this label on that Pod.
The value of the label is the ordinal index of the pod being created.
See Pod Index Label
in the StatefulSet topic for more details.
Note the PodIndexLabel
feature gate must be enabled for this label to be added to pods.
This annotation is assigned to generated ResourceClaims.
Its value corresponds to the name of the resource claim in the .spec of any Pod(s) for which the ResourceClaim was created.
This annotation is an internal implementation detail of dynamic resource allocation.
You should not need to read or modify the value of this annotation.
When this annotation is set to "true", the cluster autoscaler is allowed to evict a Pod
even if other rules would normally prevent that.
The cluster autoscaler never evicts Pods that have this annotation explicitly set to
"false"; you could set that on an important Pod that you want to keep running.
If this annotation is not set then the cluster autoscaler follows its Pod-level behavior.
This annotation is used in manifests to mark an object as local configuration that
should not be submitted to the Kubernetes API.
A value of "true" for this annotation declares that the object is only consumed by
client-side tooling and should not be submitted to the API server.
A value of "false" can be used to declare that the object should be submitted to
the API server even when it would otherwise be assumed to be local.
This annotation is part of the Kubernetes Resource Model (KRM) Functions Specification,
which is used by Kustomize and similar third-party tools.
For example, Kustomize removes objects with this annotation from its final build output.
This annotation allows you to specify the AppArmor security profile for a container within a
Kubernetes pod. As of Kubernetes v1.30, this should be set with the appArmorProfile field instead.
To learn more, see the AppArmor tutorial.
The tutorial illustrates using AppArmor to restrict a container's abilities and access.
The profile specified dictates the set of rules and restrictions that the containerized process must
adhere to. This helps enforce security policies and isolation for your containers.
internal.config.kubernetes.io/* (reserved prefix)
Type: Annotation
Used on: All objects
This prefix is reserved for internal use by tools that act as orchestrators in accordance
with the Kubernetes Resource Model (KRM) Functions Specification.
Annotations with this prefix are internal to the orchestration process and are not persisted to
the manifests on the filesystem. In other words, the orchestrator tool should set these
annotations when reading files from the local filesystem and remove them when writing the output
of functions back to the filesystem.
A KRM function must not modify annotations with this prefix, unless otherwise specified for a
given annotation. This enables orchestrator tools to add additional internal annotations, without
requiring changes to existing functions.
This annotation records the slash-delimited, OS-agnostic, relative path to the manifest file the
object was loaded from. The path is relative to a fixed location on the filesystem, determined by
the orchestrator tool.
This annotation is part of the Kubernetes Resource Model (KRM) Functions Specification, which is
used by Kustomize and similar third-party tools.
A KRM Function should not modify this annotation on input objects unless it is modifying the
referenced files. A KRM Function may include this annotation on objects it generates.
internal.config.kubernetes.io/index
Type: Annotation
Example: internal.config.kubernetes.io/index: "2"
Used on: All objects
This annotation records the zero-indexed position of the YAML document that contains the object
within the manifest file the object was loaded from. Note that YAML documents are separated by
three dashes (---) and can each contain one object. When this annotation is not specified, a
value of 0 is implied.
This annotation is part of the Kubernetes Resource Model (KRM) Functions Specification,
which is used by Kustomize and similar third-party tools.
A KRM Function should not modify this annotation on input objects unless it is modifying the
referenced files. A KRM Function may include this annotation on objects it generates.
The Kubelet populates this with runtime.GOARCH as defined by Go.
This can be handy if you are mixing ARM and x86 nodes.
kubernetes.io/os
Type: Label
Example: kubernetes.io/os: "linux"
Used on: Node, Pod
For nodes, the kubelet populates this with runtime.GOOS as defined by Go. This can be handy if you are
mixing operating systems in your cluster (for example: mixing Linux and Windows nodes).
You can also set this label on a Pod. Kubernetes allows you to set any value for this label;
if you use this label, you should nevertheless set it to the Go runtime.GOOS string for the operating
system that this Pod actually works with.
When the kubernetes.io/os label value for a Pod does not match the label value on a Node,
the kubelet on the node will not admit the Pod. However, this is not taken into account by
the kube-scheduler. Alternatively, the kubelet refuses to run a Pod where you have specified a Pod OS, if
this isn't the same as the operating system for the node where that kubelet is running. Just
look for Pods OS for more details.
The Kubernetes API server (part of the control plane)
sets this label on all namespaces. The label value is set
to the name of the namespace. You can't change this label's value.
This is useful if you want to target a specific namespace with a label
selector.
kubernetes.io/limit-ranger
Type: Annotation
Example: kubernetes.io/limit-ranger: "LimitRanger plugin set: cpu, memory request for container nginx; cpu, memory limit for container nginx"
Used on: Pod
Kubernetes by default doesn't provide any resource limit, that means unless you explicitly define
limits, your container can consume unlimited CPU and memory.
You can define a default request or default limit for pods. You do this by creating a LimitRange
in the relevant namespace. Pods deployed after you define a LimitRange will have these limits
applied to them.
The annotation kubernetes.io/limit-ranger records that resource defaults were specified for the Pod,
and they were applied successfully.
For more details, read about LimitRanges.
When the kubelet creates a static Pod based on a given manifest, it attaches this annotation
to the static Pod. The value of the annotation is the UID of the Pod.
Note that the kubelet also sets the .spec.nodeName to the current node name as if the Pod
was scheduled to the node.
For a static Pod created by the kubelet on a node, a mirror Pod
is created on the API server. The kubelet adds an annotation to indicate that this Pod is
actually a mirror Pod. The annotation value is copied from the kubernetes.io/config.hash
annotation, which is the UID of the Pod.
When updating a Pod with this annotation set, the annotation cannot be changed or removed.
If a Pod doesn't have this annotation, it cannot be added during a Pod update.
kubernetes.io/config.source
Type: Annotation
Example: kubernetes.io/config.source: "file"
Used on: Pod
This annotation is added by the kubelet to indicate where the Pod comes from.
For static Pods, the annotation value could be one of file or http depending
on where the Pod manifest is located. For a Pod created on the API server and then
scheduled to the current node, the annotation value is api.
To specify how an add-on should be managed, you can use the addonmanager.kubernetes.io/mode label.
This label can have one of three values: Reconcile, EnsureExists, or Ignore.
Reconcile: Addon resources will be periodically reconciled with the expected state.
If there are any differences, the add-on manager will recreate, reconfigure or delete
the resources as needed. This is the default mode if no label is specified.
EnsureExists: Addon resources will be checked for existence only but will not be modified
after creation. The add-on manager will create or re-create the resources when there is
no instance of the resource with that name.
Ignore: Addon resources will be ignored. This mode is useful for add-ons that are not
compatible with the add-on manager or that are managed by another controller.
The kube-apiserver sets this label on any APIService object that the API server
has created automatically. The label marks how the control plane should manage that
APIService. You should not add, modify, or remove this label by yourself.
Note:
Automanaged APIService objects are deleted by kube-apiserver when it has no built-in
or custom resource API corresponding to the API group/version of the APIService.
There are two possible values:
onstart: The APIService should be reconciled when an API server starts up, but not otherwise.
true: The API server should reconcile this APIService continuously.
This annotation on a Service denotes if the Endpoints controller should go ahead and create
Endpoints for unready Pods. Endpoints of these Services retain their DNS records and continue
receiving traffic for the Service from the moment the kubelet starts all containers in the pod
and marks it Running, til the kubelet stops all containers and deletes the pod from
the API server.
This annotation was used to configure the scaling behavior for a HorizontalPodAutoscaler (HPA) in earlier Kubernetes versions.
It allowed you to specify how the HPA should scale pods up or down, including setting stabilization windows and scaling policies.
Setting this annotation has no effect in any supported release of Kubernetes.
The Kubelet populates this label with the hostname of the node. Note that the hostname
can be changed from the "actual" hostname by passing the --hostname-override flag to
the kubelet.
This label is also used as part of the topology hierarchy.
See topology.kubernetes.io/zone for more information.
kubernetes.io/enforce-mountable-secrets is deprecated since Kubernetes v1.32. Use separate namespaces to isolate access to mounted secrets.
The value for this annotation must be true to take effect.
When you set this annotation to "true", Kubernetes enforces the following rules for
Pods running as this ServiceAccount:
Secrets mounted as volumes must be listed in the ServiceAccount's secrets field.
Secrets referenced in envFrom for containers (including sidecar containers and init containers)
must also be listed in the ServiceAccount's secrets field.
If any container in a Pod references a Secret not listed in the ServiceAccount's secrets field
(and even if the reference is marked as optional), then the Pod will fail to start,
and an error indicating the non-compliant secret reference will be generated.
Secrets referenced in a Pod's imagePullSecrets must be present in the
ServiceAccount's imagePullSecrets field, the Pod will fail to start,
and an error indicating the non-compliant image pull secret reference will be generated.
When you create or update a Pod, these rules are checked. If a Pod doesn't follow them, it won't start and you'll see an error message.
If a Pod is already running and you change the kubernetes.io/enforce-mountable-secrets annotation
to true, or you edit the associated ServiceAccount to remove the reference to a Secret
that the Pod is already using, the Pod continues to run.
You can add labels to particular worker nodes to exclude them from the list of backend servers used by external load balancers.
The following command can be used to exclude a worker node from the list of backend servers in a
backend set:
This annotation is used to set Pod Deletion Cost
which allows users to influence ReplicaSet downscaling order.
The annotation value parses into an int32 type.
This annotation controls whether a DaemonSet pod should be evicted by a ClusterAutoscaler.
This annotation needs to be specified on DaemonSet pods in a DaemonSet manifest.
When this annotation is set to "true", the ClusterAutoscaler is allowed to evict
a DaemonSet Pod, even if other rules would normally prevent that.
To disallow the ClusterAutoscaler from evicting DaemonSet pods,
you can set this annotation to "false" for important DaemonSet pods.
If this annotation is not set, then the ClusterAutoscaler follows its overall behavior
(i.e evict the DaemonSets based on its configuration).
Note:
This annotation only impacts DaemonSet Pods.
kubernetes.io/ingress-bandwidth
Type: Annotation
Example: kubernetes.io/ingress-bandwidth: 10M
Used on: Pod
You can apply quality-of-service traffic shaping to a pod and effectively limit its available
bandwidth. Ingress traffic to a Pod is handled by shaping queued packets to effectively
handle data. To limit the bandwidth on a Pod, write an object definition JSON file and specify
the data traffic speed using kubernetes.io/ingress-bandwidth annotation. The unit used for
specifying ingress rate is bits per second, as a
Quantity.
For example, 10M means 10 megabits per second.
Note:
Ingress traffic shaping annotation is an experimental feature.
If you want to enable traffic shaping support, you must add the bandwidth plugin to your CNI
configuration file (default /etc/cni/net.d) and ensure that the binary is included in your CNI
bin dir (default /opt/cni/bin).
kubernetes.io/egress-bandwidth
Type: Annotation
Example: kubernetes.io/egress-bandwidth: 10M
Used on: Pod
Egress traffic from a Pod is handled by policing, which simply drops packets in excess of the
configured rate. The limits you place on a Pod do not affect the bandwidth of other Pods.
To limit the bandwidth on a Pod, write an object definition JSON file and specify the data traffic
speed using kubernetes.io/egress-bandwidth annotation. The unit used for specifying egress rate
is bits per second, as a Quantity.
For example, 10M means 10 megabits per second.
Note:
Egress traffic shaping annotation is an experimental feature.
If you want to enable traffic shaping support, you must add the bandwidth plugin to your CNI
configuration file (default /etc/cni/net.d) and ensure that the binary is included in your CNI
bin dir (default /opt/cni/bin).
The Kubelet populates this with the instance type as defined by the cloud provider.
This will be set only if you are using a cloud provider. This setting is handy
if you want to target certain workloads to certain instance types, but typically you want
to rely on the Kubernetes scheduler to perform resource-based scheduling.
You should aim to schedule based on properties rather than on instance types
(for example: require a GPU, instead of requiring a g2.2xlarge).
When this annotation is set on a PersistentVolumeClaim (PVC), that indicates that the lifecycle
of the PVC has passed through initial binding setup. When present, that information changes
how the control plane interprets the state of PVC objects.
The value of this annotation does not matter to Kubernetes.
If this annotation is set on a PersistentVolume or PersistentVolumeClaim, it indicates that a
storage binding (PersistentVolume → PersistentVolumeClaim, or PersistentVolumeClaim → PersistentVolume)
was installed by the controller.
If the annotation isn't set, and there is a storage binding in place, the absence of that
annotation means that the binding was done manually.
The value of this annotation does not matter.
This annotation is added to a PersistentVolume(PV) that has been dynamically provisioned by Kubernetes.
Its value is the name of volume plugin that created the volume. It serves both users (to show where a PV
comes from) and Kubernetes (to recognize dynamically provisioned PVs in its decisions).
It is added to a PersistentVolume(PV) and PersistentVolumeClaim(PVC) that is supposed to be
dynamically provisioned/deleted by its corresponding CSI driver through the CSIMigration feature gate.
When this annotation is set, the Kubernetes components will "stand-down" and the
external-provisioner will act on the objects.
When a StatefulSet controller creates a Pod for the StatefulSet, the control plane
sets this label on that Pod. The value of the label is the name of the Pod being created.
See Pod Name Label
in the StatefulSet topic for more details.
On Node: The kubelet or the external cloud-controller-manager populates this
with the information from the cloud provider. This will be set only if you are using
a cloud provider. However, you can consider setting this on nodes if it makes sense
in your topology.
On PersistentVolume: topology-aware volume provisioners will automatically set
node affinity constraints on a PersistentVolume.
A zone represents a logical failure domain. It is common for Kubernetes clusters to
span multiple zones for increased availability. While the exact definition of a zone
is left to infrastructure implementations, common properties of a zone include
very low network latency within a zone, no-cost network traffic within a zone, and
failure independence from other zones.
For example, nodes within a zone might share a network switch, but nodes in different
zones should not.
A region represents a larger domain, made up of one or more zones.
It is uncommon for Kubernetes clusters to span multiple regions,
While the exact definition of a zone or region is left to infrastructure implementations,
common properties of a region include higher network latency between them than within them,
non-zero cost for network traffic between them, and failure independence from other zones or regions.
For example, nodes within a region might share power infrastructure (e.g. a UPS or generator),
but nodes in different regions typically would not.
Kubernetes makes a few assumptions about the structure of zones and regions:
regions and zones are hierarchical: zones are strict subsets of regions and
no zone can be in 2 regions
zone names are unique across regions; for example region "africa-east-1" might be comprised
of zones "africa-east-1a" and "africa-east-1b"
It should be safe to assume that topology labels do not change.
Even though labels are strictly mutable, consumers of them can assume that a given node
is not going to be moved between zones without being destroyed and recreated.
Kubernetes can use this information in various ways.
For example, the scheduler automatically tries to spread the Pods in a ReplicaSet across nodes
in a single-zone cluster (to reduce the impact of node failures, see
kubernetes.io/hostname).
With multiple-zone clusters, this spreading behavior also applies to zones (to reduce the impact of zone failures).
This is achieved via SelectorSpreadPriority.
SelectorSpreadPriority is a best effort placement. If the zones in your cluster are
heterogeneous (for example: different numbers of nodes, different types of nodes, or different pod
resource requirements), this placement might prevent equal spreading of your Pods across zones.
If desired, you can use homogeneous zones (same number and types of nodes) to reduce the probability
of unequal spreading.
The scheduler (through the VolumeZonePredicate predicate) also will ensure that Pods,
that claim a given volume, are only placed into the same zone as that volume.
Volumes cannot be attached across zones.
If PersistentVolumeLabel does not support automatic labeling of your PersistentVolumes,
you should consider adding the labels manually (or adding support for PersistentVolumeLabel).
With PersistentVolumeLabel, the scheduler prevents Pods from mounting volumes in a different zone.
If your infrastructure doesn't have this constraint, you don't need to add the zone labels to the volumes at all.
This annotation can be used for PersistentVolume(PV) or PersistentVolumeClaim(PVC)
to specify the name of StorageClass.
When both the storageClassName attribute and the volume.beta.kubernetes.io/storage-class
annotation are specified, the annotation volume.beta.kubernetes.io/storage-class
takes precedence over the storageClassName attribute.
This annotation has been deprecated. Instead, set the
storageClassName field
for the PersistentVolumeClaim or PersistentVolume.
Example : volume.beta.kubernetes.io/mount-options: "ro,soft"
Used on: PersistentVolume
A Kubernetes administrator can specify additional
mount options
for when a PersistentVolume is mounted on a node.
volume.kubernetes.io/storage-provisioner
Type: Annotation
Used on: PersistentVolumeClaim
This annotation is added to a PVC that is supposed to be dynamically provisioned.
Its value is the name of a volume plugin that is supposed to provision a volume
for this PVC.
volume.kubernetes.io/selected-node
Type: Annotation
Used on: PersistentVolumeClaim
This annotation is added to a PVC that is triggered by a scheduler to be
dynamically provisioned. Its value is the name of the selected node.
If a node has the annotation volumes.kubernetes.io/controller-managed-attach-detach,
its storage attach and detach operations are being managed by the volume attach/detachcontroller.
This annotation is automatically added for the CSINode object that maps to a node that
installs CSIDriver. This annotation shows the in-tree plugin name of the migrated plugin. Its
value depends on your cluster's in-tree cloud provider storage type.
For example, if the in-tree cloud provider storage type is CSIMigrationvSphere, the CSINodes instance for the node should be updated with:
storage.alpha.kubernetes.io/migrated-plugins: "kubernetes.io/vsphere-volume"
service.kubernetes.io/headless
Type: Label
Example: service.kubernetes.io/headless: ""
Used on: Endpoints
The control plane adds this label to an Endpoints object when the owning Service is headless.
To learn more, read Headless Services.
This annotation was used for enabling topology aware hints on Services. Topology aware
hints have since been renamed: the concept is now called
topology aware routing.
Setting the annotation to Auto, on a Service, configured the Kubernetes control plane to
add topology hints on EndpointSlices associated with that Service. You can also explicitly
set the annotation to Disabled.
If you are running a version of Kubernetes older than 1.33,
check the documentation for that Kubernetes version to see how topology aware routing
works in that release.
There are no other valid values for this annotation. If you don't want topology aware hints
for a Service, don't add this annotation.
service.kubernetes.io/topology-mode
Type: Annotation
Example: service.kubernetes.io/topology-mode: Auto
Used on: Service
This annotation provides a way to define how Services handle network topology;
for example, you can configure a Service so that Kubernetes prefers keeping traffic between
a client and server within a single topology zone.
In some cases this can help reduce costs or improve network performance.
This label records the name of the
Service that the EndpointSlice is backing. All EndpointSlices should have this label set to
the name of their associated Service.
This annotation records the unique ID of the
ServiceAccount that the token (stored in the Secret of type kubernetes.io/service-account-token)
represents.
The control plane only adds this label to Secrets that have the type
kubernetes.io/service-account-token.
The value of this label records the date (ISO 8601 format, UTC time zone) when the control plane
last saw a request where the client authenticated using the service account token.
If a legacy token was last used before the cluster gained the feature (added in Kubernetes v1.26),
then the label isn't set.
The control plane automatically adds this label to auto-generated Secrets that
have the type kubernetes.io/service-account-token. This label marks the
Secret-based token as invalid for authentication. The value of this label
records the date (ISO 8601 format, UTC time zone) when the control plane detects
that the auto-generated Secret has not been used for a specified duration
(defaults to one year).
This label is used internally to mark Endpoints objects that were created by
Kubernetes (as opposed to Endpoints created by users or external controllers).
The label is used to indicate the controller or entity that manages the EndpointSlice. This label
aims to enable different EndpointSlice objects to be managed by different controllers or entities
within the same cluster. The value endpointslice-controller.k8s.io indicates an
EndpointSlice object that was created automatically by Kubernetes for a Service with a
selectors.
The label can be set to "true" on an Endpoints resource to indicate that the
EndpointSliceMirroring controller should not mirror this resource with EndpointSlices.
You can use this annotation to set extra configuration on an Ingress that
uses the NGINX Ingress Controller.
The configuration-snippet annotation is ignored
by default since version 1.9.0 of the ingress controller.
The NGINX ingress controller setting allow-snippet-annotations.
has to be explicitly enabled to use this annotation.
Enabling the annotation can be dangerous in a multi-tenant cluster, as it can lead people with otherwise
limited permissions being able to retrieve all Secrets in the cluster.
kubernetes.io/ingress.class (deprecated)
Type: Annotation
Used on: Ingress
Note:
Starting in v1.18, this annotation is deprecated in favor of spec.ingressClassName.
kubernetes.io/cluster-service (deprecated)
Type: Label
Example: kubernetes.io/cluster-service: "true"
Used on: Service
This label indicates that the Service provides a service to the cluster, if the value is set to true.
When you run kubectl cluster-info, the tool queries for Services with this label set to true.
However, setting this label on any Service is deprecated.
When a single StorageClass resource has this annotation set to "true", new PersistentVolumeClaim
resource without a class specified will be assigned this default class.
The kubelet can set this annotation on a Node to denote its configured IPv4 and/or IPv6 address.
When kubelet is started with the --cloud-provider flag set to any value (includes both external
and legacy in-tree cloud providers), it sets this annotation on the Node to denote an IP address
set from the command line flag (--node-ip). This IP is verified with the cloud provider as valid
by the cloud-controller-manager.
This annotation is used to record the original (expected) creation timestamp for a Job,
when that Job is part of a CronJob.
The control plane sets the value to that timestamp in RFC3339 format. If the Job belongs to a CronJob
with a timezone specified, then the timestamp is in that timezone. Otherwise, the timestamp is in controller-manager's local time.
The value of the annotation is the container name that is default for this Pod.
For example, kubectl logs or kubectl exec without -c or --container flag
will use this default container.
The value of the annotation is the container name that is the default logging container for this
Pod. For example, kubectl logs without -c or --container flag will use this default
container.
Note:
This annotation is deprecated. You should use the
kubectl.kubernetes.io/default-container
annotation instead. Kubernetes versions 1.25 and newer ignore this annotation.
Used on: Deployment, ReplicaSet, StatefulSet, DaemonSet, Pod
This annotation contains the latest restart time of a resource (Deployment, ReplicaSet, StatefulSet or DaemonSet),
where kubectl triggered a rollout in order to force creation of new Pods.
The command kubectl rollout restart <RESOURCE> triggers a restart by patching the template
metadata of all the pods of resource with this annotation. In above example the latest restart time is shown as 21st June 2024 at 17:27:41 UTC.
You should not assume that this annotation represents the date / time of the most recent update;
a separate change could have been made since the last manually triggered rollout.
If you manually set this annotation on a Pod, nothing happens. The restarting side effect comes from
how workload management and Pod templating works.
The control plane adds this annotation to
an Endpoints object if the associated
Service has more than 1000 backing endpoints.
The annotation indicates that the Endpoints object is over capacity and the number of endpoints
has been truncated to 1000.
If the number of backend endpoints falls below 1000, the control plane removes this annotation.
This annotation set to an Endpoints object that
represents the timestamp (The timestamp is stored in RFC 3339 date-time string format. For example, '2018-10-22T19:32:52.1Z'). This is timestamp
of the last change in some Pod or Service object, that triggered the change to the Endpoints object.
The control plane previously set annotation on
an Endpoints object. This annotation provided
the following detail:
Who is the current leader.
The time when the current leadership was acquired.
The duration of the lease (of the leadership) in seconds.
The time the current lease (the current leadership) should be renewed.
The number of leadership transitions that happened in the past.
Kubernetes now uses Leases to
manage leader assignment for the Kubernetes control plane.
batch.kubernetes.io/job-tracking (deprecated)
Type: Annotation
Example: batch.kubernetes.io/job-tracking: ""
Used on: Jobs
The presence of this annotation on a Job used to indicate that the control plane is
tracking the Job status using finalizers.
Adding or removing this annotation no longer has an effect (Kubernetes v1.27 and later)
All Jobs are tracked with finalizers.
job-name (deprecated)
Type: Label
Example: job-name: "pi"
Used on: Jobs and Pods controlled by Jobs
Note:
Starting from Kubernetes 1.27, this label is deprecated.
Kubernetes 1.27 and newer ignore this label and use the prefixed job-name label.
controller-uid (deprecated)
Type: Label
Example: controller-uid: "$UID"
Used on: Jobs and Pods controlled by Jobs
Note:
Starting from Kubernetes 1.27, this label is deprecated.
Kubernetes 1.27 and newer ignore this label and use the prefixed controller-uid label.
batch.kubernetes.io/job-name
Type: Label
Example: batch.kubernetes.io/job-name: "pi"
Used on: Jobs and Pods controlled by Jobs
This label is used as a user-friendly way to get Pods corresponding to a Job.
The job-name comes from the name of the Job and allows for an easy way to
get Pods corresponding to the Job.
This label is used as a programmatic way to get all Pods corresponding to a Job. The controller-uid is a unique identifier that gets set in the selector field so the Job
controller can get all the corresponding Pods.
This annotation requires the PodTolerationRestriction
admission controller to be enabled. This annotation key allows assigning tolerations to a
namespace and any new pods created in this namespace would get these tolerations added.
This annotation is only useful when the (Alpha)
PodTolerationRestriction
admission controller is enabled. The annotation value is a JSON document that defines a list of
allowed tolerations for the namespace it annotates. When you create a Pod or modify its
tolerations, the API server checks the tolerations to see if they are mentioned in the allow list.
The pod is admitted only if the check succeeds.
The kubelet detects memory pressure based on memory.available and allocatableMemory.available
observed on a Node. The observed values are then compared to the corresponding thresholds that can
be set on the kubelet to determine if the Node condition and taint should be added/removed.
The kubelet detects disk pressure based on imagefs.available, imagefs.inodesFree,
nodefs.available and nodefs.inodesFree(Linux only) observed on a Node.
The observed values are then compared to the corresponding thresholds that can be set on the
kubelet to determine if the Node condition and taint should be added/removed.
This is initially set by the kubelet when the cloud provider used indicates a requirement for
additional network configuration. Only when the route on the cloud is configured properly will the
taint be removed by the cloud provider.
The kubelet checks D-value of the size of /proc/sys/kernel/pid_max and the PIDs consumed by
Kubernetes on a node to get the number of available PIDs that referred to as the pid.available
metric. The metric is then compared to the corresponding threshold that can be set on the kubelet
to determine if the node condition and taint should be added/removed.
A user can manually add the taint to a Node marking it out-of-service.
If a Node is marked out-of-service with this taint, the Pods on the node
will be forcefully deleted if there are no matching tolerations on it and
volume detach operations for the Pods terminating on the node will happen immediately.
This allows the Pods on the out-of-service node to recover quickly on a different node.
Sets this taint on a Node to mark it as unusable, when kubelet is started with the "external"
cloud provider, until a controller from the cloud-controller-manager initializes this Node, and
then removes the taint.
If a Node is in a cloud provider specified shutdown state, the Node gets tainted accordingly
with node.cloudprovider.kubernetes.io/shutdown and the taint effect of NoSchedule.
These labels are used by the Node Feature Discovery (NFD) component to advertise
features on a node. All built-in labels use the feature.node.kubernetes.io label
namespace and have the format feature.node.kubernetes.io/<feature-name>: "true".
NFD has many extension points for creating vendor and application-specific labels.
For details, see the customization guide.
For node(s) where the Node Feature Discovery (NFD)
master
is scheduled, this annotation records the version of the NFD master.
It is used for informative use only.
This annotation records a comma-separated list of node feature labels managed by
Node Feature Discovery (NFD).
NFD uses this for an internal mechanism. You should not edit this annotation yourself.
This annotation records a comma-separated list of
extended resources
managed by Node Feature Discovery (NFD).
NFD uses this for an internal mechanism. You should not edit this annotation yourself.
nfd.node.kubernetes.io/node-name
Type: Label
Example: nfd.node.kubernetes.io/node-name: node-1
Used on: Nodes
It specifies which node the NodeFeature object is targeting.
Creators of NodeFeature objects must set this label and
consumers of the objects are supposed to use the label for
filtering features designated for a certain node.
Note:
These Node Feature Discovery (NFD) labels or annotations only apply to
the nodes where NFD is running. To learn more about NFD and
its components go to its official documentation.
The cloud controller manager integration with AWS elastic load balancing configures
the load balancer for a Service based on this annotation. The value determines
how often the load balancer writes log entries. For example, if you set the value
to 5, the log writes occur 5 seconds apart.
The cloud controller manager integration with AWS elastic load balancing configures
the load balancer for a Service based on this annotation. Access logging is enabled
if you set the annotation to "true".
Example: service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: example
Used on: Service
The cloud controller manager integration with AWS elastic load balancing configures
the load balancer for a Service based on this annotation. The load balancer
writes logs to an S3 bucket with the name you specify.
The cloud controller manager integration with AWS elastic load balancing configures
the load balancer for a Service based on this annotation. The load balancer
writes log objects with the prefix that you specify.
The cloud controller manager integration with AWS elastic load balancing configures
tags (an AWS concept) for a load balancer based on the comma-separated key/value
pairs in the value of this annotation.
The cloud controller manager integration with AWS elastic load balancing configures
the load balancer based on this annotation. The load balancer's connection draining
setting depends on the value you set.
If you configure connection draining
for a Service of type: LoadBalancer, and you use the AWS cloud, the integration configures
the draining period based on this annotation. The value you set determines the draining
timeout in seconds.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The load balancer has a configured idle
timeout period (in seconds) that applies to its connections. If no data has been
sent or received by the time that the idle timeout period elapses, the load balancer
closes the connection.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. If you set this annotation to "true",
each load balancer node distributes requests evenly across the registered targets
in all enabled availability zones.
If you disable cross-zone load balancing, each load balancer node distributes requests
evenly across the registered targets in its availability zone only.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The value is a comma-separated list
of elastic IP address allocation IDs.
This annotation is only relevant for Services of type: LoadBalancer, where
the load balancer is an AWS Network Load Balancer.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The annotation value is a comma-separated
list of extra AWS VPC security groups to configure for the load balancer.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The annotation value specifies the number of
successive successful health checks required for a backend to be considered healthy
for traffic.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The annotation value specifies the interval,
in seconds, between health check probes made by the load balancer.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The annotation value determines the
path part of the URL that is used for HTTP health checks.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The annotation value determines which
port the load balancer connects to when performing health checks.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The annotation value determines how the
load balancer checks the health of backend targets.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The annotation value specifies the number
of seconds before a probe that hasn't yet succeeded is automatically treated as
having failed.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. The annotation value specifies the number of
successive unsuccessful health checks required for a backend to be considered unhealthy
for traffic.
The cloud controller manager integration with AWS elastic load balancing configures
a load balancer based on this annotation. When you set this annotation to "true",
the integration configures an internal load balancer.
If you set this annotation on a Service, and you also annotate that Service with
service.beta.kubernetes.io/aws-load-balancer-type: "external", and you use the
AWS load balancer controller
in your cluster, then the AWS load balancer controller sets the name of that load
balancer to the value you set for this annotation.
See annotations
in the AWS load balancer controller documentation.
The official Kubernetes integration with AWS elastic load balancing configures
a load balancer based on this annotation. The only permitted value is "*",
which indicates that the load balancer should wrap TCP connections to the backend
Pod with the PROXY protocol.
The AWS load balancer controller uses this annotation to specify a comma separated list
of security groups you want to attach to an AWS load balancer. Both name and ID of security
are supported where name matches a Name tag, not the groupName attribute.
When this annotation is added to a Service, the load-balancer controller attaches the security groups
referenced by the annotation to the load balancer. If you omit this annotation, the AWS load balancer
controller automatically creates a new security group and attaches it to the load balancer.
Note:
Kubernetes v1.27 and later do not directly set or read this annotation. However, the AWS
load balancer controller (part of the Kubernetes project) does still use the
service.beta.kubernetes.io/aws-load-balancer-security-groups annotation.
The official integration with AWS elastic load balancing configures TLS for a Service of
type: LoadBalancer based on this annotation. The value of the annotation is the
AWS Resource Name (ARN) of the X.509 certificate that the load balancer listener should
use.
(The TLS protocol is based on an older technology that abbreviates to SSL.)
The official integration with AWS elastic load balancing configures TLS for a Service of
type: LoadBalancer based on this annotation. The value of the annotation is the name
of an AWS policy for negotiating TLS with a client peer.
The official integration with AWS elastic load balancing configures TLS for a Service of
type: LoadBalancer based on this annotation. The value of the annotation is either "*",
which means that all the load balancer's ports should use TLS, or it is a comma separated
list of port numbers.
Kubernetes' official integration with AWS uses this annotation to configure a
load balancer and determine in which AWS availability zones to deploy the managed
load balancing service. The value is either a comma separated list of subnet names, or a
comma separated list of subnet IDs.
Kubernetes' official integration with AWS uses this annotation to determine which
nodes in your cluster should be considered as valid targets for the load balancer.
Kubernetes' official integrations with AWS use this annotation to determine
whether the AWS cloud provider integration should manage a Service of
type: LoadBalancer.
There are two permitted values:
nlb
the cloud controller manager configures a Network Load Balancer
external
the cloud controller manager does not configure any load balancer
If you deploy a Service of type: LoadBalancer on AWS, and you don't set any
service.beta.kubernetes.io/aws-load-balancer-type annotation,
the AWS integration deploys a classic Elastic Load Balancer. This behavior,
with no annotation present, is the default unless you specify otherwise.
When you set this annotation to external on a Service of type: LoadBalancer,
and your cluster has a working deployment of the AWS Load Balancer controller,
then the AWS Load Balancer controller attempts to deploy a load balancer based
on the Service specification.
Caution:
Do not modify or add the service.beta.kubernetes.io/aws-load-balancer-type annotation
on an existing Service object. See the AWS documentation on this topic for more
details.
This annotation only works for Azure standard load balancer backed service.
This annotation is used on the Service to specify whether the load balancer
should disable or enable TCP reset on idle timeout. If enabled, it helps
applications to behave more predictably, to detect the termination of a connection,
remove expired connections and initiate new connections.
You can set the value to be either true or false.
Value must be one of privileged, baseline, or restricted which correspond to
Pod Security Standard levels.
Specifically, the enforce label prohibits the creation of any Pod in the labeled
Namespace which does not meet the requirements outlined in the indicated level.
Value must be latest or a valid Kubernetes version in the format v<major>.<minor>.
This determines the version of the
Pod Security Standard
policies to apply when validating a Pod.
Value must be one of privileged, baseline, or restricted which correspond to
Pod Security Standard levels.
Specifically, the audit label does not prevent the creation of a Pod in the labeled
Namespace which does not meet the requirements outlined in the indicated level,
but adds an this annotation to the Pod.
Value must be latest or a valid Kubernetes version in the format v<major>.<minor>.
This determines the version of the
Pod Security Standard
policies to apply when validating a Pod.
Value must be one of privileged, baseline, or restricted which correspond to
Pod Security Standard levels.
Specifically, the warn label does not prevent the creation of a Pod in the labeled
Namespace which does not meet the requirements outlined in the indicated level,
but returns a warning to the user after doing so.
Note that warnings are also displayed when creating or updating objects that contain
Pod templates, such as Deployments, Jobs, StatefulSets, etc.
Value must be latest or a valid Kubernetes version in the format v<major>.<minor>.
This determines the version of the Pod Security Standard
policies to apply when validating a submitted Pod.
Note that warnings are also displayed when creating or updating objects that contain
Pod templates, such as Deployments, Jobs, StatefulSets, etc.
Used on: ClusterRole, ClusterRoleBinding, Role, RoleBinding
When this annotation is set to "true" on default RBAC objects created by the API server,
they are automatically updated at server start to add missing permissions and subjects
(extra permissions and subjects are left in place).
To prevent autoupdating a particular role or rolebinding, set this annotation to "false".
If you create your own RBAC objects and set this annotation to "false", kubectl auth reconcile
(which allows reconciling arbitrary RBAC objects in a manifest)
respects this annotation and does not automatically add missing permissions and subjects.
kubernetes.io/psp (deprecated)
Type: Annotation
Example: kubernetes.io/psp: restricted
Used on: Pod
This annotation was only relevant if you were using
PodSecurityPolicy objects.
Kubernetes v1.33 does not support the PodSecurityPolicy API.
When the PodSecurityPolicy admission controller admitted a Pod, the admission controller
modified the Pod to have this annotation.
The value of the annotation was the name of the PodSecurityPolicy that was used for validation.
Kubernetes before v1.25 allowed you to configure seccomp behavior using this annotation.
See Restrict a Container's Syscalls with seccomp to
learn the supported way to specify seccomp restrictions for a Pod.
Kubernetes before v1.25 allowed you to configure seccomp behavior using this annotation.
See Restrict a Container's Syscalls with seccomp to
learn the supported way to specify seccomp restrictions for a Pod.
Value can either be true or false. This determines whether a user can modify
the mode of the source volume when a PersistentVolumeClaim is being created from
a VolumeSnapshot.
This label/annotation is used to store the name of the JobSet that a Job or Pod belongs to.
JobSet is an extension API that you can deploy into your Kubernetes cluster.
This label or annotation stores the name of the replicated job that this Job or Pod is part of.
jobset.sigs.k8s.io/job-index
Type: Label, Annotation
Example: jobset.sigs.k8s.io/job-index: "0"
Used on: Jobs, Pods
This label/annotation is set by the JobSet controller on child Jobs and Pods. It contains the index of the Job replica within its parent ReplicatedJob.
The JobSet controller sets this label (and also an annotation with the same key) on child Jobs and
Pods of a JobSet. The value is the SHA256 hash of the namespaced Job name.
You can set this label/annotation on a JobSet to ensure exclusive Job
placement per topology group. You can also define this label or annotation on a replicated job
template. Read the documentation for JobSet to learn more.
This label/annotation can be applied to a JobSet. When it's set, the JobSet controller modifies the Jobs and their corresponding Pods by adding node selectors and tolerations. This ensures exclusive job placement per topology domain, restricting the scheduling of these Pods to specific nodes based on the strategy.
This label is either set manually or automatically (for example, a cluster autoscaler) on the nodes. When alpha.jobset.sigs.k8s.io/node-selector is set to "true", the JobSet controller adds a nodeSelector to this node label (along with the toleration to the taint alpha.jobset.sigs.k8s.io/no-schedule disucssed next).
This taint is either set manually or automatically (for example, a cluster autoscaler) on the nodes. When alpha.jobset.sigs.k8s.io/node-selector is set to "true", the JobSet controller adds a toleration to this node taint (along with the node selector to the label alpha.jobset.sigs.k8s.io/namespaced-job disucssed previously).
This annotation/label is used on Jobs and Pods to store a stable network endpoint where the coordinator
pod can be reached if the JobSet spec defines the .spec.coordinator field.
Annotation that kubeadm uses to preserve the CRI socket information given to kubeadm at
init/join time for later use. kubeadm annotates the Node object with this information.
The annotation remains "alpha", since ideally this should be a field in KubeletConfiguration
instead.
Annotation that kubeadm places on locally managed etcd Pods to keep track of
a list of URLs where etcd clients should connect to.
This is used mainly for etcd cluster health check purposes.
Annotation that kubeadm places on locally managed kube-apiserver Pods to keep track
of the exposed advertise address/port endpoint for that API server instance.
Annotation that kubeadm places on ConfigMaps that it manages for configuring components.
It contains a hash (SHA-256) used to determine if the user has applied settings different
from the kubeadm defaults for a particular component.
node-role.kubernetes.io/control-plane
Type: Label
Used on: Node
A marker label to indicate that the node is used to run control plane components.
The kubeadm tool applies this label to the control plane nodes that it manages.
Other cluster management tools typically also set this taint.
You can label control plane nodes with this label to make it easier to schedule Pods
only onto these nodes, or to avoid running Pods on the control plane.
If this label is set, the EndpointSlice controller
ignores that node while calculating Topology Aware Hints.
node-role.kubernetes.io/*
Type: Label
Example: node-role.kubernetes.io/gpu: gpu
Used on: Node
This optional label is applied to a node when you want to mark a node role.
The node role (text following / in the label key) can be set, as long as the overall key follows the
syntax rules for
object labels.
Taint that kubeadm applies on control plane nodes to restrict placing Pods and
allow only specific pods to schedule on them.
If this Taint is applied, control plane nodes allow only critical workloads to
be scheduled onto them. You can manually remove this taint with the following
command on a specific node.
Taint that kubeadm previously applied on control plane nodes to allow only critical
workloads to schedule on them. Replaced by the
node-role.kubernetes.io/control-plane
taint. kubeadm no longer sets or uses this deprecated taint.
Used to grant administrative access to certain resource.k8s.io API types within
a namespace. When this label is set on a namespace with the value "true"
(case-sensitive), it allows the use of adminAccess: true in any namespaced
resource.k8s.io API types. Currently, this permission applies to
ResourceClaim and ResourceClaimTemplate objects.
This page serves as a reference for the audit annotations of the kubernetes.io
namespace. These annotations apply to Event object from API group
audit.k8s.io.
Note:
The following annotations are not used within the Kubernetes API. When you
enable auditing in your cluster,
audit event data is written using Event from API group audit.k8s.io.
The annotations apply to audit events. Audit events are different from objects in the
Event API (API group
events.k8s.io).
k8s.io/deprecated
Example: k8s.io/deprecated: "true"
Value must be "true" or "false". The value "true" indicates that the
request used a deprecated API version.
k8s.io/removed-release
Example: k8s.io/removed-release: "1.22"
Value must be in the format "<MAJOR>.<MINOR>". It is set to target the removal release
on requests made to deprecated API versions with a target removal release.
Value must be one of user, namespace, or runtimeClass which correspond to
Pod Security Exemption
dimensions. This annotation indicates on which dimension was based the exemption
from the PodSecurity enforcement.
Value must be privileged:<version>, baseline:<version>,
restricted:<version> which correspond to Pod Security
Standard levels accompanied by
a version which must be latest or a valid Kubernetes version in the format
v<MAJOR>.<MINOR>. This annotations informs about the enforcement level that
allowed or denied the pod during PodSecurity admission.
Example: pod-security.kubernetes.io/audit-violations: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "example" must set securityContext.allowPrivilegeEscalation=false), ...
Value details an audit policy violation, it contains the
Pod Security Standard level
that was transgressed as well as the specific policies on the fields that were
violated from the PodSecurity enforcement.
This annotation indiactes the measure of latency incurred inside the storage layer,
it accounts for the time it takes to send data to the etcd and get the complete response back.
The value of this audit annotation does not include the time incurred in admission, or validation.
Example: missing-san.invalid-cert.kubernetes.io/example-svc.example-namespace.svc: "relies on a legacy Common Name field instead of the SAN extension for subject validation"
Used by Kubernetes version v1.24 and later
This annotation indicates a webhook or aggregated API server
is using an invalid certificate that is missing subjectAltNames.
Support for these certificates was disabled by default in Kubernetes 1.19,
and removed in Kubernetes 1.23.
Requests to endpoints using these certificates will fail.
Services using these certificates should replace them as soon as possible
to avoid disruption when running in Kubernetes 1.23+ environments.
Example: insecure-sha1.invalid-cert.kubernetes.io/example-svc.example-namespace.svc: "uses an insecure SHA-1 signature"
Used by Kubernetes version v1.24 and later
This annotation indicates a webhook or aggregated API server
is using an insecure certificate signed with a SHA-1 hash.
Support for these insecure certificates is disabled by default in Kubernetes 1.24,
and will be removed in a future release.
Services using these certificates should replace them as soon as possible,
to ensure connections are secured properly and to avoid disruption in future releases.
This annotation indicates that a admission policy validation evaluated to false
for an API request, or that the validation resulted in an error while the policy
was configured with failurePolicy: Fail.
The value of the annotation is a JSON object. The message in the JSON
provides the message about the validation failure.
The policy, binding and expressionIndex in the JSON identifies the
name of the ValidatingAdmissionPolicy, the name of the
ValidatingAdmissionPolicyBinding and the index in the policy validations of
the CEL expressions that failed, respectively.
The validationActions shows what actions were taken for this validation failure.
See Validating Admission Policy
for more details about validationActions.
5 - Kubernetes API
Kubernetes' API is the application that serves Kubernetes functionality through a RESTful interface and stores the state of the cluster.
Kubernetes resources and "records of intent" are all stored as API objects, and modified via RESTful calls to the API. The API allows configuration to be managed in a declarative way. Users can interact with the Kubernetes API directly, or via tools like kubectl. The core Kubernetes API is flexible and can also be extended to support custom resources.
5.1 - Workload Resources
5.1.1 - Pod
Pod is a collection of containers that can run on a host.
apiVersion: v1
import "k8s.io/api/core/v1"
Pod
Pod is a collection of containers that can run on a host. This resource is created by clients and scheduled onto hosts.
Map: unique values on key name will be kept during a merge
List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
Map: unique values on key name will be kept during a merge
List of initialization containers belonging to the pod. Init containers are executed in order prior to containers being started. If any init container fails, the pod is considered to have failed and is handled according to its restartPolicy. The name for an init container or normal container must be unique among all containers. Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes. The resourceRequirements of an init container are taken into account during scheduling by finding the highest request/limit for each resource type, and then using the max of that value or the sum of the normal containers. Limits are applied to init containers in a similar fashion. Init containers cannot currently be added or removed. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Map: unique values on key name will be kept during a merge
List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing pod to perform user-initiated actions such as debugging. This list cannot be specified when creating a pod, and it cannot be modified by updating the pod spec. In order to add an ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
EnableServiceLinks indicates whether information about services should be injected into pod's environment variables, matching the syntax of Docker links. Optional: Defaults to true.
os (PodOS)
Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set.
If the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions
If the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.hostUsers - spec.securityContext.appArmorProfile - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.securityContext.supplementalGroupsPolicy - spec.containers[].securityContext.appArmorProfile - spec.containers[].securityContext.seLinuxOptions - spec.containers[].securityContext.seccompProfile - spec.containers[].securityContext.capabilities - spec.containers[].securityContext.readOnlyRootFilesystem - spec.containers[].securityContext.privileged - spec.containers[].securityContext.allowPrivilegeEscalation - spec.containers[].securityContext.procMount - spec.containers[].securityContext.runAsUser - spec.containers[].securityContext.runAsGroup
NodeName indicates in which node this pod is scheduled. If empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName. Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod. This field should not be used to express a desire for the pod to be scheduled on a specific node. https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod in the same node, zone, etc. as some other pod(s)).
tolerations ([]Toleration)
Atomic: will be replaced during a merge
If specified, the pod's tolerations.
The pod this Toleration is attached to tolerates any taint that matches the triple <key,value,effect> using the matching operator .
tolerations.key (string)
Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys.
tolerations.operator (string)
Operator represents a key's relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a pod can tolerate all taints of a particular category.
tolerations.value (string)
Value is the taint value the toleration matches to. If the operator is Exists, the value should be empty, otherwise just a regular string.
tolerations.effect (string)
Effect indicates the taint effect to match. Empty means match all taint effects. When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
tolerations.tolerationSeconds (int64)
TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system.
schedulerName (string)
If specified, the pod will be dispatched by specified scheduler. If not specified, the pod will be dispatched by default scheduler.
runtimeClassName (string)
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run. If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an empty definition that uses the default runtime handler. More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
priorityClassName (string)
If specified, indicates the pod's priority. "system-node-critical" and "system-cluster-critical" are two special keywords which indicate the highest priorities with the former being the highest priority. Any other name must be defined by creating a PriorityClass object with that name. If not specified, the pod priority will be default or zero if there is no default.
priority (int32)
The priority value. Various system components use this field to find the priority of the pod. When Priority Admission Controller is enabled, it prevents users from setting this field. The admission controller populates this field from PriorityClassName. The higher the value, the higher the priority.
preemptionPolicy (string)
PreemptionPolicy is the Policy for preempting pods with lower priority. One of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset.
Map: unique values on keys topologyKey, whenUnsatisfiable will be kept during a merge
TopologySpreadConstraints describes how a group of pods ought to spread across topology domains. Scheduler will schedule pods in a way which abides by the constraints. All topologySpreadConstraints are ANDed.
TopologySpreadConstraint specifies how to spread matching pods among the given topology.
MaxSkew describes the degree to which pods may be unevenly distributed. When whenUnsatisfiable=DoNotSchedule, it is the maximum permitted difference between the number of matching pods in the target topology and the global minimum. The global minimum is the minimum number of matching pods in an eligible domain or zero if the number of eligible domains is less than MinDomains. For example, in a 3-zone cluster, MaxSkew is set to 1, and pods with the same labelSelector spread as 2/2/1: In this case, the global minimum is 1. | zone1 | zone2 | zone3 | | P P | P P | P | - if MaxSkew is 1, incoming pod can only be scheduled to zone3 to become 2/2/2; scheduling it onto zone1(zone2) would make the ActualSkew(3-1) on zone1(zone2) violate MaxSkew(1). - if MaxSkew is 2, incoming pod can be scheduled onto any zone. When whenUnsatisfiable=ScheduleAnyway, it is used to give higher precedence to topologies that satisfy it. It's a required field. Default value is 1 and 0 is not allowed.
TopologyKey is the key of node labels. Nodes that have a label with this key and identical values are considered to be in the same topology. We consider each <key, value> as a "bucket", and try to put balanced number of pods into each bucket. We define a domain as a particular instance of a topology. Also, we define an eligible domain as a domain whose nodes meet the requirements of nodeAffinityPolicy and nodeTaintsPolicy. e.g. If TopologyKey is "kubernetes.io/hostname", each Node is a domain of that topology. And, if TopologyKey is "topology.kubernetes.io/zone", each zone is a domain of that topology. It's a required field.
WhenUnsatisfiable indicates how to deal with a pod if it doesn't satisfy the spread constraint. - DoNotSchedule (default) tells the scheduler not to schedule it. - ScheduleAnyway tells the scheduler to schedule the pod in any location,
but giving higher precedence to topologies that would help reduce the
skew.
A constraint is considered "Unsatisfiable" for an incoming pod if and only if every possible node assignment for that pod would violate "MaxSkew" on some topology. For example, in a 3-zone cluster, MaxSkew is set to 1, and pods with the same labelSelector spread as 3/1/1: | zone1 | zone2 | zone3 | | P P P | P | P | If WhenUnsatisfiable is set to DoNotSchedule, incoming pod can only be scheduled to zone2(zone3) to become 3/2/1(3/1/2) as ActualSkew(2-1) on zone2(zone3) satisfies MaxSkew(1). In other words, the cluster can still be imbalanced, but scheduler won't make it more imbalanced. It's a required field.
LabelSelector is used to find matching pods. Pods that match this label selector are counted to determine the number of pods in their corresponding topology domain.
MatchLabelKeys is a set of pod label keys to select the pods over which spreading will be calculated. The keys are used to lookup values from the incoming pod labels, those key-value labels are ANDed with labelSelector to select the group of existing pods over which spreading will be calculated for the incoming pod. The same key is forbidden to exist in both MatchLabelKeys and LabelSelector. MatchLabelKeys cannot be set when LabelSelector isn't set. Keys that don't exist in the incoming pod labels will be ignored. A null or empty list means only match against labelSelector.
This is a beta field and requires the MatchLabelKeysInPodTopologySpread feature gate to be enabled (enabled by default).
topologySpreadConstraints.minDomains (int32)
MinDomains indicates a minimum number of eligible domains. When the number of eligible domains with matching topology keys is less than minDomains, Pod Topology Spread treats "global minimum" as 0, and then the calculation of Skew is performed. And when the number of eligible domains with matching topology keys equals or greater than minDomains, this value has no effect on scheduling. As a result, when the number of eligible domains is less than minDomains, scheduler won't schedule more than maxSkew Pods to those domains. If value is nil, the constraint behaves as if MinDomains is equal to 1. Valid values are integers greater than 0. When value is not nil, WhenUnsatisfiable must be DoNotSchedule.
For example, in a 3-zone cluster, MaxSkew is set to 2, MinDomains is set to 5 and pods with the same labelSelector spread as 2/2/2: | zone1 | zone2 | zone3 | | P P | P P | P P | The number of domains is less than 5(MinDomains), so "global minimum" is treated as 0. In this situation, new pod with the same labelSelector cannot be scheduled, because computed skew will be 3(3 - 0) if new Pod is scheduled to any of the three zones, it will violate MaxSkew.
NodeAffinityPolicy indicates how we will treat Pod's nodeAffinity/nodeSelector when calculating pod topology spread skew. Options are: - Honor: only nodes matching nodeAffinity/nodeSelector are included in the calculations. - Ignore: nodeAffinity/nodeSelector are ignored. All nodes are included in the calculations.
If this value is nil, the behavior is equivalent to the Honor policy.
NodeTaintsPolicy indicates how we will treat node taints when calculating pod topology spread skew. Options are: - Honor: nodes without taints, along with tainted nodes for which the incoming pod has a toleration, are included. - Ignore: node taints are ignored. All nodes are included.
If this value is nil, the behavior is equivalent to the Ignore policy.
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass. This field will be autopopulated at admission time by the RuntimeClass admission controller. If the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests. The RuntimeClass admission controller will reject Pod create requests which have the overhead already set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero. More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates stop immediately via the kill signal (no opportunity to shut down). If this value is nil, the default grace period will be used instead. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. Defaults to 30 seconds.
activeDeadlineSeconds (int64)
Optional duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers. Value must be a positive integer.
PodReadinessGate contains the reference to a pod condition
readinessGates.conditionType (string), required
ConditionType refers to a condition in the pod's condition list with matching type.
Hostname and Name resolution
hostname (string)
Specifies the hostname of the Pod If not specified, the pod's hostname will be set to a system-defined value.
setHostnameAsFQDN (boolean)
If true the pod's hostname will be configured as the pod's FQDN, rather than the leaf name (the default). In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname). In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to FQDN. If a pod does not have FQDN, this has no effect. Default to false.
subdomain (string)
If specified, the fully qualified Pod hostname will be "<hostname>.<subdomain>.<pod namespace>.svc.<cluster domain>". If not specified, the pod will not have a domainname at all.
hostAliases ([]HostAlias)
Patch strategy: merge on key ip
Map: unique values on key ip will be kept during a merge
HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts file if specified.
HostAlias holds the mapping between IP and hostnames that will be injected as an entry in the pod's hosts file.
hostAliases.ip (string), required
IP address of the host file entry.
hostAliases.hostnames ([]string)
Atomic: will be replaced during a merge
Hostnames for the above IP address.
dnsConfig (PodDNSConfig)
Specifies the DNS parameters of a pod. Parameters specified here will be merged to the generated DNS configuration based on DNSPolicy.
PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
dnsConfig.nameservers ([]string)
Atomic: will be replaced during a merge
A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
dnsConfig.options ([]PodDNSConfigOption)
Atomic: will be replaced during a merge
A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
PodDNSConfigOption defines DNS resolver options of a pod.
dnsConfig.options.name (string)
Name is this DNS resolver option's name. Required.
dnsConfig.options.value (string)
Value is this DNS resolver option's value.
dnsConfig.searches ([]string)
Atomic: will be replaced during a merge
A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
dnsPolicy (string)
Set DNS policy for the pod. Defaults to "ClusterFirst". Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to 'ClusterFirstWithHostNet'.
Hosts namespaces
hostNetwork (boolean)
Host networking requested for this pod. Use the host's network namespace. If this option is set, the ports that will be used must be specified. Default to false.
hostPID (boolean)
Use the host's pid namespace. Optional: Default to false.
hostIPC (boolean)
Use the host's ipc namespace. Optional: Default to false.
shareProcessNamespace (boolean)
Share a single process namespace between all of the containers in a pod. When this is set containers will be able to view and signal processes from other containers in the same pod, and the first process in each container will not be assigned PID 1. HostPID and ShareProcessNamespace cannot both be set. Optional: Default to false.
AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
Security context
securityContext (PodSecurityContext)
SecurityContext holds pod-level security attributes and common container settings. Optional: Defaults to empty. See type description for default values of each field.
PodSecurityContext holds pod-level security attributes and common container settings. Some fields are also present in container.securityContext. Field values of container.securityContext take precedence over field values of PodSecurityContext.
securityContext.appArmorProfile (AppArmorProfile)
appArmorProfile is the AppArmor options to use by the containers in this pod. Note that this field cannot be set when spec.os.name is windows.
AppArmorProfile defines a pod or container's AppArmor settings.
type indicates which kind of AppArmor profile will be applied. Valid options are:
Localhost - a profile pre-loaded on the node.
RuntimeDefault - the container runtime's default profile.
Unconfined - no AppArmor enforcement.
localhostProfile indicates a profile loaded on the node that should be used. The profile must be preconfigured on the node to work. Must match the loaded name of the profile. Must be set if and only if type is "Localhost".
securityContext.fsGroup (int64)
A special supplemental group that applies to all containers in a pod. Some volume types allow the Kubelet to change the ownership of that volume to be owned by the pod:
The owning GID will be the FSGroup 2. The setgid bit is set (new files created in the volume will be owned by FSGroup) 3. The permission bits are OR'd with rw-rw----
If unset, the Kubelet will not modify the ownership and permissions of any volume. Note that this field cannot be set when spec.os.name is windows.
securityContext.fsGroupChangePolicy (string)
fsGroupChangePolicy defines behavior of changing ownership and permission of the volume before being exposed inside Pod. This field will only apply to volume types which support fsGroup based ownership(and permissions). It will have no effect on ephemeral volume types such as: secret, configmaps and emptydir. Valid values are "OnRootMismatch" and "Always". If not specified, "Always" is used. Note that this field cannot be set when spec.os.name is windows.
securityContext.runAsUser (int64)
The UID to run the entrypoint of the container process. Defaults to user specified in image metadata if unspecified. May also be set in SecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence for that container. Note that this field cannot be set when spec.os.name is windows.
securityContext.runAsNonRoot (boolean)
Indicates that the container must run as a non-root user. If true, the Kubelet will validate the image at runtime to ensure that it does not run as UID 0 (root) and fail to start the container if it does. If unset or false, no such validation will be performed. May also be set in SecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence.
securityContext.runAsGroup (int64)
The GID to run the entrypoint of the container process. Uses runtime default if unset. May also be set in SecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence for that container. Note that this field cannot be set when spec.os.name is windows.
securityContext.seccompProfile (SeccompProfile)
The seccomp options to use by the containers in this pod. Note that this field cannot be set when spec.os.name is windows.
SeccompProfile defines a pod/container's seccomp profile settings. Only one profile source may be set.
type indicates which kind of seccomp profile will be applied. Valid options are:
Localhost - a profile defined in a file on the node should be used. RuntimeDefault - the container runtime default profile should be used. Unconfined - no profile should be applied.
localhostProfile indicates a profile defined in a file on the node should be used. The profile must be preconfigured on the node to work. Must be a descending path, relative to the kubelet's configured seccomp profile location. Must be set if type is "Localhost". Must NOT be set for any other type.
securityContext.seLinuxChangePolicy (string)
seLinuxChangePolicy defines how the container's SELinux label is applied to all volumes used by the Pod. It has no effect on nodes that do not support SELinux or to volumes does not support SELinux. Valid values are "MountOption" and "Recursive".
"Recursive" means relabeling of all files on all Pod volumes by the container runtime. This may be slow for large volumes, but allows mixing privileged and unprivileged Pods sharing the same volume on the same node.
"MountOption" mounts all eligible Pod volumes with -o context mount option. This requires all Pods that share the same volume to use the same SELinux label. It is not possible to share the same volume among privileged and unprivileged Pods. Eligible volumes are in-tree FibreChannel and iSCSI volumes, and all CSI volumes whose CSI driver announces SELinux support by setting spec.seLinuxMount: true in their CSIDriver instance. Other volumes are always re-labelled recursively. "MountOption" value is allowed only when SELinuxMount feature gate is enabled.
If not specified and SELinuxMount feature gate is enabled, "MountOption" is used. If not specified and SELinuxMount feature gate is disabled, "MountOption" is used for ReadWriteOncePod volumes and "Recursive" for all other volumes.
This field affects only Pods that have SELinux label set, either in PodSecurityContext or in SecurityContext of all containers.
All Pods that use the same volume should use the same seLinuxChangePolicy, otherwise some pods can get stuck in ContainerCreating state. Note that this field cannot be set when spec.os.name is windows.
securityContext.seLinuxOptions (SELinuxOptions)
The SELinux context to be applied to all containers. If unspecified, the container runtime will allocate a random SELinux context for each container. May also be set in SecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence for that container. Note that this field cannot be set when spec.os.name is windows.
SELinuxOptions are the labels to be applied to the container
securityContext.seLinuxOptions.level (string)
Level is SELinux level label that applies to the container.
securityContext.seLinuxOptions.role (string)
Role is a SELinux role label that applies to the container.
securityContext.seLinuxOptions.type (string)
Type is a SELinux type label that applies to the container.
securityContext.seLinuxOptions.user (string)
User is a SELinux user label that applies to the container.
securityContext.supplementalGroups ([]int64)
Atomic: will be replaced during a merge
A list of groups applied to the first process run in each container, in addition to the container's primary GID and fsGroup (if specified). If the SupplementalGroupsPolicy feature is enabled, the supplementalGroupsPolicy field determines whether these are in addition to or instead of any group memberships defined in the container image. If unspecified, no additional groups are added, though group memberships defined in the container image may still be used, depending on the supplementalGroupsPolicy field. Note that this field cannot be set when spec.os.name is windows.
securityContext.supplementalGroupsPolicy (string)
Defines how supplemental groups of the first container processes are calculated. Valid values are "Merge" and "Strict". If not specified, "Merge" is used. (Alpha) Using the field requires the SupplementalGroupsPolicy feature gate to be enabled and the container runtime must implement support for this feature. Note that this field cannot be set when spec.os.name is windows.
securityContext.sysctls ([]Sysctl)
Atomic: will be replaced during a merge
Sysctls hold a list of namespaced sysctls used for the pod. Pods with unsupported sysctls (by the container runtime) might fail to launch. Note that this field cannot be set when spec.os.name is windows.
The Windows specific settings applied to all containers. If unspecified, the options within a container's SecurityContext will be used. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is linux.
WindowsSecurityContextOptions contain Windows-specific options and credentials.
GMSACredentialSpec is where the GMSA admission webhook (https://github.com/kubernetes-sigs/windows-gmsa) inlines the contents of the GMSA credential spec named by the GMSACredentialSpecName field.
HostProcess determines if a container should be run as a 'Host Process' container. All of a Pod's containers must have the same effective HostProcess value (it is not allowed to have a mix of HostProcess containers and non-HostProcess containers). In addition, if HostProcess is true then HostNetwork must also be set to true.
The UserName in Windows to run the entrypoint of the container process. Defaults to the user specified in image metadata if unspecified. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence.
Alpha level
hostUsers (boolean)
Use the host's user namespace. Optional: Default to true. If set to true or not present, the pod will be run in the host user namespace, useful for when the pod needs a feature only available to the host user namespace, such as loading a kernel module with CAP_SYS_MODULE. When set to false, a new userns is created for the pod. Setting false is useful for mitigating container breakout vulnerabilities even allowing users to run their containers as root without actually having root privileges on the host. This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
resources (ResourceRequirements)
Resources is the total amount of CPU and Memory resources required by all containers in the pod. It supports specifying Requests and Limits for "cpu" and "memory" resource names only. ResourceClaims are not supported.
This field enables fine-grained control over resource allocation for the entire pod, allowing resource sharing among containers in a pod.
This is an alpha field and requires enabling the PodLevelResources feature gate.
ResourceRequirements describes the compute resource requirements.
resources.claims ([]ResourceClaim)
Map: unique values on key name will be kept during a merge
Claims lists the names of resources, defined in spec.resourceClaims, that are used by this container.
This is an alpha field and requires enabling the DynamicResourceAllocation feature gate.
This field is immutable. It can only be set for containers.
ResourceClaim references one entry in PodSpec.ResourceClaims.
resources.claims.name (string), required
Name must match the name of one entry in pod.spec.resourceClaims of the Pod where this field is used. It makes that resource available inside a container.
resources.claims.request (string)
Request is the name chosen for a request in the referenced claim. If empty, everything from the claim is made available, otherwise only the result of this request.
Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
resourceClaims ([]PodResourceClaim)
Patch strategies: retainKeys, merge on key name
Map: unique values on key name will be kept during a merge
ResourceClaims defines which ResourceClaims must be allocated and reserved before the Pod is allowed to start. The resources will be made available to those containers which consume them by name.
This is an alpha field and requires enabling the DynamicResourceAllocation feature gate.
This field is immutable.
*PodResourceClaim references exactly one ResourceClaim, either directly or by naming a ResourceClaimTemplate which is then turned into a ResourceClaim for the pod.
It adds a name to it that uniquely identifies the ResourceClaim inside the Pod. Containers that need access to the ResourceClaim reference it with this name.*
resourceClaims.name (string), required
Name uniquely identifies this resource claim inside the pod. This must be a DNS_LABEL.
resourceClaims.resourceClaimName (string)
ResourceClaimName is the name of a ResourceClaim object in the same namespace as this pod.
Exactly one of ResourceClaimName and ResourceClaimTemplateName must be set.
resourceClaims.resourceClaimTemplateName (string)
ResourceClaimTemplateName is the name of a ResourceClaimTemplate object in the same namespace as this pod.
The template will be used to create a new ResourceClaim, which will be bound to this pod. When this pod is deleted, the ResourceClaim will also be deleted. The pod name and resource name, along with a generated component, will be used to form a unique name for the ResourceClaim, which will be recorded in pod.status.resourceClaimStatuses.
This field is immutable and no changes will be made to the corresponding ResourceClaim by the control plane after creating the ResourceClaim.
Exactly one of ResourceClaimName and ResourceClaimTemplateName must be set.
schedulingGates ([]PodSchedulingGate)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
SchedulingGates is an opaque list of values that if specified will block scheduling the pod. If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the scheduler will not attempt to schedule the pod.
SchedulingGates can only be set at pod creation time, and be removed only afterwards.
PodSchedulingGate is associated to a Pod to guard its scheduling.
schedulingGates.name (string), required
Name of the scheduling gate. Each scheduling gate must have a unique name field.
Deprecated
serviceAccount (string)
DeprecatedServiceAccount is a deprecated alias for ServiceAccountName. Deprecated: Use serviceAccountName instead.
Container
A single application container that you want to run within a pod.
name (string), required
Name of the container specified as a DNS_LABEL. Each container in a pod must have a unique name (DNS_LABEL). Cannot be updated.
Image
image (string)
Container image name. More info: https://kubernetes.io/docs/concepts/containers/images This field is optional to allow higher level config management to default or override container images in workload controllers like Deployments and StatefulSets.
Entrypoint array. Not executed within a shell. The container image's ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container's environment. If a variable cannot be resolved, the reference in the input string will be unchanged. Double $$ are reduced to a single $, which allows for escaping the $(VAR_NAME) syntax: i.e. "$$(VAR_NAME)" will produce the string literal "$(VAR_NAME)". Escaped references will never be expanded, regardless of whether the variable exists or not. Cannot be updated. More info: https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shell
args ([]string)
Atomic: will be replaced during a merge
Arguments to the entrypoint. The container image's CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container's environment. If a variable cannot be resolved, the reference in the input string will be unchanged. Double $$ are reduced to a single $, which allows for escaping the $(VAR_NAME) syntax: i.e. "$$(VAR_NAME)" will produce the string literal "$(VAR_NAME)". Escaped references will never be expanded, regardless of whether the variable exists or not. Cannot be updated. More info: https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shell
workingDir (string)
Container's working directory. If not specified, the container runtime's default will be used, which might be configured in the container image. Cannot be updated.
Ports
ports ([]ContainerPort)
Patch strategy: merge on key containerPort
Map: unique values on keys containerPort, protocol will be kept during a merge
List of ports to expose from the container. Not specifying a port here DOES NOT prevent that port from being exposed. Any port which is listening on the default "0.0.0.0" address inside a container will be accessible from the network. Modifying this array with strategic merge patch may corrupt the data. For more information See https://github.com/kubernetes/kubernetes/issues/108255. Cannot be updated.
ContainerPort represents a network port in a single container.
ports.containerPort (int32), required
Number of port to expose on the pod's IP address. This must be a valid port number, 0 < x < 65536.
ports.hostIP (string)
What host IP to bind the external port to.
ports.hostPort (int32)
Number of port to expose on the host. If specified, this must be a valid port number, 0 < x < 65536. If HostNetwork is specified, this must match ContainerPort. Most containers do not need this.
ports.name (string)
If specified, this must be an IANA_SVC_NAME and unique within the pod. Each named port in a pod must have a unique name. Name for the port that can be referred to by services.
ports.protocol (string)
Protocol for port. Must be UDP, TCP, or SCTP. Defaults to "TCP".
Environment variables
env ([]EnvVar)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
List of environment variables to set in the container. Cannot be updated.
EnvVar represents an environment variable present in a Container.
env.name (string), required
Name of the environment variable. Must be a C_IDENTIFIER.
env.value (string)
Variable references $(VAR_NAME) are expanded using the previously defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. Double $$ are reduced to a single $, which allows for escaping the $(VAR_NAME) syntax: i.e. "$$(VAR_NAME)" will produce the string literal "$(VAR_NAME)". Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".
env.valueFrom (EnvVarSource)
Source for the environment variable's value. Cannot be used if value is not empty.
EnvVarSource represents a source for the value of an EnvVar.
Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels['\<KEY>'], metadata.annotations['\<KEY>'], spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.
Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.
env.valueFrom.secretKeyRef (SecretKeySelector)
Selects a key of a secret in the pod's namespace
SecretKeySelector selects a key of a Secret.
env.valueFrom.secretKeyRef.key (string), required
The key of the secret to select from. Must be a valid secret key.
Specify whether the Secret or its key must be defined
envFrom ([]EnvFromSource)
Atomic: will be replaced during a merge
List of sources to populate environment variables in the container. The keys defined within a source must be a C_IDENTIFIER. All invalid keys will be reported as an event when the container is starting. When a key exists in multiple sources, the value associated with the last source will take precedence. Values defined by an Env with a duplicate key will take precedence. Cannot be updated.
EnvFromSource represents the source of a set of ConfigMaps or Secrets
envFrom.configMapRef (ConfigMapEnvSource)
The ConfigMap to select from
*ConfigMapEnvSource selects a ConfigMap to populate the environment variables with.
The contents of the target ConfigMap's Data field will represent the key-value pairs as environment variables.*
Map: unique values on key mountPath will be kept during a merge
Pod volumes to mount into the container's filesystem. Cannot be updated.
VolumeMount describes a mounting of a Volume within a container.
volumeMounts.mountPath (string), required
Path within the container at which the volume should be mounted. Must not contain ':'.
volumeMounts.name (string), required
This must match the Name of a Volume.
volumeMounts.mountPropagation (string)
mountPropagation determines how mounts are propagated from the host to container and the other way around. When not set, MountPropagationNone is used. This field is beta in 1.10. When RecursiveReadOnly is set to IfPossible or to Enabled, MountPropagation must be None or unspecified (which defaults to None).
volumeMounts.readOnly (boolean)
Mounted read-only if true, read-write otherwise (false or unspecified). Defaults to false.
volumeMounts.recursiveReadOnly (string)
RecursiveReadOnly specifies whether read-only mounts should be handled recursively.
If ReadOnly is false, this field has no meaning and must be unspecified.
If ReadOnly is true, and this field is set to Disabled, the mount is not made recursively read-only. If this field is set to IfPossible, the mount is made recursively read-only, if it is supported by the container runtime. If this field is set to Enabled, the mount is made recursively read-only if it is supported by the container runtime, otherwise the pod will not be started and an error will be generated to indicate the reason.
If this field is set to IfPossible or Enabled, MountPropagation must be set to None (or be unspecified, which defaults to None).
If this field is not specified, it is treated as an equivalent of Disabled.
volumeMounts.subPath (string)
Path within the volume from which the container's volume should be mounted. Defaults to "" (volume's root).
volumeMounts.subPathExpr (string)
Expanded path within the volume from which the container's volume should be mounted. Behaves similarly to SubPath but environment variable references $(VAR_NAME) are expanded using the container's environment. Defaults to "" (volume's root). SubPathExpr and SubPath are mutually exclusive.
volumeDevices ([]VolumeDevice)
Patch strategy: merge on key devicePath
Map: unique values on key devicePath will be kept during a merge
volumeDevices is the list of block devices to be used by the container.
volumeDevice describes a mapping of a raw block device within a container.
volumeDevices.devicePath (string), required
devicePath is the path inside of the container that the device will be mapped to.
volumeDevices.name (string), required
name must match the name of a persistentVolumeClaim in the pod
ResourceRequirements describes the compute resource requirements.
resources.claims ([]ResourceClaim)
Map: unique values on key name will be kept during a merge
Claims lists the names of resources, defined in spec.resourceClaims, that are used by this container.
This is an alpha field and requires enabling the DynamicResourceAllocation feature gate.
This field is immutable. It can only be set for containers.
ResourceClaim references one entry in PodSpec.ResourceClaims.
resources.claims.name (string), required
Name must match the name of one entry in pod.spec.resourceClaims of the Pod where this field is used. It makes that resource available inside a container.
resources.claims.request (string)
Request is the name chosen for a request in the referenced claim. If empty, everything from the claim is made available, otherwise only the result of this request.
Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
resizePolicy ([]ContainerResizePolicy)
Atomic: will be replaced during a merge
Resources resize policy for the container.
ContainerResizePolicy represents resource resize policy for the container.
resizePolicy.resourceName (string), required
Name of the resource to which this resource resize policy applies. Supported values: cpu, memory.
resizePolicy.restartPolicy (string), required
Restart policy to apply when specified resource is resized. If not specified, it defaults to NotRequired.
Lifecycle
lifecycle (Lifecycle)
Actions that the management system should take in response to container lifecycle events. Cannot be updated.
Lifecycle describes actions that the management system should take in response to container lifecycle events. For the PostStart and PreStop lifecycle handlers, management of the container blocks until the action is complete, unless the container process fails, in which case the handler is aborted.
PreStop is called immediately before a container is terminated due to an API request or management event such as liveness/startup probe failure, preemption, resource contention, etc. The handler is not called if the container crashes or exits. The Pod's termination grace period countdown begins before the PreStop hook is executed. Regardless of the outcome of the handler, the container will eventually terminate within the Pod's termination grace period (unless delayed by finalizers). Other management of the container blocks until the hook completes or until the termination grace period is reached. More info: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks
lifecycle.stopSignal (string)
StopSignal defines which signal will be sent to a container when it is being stopped. If not specified, the default is defined by the container runtime in use. StopSignal can only be set for Pods with a non-empty .spec.os.name
terminationMessagePath (string)
Optional: Path at which the file to which the container's termination message will be written is mounted into the container's filesystem. Message written is intended to be brief final status, such as an assertion failure message. Will be truncated by the node if greater than 4096 bytes. The total message length across all containers will be limited to 12kb. Defaults to /dev/termination-log. Cannot be updated.
terminationMessagePolicy (string)
Indicate how the termination message should be populated. File will use the contents of terminationMessagePath to populate the container status message on both success and failure. FallbackToLogsOnError will use the last chunk of container log output if the termination message file is empty and the container exited with an error. The log output is limited to 2048 bytes or 80 lines, whichever is smaller. Defaults to File. Cannot be updated.
StartupProbe indicates that the Pod has successfully initialized. If specified, no other probes are executed until this completes successfully. If this probe fails, the Pod will be restarted, just as if the livenessProbe failed. This can be used to provide different probe parameters at the beginning of a Pod's lifecycle, when it might take a long time to load data or warm a cache, than during steady-state operation. This cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
restartPolicy (string)
RestartPolicy defines the restart behavior of individual containers in a pod. This field may only be set for init containers, and the only allowed value is "Always". For non-init containers or when this field is not specified, the restart behavior is defined by the Pod's restart policy and the container type. Setting the RestartPolicy as "Always" for the init container will have the following effect: this init container will be continually restarted on exit until all regular containers have terminated. Once all regular containers have completed, all init containers with restartPolicy "Always" will be shut down. This lifecycle differs from normal init containers and is often referred to as a "sidecar" container. Although this init container still starts in the init container sequence, it does not wait for the container to complete before proceeding to the next init container. Instead, the next init container starts immediately after this init container is started, or after any startupProbe has successfully completed.
SecurityContext holds security configuration that will be applied to a container. Some fields are present in both SecurityContext and PodSecurityContext. When both are set, the values in SecurityContext take precedence.
AllowPrivilegeEscalation controls whether a process can gain more privileges than its parent process. This bool directly controls if the no_new_privs flag will be set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged 2) has CAP_SYS_ADMIN Note that this field cannot be set when spec.os.name is windows.
securityContext.appArmorProfile (AppArmorProfile)
appArmorProfile is the AppArmor options to use by this container. If set, this profile overrides the pod's appArmorProfile. Note that this field cannot be set when spec.os.name is windows.
AppArmorProfile defines a pod or container's AppArmor settings.
type indicates which kind of AppArmor profile will be applied. Valid options are:
Localhost - a profile pre-loaded on the node.
RuntimeDefault - the container runtime's default profile.
Unconfined - no AppArmor enforcement.
localhostProfile indicates a profile loaded on the node that should be used. The profile must be preconfigured on the node to work. Must match the loaded name of the profile. Must be set if and only if type is "Localhost".
securityContext.capabilities (Capabilities)
The capabilities to add/drop when running containers. Defaults to the default set of capabilities granted by the container runtime. Note that this field cannot be set when spec.os.name is windows.
Adds and removes POSIX capabilities from running containers.
securityContext.capabilities.add ([]string)
Atomic: will be replaced during a merge
Added capabilities
securityContext.capabilities.drop ([]string)
Atomic: will be replaced during a merge
Removed capabilities
securityContext.procMount (string)
procMount denotes the type of proc mount to use for the containers. The default value is Default which uses the container runtime defaults for readonly paths and masked paths. This requires the ProcMountType feature flag to be enabled. Note that this field cannot be set when spec.os.name is windows.
securityContext.privileged (boolean)
Run container in privileged mode. Processes in privileged containers are essentially equivalent to root on the host. Defaults to false. Note that this field cannot be set when spec.os.name is windows.
securityContext.readOnlyRootFilesystem (boolean)
Whether this container has a read-only root filesystem. Default is false. Note that this field cannot be set when spec.os.name is windows.
securityContext.runAsUser (int64)
The UID to run the entrypoint of the container process. Defaults to user specified in image metadata if unspecified. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is windows.
securityContext.runAsNonRoot (boolean)
Indicates that the container must run as a non-root user. If true, the Kubelet will validate the image at runtime to ensure that it does not run as UID 0 (root) and fail to start the container if it does. If unset or false, no such validation will be performed. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence.
securityContext.runAsGroup (int64)
The GID to run the entrypoint of the container process. Uses runtime default if unset. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is windows.
securityContext.seLinuxOptions (SELinuxOptions)
The SELinux context to be applied to the container. If unspecified, the container runtime will allocate a random SELinux context for each container. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is windows.
SELinuxOptions are the labels to be applied to the container
securityContext.seLinuxOptions.level (string)
Level is SELinux level label that applies to the container.
securityContext.seLinuxOptions.role (string)
Role is a SELinux role label that applies to the container.
securityContext.seLinuxOptions.type (string)
Type is a SELinux type label that applies to the container.
securityContext.seLinuxOptions.user (string)
User is a SELinux user label that applies to the container.
securityContext.seccompProfile (SeccompProfile)
The seccomp options to use by this container. If seccomp options are provided at both the pod & container level, the container options override the pod options. Note that this field cannot be set when spec.os.name is windows.
SeccompProfile defines a pod/container's seccomp profile settings. Only one profile source may be set.
type indicates which kind of seccomp profile will be applied. Valid options are:
Localhost - a profile defined in a file on the node should be used. RuntimeDefault - the container runtime default profile should be used. Unconfined - no profile should be applied.
localhostProfile indicates a profile defined in a file on the node should be used. The profile must be preconfigured on the node to work. Must be a descending path, relative to the kubelet's configured seccomp profile location. Must be set if type is "Localhost". Must NOT be set for any other type.
The Windows specific settings applied to all containers. If unspecified, the options from the PodSecurityContext will be used. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is linux.
WindowsSecurityContextOptions contain Windows-specific options and credentials.
GMSACredentialSpec is where the GMSA admission webhook (https://github.com/kubernetes-sigs/windows-gmsa) inlines the contents of the GMSA credential spec named by the GMSACredentialSpecName field.
HostProcess determines if a container should be run as a 'Host Process' container. All of a Pod's containers must have the same effective HostProcess value (it is not allowed to have a mix of HostProcess containers and non-HostProcess containers). In addition, if HostProcess is true then HostNetwork must also be set to true.
The UserName in Windows to run the entrypoint of the container process. Defaults to the user specified in image metadata if unspecified. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence.
Debugging
stdin (boolean)
Whether this container should allocate a buffer for stdin in the container runtime. If this is not set, reads from stdin in the container will always result in EOF. Default is false.
stdinOnce (boolean)
Whether the container runtime should close the stdin channel after it has been opened by a single attach. When stdin is true the stdin stream will remain open across multiple attach sessions. If stdinOnce is set to true, stdin is opened on container start, is empty until the first client attaches to stdin, and then remains open and accepts data until the client disconnects, at which time stdin is closed and remains closed until the container is restarted. If this flag is false, a container processes that reads from stdin will never receive an EOF. Default is false
tty (boolean)
Whether this container should allocate a TTY for itself, also requires 'stdin' to be true. Default is false.
EphemeralContainer
An EphemeralContainer is a temporary container that you may add to an existing Pod for user-initiated activities such as debugging. Ephemeral containers have no resource or scheduling guarantees, and they will not be restarted when they exit or when a Pod is removed or restarted. The kubelet may evict a Pod if an ephemeral container causes the Pod to exceed its resource allocation.
To add an ephemeral container, use the ephemeralcontainers subresource of an existing Pod. Ephemeral containers may not be removed or restarted.
name (string), required
Name of the ephemeral container specified as a DNS_LABEL. This name must be unique among all containers, init containers and ephemeral containers.
targetContainerName (string)
If set, the name of the container from PodSpec that this ephemeral container targets. The ephemeral container will be run in the namespaces (IPC, PID, etc) of this container. If not set then the ephemeral container uses the namespaces configured in the Pod spec.
The container runtime must implement support for this feature. If the runtime does not support namespace targeting then the result of setting this field is undefined.
Entrypoint array. Not executed within a shell. The image's ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container's environment. If a variable cannot be resolved, the reference in the input string will be unchanged. Double $$ are reduced to a single $, which allows for escaping the $(VAR_NAME) syntax: i.e. "$$(VAR_NAME)" will produce the string literal "$(VAR_NAME)". Escaped references will never be expanded, regardless of whether the variable exists or not. Cannot be updated. More info: https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shell
args ([]string)
Atomic: will be replaced during a merge
Arguments to the entrypoint. The image's CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container's environment. If a variable cannot be resolved, the reference in the input string will be unchanged. Double $$ are reduced to a single $, which allows for escaping the $(VAR_NAME) syntax: i.e. "$$(VAR_NAME)" will produce the string literal "$(VAR_NAME)". Escaped references will never be expanded, regardless of whether the variable exists or not. Cannot be updated. More info: https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shell
workingDir (string)
Container's working directory. If not specified, the container runtime's default will be used, which might be configured in the container image. Cannot be updated.
Environment variables
env ([]EnvVar)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
List of environment variables to set in the container. Cannot be updated.
EnvVar represents an environment variable present in a Container.
env.name (string), required
Name of the environment variable. Must be a C_IDENTIFIER.
env.value (string)
Variable references $(VAR_NAME) are expanded using the previously defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. Double $$ are reduced to a single $, which allows for escaping the $(VAR_NAME) syntax: i.e. "$$(VAR_NAME)" will produce the string literal "$(VAR_NAME)". Escaped references will never be expanded, regardless of whether the variable exists or not. Defaults to "".
env.valueFrom (EnvVarSource)
Source for the environment variable's value. Cannot be used if value is not empty.
EnvVarSource represents a source for the value of an EnvVar.
Selects a field of the pod: supports metadata.name, metadata.namespace, metadata.labels['\<KEY>'], metadata.annotations['\<KEY>'], spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.
Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.
env.valueFrom.secretKeyRef (SecretKeySelector)
Selects a key of a secret in the pod's namespace
SecretKeySelector selects a key of a Secret.
env.valueFrom.secretKeyRef.key (string), required
The key of the secret to select from. Must be a valid secret key.
Specify whether the Secret or its key must be defined
envFrom ([]EnvFromSource)
Atomic: will be replaced during a merge
List of sources to populate environment variables in the container. The keys defined within a source must be a C_IDENTIFIER. All invalid keys will be reported as an event when the container is starting. When a key exists in multiple sources, the value associated with the last source will take precedence. Values defined by an Env with a duplicate key will take precedence. Cannot be updated.
EnvFromSource represents the source of a set of ConfigMaps or Secrets
envFrom.configMapRef (ConfigMapEnvSource)
The ConfigMap to select from
*ConfigMapEnvSource selects a ConfigMap to populate the environment variables with.
The contents of the target ConfigMap's Data field will represent the key-value pairs as environment variables.*
Map: unique values on key mountPath will be kept during a merge
Pod volumes to mount into the container's filesystem. Subpath mounts are not allowed for ephemeral containers. Cannot be updated.
VolumeMount describes a mounting of a Volume within a container.
volumeMounts.mountPath (string), required
Path within the container at which the volume should be mounted. Must not contain ':'.
volumeMounts.name (string), required
This must match the Name of a Volume.
volumeMounts.mountPropagation (string)
mountPropagation determines how mounts are propagated from the host to container and the other way around. When not set, MountPropagationNone is used. This field is beta in 1.10. When RecursiveReadOnly is set to IfPossible or to Enabled, MountPropagation must be None or unspecified (which defaults to None).
volumeMounts.readOnly (boolean)
Mounted read-only if true, read-write otherwise (false or unspecified). Defaults to false.
volumeMounts.recursiveReadOnly (string)
RecursiveReadOnly specifies whether read-only mounts should be handled recursively.
If ReadOnly is false, this field has no meaning and must be unspecified.
If ReadOnly is true, and this field is set to Disabled, the mount is not made recursively read-only. If this field is set to IfPossible, the mount is made recursively read-only, if it is supported by the container runtime. If this field is set to Enabled, the mount is made recursively read-only if it is supported by the container runtime, otherwise the pod will not be started and an error will be generated to indicate the reason.
If this field is set to IfPossible or Enabled, MountPropagation must be set to None (or be unspecified, which defaults to None).
If this field is not specified, it is treated as an equivalent of Disabled.
volumeMounts.subPath (string)
Path within the volume from which the container's volume should be mounted. Defaults to "" (volume's root).
volumeMounts.subPathExpr (string)
Expanded path within the volume from which the container's volume should be mounted. Behaves similarly to SubPath but environment variable references $(VAR_NAME) are expanded using the container's environment. Defaults to "" (volume's root). SubPathExpr and SubPath are mutually exclusive.
volumeDevices ([]VolumeDevice)
Patch strategy: merge on key devicePath
Map: unique values on key devicePath will be kept during a merge
volumeDevices is the list of block devices to be used by the container.
volumeDevice describes a mapping of a raw block device within a container.
volumeDevices.devicePath (string), required
devicePath is the path inside of the container that the device will be mapped to.
volumeDevices.name (string), required
name must match the name of a persistentVolumeClaim in the pod
Resources
resizePolicy ([]ContainerResizePolicy)
Atomic: will be replaced during a merge
Resources resize policy for the container.
ContainerResizePolicy represents resource resize policy for the container.
resizePolicy.resourceName (string), required
Name of the resource to which this resource resize policy applies. Supported values: cpu, memory.
resizePolicy.restartPolicy (string), required
Restart policy to apply when specified resource is resized. If not specified, it defaults to NotRequired.
Lifecycle
terminationMessagePath (string)
Optional: Path at which the file to which the container's termination message will be written is mounted into the container's filesystem. Message written is intended to be brief final status, such as an assertion failure message. Will be truncated by the node if greater than 4096 bytes. The total message length across all containers will be limited to 12kb. Defaults to /dev/termination-log. Cannot be updated.
terminationMessagePolicy (string)
Indicate how the termination message should be populated. File will use the contents of terminationMessagePath to populate the container status message on both success and failure. FallbackToLogsOnError will use the last chunk of container log output if the termination message file is empty and the container exited with an error. The log output is limited to 2048 bytes or 80 lines, whichever is smaller. Defaults to File. Cannot be updated.
restartPolicy (string)
Restart policy for the container to manage the restart behavior of each container within a pod. This may only be set for init containers. You cannot set this field on ephemeral containers.
Debugging
stdin (boolean)
Whether this container should allocate a buffer for stdin in the container runtime. If this is not set, reads from stdin in the container will always result in EOF. Default is false.
stdinOnce (boolean)
Whether the container runtime should close the stdin channel after it has been opened by a single attach. When stdin is true the stdin stream will remain open across multiple attach sessions. If stdinOnce is set to true, stdin is opened on container start, is empty until the first client attaches to stdin, and then remains open and accepts data until the client disconnects, at which time stdin is closed and remains closed until the container is restarted. If this flag is false, a container processes that reads from stdin will never receive an EOF. Default is false
tty (boolean)
Whether this container should allocate a TTY for itself, also requires 'stdin' to be true. Default is false.
Security context
securityContext (SecurityContext)
Optional: SecurityContext defines the security options the ephemeral container should be run with. If set, the fields of SecurityContext override the equivalent fields of PodSecurityContext.
SecurityContext holds security configuration that will be applied to a container. Some fields are present in both SecurityContext and PodSecurityContext. When both are set, the values in SecurityContext take precedence.
AllowPrivilegeEscalation controls whether a process can gain more privileges than its parent process. This bool directly controls if the no_new_privs flag will be set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged 2) has CAP_SYS_ADMIN Note that this field cannot be set when spec.os.name is windows.
securityContext.appArmorProfile (AppArmorProfile)
appArmorProfile is the AppArmor options to use by this container. If set, this profile overrides the pod's appArmorProfile. Note that this field cannot be set when spec.os.name is windows.
AppArmorProfile defines a pod or container's AppArmor settings.
type indicates which kind of AppArmor profile will be applied. Valid options are:
Localhost - a profile pre-loaded on the node.
RuntimeDefault - the container runtime's default profile.
Unconfined - no AppArmor enforcement.
localhostProfile indicates a profile loaded on the node that should be used. The profile must be preconfigured on the node to work. Must match the loaded name of the profile. Must be set if and only if type is "Localhost".
securityContext.capabilities (Capabilities)
The capabilities to add/drop when running containers. Defaults to the default set of capabilities granted by the container runtime. Note that this field cannot be set when spec.os.name is windows.
Adds and removes POSIX capabilities from running containers.
securityContext.capabilities.add ([]string)
Atomic: will be replaced during a merge
Added capabilities
securityContext.capabilities.drop ([]string)
Atomic: will be replaced during a merge
Removed capabilities
securityContext.procMount (string)
procMount denotes the type of proc mount to use for the containers. The default value is Default which uses the container runtime defaults for readonly paths and masked paths. This requires the ProcMountType feature flag to be enabled. Note that this field cannot be set when spec.os.name is windows.
securityContext.privileged (boolean)
Run container in privileged mode. Processes in privileged containers are essentially equivalent to root on the host. Defaults to false. Note that this field cannot be set when spec.os.name is windows.
securityContext.readOnlyRootFilesystem (boolean)
Whether this container has a read-only root filesystem. Default is false. Note that this field cannot be set when spec.os.name is windows.
securityContext.runAsUser (int64)
The UID to run the entrypoint of the container process. Defaults to user specified in image metadata if unspecified. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is windows.
securityContext.runAsNonRoot (boolean)
Indicates that the container must run as a non-root user. If true, the Kubelet will validate the image at runtime to ensure that it does not run as UID 0 (root) and fail to start the container if it does. If unset or false, no such validation will be performed. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence.
securityContext.runAsGroup (int64)
The GID to run the entrypoint of the container process. Uses runtime default if unset. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is windows.
securityContext.seLinuxOptions (SELinuxOptions)
The SELinux context to be applied to the container. If unspecified, the container runtime will allocate a random SELinux context for each container. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is windows.
SELinuxOptions are the labels to be applied to the container
securityContext.seLinuxOptions.level (string)
Level is SELinux level label that applies to the container.
securityContext.seLinuxOptions.role (string)
Role is a SELinux role label that applies to the container.
securityContext.seLinuxOptions.type (string)
Type is a SELinux type label that applies to the container.
securityContext.seLinuxOptions.user (string)
User is a SELinux user label that applies to the container.
securityContext.seccompProfile (SeccompProfile)
The seccomp options to use by this container. If seccomp options are provided at both the pod & container level, the container options override the pod options. Note that this field cannot be set when spec.os.name is windows.
SeccompProfile defines a pod/container's seccomp profile settings. Only one profile source may be set.
type indicates which kind of seccomp profile will be applied. Valid options are:
Localhost - a profile defined in a file on the node should be used. RuntimeDefault - the container runtime default profile should be used. Unconfined - no profile should be applied.
localhostProfile indicates a profile defined in a file on the node should be used. The profile must be preconfigured on the node to work. Must be a descending path, relative to the kubelet's configured seccomp profile location. Must be set if type is "Localhost". Must NOT be set for any other type.
The Windows specific settings applied to all containers. If unspecified, the options from the PodSecurityContext will be used. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. Note that this field cannot be set when spec.os.name is linux.
WindowsSecurityContextOptions contain Windows-specific options and credentials.
GMSACredentialSpec is where the GMSA admission webhook (https://github.com/kubernetes-sigs/windows-gmsa) inlines the contents of the GMSA credential spec named by the GMSACredentialSpecName field.
HostProcess determines if a container should be run as a 'Host Process' container. All of a Pod's containers must have the same effective HostProcess value (it is not allowed to have a mix of HostProcess containers and non-HostProcess containers). In addition, if HostProcess is true then HostNetwork must also be set to true.
The UserName in Windows to run the entrypoint of the container process. Defaults to the user specified in image metadata if unspecified. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence.
Not allowed
ports ([]ContainerPort)
Patch strategy: merge on key containerPort
Map: unique values on keys containerPort, protocol will be kept during a merge
Ports are not allowed for ephemeral containers.
ContainerPort represents a network port in a single container.
ports.containerPort (int32), required
Number of port to expose on the pod's IP address. This must be a valid port number, 0 < x < 65536.
ports.hostIP (string)
What host IP to bind the external port to.
ports.hostPort (int32)
Number of port to expose on the host. If specified, this must be a valid port number, 0 < x < 65536. If HostNetwork is specified, this must match ContainerPort. Most containers do not need this.
ports.name (string)
If specified, this must be an IANA_SVC_NAME and unique within the pod. Each named port in a pod must have a unique name. Name for the port that can be referred to by services.
ports.protocol (string)
Protocol for port. Must be UDP, TCP, or SCTP. Defaults to "TCP".
resources (ResourceRequirements)
Resources are not allowed for ephemeral containers. Ephemeral containers use spare resources already allocated to the pod.
ResourceRequirements describes the compute resource requirements.
resources.claims ([]ResourceClaim)
Map: unique values on key name will be kept during a merge
Claims lists the names of resources, defined in spec.resourceClaims, that are used by this container.
This is an alpha field and requires enabling the DynamicResourceAllocation feature gate.
This field is immutable. It can only be set for containers.
ResourceClaim references one entry in PodSpec.ResourceClaims.
resources.claims.name (string), required
Name must match the name of one entry in pod.spec.resourceClaims of the Pod where this field is used. It makes that resource available inside a container.
resources.claims.request (string)
Request is the name chosen for a request in the referenced claim. If empty, everything from the claim is made available, otherwise only the result of this request.
Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
lifecycle (Lifecycle)
Lifecycle is not allowed for ephemeral containers.
Lifecycle describes actions that the management system should take in response to container lifecycle events. For the PostStart and PreStop lifecycle handlers, management of the container blocks until the action is complete, unless the container process fails, in which case the handler is aborted.
PreStop is called immediately before a container is terminated due to an API request or management event such as liveness/startup probe failure, preemption, resource contention, etc. The handler is not called if the container crashes or exits. The Pod's termination grace period countdown begins before the PreStop hook is executed. Regardless of the outcome of the handler, the container will eventually terminate within the Pod's termination grace period (unless delayed by finalizers). Other management of the container blocks until the hook completes or until the termination grace period is reached. More info: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks
lifecycle.stopSignal (string)
StopSignal defines which signal will be sent to a container when it is being stopped. If not specified, the default is defined by the container runtime in use. StopSignal can only be set for Pods with a non-empty .spec.os.name
LifecycleHandler defines a specific action that should be taken in a lifecycle hook. One and only one of the fields, except TCPSocket must be specified.
exec (ExecAction)
Exec specifies a command to execute in the container.
ExecAction describes a "run in container" action.
exec.command ([]string)
Atomic: will be replaced during a merge
Command is the command line to execute inside the container, the working directory for the command is root ('/') in the container's filesystem. The command is simply exec'd, it is not run inside a shell, so traditional shell instructions ('|', etc) won't work. To use a shell, you need to explicitly call out to that shell. Exit status of 0 is treated as live/healthy and non-zero is unhealthy.
httpGet (HTTPGetAction)
HTTPGet specifies an HTTP GET request to perform.
HTTPGetAction describes an action based on HTTP Get requests.
httpGet.port (IntOrString), required
Name or number of the port to access on the container. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
httpGet.host (string)
Host name to connect to, defaults to the pod IP. You probably want to set "Host" in httpHeaders instead.
httpGet.httpHeaders ([]HTTPHeader)
Atomic: will be replaced during a merge
Custom headers to set in the request. HTTP allows repeated headers.
HTTPHeader describes a custom header to be used in HTTP probes
httpGet.httpHeaders.name (string), required
The header field name. This will be canonicalized upon output, so case-variant names will be understood as the same header.
httpGet.httpHeaders.value (string), required
The header field value
httpGet.path (string)
Path to access on the HTTP server.
httpGet.scheme (string)
Scheme to use for connecting to the host. Defaults to HTTP.
sleep (SleepAction)
Sleep represents a duration that the container should sleep.
SleepAction describes a "sleep" action.
sleep.seconds (int64), required
Seconds is the number of seconds to sleep.
tcpSocket (TCPSocketAction)
Deprecated. TCPSocket is NOT supported as a LifecycleHandler and kept for backward compatibility. There is no validation of this field and lifecycle hooks will fail at runtime when it is specified.
TCPSocketAction describes an action based on opening a socket
tcpSocket.port (IntOrString), required
Number or name of the port to access on the container. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
tcpSocket.host (string)
Optional: Host name to connect to, defaults to the pod IP.
NodeAffinity
Node affinity is a group of node affinity scheduling rules.
The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node matches the corresponding matchExpressions; the node(s) with the highest sum are the most preferred.
An empty preferred scheduling term matches all objects with implicit weight 0 (i.e. it's a no-op). A null preferred scheduling term matches no objects (i.e. is also a no-op).
A node selector term, associated with the corresponding weight.
A null or empty node selector term matches no objects. The requirements of them are ANDed. The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to an update), the system may or may not try to eventually evict the pod from its node.
A node selector represents the union of the results of one or more label queries over a set of nodes; that is, it represents the OR of the selectors represented by the node selector terms.
Required. A list of node selector terms. The terms are ORed.
A null or empty node selector term matches no objects. The requirements of them are ANDed. The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node has pods which matches the corresponding podAffinityTerm; the node(s) with the highest sum are the most preferred.
The weights of all of the matched WeightedPodAffinityTerm fields are added per-node to find the most preferred node(s)
Required. A pod affinity term, associated with the corresponding weight.
Defines a set of pods (namely those matching the labelSelector relative to the given namespace(s)) that this pod should be co-located (affinity) or not co-located (anti-affinity) with, where co-located is defined as running on a node whose value of the label with key matches that of any node on which a pod of the set of pods is running
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey is not allowed.
MatchLabelKeys is a set of pod label keys to select which pods will be taken into consideration. The keys are used to lookup values from the incoming pod labels, those key-value labels are merged with labelSelector as key in (value) to select the group of existing pods which pods will be taken into consideration for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming pod labels will be ignored. The default value is empty. The same key is forbidden to exist in both matchLabelKeys and labelSelector. Also, matchLabelKeys cannot be set when labelSelector isn't set.
MismatchLabelKeys is a set of pod label keys to select which pods will be taken into consideration. The keys are used to lookup values from the incoming pod labels, those key-value labels are merged with labelSelector as key notin (value) to select the group of existing pods which pods will be taken into consideration for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming pod labels will be ignored. The default value is empty. The same key is forbidden to exist in both mismatchLabelKeys and labelSelector. Also, mismatchLabelKeys cannot be set when labelSelector isn't set.
A label query over the set of namespaces that the term applies to. The term is applied to the union of the namespaces selected by this field and the ones listed in the namespaces field. null selector and null or empty namespaces list means "this pod's namespace". An empty selector ({}) matches all namespaces.
namespaces specifies a static list of namespace names that the term applies to. The term is applied to the union of the namespaces listed in this field and the ones selected by namespaceSelector. null or empty namespaces list and null namespaceSelector means "this pod's namespace".
If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to a pod label update), the system may or may not try to eventually evict the pod from its node. When there are multiple elements, the lists of nodes corresponding to each podAffinityTerm are intersected, i.e. all terms must be satisfied.
Defines a set of pods (namely those matching the labelSelector relative to the given namespace(s)) that this pod should be co-located (affinity) or not co-located (anti-affinity) with, where co-located is defined as running on a node whose value of the label with key matches that of any node on which a pod of the set of pods is running
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey is not allowed.
MatchLabelKeys is a set of pod label keys to select which pods will be taken into consideration. The keys are used to lookup values from the incoming pod labels, those key-value labels are merged with labelSelector as key in (value) to select the group of existing pods which pods will be taken into consideration for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming pod labels will be ignored. The default value is empty. The same key is forbidden to exist in both matchLabelKeys and labelSelector. Also, matchLabelKeys cannot be set when labelSelector isn't set.
MismatchLabelKeys is a set of pod label keys to select which pods will be taken into consideration. The keys are used to lookup values from the incoming pod labels, those key-value labels are merged with labelSelector as key notin (value) to select the group of existing pods which pods will be taken into consideration for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming pod labels will be ignored. The default value is empty. The same key is forbidden to exist in both mismatchLabelKeys and labelSelector. Also, mismatchLabelKeys cannot be set when labelSelector isn't set.
A label query over the set of namespaces that the term applies to. The term is applied to the union of the namespaces selected by this field and the ones listed in the namespaces field. null selector and null or empty namespaces list means "this pod's namespace". An empty selector ({}) matches all namespaces.
namespaces specifies a static list of namespace names that the term applies to. The term is applied to the union of the namespaces listed in this field and the ones selected by namespaceSelector. null or empty namespaces list and null namespaceSelector means "this pod's namespace".
PodAntiAffinity
Pod anti affinity is a group of inter pod anti affinity scheduling rules.
The scheduler will prefer to schedule pods to nodes that satisfy the anti-affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling anti-affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node has pods which matches the corresponding podAffinityTerm; the node(s) with the highest sum are the most preferred.
The weights of all of the matched WeightedPodAffinityTerm fields are added per-node to find the most preferred node(s)
Required. A pod affinity term, associated with the corresponding weight.
Defines a set of pods (namely those matching the labelSelector relative to the given namespace(s)) that this pod should be co-located (affinity) or not co-located (anti-affinity) with, where co-located is defined as running on a node whose value of the label with key matches that of any node on which a pod of the set of pods is running
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey is not allowed.
MatchLabelKeys is a set of pod label keys to select which pods will be taken into consideration. The keys are used to lookup values from the incoming pod labels, those key-value labels are merged with labelSelector as key in (value) to select the group of existing pods which pods will be taken into consideration for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming pod labels will be ignored. The default value is empty. The same key is forbidden to exist in both matchLabelKeys and labelSelector. Also, matchLabelKeys cannot be set when labelSelector isn't set.
MismatchLabelKeys is a set of pod label keys to select which pods will be taken into consideration. The keys are used to lookup values from the incoming pod labels, those key-value labels are merged with labelSelector as key notin (value) to select the group of existing pods which pods will be taken into consideration for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming pod labels will be ignored. The default value is empty. The same key is forbidden to exist in both mismatchLabelKeys and labelSelector. Also, mismatchLabelKeys cannot be set when labelSelector isn't set.
A label query over the set of namespaces that the term applies to. The term is applied to the union of the namespaces selected by this field and the ones listed in the namespaces field. null selector and null or empty namespaces list means "this pod's namespace". An empty selector ({}) matches all namespaces.
namespaces specifies a static list of namespace names that the term applies to. The term is applied to the union of the namespaces listed in this field and the ones selected by namespaceSelector. null or empty namespaces list and null namespaceSelector means "this pod's namespace".
If the anti-affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the anti-affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to a pod label update), the system may or may not try to eventually evict the pod from its node. When there are multiple elements, the lists of nodes corresponding to each podAffinityTerm are intersected, i.e. all terms must be satisfied.
Defines a set of pods (namely those matching the labelSelector relative to the given namespace(s)) that this pod should be co-located (affinity) or not co-located (anti-affinity) with, where co-located is defined as running on a node whose value of the label with key matches that of any node on which a pod of the set of pods is running
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey is not allowed.
MatchLabelKeys is a set of pod label keys to select which pods will be taken into consideration. The keys are used to lookup values from the incoming pod labels, those key-value labels are merged with labelSelector as key in (value) to select the group of existing pods which pods will be taken into consideration for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming pod labels will be ignored. The default value is empty. The same key is forbidden to exist in both matchLabelKeys and labelSelector. Also, matchLabelKeys cannot be set when labelSelector isn't set.
MismatchLabelKeys is a set of pod label keys to select which pods will be taken into consideration. The keys are used to lookup values from the incoming pod labels, those key-value labels are merged with labelSelector as key notin (value) to select the group of existing pods which pods will be taken into consideration for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming pod labels will be ignored. The default value is empty. The same key is forbidden to exist in both mismatchLabelKeys and labelSelector. Also, mismatchLabelKeys cannot be set when labelSelector isn't set.
A label query over the set of namespaces that the term applies to. The term is applied to the union of the namespaces selected by this field and the ones listed in the namespaces field. null selector and null or empty namespaces list means "this pod's namespace". An empty selector ({}) matches all namespaces.
namespaces specifies a static list of namespace names that the term applies to. The term is applied to the union of the namespaces listed in this field and the ones selected by namespaceSelector. null or empty namespaces list and null namespaceSelector means "this pod's namespace".
Probe
Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic.
exec (ExecAction)
Exec specifies a command to execute in the container.
ExecAction describes a "run in container" action.
exec.command ([]string)
Atomic: will be replaced during a merge
Command is the command line to execute inside the container, the working directory for the command is root ('/') in the container's filesystem. The command is simply exec'd, it is not run inside a shell, so traditional shell instructions ('|', etc) won't work. To use a shell, you need to explicitly call out to that shell. Exit status of 0 is treated as live/healthy and non-zero is unhealthy.
httpGet (HTTPGetAction)
HTTPGet specifies an HTTP GET request to perform.
HTTPGetAction describes an action based on HTTP Get requests.
httpGet.port (IntOrString), required
Name or number of the port to access on the container. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
httpGet.host (string)
Host name to connect to, defaults to the pod IP. You probably want to set "Host" in httpHeaders instead.
httpGet.httpHeaders ([]HTTPHeader)
Atomic: will be replaced during a merge
Custom headers to set in the request. HTTP allows repeated headers.
HTTPHeader describes a custom header to be used in HTTP probes
httpGet.httpHeaders.name (string), required
The header field name. This will be canonicalized upon output, so case-variant names will be understood as the same header.
httpGet.httpHeaders.value (string), required
The header field value
httpGet.path (string)
Path to access on the HTTP server.
httpGet.scheme (string)
Scheme to use for connecting to the host. Defaults to HTTP.
tcpSocket (TCPSocketAction)
TCPSocket specifies a connection to a TCP port.
TCPSocketAction describes an action based on opening a socket
tcpSocket.port (IntOrString), required
Number or name of the port to access on the container. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
tcpSocket.host (string)
Optional: Host name to connect to, defaults to the pod IP.
Optional duration in seconds the pod needs to terminate gracefully upon probe failure. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. If this value is nil, the pod's terminationGracePeriodSeconds will be used. Otherwise, this value overrides the value provided by the pod spec. Value must be non-negative integer. The value zero indicates stop immediately via the kill signal (no opportunity to shut down). This is a beta field and requires enabling ProbeTerminationGracePeriod feature gate. Minimum value is 1. spec.terminationGracePeriodSeconds is used if unset.
periodSeconds (int32)
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
Minimum consecutive failures for the probe to be considered failed after having succeeded. Defaults to 3. Minimum value is 1.
successThreshold (int32)
Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness and startup. Minimum value is 1.
grpc (GRPCAction)
GRPC specifies a GRPC HealthCheckRequest.
GRPCAction specifies an action involving a GRPC service.
grpc.port (int32), required
Port number of the gRPC service. Number must be in the range 1 to 65535.
If this is not specified, the default behavior is defined by gRPC.
PodStatus
PodStatus represents information about the status of a pod. Status may trail the actual state of a system, especially if the node that hosts the pod cannot contact the control plane.
nominatedNodeName (string)
nominatedNodeName is set only when this pod preempts other pods on the node, but it cannot be scheduled right away as preemption victims receive their graceful termination periods. This field does not guarantee that the pod will be scheduled on this node. Scheduler may decide to place the pod elsewhere if other nodes become available sooner. Scheduler may also decide to give the resources on this node to a higher priority pod that is created after preemption. As a result, this field may be different than PodSpec.nodeName when the pod is scheduled.
hostIP (string)
hostIP holds the IP address of the host to which the pod is assigned. Empty if the pod has not started yet. A pod can be assigned to a node that has a problem in kubelet which in turns mean that HostIP will not be updated even if there is a node is assigned to pod
hostIPs ([]HostIP)
Patch strategy: merge on key ip
Atomic: will be replaced during a merge
hostIPs holds the IP addresses allocated to the host. If this field is specified, the first entry must match the hostIP field. This list is empty if the pod has not started yet. A pod can be assigned to a node that has a problem in kubelet which in turns means that HostIPs will not be updated even if there is a node is assigned to this pod.
HostIP represents a single IP address allocated to the host.
hostIPs.ip (string), required
IP is the IP address assigned to the host
startTime (Time)
RFC 3339 date and time at which the object was acknowledged by the Kubelet. This is before the Kubelet pulled the container image(s) for the pod.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
phase (string)
The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. The conditions array, the reason and message fields, and the individual container status arrays contain more detail about the pod's status. There are five possible phase values:
Pending: The pod has been accepted by the Kubernetes system, but one or more of the container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while. Running: The pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting. Succeeded: All containers in the pod have terminated in success, and will not be restarted. Failed: All containers in the pod have terminated, and at least one container has terminated in failure. The container either exited with non-zero status or was terminated by the system. Unknown: For some reason the state of the pod could not be obtained, typically due to an error in communicating with the host of the pod.
A human readable message indicating details about why the pod is in this condition.
reason (string)
A brief CamelCase message indicating details about why the pod is in this state. e.g. 'Evicted'
podIP (string)
podIP address allocated to the pod. Routable at least within the cluster. Empty if not yet allocated.
podIPs ([]PodIP)
Patch strategy: merge on key ip
Map: unique values on key ip will be kept during a merge
podIPs holds the IP addresses allocated to the pod. If this field is specified, the 0th entry must match the podIP field. Pods may be allocated at most 1 value for each of IPv4 and IPv6. This list is empty if no IPs have been allocated yet.
PodIP represents a single IP address allocated to the pod.
podIPs.ip (string), required
IP is the IP address assigned to the pod
conditions ([]PodCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.lastTransitionTime (Time)
Last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
Human-readable message indicating details about last transition.
conditions.observedGeneration (int64)
If set, this represents the .metadata.generation that the pod condition was set based upon. This is an alpha field. Enable PodObservedGenerationTracking to be able to use this field.
conditions.reason (string)
Unique, one-word, CamelCase reason for the condition's last transition.
Statuses of init containers in this pod. The most recent successful non-restartable init container will have ready = true, the most recently started container will have startTime set. Each init container in the pod should have at most one status in this list, and all statuses should be for containers in the pod. However this is not enforced. If a status for a non-existent container is present in the list, or the list has duplicate names, the behavior of various Kubernetes components is not defined and those statuses might be ignored. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-and-container-status
ContainerStatus contains details for the current status of this container.
AllocatedResources represents the compute resources allocated for this container by the node. Kubelet sets this value to Container.Resources.Requests upon successful pod admission and after successfully admitting desired pod resize.
Name of the resource. Must be unique within the pod and in case of non-DRA resource, match one of the resources from the pod spec. For DRA resources, the value must be "claim:<claim_name>/<request>". When this status is reported about a container, the "claim_name" and "request" must match one of the claims of this container.
Map: unique values on key resourceID will be kept during a merge
List of unique resources health. Each element in the list contains an unique resource ID and its health. At a minimum, for the lifetime of a Pod, resource ID must uniquely identify the resource allocated to the Pod on the Node. If other Pod on the same Node reports the status with the same resource ID, it must be the same resource they share. See ResourceID type definition for a specific format it has in various use cases.
ResourceHealth represents the health of a resource. It has the latest device health information. This is a part of KEP https://kep.k8s.io/4680.
Unhealthy: reported unhealthy. We consider this a temporary health issue
since we do not have a mechanism today to distinguish
temporary and permanent issues.
Unknown: The status cannot be determined.
For example, Device Plugin got unregistered and hasn't been re-registered since.
In future we may want to introduce the PermanentlyUnhealthy Status.
initContainerStatuses.containerID (string)
ContainerID is the ID of the container in the format '<type>://<container_id>'. Where type is a container runtime identifier, returned from Version call of CRI API (for example "containerd").
initContainerStatuses.image (string), required
Image is the name of container image that the container is running. The container image may not match the image used in the PodSpec, as it may have been resolved by the runtime. More info: https://kubernetes.io/docs/concepts/containers/images.
initContainerStatuses.imageID (string), required
ImageID is the image ID of the container's image. The image ID may not match the image ID of the image used in the PodSpec, as it may have been resolved by the runtime.
initContainerStatuses.lastState (ContainerState)
LastTerminationState holds the last termination state of the container to help debug container crashes and restarts. This field is not populated if the container is still running and RestartCount is 0.
ContainerState holds a possible state of container. Only one of its members may be specified. If none of them is specified, the default one is ContainerStateWaiting.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time at which previous execution of the container started
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Name is a DNS_LABEL representing the unique name of the container. Each container in a pod must have a unique name across all container types. Cannot be updated.
initContainerStatuses.ready (boolean), required
Ready specifies whether the container is currently passing its readiness check. The value will change as readiness probes keep executing. If no readiness probes are specified, this field defaults to true once the container is fully started (see Started field).
The value is typically used to determine whether a container is ready to accept traffic.
Resources represents the compute resource requests and limits that have been successfully enacted on the running container after it has been started or has been successfully resized.
ResourceRequirements describes the compute resource requirements.
Name must match the name of one entry in pod.spec.resourceClaims of the Pod where this field is used. It makes that resource available inside a container.
Request is the name chosen for a request in the referenced claim. If empty, everything from the claim is made available, otherwise only the result of this request.
Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
RestartCount holds the number of times the container has been restarted. Kubelet makes an effort to always increment the value, but there are cases when the state may be lost due to node restarts and then the value may be reset to 0. The value is never negative.
initContainerStatuses.started (boolean)
Started indicates whether the container has finished its postStart lifecycle hook and passed its startup probe. Initialized as false, becomes true after startupProbe is considered successful. Resets to false when the container is restarted, or if kubelet loses state temporarily. In both cases, startup probes will run again. Is always true when no startupProbe is defined and container is running and has passed the postStart lifecycle hook. The null value must be treated the same as false.
initContainerStatuses.state (ContainerState)
State holds details about the container's current condition.
ContainerState holds a possible state of container. Only one of its members may be specified. If none of them is specified, the default one is ContainerStateWaiting.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time at which previous execution of the container started
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Linux holds user identity information initially attached to the first process of the containers in Linux. Note that the actual running identity can be changed if the process has enough privilege to do so.
LinuxContainerUser represents user identity information in Linux containers
RecursiveReadOnly must be set to Disabled, Enabled, or unspecified (for non-readonly mounts). An IfPossible value in the original VolumeMount must be translated to Disabled or Enabled, depending on the mount result.
containerStatuses ([]ContainerStatus)
Atomic: will be replaced during a merge
Statuses of containers in this pod. Each container in the pod should have at most one status in this list, and all statuses should be for containers in the pod. However this is not enforced. If a status for a non-existent container is present in the list, or the list has duplicate names, the behavior of various Kubernetes components is not defined and those statuses might be ignored. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-and-container-status
ContainerStatus contains details for the current status of this container.
AllocatedResources represents the compute resources allocated for this container by the node. Kubelet sets this value to Container.Resources.Requests upon successful pod admission and after successfully admitting desired pod resize.
Name of the resource. Must be unique within the pod and in case of non-DRA resource, match one of the resources from the pod spec. For DRA resources, the value must be "claim:<claim_name>/<request>". When this status is reported about a container, the "claim_name" and "request" must match one of the claims of this container.
Map: unique values on key resourceID will be kept during a merge
List of unique resources health. Each element in the list contains an unique resource ID and its health. At a minimum, for the lifetime of a Pod, resource ID must uniquely identify the resource allocated to the Pod on the Node. If other Pod on the same Node reports the status with the same resource ID, it must be the same resource they share. See ResourceID type definition for a specific format it has in various use cases.
ResourceHealth represents the health of a resource. It has the latest device health information. This is a part of KEP https://kep.k8s.io/4680.
Unhealthy: reported unhealthy. We consider this a temporary health issue
since we do not have a mechanism today to distinguish
temporary and permanent issues.
Unknown: The status cannot be determined.
For example, Device Plugin got unregistered and hasn't been re-registered since.
In future we may want to introduce the PermanentlyUnhealthy Status.
containerStatuses.containerID (string)
ContainerID is the ID of the container in the format '<type>://<container_id>'. Where type is a container runtime identifier, returned from Version call of CRI API (for example "containerd").
containerStatuses.image (string), required
Image is the name of container image that the container is running. The container image may not match the image used in the PodSpec, as it may have been resolved by the runtime. More info: https://kubernetes.io/docs/concepts/containers/images.
containerStatuses.imageID (string), required
ImageID is the image ID of the container's image. The image ID may not match the image ID of the image used in the PodSpec, as it may have been resolved by the runtime.
containerStatuses.lastState (ContainerState)
LastTerminationState holds the last termination state of the container to help debug container crashes and restarts. This field is not populated if the container is still running and RestartCount is 0.
ContainerState holds a possible state of container. Only one of its members may be specified. If none of them is specified, the default one is ContainerStateWaiting.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time at which previous execution of the container started
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Name is a DNS_LABEL representing the unique name of the container. Each container in a pod must have a unique name across all container types. Cannot be updated.
containerStatuses.ready (boolean), required
Ready specifies whether the container is currently passing its readiness check. The value will change as readiness probes keep executing. If no readiness probes are specified, this field defaults to true once the container is fully started (see Started field).
The value is typically used to determine whether a container is ready to accept traffic.
Resources represents the compute resource requests and limits that have been successfully enacted on the running container after it has been started or has been successfully resized.
ResourceRequirements describes the compute resource requirements.
Name must match the name of one entry in pod.spec.resourceClaims of the Pod where this field is used. It makes that resource available inside a container.
Request is the name chosen for a request in the referenced claim. If empty, everything from the claim is made available, otherwise only the result of this request.
Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
containerStatuses.restartCount (int32), required
RestartCount holds the number of times the container has been restarted. Kubelet makes an effort to always increment the value, but there are cases when the state may be lost due to node restarts and then the value may be reset to 0. The value is never negative.
containerStatuses.started (boolean)
Started indicates whether the container has finished its postStart lifecycle hook and passed its startup probe. Initialized as false, becomes true after startupProbe is considered successful. Resets to false when the container is restarted, or if kubelet loses state temporarily. In both cases, startup probes will run again. Is always true when no startupProbe is defined and container is running and has passed the postStart lifecycle hook. The null value must be treated the same as false.
containerStatuses.state (ContainerState)
State holds details about the container's current condition.
ContainerState holds a possible state of container. Only one of its members may be specified. If none of them is specified, the default one is ContainerStateWaiting.
ContainerStateRunning is a running state of a container.
containerStatuses.state.running.startedAt (Time)
Time at which the container was last (re-)started
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time at which previous execution of the container started
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
ContainerStateWaiting is a waiting state of a container.
containerStatuses.state.waiting.message (string)
Message regarding why the container is not yet running.
containerStatuses.state.waiting.reason (string)
(brief) reason the container is not yet running.
containerStatuses.stopSignal (string)
StopSignal reports the effective stop signal for this container
containerStatuses.user (ContainerUser)
User represents user identity information initially attached to the first process of the container
ContainerUser represents user identity information
containerStatuses.user.linux (LinuxContainerUser)
Linux holds user identity information initially attached to the first process of the containers in Linux. Note that the actual running identity can be changed if the process has enough privilege to do so.
LinuxContainerUser represents user identity information in Linux containers
RecursiveReadOnly must be set to Disabled, Enabled, or unspecified (for non-readonly mounts). An IfPossible value in the original VolumeMount must be translated to Disabled or Enabled, depending on the mount result.
ephemeralContainerStatuses ([]ContainerStatus)
Atomic: will be replaced during a merge
Statuses for any ephemeral containers that have run in this pod. Each ephemeral container in the pod should have at most one status in this list, and all statuses should be for containers in the pod. However this is not enforced. If a status for a non-existent container is present in the list, or the list has duplicate names, the behavior of various Kubernetes components is not defined and those statuses might be ignored. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-and-container-status
ContainerStatus contains details for the current status of this container.
AllocatedResources represents the compute resources allocated for this container by the node. Kubelet sets this value to Container.Resources.Requests upon successful pod admission and after successfully admitting desired pod resize.
Name of the resource. Must be unique within the pod and in case of non-DRA resource, match one of the resources from the pod spec. For DRA resources, the value must be "claim:<claim_name>/<request>". When this status is reported about a container, the "claim_name" and "request" must match one of the claims of this container.
Map: unique values on key resourceID will be kept during a merge
List of unique resources health. Each element in the list contains an unique resource ID and its health. At a minimum, for the lifetime of a Pod, resource ID must uniquely identify the resource allocated to the Pod on the Node. If other Pod on the same Node reports the status with the same resource ID, it must be the same resource they share. See ResourceID type definition for a specific format it has in various use cases.
ResourceHealth represents the health of a resource. It has the latest device health information. This is a part of KEP https://kep.k8s.io/4680.
Unhealthy: reported unhealthy. We consider this a temporary health issue
since we do not have a mechanism today to distinguish
temporary and permanent issues.
Unknown: The status cannot be determined.
For example, Device Plugin got unregistered and hasn't been re-registered since.
In future we may want to introduce the PermanentlyUnhealthy Status.
ephemeralContainerStatuses.containerID (string)
ContainerID is the ID of the container in the format '<type>://<container_id>'. Where type is a container runtime identifier, returned from Version call of CRI API (for example "containerd").
Image is the name of container image that the container is running. The container image may not match the image used in the PodSpec, as it may have been resolved by the runtime. More info: https://kubernetes.io/docs/concepts/containers/images.
ImageID is the image ID of the container's image. The image ID may not match the image ID of the image used in the PodSpec, as it may have been resolved by the runtime.
LastTerminationState holds the last termination state of the container to help debug container crashes and restarts. This field is not populated if the container is still running and RestartCount is 0.
ContainerState holds a possible state of container. Only one of its members may be specified. If none of them is specified, the default one is ContainerStateWaiting.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time at which previous execution of the container started
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Name is a DNS_LABEL representing the unique name of the container. Each container in a pod must have a unique name across all container types. Cannot be updated.
Ready specifies whether the container is currently passing its readiness check. The value will change as readiness probes keep executing. If no readiness probes are specified, this field defaults to true once the container is fully started (see Started field).
The value is typically used to determine whether a container is ready to accept traffic.
Resources represents the compute resource requests and limits that have been successfully enacted on the running container after it has been started or has been successfully resized.
ResourceRequirements describes the compute resource requirements.
Name must match the name of one entry in pod.spec.resourceClaims of the Pod where this field is used. It makes that resource available inside a container.
Request is the name chosen for a request in the referenced claim. If empty, everything from the claim is made available, otherwise only the result of this request.
Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
RestartCount holds the number of times the container has been restarted. Kubelet makes an effort to always increment the value, but there are cases when the state may be lost due to node restarts and then the value may be reset to 0. The value is never negative.
ephemeralContainerStatuses.started (boolean)
Started indicates whether the container has finished its postStart lifecycle hook and passed its startup probe. Initialized as false, becomes true after startupProbe is considered successful. Resets to false when the container is restarted, or if kubelet loses state temporarily. In both cases, startup probes will run again. Is always true when no startupProbe is defined and container is running and has passed the postStart lifecycle hook. The null value must be treated the same as false.
ephemeralContainerStatuses.state (ContainerState)
State holds details about the container's current condition.
ContainerState holds a possible state of container. Only one of its members may be specified. If none of them is specified, the default one is ContainerStateWaiting.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time at which previous execution of the container started
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
Linux holds user identity information initially attached to the first process of the containers in Linux. Note that the actual running identity can be changed if the process has enough privilege to do so.
LinuxContainerUser represents user identity information in Linux containers
RecursiveReadOnly must be set to Disabled, Enabled, or unspecified (for non-readonly mounts). An IfPossible value in the original VolumeMount must be translated to Disabled or Enabled, depending on the mount result.
resourceClaimStatuses ([]PodResourceClaimStatus)
Patch strategies: retainKeys, merge on key name
Map: unique values on key name will be kept during a merge
Status of resource claims.
PodResourceClaimStatus is stored in the PodStatus for each PodResourceClaim which references a ResourceClaimTemplate. It stores the generated name for the corresponding ResourceClaim.
resourceClaimStatuses.name (string), required
Name uniquely identifies this resource claim inside the pod. This must match the name of an entry in pod.spec.resourceClaims, which implies that the string must be a DNS_LABEL.
resourceClaimStatuses.resourceClaimName (string)
ResourceClaimName is the name of the ResourceClaim that was generated for the Pod in the namespace of the Pod. If this is unset, then generating a ResourceClaim was not necessary. The pod.spec.resourceClaims entry can be ignored in this case.
resize (string)
Status of resources resize desired for pod's containers. It is empty if no resources resize is pending. Any changes to container resources will automatically set this to "Proposed" Deprecated: Resize status is moved to two pod conditions PodResizePending and PodResizeInProgress. PodResizePending will track states where the spec has been resized, but the Kubelet has not yet allocated the resources. PodResizeInProgress will track in-progress resizes, and should be present whenever allocated resources != acknowledged resources.
observedGeneration (int64)
If set, this represents the .metadata.generation that the pod status was set based upon. This is an alpha field. Enable PodObservedGenerationTracking to be able to use this field.
The container for which to stream logs. Defaults to only container if there is one container in the pod.
follow (in query): boolean
Follow the log stream of the pod. Defaults to false.
insecureSkipTLSVerifyBackend (in query): boolean
insecureSkipTLSVerifyBackend indicates that the apiserver should not confirm the validity of the serving certificate of the backend it is connecting to. This will make the HTTPS connection between the apiserver and the backend insecure. This means the apiserver cannot verify the log data it is receiving came from the real kubelet. If the kubelet is configured to verify the apiserver's TLS credentials, it does not mean the connection to the real kubelet is vulnerable to a man in the middle attack (e.g. an attacker could not intercept the actual log data coming from the real kubelet).
limitBytes (in query): integer
If set, the number of bytes to read from the server before terminating the log output. This may not display a complete final line of logging, and may return slightly more or slightly less than the specified limit.
Return previous terminated container logs. Defaults to false.
sinceSeconds (in query): integer
A relative time in seconds before the current time from which to show logs. If this value precedes the time a pod was started, only logs since the pod start will be returned. If this value is in the future, no logs will be returned. Only one of sinceSeconds or sinceTime may be specified.
stream (in query): string
Specify which container log stream to return to the client. Acceptable values are "All", "Stdout" and "Stderr". If not specified, "All" is used, and both stdout and stderr are returned interleaved. Note that when "TailLines" is specified, "Stream" can only be set to nil or "All".
tailLines (in query): integer
If set, the number of lines from the end of the logs to show. If not specified, logs are shown from the creation of the container or sinceSeconds or sinceTime. Note that when "TailLines" is specified, "Stream" can only be set to nil or "All".
timestamps (in query): boolean
If true, add an RFC3339 or RFC3339Nano timestamp at the beginning of every line of log output. Defaults to false.
Response
200 (string): OK
401: Unauthorized
get read resize of the specified Pod
HTTP Request
GET /api/v1/namespaces/{namespace}/pods/{name}/resize
ReplicationControllerSpec is the specification of a replication controller.
selector (map[string]string)
Selector is a label query over pods that should match the Replicas count. If Selector is empty, it is defaulted to the labels present on the Pod template. Label keys and values that must match in order to be controlled by this replication controller, if empty defaulted to labels on Pod template. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
Minimum number of seconds for which a newly created pod should be ready without any of its container crashing, for it to be considered available. Defaults to 0 (pod will be considered available as soon as it is ready)
ReplicationControllerStatus
ReplicationControllerStatus represents the current status of a replication controller.
The number of available replicas (ready for at least minReadySeconds) for this replication controller.
readyReplicas (int32)
The number of ready replicas for this replication controller.
fullyLabeledReplicas (int32)
The number of pods that have labels matching the labels of the pod template of the replication controller.
conditions ([]ReplicationControllerCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Represents the latest available observations of a replication controller's current state.
ReplicationControllerCondition describes the state of a replication controller at a certain point.
conditions.status (string), required
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of replication controller condition.
conditions.lastTransitionTime (Time)
The last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
A human readable message indicating details about the transition.
conditions.reason (string)
The reason for the condition's last transition.
observedGeneration (int64)
ObservedGeneration reflects the generation of the most recently observed replication controller.
ReplicationControllerList
ReplicationControllerList is a collection of replication controllers.
Minimum number of seconds for which a newly created pod should be ready without any of its container crashing, for it to be considered available. Defaults to 0 (pod will be considered available as soon as it is ready)
ReplicaSetStatus
ReplicaSetStatus represents the current status of a ReplicaSet.
The number of available non-terminating pods (ready for at least minReadySeconds) for this replica set.
readyReplicas (int32)
The number of non-terminating pods targeted by this ReplicaSet with a Ready Condition.
terminatingReplicas (int32)
The number of terminating pods for this replica set. Terminating pods have a non-null .metadata.deletionTimestamp and have not yet reached the Failed or Succeeded .status.phase.
This is an alpha field. Enable DeploymentReplicaSetTerminatingReplicas to be able to use this field.
fullyLabeledReplicas (int32)
The number of non-terminating pods that have labels matching the labels of the pod template of the replicaset.
conditions ([]ReplicaSetCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Represents the latest available observations of a replica set's current state.
ReplicaSetCondition describes the state of a replica set at a certain point.
conditions.status (string), required
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of replica set condition.
conditions.lastTransitionTime (Time)
The last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
A human readable message indicating details about the transition.
conditions.reason (string)
The reason for the condition's last transition.
observedGeneration (int64)
ObservedGeneration reflects the generation of the most recently observed ReplicaSet.
Label selector for pods. Existing ReplicaSets whose pods are selected by this will be the ones affected by this deployment. It must match the pod template's labels.
Template describes the pods that will be created. The only allowed template.spec.restartPolicy value is "Always".
replicas (int32)
Number of desired pods. This is a pointer to distinguish between explicit zero and not specified. Defaults to 1.
minReadySeconds (int32)
Minimum number of seconds for which a newly created pod should be ready without any of its container crashing, for it to be considered available. Defaults to 0 (pod will be considered available as soon as it is ready)
strategy (DeploymentStrategy)
Patch strategy: retainKeys
The deployment strategy to use to replace existing pods with new ones.
DeploymentStrategy describes how to replace existing pods with new ones.
strategy.type (string)
Type of deployment. Can be "Recreate" or "RollingUpdate". Default is RollingUpdate.
strategy.rollingUpdate (RollingUpdateDeployment)
Rolling update config params. Present only if DeploymentStrategyType = RollingUpdate.
Spec to control the desired behavior of rolling update.
strategy.rollingUpdate.maxSurge (IntOrString)
The maximum number of pods that can be scheduled above the desired number of pods. Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%). This can not be 0 if MaxUnavailable is 0. Absolute number is calculated from percentage by rounding up. Defaults to 25%. Example: when this is set to 30%, the new ReplicaSet can be scaled up immediately when the rolling update starts, such that the total number of old and new pods do not exceed 130% of desired pods. Once old pods have been killed, new ReplicaSet can be scaled up further, ensuring that total number of pods running at any time during the update is at most 130% of desired pods.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
The maximum number of pods that can be unavailable during the update. Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%). Absolute number is calculated from percentage by rounding down. This can not be 0 if MaxSurge is 0. Defaults to 25%. Example: when this is set to 30%, the old ReplicaSet can be scaled down to 70% of desired pods immediately when the rolling update starts. Once new pods are ready, old ReplicaSet can be scaled down further, followed by scaling up the new ReplicaSet, ensuring that the total number of pods available at all times during the update is at least 70% of desired pods.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
revisionHistoryLimit (int32)
The number of old ReplicaSets to retain to allow rollback. This is a pointer to distinguish between explicit zero and not specified. Defaults to 10.
progressDeadlineSeconds (int32)
The maximum time in seconds for a deployment to make progress before it is considered to be failed. The deployment controller will continue to process failed deployments and a condition with a ProgressDeadlineExceeded reason will be surfaced in the deployment status. Note that progress will not be estimated during the time a deployment is paused. Defaults to 600s.
paused (boolean)
Indicates that the deployment is paused.
DeploymentStatus
DeploymentStatus is the most recently observed status of the Deployment.
replicas (int32)
Total number of non-terminating pods targeted by this deployment (their labels match the selector).
availableReplicas (int32)
Total number of available non-terminating pods (ready for at least minReadySeconds) targeted by this deployment.
readyReplicas (int32)
Total number of non-terminating pods targeted by this Deployment with a Ready Condition.
unavailableReplicas (int32)
Total number of unavailable pods targeted by this deployment. This is the total number of pods that are still required for the deployment to have 100% available capacity. They may either be pods that are running but not yet available or pods that still have not been created.
updatedReplicas (int32)
Total number of non-terminating pods targeted by this deployment that have the desired template spec.
terminatingReplicas (int32)
Total number of terminating pods targeted by this deployment. Terminating pods have a non-null .metadata.deletionTimestamp and have not yet reached the Failed or Succeeded .status.phase.
This is an alpha field. Enable DeploymentReplicaSetTerminatingReplicas to be able to use this field.
collisionCount (int32)
Count of hash collisions for the Deployment. The Deployment controller uses this field as a collision avoidance mechanism when it needs to create the name for the newest ReplicaSet.
conditions ([]DeploymentCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Represents the latest available observations of a deployment's current state.
DeploymentCondition describes the state of a deployment at a certain point.
conditions.status (string), required
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of deployment condition.
conditions.lastTransitionTime (Time)
Last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.lastUpdateTime (Time)
The last time this condition was updated.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
A human readable message indicating details about the transition.
conditions.reason (string)
The reason for the condition's last transition.
observedGeneration (int64)
The generation observed by the deployment controller.
Status is the current status of Pods in this StatefulSet. This data may be out of date by some window of time.
StatefulSetSpec
A StatefulSetSpec is the specification of a StatefulSet.
serviceName (string)
serviceName is the name of the service that governs this StatefulSet. This service must exist before the StatefulSet, and is responsible for the network identity of the set. Pods get DNS/hostnames that follow the pattern: pod-specific-string.serviceName.default.svc.cluster.local where "pod-specific-string" is managed by the StatefulSet controller.
template is the object that describes the pod that will be created if insufficient replicas are detected. Each pod stamped out by the StatefulSet will fulfill this Template, but have a unique identity from the rest of the StatefulSet. Each pod will be named with the format <statefulsetname>-<podindex>. For example, a pod in a StatefulSet named "web" with index number "3" would be named "web-3". The only allowed template.spec.restartPolicy value is "Always".
replicas (int32)
replicas is the desired number of replicas of the given Template. These are replicas in the sense that they are instantiations of the same Template, but individual replicas also have a consistent identity. If unspecified, defaults to 1.
updateStrategy (StatefulSetUpdateStrategy)
updateStrategy indicates the StatefulSetUpdateStrategy that will be employed to update Pods in the StatefulSet when a revision is made to Template.
StatefulSetUpdateStrategy indicates the strategy that the StatefulSet controller will use to perform updates. It includes any additional parameters necessary to perform the update for the indicated strategy.
updateStrategy.type (string)
Type indicates the type of the StatefulSetUpdateStrategy. Default is RollingUpdate.
The maximum number of pods that can be unavailable during the update. Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%). Absolute number is calculated from percentage by rounding up. This can not be 0. Defaults to 1. This field is alpha-level and is only honored by servers that enable the MaxUnavailableStatefulSet feature. The field applies to all pods in the range 0 to Replicas-1. That means if there is any unavailable pod in the range 0 to Replicas-1, it will be counted towards MaxUnavailable.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
updateStrategy.rollingUpdate.partition (int32)
Partition indicates the ordinal at which the StatefulSet should be partitioned for updates. During a rolling update, all pods from ordinal Replicas-1 to Partition are updated. All pods from ordinal Partition-1 to 0 remain untouched. This is helpful in being able to do a canary based deployment. The default value is 0.
podManagementPolicy (string)
podManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down. The default policy is OrderedReady, where pods are created in increasing order (pod-0, then pod-1, etc) and the controller will wait until each pod is ready before continuing. When scaling down, the pods are removed in the opposite order. The alternative policy is Parallel which will create pods in parallel to match the desired scale without waiting, and on scale down will delete all pods at once.
revisionHistoryLimit (int32)
revisionHistoryLimit is the maximum number of revisions that will be maintained in the StatefulSet's revision history. The revision history consists of all revisions not represented by a currently applied StatefulSetSpec version. The default value is 10.
volumeClaimTemplates is a list of claims that pods are allowed to reference. The StatefulSet controller is responsible for mapping network identities to claims in a way that maintains the identity of a pod. Every claim in this list must have at least one matching (by name) volumeMount in one container in the template. A claim in this list takes precedence over any volumes in the template, with the same name.
minReadySeconds (int32)
Minimum number of seconds for which a newly created pod should be ready without any of its container crashing for it to be considered available. Defaults to 0 (pod will be considered available as soon as it is ready)
persistentVolumeClaimRetentionPolicy describes the lifecycle of persistent volume claims created from volumeClaimTemplates. By default, all persistent volume claims are created as needed and retained until manually deleted. This policy allows the lifecycle to be altered, for example by deleting persistent volume claims when their stateful set is deleted, or when their pod is scaled down.
StatefulSetPersistentVolumeClaimRetentionPolicy describes the policy used for PVCs created from the StatefulSet VolumeClaimTemplates.
WhenDeleted specifies what happens to PVCs created from StatefulSet VolumeClaimTemplates when the StatefulSet is deleted. The default policy of Retain causes PVCs to not be affected by StatefulSet deletion. The Delete policy causes those PVCs to be deleted.
WhenScaled specifies what happens to PVCs created from StatefulSet VolumeClaimTemplates when the StatefulSet is scaled down. The default policy of Retain causes PVCs to not be affected by a scaledown. The Delete policy causes the associated PVCs for any excess pods above the replica count to be deleted.
ordinals (StatefulSetOrdinals)
ordinals controls the numbering of replica indices in a StatefulSet. The default ordinals behavior assigns a "0" index to the first replica and increments the index by one for each additional replica requested.
StatefulSetOrdinals describes the policy used for replica ordinal assignment in this StatefulSet.
ordinals.start (int32)
start is the number representing the first replica's index. It may be used to number replicas from an alternate index (eg: 1-indexed) over the default 0-indexed names, or to orchestrate progressive movement of replicas from one StatefulSet to another. If set, replica indices will be in the range:
[.spec.ordinals.start, .spec.ordinals.start + .spec.replicas).
If unset, defaults to 0. Replica indices will be in the range:
[0, .spec.replicas).
StatefulSetStatus
StatefulSetStatus represents the current state of a StatefulSet.
replicas (int32), required
replicas is the number of Pods created by the StatefulSet controller.
readyReplicas (int32)
readyReplicas is the number of pods created for this StatefulSet with a Ready Condition.
currentReplicas (int32)
currentReplicas is the number of Pods created by the StatefulSet controller from the StatefulSet version indicated by currentRevision.
updatedReplicas (int32)
updatedReplicas is the number of Pods created by the StatefulSet controller from the StatefulSet version indicated by updateRevision.
availableReplicas (int32)
Total number of available pods (ready for at least minReadySeconds) targeted by this statefulset.
collisionCount (int32)
collisionCount is the count of hash collisions for the StatefulSet. The StatefulSet controller uses this field as a collision avoidance mechanism when it needs to create the name for the newest ControllerRevision.
conditions ([]StatefulSetCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Represents the latest available observations of a statefulset's current state.
StatefulSetCondition describes the state of a statefulset at a certain point.
conditions.status (string), required
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of statefulset condition.
conditions.lastTransitionTime (Time)
Last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
A human readable message indicating details about the transition.
conditions.reason (string)
The reason for the condition's last transition.
currentRevision (string)
currentRevision, if not empty, indicates the version of the StatefulSet used to generate Pods in the sequence [0,currentReplicas).
updateRevision (string)
updateRevision, if not empty, indicates the version of the StatefulSet used to generate Pods in the sequence [replicas-updatedReplicas,replicas)
observedGeneration (int64)
observedGeneration is the most recent generation observed for this StatefulSet. It corresponds to the StatefulSet's generation, which is updated on mutation by the API Server.
ControllerRevision implements an immutable snapshot of state data.
apiVersion: apps/v1
import "k8s.io/api/apps/v1"
ControllerRevision
ControllerRevision implements an immutable snapshot of state data. Clients are responsible for serializing and deserializing the objects that contain their internal state. Once a ControllerRevision has been successfully created, it can not be updated. The API Server will fail validation of all requests that attempt to mutate the Data field. ControllerRevisions may, however, be deleted. Note that, due to its use by both the DaemonSet and StatefulSet controllers for update and rollback, this object is beta. However, it may be subject to name and representation changes in future releases, and clients should not depend on its stability. It is primarily for internal use by controllers.
Revision indicates the revision of the state represented by Data.
data (RawExtension)
Data is the serialized representation of the state.
*RawExtension is used to hold extensions in external versions.
To use this, make a field which has RawExtension as its type in your external, versioned struct, and Object in your internal struct. You also need to register your various plugin types.
// Internal package:
type MyAPIObject struct {
runtime.TypeMeta json:",inline"
MyPlugin runtime.Object json:"myPlugin"
}
type PluginA struct {
AOption string json:"aOption"
}
// External package:
type MyAPIObject struct {
runtime.TypeMeta json:",inline"
MyPlugin runtime.RawExtension json:"myPlugin"
}
type PluginA struct {
AOption string json:"aOption"
}
// On the wire, the JSON will look something like this:
So what happens? Decode first uses json or yaml to unmarshal the serialized data into your external MyAPIObject. That causes the raw JSON to be stored, but not unpacked. The next step is to copy (using pkg/conversion) into the internal struct. The runtime package's DefaultScheme has conversion functions installed which will unpack the JSON stored in RawExtension, turning it into the correct object type, and storing it in the Object. (TODO: In the case where the object is of an unknown type, a runtime.Unknown object will be created and stored.)*
ControllerRevisionList
ControllerRevisionList is a resource containing a list of ControllerRevision objects.
An object that describes the pod that will be created. The DaemonSet will create exactly one copy of this pod on every node that matches the template's node selector (or on every node if no node selector is specified). The only allowed template.spec.restartPolicy value is "Always". More info: https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template
minReadySeconds (int32)
The minimum number of seconds for which a newly created DaemonSet pod should be ready without any of its container crashing, for it to be considered available. Defaults to 0 (pod will be considered available as soon as it is ready).
updateStrategy (DaemonSetUpdateStrategy)
An update strategy to replace existing DaemonSet pods with new pods.
DaemonSetUpdateStrategy is a struct used to control the update strategy for a DaemonSet.
updateStrategy.type (string)
Type of daemon set update. Can be "RollingUpdate" or "OnDelete". Default is RollingUpdate.
The maximum number of nodes with an existing available DaemonSet pod that can have an updated DaemonSet pod during during an update. Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%). This can not be 0 if MaxUnavailable is 0. Absolute number is calculated from percentage by rounding up to a minimum of 1. Default value is 0. Example: when this is set to 30%, at most 30% of the total number of nodes that should be running the daemon pod (i.e. status.desiredNumberScheduled) can have their a new pod created before the old pod is marked as deleted. The update starts by launching new pods on 30% of nodes. Once an updated pod is available (Ready for at least minReadySeconds) the old DaemonSet pod on that node is marked deleted. If the old pod becomes unavailable for any reason (Ready transitions to false, is evicted, or is drained) an updated pod is immediatedly created on that node without considering surge limits. Allowing surge implies the possibility that the resources consumed by the daemonset on any given node can double if the readiness check fails, and so resource intensive daemonsets should take into account that they may cause evictions during disruption.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
The maximum number of DaemonSet pods that can be unavailable during the update. Value can be an absolute number (ex: 5) or a percentage of total number of DaemonSet pods at the start of the update (ex: 10%). Absolute number is calculated from percentage by rounding up. This cannot be 0 if MaxSurge is 0 Default value is 1. Example: when this is set to 30%, at most 30% of the total number of nodes that should be running the daemon pod (i.e. status.desiredNumberScheduled) can have their pods stopped for an update at any given time. The update starts by stopping at most 30% of those DaemonSet pods and then brings up new DaemonSet pods in their place. Once the new pods are available, it then proceeds onto other DaemonSet pods, thus ensuring that at least 70% of original number of DaemonSet pods are available at all times during the update.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
revisionHistoryLimit (int32)
The number of old history to retain to allow rollback. This is a pointer to distinguish between explicit zero and not specified. Defaults to 10.
DaemonSetStatus
DaemonSetStatus represents the current status of a daemon set.
numberReady (int32), required
numberReady is the number of nodes that should be running the daemon pod and have one or more of the daemon pod running with a Ready Condition.
numberAvailable (int32)
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available (ready for at least spec.minReadySeconds)
numberUnavailable (int32)
The number of nodes that should be running the daemon pod and have none of the daemon pod running and available (ready for at least spec.minReadySeconds)
The total number of nodes that are running updated daemon pod
collisionCount (int32)
Count of hash collisions for the DaemonSet. The DaemonSet controller uses this field as a collision avoidance mechanism when it needs to create the name for the newest ControllerRevision.
conditions ([]DaemonSetCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Represents the latest available observations of a DaemonSet's current state.
DaemonSetCondition describes the state of a DaemonSet at a certain point.
conditions.status (string), required
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of DaemonSet condition.
conditions.lastTransitionTime (Time)
Last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
A human readable message indicating details about the transition.
conditions.reason (string)
The reason for the condition's last transition.
observedGeneration (int64)
The most recent generation observed by the daemon set controller.
Specifies the maximum desired number of pods the job should run at any given time. The actual number of pods running in steady state will be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism), i.e. when the work left to do is less than max parallelism. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
Lifecycle
completions (int32)
Specifies the desired number of successfully finished pods the job should be run with. Setting to null means that the success of any pod signals the success of all pods, and allows parallelism to have any positive value. Setting to 1 means that parallelism is limited to 1 and the success of that pod signals the success of the job. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
completionMode (string)
completionMode specifies how Pod completions are tracked. It can be NonIndexed (default) or Indexed.
NonIndexed means that the Job is considered complete when there have been .spec.completions successfully completed Pods. Each Pod completion is homologous to each other.
Indexed means that the Pods of a Job get an associated completion index from 0 to (.spec.completions - 1), available in the annotation batch.kubernetes.io/job-completion-index. The Job is considered complete when there is one successfully completed Pod for each index. When value is Indexed, .spec.completions must be specified and .spec.parallelism must be less than or equal to 10^5. In addition, The Pod name takes the form $(job-name)-$(index)-$(random-string), the Pod hostname takes the form $(job-name)-$(index).
More completion modes can be added in the future. If the Job controller observes a mode that it doesn't recognize, which is possible during upgrades due to version skew, the controller skips updates for the Job.
backoffLimit (int32)
Specifies the number of retries before marking this job failed. Defaults to 6
activeDeadlineSeconds (int64)
Specifies the duration in seconds relative to the startTime that the job may be continuously active before the system tries to terminate it; value must be positive integer. If a Job is suspended (at creation or through an update), this timer will effectively be stopped and reset when the Job is resumed again.
ttlSecondsAfterFinished (int32)
ttlSecondsAfterFinished limits the lifetime of a Job that has finished execution (either Complete or Failed). If this field is set, ttlSecondsAfterFinished after the Job finishes, it is eligible to be automatically deleted. When the Job is being deleted, its lifecycle guarantees (e.g. finalizers) will be honored. If this field is unset, the Job won't be automatically deleted. If this field is set to zero, the Job becomes eligible to be deleted immediately after it finishes.
suspend (boolean)
suspend specifies whether the Job controller should create Pods or not. If a Job is created with suspend set to true, no Pods are created by the Job controller. If a Job is suspended after creation (i.e. the flag goes from false to true), the Job controller will delete all active Pods associated with this Job. Users must design their workload to gracefully handle this. Suspending a Job will reset the StartTime field of the Job, effectively resetting the ActiveDeadlineSeconds timer too. Defaults to false.
manualSelector controls generation of pod labels and pod selectors. Leave manualSelector unset unless you are certain what you are doing. When false or unset, the system pick labels unique to this job and appends those labels to the pod template. When true, the user is responsible for picking unique labels and specifying the selector. Failure to pick a unique label may cause this and other jobs to not function correctly. However, You may see manualSelector=true in jobs that were created with the old extensions/v1beta1 API. More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector
Beta level
podFailurePolicy (PodFailurePolicy)
Specifies the policy of handling failed pods. In particular, it allows to specify the set of actions and conditions which need to be satisfied to take the associated action. If empty, the default behaviour applies - the counter of failed pods, represented by the jobs's .status.failed field, is incremented and it is checked against the backoffLimit. This field cannot be used in combination with restartPolicy=OnFailure.
PodFailurePolicy describes how failed pods influence the backoffLimit.
A list of pod failure policy rules. The rules are evaluated in order. Once a rule matches a Pod failure, the remaining of the rules are ignored. When no rule matches the Pod failure, the default handling applies - the counter of pod failures is incremented and it is checked against the backoffLimit. At most 20 elements are allowed.
PodFailurePolicyRule describes how a pod failure is handled when the requirements are met. One of onExitCodes and onPodConditions, but not both, can be used in each rule.
podFailurePolicy.rules.action (string), required
Specifies the action taken on a pod failure when the requirements are satisfied. Possible values are:
FailJob: indicates that the pod's job is marked as Failed and all
running pods are terminated.
FailIndex: indicates that the pod's index is marked as Failed and will
not be restarted.
Ignore: indicates that the counter towards the .backoffLimit is not
incremented and a replacement pod is created.
Count: indicates that the pod is handled in the default way - the
counter towards the .backoffLimit is incremented.
Additional values are considered to be added in the future. Clients should react to an unknown action by skipping the rule.
Represents the requirement on the container exit codes.
PodFailurePolicyOnExitCodesRequirement describes the requirement for handling a failed pod based on its container exit codes. In particular, it lookups the .state.terminated.exitCode for each app container and init container status, represented by the .status.containerStatuses and .status.initContainerStatuses fields in the Pod status, respectively. Containers completed with success (exit code 0) are excluded from the requirement check.
Represents the relationship between the container exit code(s) and the specified values. Containers completed with success (exit code 0) are excluded from the requirement check. Possible values are:
In: the requirement is satisfied if at least one container exit code
(might be multiple if there are multiple containers not restricted
by the 'containerName' field) is in the set of specified values.
NotIn: the requirement is satisfied if at least one container exit code
(might be multiple if there are multiple containers not restricted
by the 'containerName' field) is not in the set of specified values.
Additional values are considered to be added in the future. Clients should react to an unknown operator by assuming the requirement is not satisfied.
Specifies the set of values. Each returned container exit code (might be multiple in case of multiple containers) is checked against this set of values with respect to the operator. The list of values must be ordered and must not contain duplicates. Value '0' cannot be used for the In operator. At least one element is required. At most 255 elements are allowed.
Restricts the check for exit codes to the container with the specified name. When null, the rule applies to all containers. When specified, it should match one the container or initContainer names in the pod template.
Represents the requirement on the pod conditions. The requirement is represented as a list of pod condition patterns. The requirement is satisfied if at least one pattern matches an actual pod condition. At most 20 elements are allowed.
PodFailurePolicyOnPodConditionsPattern describes a pattern for matching an actual pod condition type.
Specifies the required Pod condition status. To match a pod condition it is required that the specified status equals the pod condition status. Defaults to True.
Specifies the required Pod condition type. To match a pod condition it is required that specified type equals the pod condition type.
successPolicy (SuccessPolicy)
successPolicy specifies the policy when the Job can be declared as succeeded. If empty, the default behavior applies - the Job is declared as succeeded only when the number of succeeded pods equals to the completions. When the field is specified, it must be immutable and works only for the Indexed Jobs. Once the Job meets the SuccessPolicy, the lingering pods are terminated.
SuccessPolicy describes when a Job can be declared as succeeded based on the success of some indexes.
rules represents the list of alternative rules for the declaring the Jobs as successful before .status.succeeded >= .spec.completions. Once any of the rules are met, the "SucceededCriteriaMet" condition is added, and the lingering pods are removed. The terminal state for such a Job has the "Complete" condition. Additionally, these rules are evaluated in order; Once the Job meets one of the rules, other rules are ignored. At most 20 elements are allowed.
SuccessPolicyRule describes rule for declaring a Job as succeeded. Each rule must have at least one of the "succeededIndexes" or "succeededCount" specified.
successPolicy.rules.succeededCount (int32)
succeededCount specifies the minimal required size of the actual set of the succeeded indexes for the Job. When succeededCount is used along with succeededIndexes, the check is constrained only to the set of indexes specified by succeededIndexes. For example, given that succeededIndexes is "1-4", succeededCount is "3", and completed indexes are "1", "3", and "5", the Job isn't declared as succeeded because only "1" and "3" indexes are considered in that rules. When this field is null, this doesn't default to any value and is never evaluated at any time. When specified it needs to be a positive integer.
successPolicy.rules.succeededIndexes (string)
succeededIndexes specifies the set of indexes which need to be contained in the actual set of the succeeded indexes for the Job. The list of indexes must be within 0 to ".spec.completions-1" and must not contain duplicates. At least one element is required. The indexes are represented as intervals separated by commas. The intervals can be a decimal integer or a pair of decimal integers separated by a hyphen. The number are listed in represented by the first and last element of the series, separated by a hyphen. For example, if the completed indexes are 1, 3, 4, 5 and 7, they are represented as "1,3-5,7". When this field is null, this field doesn't default to any value and is never evaluated at any time.
Alpha level
backoffLimitPerIndex (int32)
Specifies the limit for the number of retries within an index before marking this index as failed. When enabled the number of failures per index is kept in the pod's batch.kubernetes.io/job-index-failure-count annotation. It can only be set when Job's completionMode=Indexed, and the Pod's restart policy is Never. The field is immutable.
managedBy (string)
ManagedBy field indicates the controller that manages a Job. The k8s Job controller reconciles jobs which don't have this field at all or the field value is the reserved string kubernetes.io/job-controller, but skips reconciling Jobs with a custom value for this field. The value must be a valid domain-prefixed path (e.g. acme.io/foo) - all characters before the first "/" must be a valid subdomain as defined by RFC 1123. All characters trailing the first "/" must be valid HTTP Path characters as defined by RFC 3986. The value cannot exceed 63 characters. This field is immutable.
This field is beta-level. The job controller accepts setting the field when the feature gate JobManagedBy is enabled (enabled by default).
maxFailedIndexes (int32)
Specifies the maximal number of failed indexes before marking the Job as failed, when backoffLimitPerIndex is set. Once the number of failed indexes exceeds this number the entire Job is marked as Failed and its execution is terminated. When left as null the job continues execution of all of its indexes and is marked with the Complete Job condition. It can only be specified when backoffLimitPerIndex is set. It can be null or up to completions. It is required and must be less than or equal to 10^4 when is completions greater than 10^5.
podReplacementPolicy (string)
podReplacementPolicy specifies when to create replacement Pods. Possible values are: - TerminatingOrFailed means that we recreate pods
when they are terminating (has a metadata.deletionTimestamp) or failed.
Failed means to wait until a previously created Pod is fully terminated (has phase
Failed or Succeeded) before creating a replacement Pod.
When using podFailurePolicy, Failed is the the only allowed value. TerminatingOrFailed and Failed are allowed values when podFailurePolicy is not in use. This is an beta field. To use this, enable the JobPodReplacementPolicy feature toggle. This is on by default.
JobStatus
JobStatus represents the current state of a Job.
startTime (Time)
Represents time when the job controller started processing a job. When a Job is created in the suspended state, this field is not set until the first time it is resumed. This field is reset every time a Job is resumed from suspension. It is represented in RFC3339 form and is in UTC.
Once set, the field can only be removed when the job is suspended. The field cannot be modified while the job is unsuspended or finished.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
completionTime (Time)
Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. The completion time is set when the job finishes successfully, and only then. The value cannot be updated or removed. The value indicates the same or later point in time as the startTime field.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
active (int32)
The number of pending and running pods which are not terminating (without a deletionTimestamp). The value is zero for finished jobs.
failed (int32)
The number of pods which reached phase Failed. The value increases monotonically.
succeeded (int32)
The number of pods which reached phase Succeeded. The value increases monotonically for a given spec. However, it may decrease in reaction to scale down of elastic indexed jobs.
completedIndexes (string)
completedIndexes holds the completed indexes when .spec.completionMode = "Indexed" in a text format. The indexes are represented as decimal integers separated by commas. The numbers are listed in increasing order. Three or more consecutive numbers are compressed and represented by the first and last element of the series, separated by a hyphen. For example, if the completed indexes are 1, 3, 4, 5 and 7, they are represented as "1,3-5,7".
conditions ([]JobCondition)
Patch strategy: merge on key type
Atomic: will be replaced during a merge
The latest available observations of an object's current state. When a Job fails, one of the conditions will have type "Failed" and status true. When a Job is suspended, one of the conditions will have type "Suspended" and status true; when the Job is resumed, the status of this condition will become false. When a Job is completed, one of the conditions will have type "Complete" and status true.
A job is considered finished when it is in a terminal condition, either "Complete" or "Failed". A Job cannot have both the "Complete" and "Failed" conditions. Additionally, it cannot be in the "Complete" and "FailureTarget" conditions. The "Complete", "Failed" and "FailureTarget" conditions cannot be disabled.
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of job condition, Complete or Failed.
conditions.lastProbeTime (Time)
Last time the condition was checked.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.lastTransitionTime (Time)
Last time the condition transit from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
Human readable message indicating details about last transition.
conditions.reason (string)
(brief) reason for the condition's last transition.
uncountedTerminatedPods (UncountedTerminatedPods)
uncountedTerminatedPods holds the UIDs of Pods that have terminated but the job controller hasn't yet accounted for in the status counters.
The job controller creates pods with a finalizer. When a pod terminates (succeeded or failed), the controller does three steps to account for it in the job status:
Add the pod UID to the arrays in this field. 2. Remove the pod finalizer. 3. Remove the pod UID from the arrays while increasing the corresponding
counter.
Old jobs might not be tracked using this field, in which case the field remains null. The structure is empty for finished jobs.
UncountedTerminatedPods holds UIDs of Pods that have terminated but haven't been accounted in Job status counters.
uncountedTerminatedPods.failed ([]string)
Set: unique values will be kept during a merge
failed holds UIDs of failed Pods.
uncountedTerminatedPods.succeeded ([]string)
Set: unique values will be kept during a merge
succeeded holds UIDs of succeeded Pods.
Beta level
ready (int32)
The number of active pods which have a Ready condition and are not terminating (without a deletionTimestamp).
Alpha level
failedIndexes (string)
FailedIndexes holds the failed indexes when spec.backoffLimitPerIndex is set. The indexes are represented in the text format analogous as for the completedIndexes field, ie. they are kept as decimal integers separated by commas. The numbers are listed in increasing order. Three or more consecutive numbers are compressed and represented by the first and last element of the series, separated by a hyphen. For example, if the failed indexes are 1, 3, 4, 5 and 7, they are represented as "1,3-5,7". The set of failed indexes cannot overlap with the set of completed indexes.
terminating (int32)
The number of pods which are terminating (in phase Pending or Running and have a deletionTimestamp).
This field is beta-level. The job controller populates the field when the feature gate JobPodReplacementPolicy is enabled (enabled by default).
The time zone name for the given schedule, see https://en.wikipedia.org/wiki/List_of_tz_database_time_zones. If not specified, this will default to the time zone of the kube-controller-manager process. The set of valid time zone names and the time zone offset is loaded from the system-wide time zone database by the API server during CronJob validation and the controller manager during execution. If no system-wide time zone database can be found a bundled version of the database is used instead. If the time zone name becomes invalid during the lifetime of a CronJob or due to a change in host configuration, the controller will stop creating new new Jobs and will create a system event with the reason UnknownTimeZone. More information can be found in https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#time-zones
concurrencyPolicy (string)
Specifies how to treat concurrent executions of a Job. Valid values are:
"Allow" (default): allows CronJobs to run concurrently; - "Forbid": forbids concurrent runs, skipping next run if previous run hasn't finished yet; - "Replace": cancels currently running job and replaces it with a new one
startingDeadlineSeconds (int64)
Optional deadline in seconds for starting the job if it misses scheduled time for any reason. Missed jobs executions will be counted as failed ones.
suspend (boolean)
This flag tells the controller to suspend subsequent executions, it does not apply to already started executions. Defaults to false.
successfulJobsHistoryLimit (int32)
The number of successful finished jobs to retain. Value must be non-negative integer. Defaults to 3.
failedJobsHistoryLimit (int32)
The number of failed finished jobs to retain. Value must be non-negative integer. Defaults to 1.
CronJobStatus
CronJobStatus represents the current state of a cron job.
Information when was the last time the job was successfully scheduled.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
lastSuccessfulTime (Time)
Information when was the last time the job successfully completed.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
reference to scaled resource; horizontal pod autoscaler will learn the current resource consumption and will set the desired number of pods by using its Scale subresource.
CrossVersionObjectReference contains enough information to let you identify the referred resource.
minReplicas is the lower limit for the number of replicas to which the autoscaler can scale down. It defaults to 1 pod. minReplicas is allowed to be 0 if the alpha feature gate HPAScaleToZero is enabled and at least one Object or External metric is configured. Scaling is active as long as at least one metric value is available.
targetCPUUtilizationPercentage (int32)
targetCPUUtilizationPercentage is the target average CPU utilization (represented as a percentage of requested CPU) over all the pods; if not specified the default autoscaling policy will be used.
HorizontalPodAutoscalerStatus
current status of a horizontal pod autoscaler
currentReplicas (int32), required
currentReplicas is the current number of replicas of pods managed by this autoscaler.
desiredReplicas (int32), required
desiredReplicas is the desired number of replicas of pods managed by this autoscaler.
currentCPUUtilizationPercentage (int32)
currentCPUUtilizationPercentage is the current average CPU utilization over all pods, represented as a percentage of requested CPU, e.g. 70 means that an average pod is using now 70% of its requested CPU.
lastScaleTime (Time)
lastScaleTime is the last time the HorizontalPodAutoscaler scaled the number of pods; used by the autoscaler to control how often the number of pods is changed.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
observedGeneration (int64)
observedGeneration is the most recent generation observed by this autoscaler.
HorizontalPodAutoscaler is the configuration for a horizontal pod autoscaler, which automatically manages the replica count of any resource implementing the scale subresource based on the metrics specified.
apiVersion: autoscaling/v2
import "k8s.io/api/autoscaling/v2"
HorizontalPodAutoscaler
HorizontalPodAutoscaler is the configuration for a horizontal pod autoscaler, which automatically manages the replica count of any resource implementing the scale subresource based on the metrics specified.
scaleTargetRef points to the target resource to scale, and is used to the pods for which metrics should be collected, as well as to actually change the replica count.
CrossVersionObjectReference contains enough information to let you identify the referred resource.
minReplicas is the lower limit for the number of replicas to which the autoscaler can scale down. It defaults to 1 pod. minReplicas is allowed to be 0 if the alpha feature gate HPAScaleToZero is enabled and at least one Object or External metric is configured. Scaling is active as long as at least one metric value is available.
behavior (HorizontalPodAutoscalerBehavior)
behavior configures the scaling behavior of the target in both Up and Down directions (scaleUp and scaleDown fields respectively). If not set, the default HPAScalingRules for scale up and scale down are used.
HorizontalPodAutoscalerBehavior configures the scaling behavior of the target in both Up and Down directions (scaleUp and scaleDown fields respectively).
behavior.scaleDown (HPAScalingRules)
scaleDown is scaling policy for scaling Down. If not set, the default value is to allow to scale down to minReplicas pods, with a 300 second stabilization window (i.e., the highest recommendation for the last 300sec is used).
*HPAScalingRules configures the scaling behavior for one direction via scaling Policy Rules and a configurable metric tolerance.
Scaling Policy Rules are applied after calculating DesiredReplicas from metrics for the HPA. They can limit the scaling velocity by specifying scaling policies. They can prevent flapping by specifying the stabilization window, so that the number of replicas is not set instantly, instead, the safest value from the stabilization window is chosen.
The tolerance is applied to the metric values and prevents scaling too eagerly for small metric variations. (Note that setting a tolerance requires enabling the alpha HPAConfigurableTolerance feature gate.)*
behavior.scaleDown.policies ([]HPAScalingPolicy)
Atomic: will be replaced during a merge
policies is a list of potential scaling polices which can be used during scaling. If not set, use the default values: - For scale up: allow doubling the number of pods, or an absolute change of 4 pods in a 15s window. - For scale down: allow all pods to be removed in a 15s window.
HPAScalingPolicy is a single policy which must hold true for a specified past interval.
periodSeconds specifies the window of time for which the policy should hold true. PeriodSeconds must be greater than zero and less than or equal to 1800 (30 min).
behavior.scaleDown.selectPolicy (string)
selectPolicy is used to specify which policy should be used. If not set, the default value Max is used.
stabilizationWindowSeconds is the number of seconds for which past recommendations should be considered while scaling up or scaling down. StabilizationWindowSeconds must be greater than or equal to zero and less than or equal to 3600 (one hour). If not set, use the default values: - For scale up: 0 (i.e. no stabilization is done). - For scale down: 300 (i.e. the stabilization window is 300 seconds long).
tolerance is the tolerance on the ratio between the current and desired metric value under which no updates are made to the desired number of replicas (e.g. 0.01 for 1%). Must be greater than or equal to zero. If not set, the default cluster-wide tolerance is applied (by default 10%).
For example, if autoscaling is configured with a memory consumption target of 100Mi, and scale-down and scale-up tolerances of 5% and 1% respectively, scaling will be triggered when the actual consumption falls below 95Mi or exceeds 101Mi.
This is an alpha field and requires enabling the HPAConfigurableTolerance feature gate.
behavior.scaleUp (HPAScalingRules)
scaleUp is scaling policy for scaling Up. If not set, the default value is the higher of:
increase no more than 4 pods per 60 seconds
double the number of pods per 60 seconds
No stabilization is used.
*HPAScalingRules configures the scaling behavior for one direction via scaling Policy Rules and a configurable metric tolerance.
Scaling Policy Rules are applied after calculating DesiredReplicas from metrics for the HPA. They can limit the scaling velocity by specifying scaling policies. They can prevent flapping by specifying the stabilization window, so that the number of replicas is not set instantly, instead, the safest value from the stabilization window is chosen.
The tolerance is applied to the metric values and prevents scaling too eagerly for small metric variations. (Note that setting a tolerance requires enabling the alpha HPAConfigurableTolerance feature gate.)*
behavior.scaleUp.policies ([]HPAScalingPolicy)
Atomic: will be replaced during a merge
policies is a list of potential scaling polices which can be used during scaling. If not set, use the default values: - For scale up: allow doubling the number of pods, or an absolute change of 4 pods in a 15s window. - For scale down: allow all pods to be removed in a 15s window.
HPAScalingPolicy is a single policy which must hold true for a specified past interval.
behavior.scaleUp.policies.type (string), required
type is used to specify the scaling policy.
behavior.scaleUp.policies.value (int32), required
value contains the amount of change which is permitted by the policy. It must be greater than zero
periodSeconds specifies the window of time for which the policy should hold true. PeriodSeconds must be greater than zero and less than or equal to 1800 (30 min).
behavior.scaleUp.selectPolicy (string)
selectPolicy is used to specify which policy should be used. If not set, the default value Max is used.
stabilizationWindowSeconds is the number of seconds for which past recommendations should be considered while scaling up or scaling down. StabilizationWindowSeconds must be greater than or equal to zero and less than or equal to 3600 (one hour). If not set, use the default values: - For scale up: 0 (i.e. no stabilization is done). - For scale down: 300 (i.e. the stabilization window is 300 seconds long).
tolerance is the tolerance on the ratio between the current and desired metric value under which no updates are made to the desired number of replicas (e.g. 0.01 for 1%). Must be greater than or equal to zero. If not set, the default cluster-wide tolerance is applied (by default 10%).
For example, if autoscaling is configured with a memory consumption target of 100Mi, and scale-down and scale-up tolerances of 5% and 1% respectively, scaling will be triggered when the actual consumption falls below 95Mi or exceeds 101Mi.
This is an alpha field and requires enabling the HPAConfigurableTolerance feature gate.
metrics ([]MetricSpec)
Atomic: will be replaced during a merge
metrics contains the specifications for which to use to calculate the desired replica count (the maximum replica count across all metrics will be used). The desired replica count is calculated multiplying the ratio between the target value and the current value by the current number of pods. Ergo, metrics used must decrease as the pod count is increased, and vice-versa. See the individual metric source types for more information about how each type of metric must respond. If not set, the default metric will be set to 80% average CPU utilization.
MetricSpec specifies how to scale based on a single metric (only type and one other matching field should be set at once).
metrics.type (string), required
type is the type of metric source. It should be one of "ContainerResource", "External", "Object", "Pods" or "Resource", each mapping to a matching field in the object.
containerResource refers to a resource metric (such as those specified in requests and limits) known to Kubernetes describing a single container in each pod of the current scale target (e.g. CPU or memory). Such metrics are built in to Kubernetes, and have special scaling options on top of those available to normal per-pod metrics using the "pods" source.
ContainerResourceMetricSource indicates how to scale on a resource metric known to Kubernetes, as specified in requests and limits, describing each pod in the current scale target (e.g. CPU or memory). The values will be averaged together before being compared to the target. Such metrics are built in to Kubernetes, and have special scaling options on top of those available to normal per-pod metrics using the "pods" source. Only one "target" type should be set.
averageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type
value is the target value of the metric (as a quantity).
metrics.external (ExternalMetricSource)
external refers to a global metric that is not associated with any Kubernetes object. It allows autoscaling based on information coming from components running outside of cluster (for example length of queue in cloud messaging service, or QPS from loadbalancer running outside of cluster).
ExternalMetricSource indicates how to scale on a metric not associated with any Kubernetes object (for example length of queue in cloud messaging service, or QPS from loadbalancer running outside of cluster).
selector is the string-encoded form of a standard kubernetes label selector for the given metric When set, it is passed as an additional parameter to the metrics server for more specific metrics scoping. When unset, just the metricName will be used to gather metrics.
metrics.external.target (MetricTarget), required
target specifies the target value for the given metric
MetricTarget defines the target value, average value, or average utilization of a specific metric
metrics.external.target.type (string), required
type represents whether the metric type is Utilization, Value, or AverageValue
averageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type
selector is the string-encoded form of a standard kubernetes label selector for the given metric When set, it is passed as an additional parameter to the metrics server for more specific metrics scoping. When unset, just the metricName will be used to gather metrics.
metrics.object.target (MetricTarget), required
target specifies the target value for the given metric
MetricTarget defines the target value, average value, or average utilization of a specific metric
metrics.object.target.type (string), required
type represents whether the metric type is Utilization, Value, or AverageValue
metrics.object.target.averageUtilization (int32)
averageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type
value is the target value of the metric (as a quantity).
metrics.pods (PodsMetricSource)
pods refers to a metric describing each pod in the current scale target (for example, transactions-processed-per-second). The values will be averaged together before being compared to the target value.
PodsMetricSource indicates how to scale on a metric describing each pod in the current scale target (for example, transactions-processed-per-second). The values will be averaged together before being compared to the target value.
metrics.pods.metric (MetricIdentifier), required
metric identifies the target metric by name and selector
MetricIdentifier defines the name and optionally selector for a metric
selector is the string-encoded form of a standard kubernetes label selector for the given metric When set, it is passed as an additional parameter to the metrics server for more specific metrics scoping. When unset, just the metricName will be used to gather metrics.
metrics.pods.target (MetricTarget), required
target specifies the target value for the given metric
MetricTarget defines the target value, average value, or average utilization of a specific metric
metrics.pods.target.type (string), required
type represents whether the metric type is Utilization, Value, or AverageValue
metrics.pods.target.averageUtilization (int32)
averageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type
value is the target value of the metric (as a quantity).
metrics.resource (ResourceMetricSource)
resource refers to a resource metric (such as those specified in requests and limits) known to Kubernetes describing each pod in the current scale target (e.g. CPU or memory). Such metrics are built in to Kubernetes, and have special scaling options on top of those available to normal per-pod metrics using the "pods" source.
ResourceMetricSource indicates how to scale on a resource metric known to Kubernetes, as specified in requests and limits, describing each pod in the current scale target (e.g. CPU or memory). The values will be averaged together before being compared to the target. Such metrics are built in to Kubernetes, and have special scaling options on top of those available to normal per-pod metrics using the "pods" source. Only one "target" type should be set.
metrics.resource.name (string), required
name is the name of the resource in question.
metrics.resource.target (MetricTarget), required
target specifies the target value for the given metric
MetricTarget defines the target value, average value, or average utilization of a specific metric
metrics.resource.target.type (string), required
type represents whether the metric type is Utilization, Value, or AverageValue
averageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type
value is the target value of the metric (as a quantity).
HorizontalPodAutoscalerStatus
HorizontalPodAutoscalerStatus describes the current status of a horizontal pod autoscaler.
desiredReplicas (int32), required
desiredReplicas is the desired number of replicas of pods managed by this autoscaler, as last calculated by the autoscaler.
conditions ([]HorizontalPodAutoscalerCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
conditions is the set of conditions required for this autoscaler to scale its target, and indicates whether or not those conditions are met.
HorizontalPodAutoscalerCondition describes the state of a HorizontalPodAutoscaler at a certain point.
conditions.status (string), required
status is the status of the condition (True, False, Unknown)
conditions.type (string), required
type describes the current condition
conditions.lastTransitionTime (Time)
lastTransitionTime is the last time the condition transitioned from one status to another
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
message is a human-readable explanation containing details about the transition
conditions.reason (string)
reason is the reason for the condition's last transition.
currentMetrics ([]MetricStatus)
Atomic: will be replaced during a merge
currentMetrics is the last read state of the metrics used by this autoscaler.
MetricStatus describes the last-read state of a single metric.
currentMetrics.type (string), required
type is the type of metric source. It will be one of "ContainerResource", "External", "Object", "Pods" or "Resource", each corresponds to a matching field in the object.
container resource refers to a resource metric (such as those specified in requests and limits) known to Kubernetes describing a single container in each pod in the current scale target (e.g. CPU or memory). Such metrics are built in to Kubernetes, and have special scaling options on top of those available to normal per-pod metrics using the "pods" source.
ContainerResourceMetricStatus indicates the current value of a resource metric known to Kubernetes, as specified in requests and limits, describing a single container in each pod in the current scale target (e.g. CPU or memory). Such metrics are built in to Kubernetes, and have special scaling options on top of those available to normal per-pod metrics using the "pods" source.
currentAverageUtilization is the current value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods.
external refers to a global metric that is not associated with any Kubernetes object. It allows autoscaling based on information coming from components running outside of cluster (for example length of queue in cloud messaging service, or QPS from loadbalancer running outside of cluster).
ExternalMetricStatus indicates the current value of a global metric not associated with any Kubernetes object.
currentAverageUtilization is the current value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods.
selector is the string-encoded form of a standard kubernetes label selector for the given metric When set, it is passed as an additional parameter to the metrics server for more specific metrics scoping. When unset, just the metricName will be used to gather metrics.
currentMetrics.object (ObjectMetricStatus)
object refers to a metric describing a single kubernetes object (for example, hits-per-second on an Ingress object).
ObjectMetricStatus indicates the current value of a metric describing a kubernetes object (for example, hits-per-second on an Ingress object).
currentAverageUtilization is the current value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods.
selector is the string-encoded form of a standard kubernetes label selector for the given metric When set, it is passed as an additional parameter to the metrics server for more specific metrics scoping. When unset, just the metricName will be used to gather metrics.
currentMetrics.pods (PodsMetricStatus)
pods refers to a metric describing each pod in the current scale target (for example, transactions-processed-per-second). The values will be averaged together before being compared to the target value.
PodsMetricStatus indicates the current value of a metric describing each pod in the current scale target (for example, transactions-processed-per-second).
currentAverageUtilization is the current value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods.
selector is the string-encoded form of a standard kubernetes label selector for the given metric When set, it is passed as an additional parameter to the metrics server for more specific metrics scoping. When unset, just the metricName will be used to gather metrics.
currentMetrics.resource (ResourceMetricStatus)
resource refers to a resource metric (such as those specified in requests and limits) known to Kubernetes describing each pod in the current scale target (e.g. CPU or memory). Such metrics are built in to Kubernetes, and have special scaling options on top of those available to normal per-pod metrics using the "pods" source.
ResourceMetricStatus indicates the current value of a resource metric known to Kubernetes, as specified in requests and limits, describing each pod in the current scale target (e.g. CPU or memory). Such metrics are built in to Kubernetes, and have special scaling options on top of those available to normal per-pod metrics using the "pods" source.
currentAverageUtilization is the current value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods.
value is the current value of the metric (as a quantity).
currentMetrics.resource.name (string), required
name is the name of the resource in question.
currentReplicas (int32)
currentReplicas is current number of replicas of pods managed by this autoscaler, as last seen by the autoscaler.
lastScaleTime (Time)
lastScaleTime is the last time the HorizontalPodAutoscaler scaled the number of pods, used by the autoscaler to control how often the number of pods is changed.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
observedGeneration (int64)
observedGeneration is the most recent generation observed by this autoscaler.
HorizontalPodAutoscalerList
HorizontalPodAutoscalerList is a list of horizontal pod autoscaler objects.
value represents the integer value of this priority class. This is the actual priority that pods receive when they have the name of this class in their pod spec.
description (string)
description is an arbitrary string that usually provides guidelines on when this priority class should be used.
globalDefault (boolean)
globalDefault specifies whether this PriorityClass should be considered as the default priority for pods that do not have any priority class. Only one PriorityClass can be marked as globalDefault. However, if more than one PriorityClasses exists with their globalDefault field set to true, the smallest value of such global default PriorityClasses will be used as the default priority.
preemptionPolicy (string)
preemptionPolicy is the Policy for preempting pods with lower priority. One of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset.
PriorityClassList
PriorityClassList is a collection of priority classes.
DeviceTaintRule adds one taint to all devices which match the selector.
apiVersion: resource.k8s.io/v1alpha3
import "k8s.io/api/resource/v1alpha3"
DeviceTaintRule
DeviceTaintRule adds one taint to all devices which match the selector. This has the same effect as if the taint was specified directly in the ResourceSlice by the DRA driver.
Changing the spec automatically increments the metadata.generation number.
DeviceTaintRuleSpec
DeviceTaintRuleSpec specifies the selector and one taint.
taint (DeviceTaint), required
The taint that gets applied to matching devices.
The device this taint is attached to has the "effect" on any claim which does not tolerate the taint and, through the claim, to pods using the claim.
taint.effect (string), required
The effect of the taint on claims that do not tolerate the taint and through such claims on the pods using them. Valid effects are NoSchedule and NoExecute. PreferNoSchedule as used for nodes is not valid here.
taint.key (string), required
The taint key to be applied to a device. Must be a label name.
taint.timeAdded (Time)
TimeAdded represents the time at which the taint was added. Added automatically during create or update if not set.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
taint.value (string)
The taint value corresponding to the taint key. Must be a label value.
deviceSelector (DeviceTaintSelector)
DeviceSelector defines which device(s) the taint is applied to. All selector criteria must be satified for a device to match. The empty selector matches all devices. Without a selector, no devices are matches.
DeviceTaintSelector defines which device(s) a DeviceTaintRule applies to. The empty selector matches all devices. Without a selector, no devices are matched.
deviceSelector.device (string)
If device is set, only devices with that name are selected. This field corresponds to slice.spec.devices[].name.
Setting also driver and pool may be required to avoid ambiguity, but is not required.
deviceSelector.deviceClassName (string)
If DeviceClassName is set, the selectors defined there must be satisfied by a device to be selected. This field corresponds to class.metadata.name.
deviceSelector.driver (string)
If driver is set, only devices from that driver are selected. This fields corresponds to slice.spec.driver.
deviceSelector.pool (string)
If pool is set, only devices in that pool are selected.
Also setting the driver name may be useful to avoid ambiguity when different drivers use the same pool name, but this is not required because selecting pools from different drivers may also be useful, for example when drivers with node-local devices use the node name as their pool name.
deviceSelector.selectors ([]DeviceSelector)
Atomic: will be replaced during a merge
Selectors contains the same selection criteria as a ResourceClaim. Currently, CEL expressions are supported. All of these selectors must be satisfied.
DeviceSelector must have exactly one field set.
deviceSelector.selectors.cel (CELDeviceSelector)
CEL contains a CEL expression for selecting a device.
CELDeviceSelector contains a CEL expression for selecting a device.
Expression is a CEL expression which evaluates a single device. It must evaluate to true when the device under consideration satisfies the desired criteria, and false when it does not. Any other result is an error and causes allocation of devices to abort.
The expression's input is an object named "device", which carries the following properties:
driver (string): the name of the driver which defines this device.
attributes (map[string]object): the device's attributes, grouped by prefix
(e.g. device.attributes["dra.example.com"] evaluates to an object with all
of the attributes which were prefixed by "dra.example.com".
capacity (map[string]object): the device's capacities, grouped by prefix.
Example: Consider a device with driver="dra.example.com", which exposes two attributes named "model" and "ext.example.com/family" and which exposes one capacity named "modules". This input to this expression would have the following fields:
The device.driver field can be used to check for a specific driver, either as a high-level precondition (i.e. you only want to consider devices from this driver) or as part of a multi-clause expression that is meant to consider devices from different drivers.
The value type of each attribute is defined by the device definition, and users who write these expressions must consult the documentation for their specific drivers. The value type of each capacity is Quantity.
If an unknown prefix is used as a lookup in either device.attributes or device.capacity, an empty map will be returned. Any reference to an unknown field will cause an evaluation error and allocation to abort.
A robust expression should check for the existence of attributes before referencing them.
For ease of use, the cel.bind() function is enabled, and can be used to simplify expressions that access multiple attributes with the same domain. For example:
The length of the expression must be smaller or equal to 10 Ki. The cost of evaluating it is also limited based on the estimated number of logical steps.
DeviceTaintRuleList
DeviceTaintRuleList is a collection of DeviceTaintRules.
ResourceClaim describes a request for access to resources in the cluster, for use by workloads.
apiVersion: resource.k8s.io/v1beta2
import "k8s.io/api/resource/v1beta2"
ResourceClaim
ResourceClaim describes a request for access to resources in the cluster, for use by workloads. For example, if a workload needs an accelerator device with specific properties, this is how that request is expressed. The status stanza tracks whether this claim has been satisfied and what specific resources have been allocated.
This is an alpha type and requires enabling the DynamicResourceAllocation feature gate.
Parameters can contain arbitrary data. It is the responsibility of the driver developer to handle validation and versioning. Typically this includes self-identification and a version ("kind" + "apiVersion" for Kubernetes types), with conversion between different versions.
The length of the raw data must be smaller or equal to 10 Ki.
*RawExtension is used to hold extensions in external versions.
To use this, make a field which has RawExtension as its type in your external, versioned struct, and Object in your internal struct. You also need to register your various plugin types.
// Internal package:
type MyAPIObject struct {
runtime.TypeMeta `json:",inline"`
MyPlugin runtime.Object `json:"myPlugin"`
}
type PluginA struct {
AOption string `json:"aOption"`
}
// External package:
type MyAPIObject struct {
runtime.TypeMeta `json:",inline"`
MyPlugin runtime.RawExtension `json:"myPlugin"`
}
type PluginA struct {
AOption string `json:"aOption"`
}
// On the wire, the JSON will look something like this:
So what happens? Decode first uses json or yaml to unmarshal the serialized data into your external MyAPIObject. That causes the raw JSON to be stored, but not unpacked. The next step is to copy (using pkg/conversion) into the internal struct. The runtime package's DefaultScheme has conversion functions installed which will unpack the JSON stored in RawExtension, turning it into the correct object type, and storing it in the Object. (TODO: In the case where the object is of an unknown type, a runtime.Unknown object will be created and stored.)*
devices.config.requests ([]string)
Atomic: will be replaced during a merge
Requests lists the names of requests where the configuration applies. If empty, it applies to all requests.
References to subrequests must include the name of the main request and may include the subrequest using the format <main request>[/<subrequest>]. If just the main request is given, the configuration applies to all subrequests.
devices.constraints ([]DeviceConstraint)
Atomic: will be replaced during a merge
These constraints must be satisfied by the set of devices that get allocated for the claim.
DeviceConstraint must have exactly one field set besides Requests.
devices.constraints.matchAttribute (string)
MatchAttribute requires that all devices in question have this attribute and that its type and value are the same across those devices.
For example, if you specified "dra.example.com/numa" (a hypothetical example!), then only devices in the same NUMA node will be chosen. A device which does not have that attribute will not be chosen. All devices should use a value of the same type for this attribute because that is part of its specification, but if one device doesn't, then it also will not be chosen.
Must include the domain qualifier.
devices.constraints.requests ([]string)
Atomic: will be replaced during a merge
Requests is a list of the one or more requests in this claim which must co-satisfy this constraint. If a request is fulfilled by multiple devices, then all of the devices must satisfy the constraint. If this is not specified, this constraint applies to all requests in this claim.
References to subrequests must include the name of the main request and may include the subrequest using the format <main request>[/<subrequest>]. If just the main request is given, the constraint applies to all subrequests.
devices.requests ([]DeviceRequest)
Atomic: will be replaced during a merge
Requests represent individual requests for distinct devices which must all be satisfied. If empty, nothing needs to be allocated.
DeviceRequest is a request for devices required for a claim. This is typically a request for a single resource like a device, but can also ask for several identical devices. With FirstAvailable it is also possible to provide a prioritized list of requests.
devices.requests.name (string), required
Name can be used to reference this request in a pod.spec.containers[].resources.claims entry and in a constraint of the claim.
References using the name in the DeviceRequest will uniquely identify a request when the Exactly field is set. When the FirstAvailable field is set, a reference to the name of the DeviceRequest will match whatever subrequest is chosen by the scheduler.
Must be a DNS label.
devices.requests.exactly (ExactDeviceRequest)
Exactly specifies the details for a single request that must be met exactly for the request to be satisfied.
One of Exactly or FirstAvailable must be set.
ExactDeviceRequest is a request for one or more identical devices.
DeviceClassName references a specific DeviceClass, which can define additional configuration and selectors to be inherited by this request.
A DeviceClassName is required.
Administrators may use this to restrict which devices may get requested by only installing classes with selectors for permitted devices. If users are free to request anything without restrictions, then administrators can create an empty DeviceClass for users to reference.
devices.requests.exactly.adminAccess (boolean)
AdminAccess indicates that this is a claim for administrative access to the device(s). Claims with AdminAccess are expected to be used for monitoring or other management services for a device. They ignore all ordinary claims to the device with respect to access modes and any resource allocations.
This is an alpha field and requires enabling the DRAAdminAccess feature gate. Admin access is disabled if this field is unset or set to false, otherwise it is enabled.
devices.requests.exactly.allocationMode (string)
AllocationMode and its related fields define how devices are allocated to satisfy this request. Supported values are:
ExactCount: This request is for a specific number of devices.
This is the default. The exact number is provided in the
count field.
All: This request is for all of the matching devices in a pool.
At least one device must exist on the node for the allocation to succeed.
Allocation will fail if some devices are already allocated,
unless adminAccess is requested.
If AllocationMode is not specified, the default mode is ExactCount. If the mode is ExactCount and count is not specified, the default count is one. Any other requests must specify this field.
More modes may get added in the future. Clients must refuse to handle requests with unknown modes.
devices.requests.exactly.count (int64)
Count is used only when the count mode is "ExactCount". Must be greater than zero. If AllocationMode is ExactCount and this field is not specified, the default is one.
Selectors define criteria which must be satisfied by a specific device in order for that device to be considered for this request. All selectors must be satisfied for a device to be considered.
Expression is a CEL expression which evaluates a single device. It must evaluate to true when the device under consideration satisfies the desired criteria, and false when it does not. Any other result is an error and causes allocation of devices to abort.
The expression's input is an object named "device", which carries the following properties:
driver (string): the name of the driver which defines this device.
attributes (map[string]object): the device's attributes, grouped by prefix
(e.g. device.attributes["dra.example.com"] evaluates to an object with all
of the attributes which were prefixed by "dra.example.com".
capacity (map[string]object): the device's capacities, grouped by prefix.
Example: Consider a device with driver="dra.example.com", which exposes two attributes named "model" and "ext.example.com/family" and which exposes one capacity named "modules". This input to this expression would have the following fields:
The device.driver field can be used to check for a specific driver, either as a high-level precondition (i.e. you only want to consider devices from this driver) or as part of a multi-clause expression that is meant to consider devices from different drivers.
The value type of each attribute is defined by the device definition, and users who write these expressions must consult the documentation for their specific drivers. The value type of each capacity is Quantity.
If an unknown prefix is used as a lookup in either device.attributes or device.capacity, an empty map will be returned. Any reference to an unknown field will cause an evaluation error and allocation to abort.
A robust expression should check for the existence of attributes before referencing them.
For ease of use, the cel.bind() function is enabled, and can be used to simplify expressions that access multiple attributes with the same domain. For example:
The length of the expression must be smaller or equal to 10 Ki. The cost of evaluating it is also limited based on the estimated number of logical steps.
Tolerations for NoSchedule are required to allocate a device which has a taint with that effect. The same applies to NoExecute.
In addition, should any of the allocated devices get tainted with NoExecute after allocation and that effect is not tolerated, then all pods consuming the ResourceClaim get deleted to evict them. The scheduler will not let new pods reserve the claim while it has these tainted devices. Once all pods are evicted, the claim will get deallocated.
The maximum number of tolerations is 16.
This is an alpha field and requires enabling the DRADeviceTaints feature gate.
The ResourceClaim this DeviceToleration is attached to tolerates any taint that matches the triple <key,value,effect> using the matching operator .
Effect indicates the taint effect to match. Empty means match all taint effects. When specified, allowed values are NoSchedule and NoExecute.
devices.requests.exactly.tolerations.key (string)
Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys. Must be a label name.
Operator represents a key's relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a ResourceClaim can tolerate all taints of a particular category.
TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system. If larger than zero, the time when the pod needs to be evicted is calculated as <time when taint was adedd> + <toleration seconds>.
Value is the taint value the toleration matches to. If the operator is Exists, the value must be empty, otherwise just a regular string. Must be a label value.
FirstAvailable contains subrequests, of which exactly one will be selected by the scheduler. It tries to satisfy them in the order in which they are listed here. So if there are two entries in the list, the scheduler will only check the second one if it determines that the first one can not be used.
DRA does not yet implement scoring, so the scheduler will select the first set of devices that satisfies all the requests in the claim. And if the requirements can be satisfied on more than one node, other scheduling features will determine which node is chosen. This means that the set of devices allocated to a claim might not be the optimal set available to the cluster. Scoring will be implemented later.
*DeviceSubRequest describes a request for device provided in the claim.spec.devices.requests[].firstAvailable array. Each is typically a request for a single resource like a device, but can also ask for several identical devices.
DeviceSubRequest is similar to ExactDeviceRequest, but doesn't expose the AdminAccess field as that one is only supported when requesting a specific device.*
DeviceClassName references a specific DeviceClass, which can define additional configuration and selectors to be inherited by this subrequest.
A class is required. Which classes are available depends on the cluster.
Administrators may use this to restrict which devices may get requested by only installing classes with selectors for permitted devices. If users are free to request anything without restrictions, then administrators can create an empty DeviceClass for users to reference.
Name can be used to reference this subrequest in the list of constraints or the list of configurations for the claim. References must use the format <main request>/<subrequest>.
AllocationMode and its related fields define how devices are allocated to satisfy this subrequest. Supported values are:
ExactCount: This request is for a specific number of devices.
This is the default. The exact number is provided in the
count field.
All: This subrequest is for all of the matching devices in a pool.
Allocation will fail if some devices are already allocated,
unless adminAccess is requested.
If AllocationMode is not specified, the default mode is ExactCount. If the mode is ExactCount and count is not specified, the default count is one. Any other subrequests must specify this field.
More modes may get added in the future. Clients must refuse to handle requests with unknown modes.
devices.requests.firstAvailable.count (int64)
Count is used only when the count mode is "ExactCount". Must be greater than zero. If AllocationMode is ExactCount and this field is not specified, the default is one.
Selectors define criteria which must be satisfied by a specific device in order for that device to be considered for this subrequest. All selectors must be satisfied for a device to be considered.
Expression is a CEL expression which evaluates a single device. It must evaluate to true when the device under consideration satisfies the desired criteria, and false when it does not. Any other result is an error and causes allocation of devices to abort.
The expression's input is an object named "device", which carries the following properties:
driver (string): the name of the driver which defines this device.
attributes (map[string]object): the device's attributes, grouped by prefix
(e.g. device.attributes["dra.example.com"] evaluates to an object with all
of the attributes which were prefixed by "dra.example.com".
capacity (map[string]object): the device's capacities, grouped by prefix.
Example: Consider a device with driver="dra.example.com", which exposes two attributes named "model" and "ext.example.com/family" and which exposes one capacity named "modules". This input to this expression would have the following fields:
The device.driver field can be used to check for a specific driver, either as a high-level precondition (i.e. you only want to consider devices from this driver) or as part of a multi-clause expression that is meant to consider devices from different drivers.
The value type of each attribute is defined by the device definition, and users who write these expressions must consult the documentation for their specific drivers. The value type of each capacity is Quantity.
If an unknown prefix is used as a lookup in either device.attributes or device.capacity, an empty map will be returned. Any reference to an unknown field will cause an evaluation error and allocation to abort.
A robust expression should check for the existence of attributes before referencing them.
For ease of use, the cel.bind() function is enabled, and can be used to simplify expressions that access multiple attributes with the same domain. For example:
The length of the expression must be smaller or equal to 10 Ki. The cost of evaluating it is also limited based on the estimated number of logical steps.
Tolerations for NoSchedule are required to allocate a device which has a taint with that effect. The same applies to NoExecute.
In addition, should any of the allocated devices get tainted with NoExecute after allocation and that effect is not tolerated, then all pods consuming the ResourceClaim get deleted to evict them. The scheduler will not let new pods reserve the claim while it has these tainted devices. Once all pods are evicted, the claim will get deallocated.
The maximum number of tolerations is 16.
This is an alpha field and requires enabling the DRADeviceTaints feature gate.
The ResourceClaim this DeviceToleration is attached to tolerates any taint that matches the triple <key,value,effect> using the matching operator .
Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys. Must be a label name.
Operator represents a key's relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a ResourceClaim can tolerate all taints of a particular category.
TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system. If larger than zero, the time when the pod needs to be evicted is calculated as <time when taint was adedd> + <toleration seconds>.
Value is the taint value the toleration matches to. If the operator is Exists, the value must be empty, otherwise just a regular string. Must be a label value.
ResourceClaimStatus
ResourceClaimStatus tracks whether the resource has been allocated and what the result of that was.
allocation (AllocationResult)
Allocation is set once the claim has been allocated successfully.
AllocationResult contains attributes of an allocated resource.
allocation.devices (DeviceAllocationResult)
Devices is the result of allocating devices.
DeviceAllocationResult is the result of allocating devices.
This field is a combination of all the claim and class configuration parameters. Drivers can distinguish between those based on a flag.
This includes configuration parameters for drivers which have no allocated devices in the result because it is up to the drivers which configuration parameters they support. They can silently ignore unknown configuration parameters.
DeviceAllocationConfiguration gets embedded in an AllocationResult.
Parameters can contain arbitrary data. It is the responsibility of the driver developer to handle validation and versioning. Typically this includes self-identification and a version ("kind" + "apiVersion" for Kubernetes types), with conversion between different versions.
The length of the raw data must be smaller or equal to 10 Ki.
*RawExtension is used to hold extensions in external versions.
To use this, make a field which has RawExtension as its type in your external, versioned struct, and Object in your internal struct. You also need to register your various plugin types.
// Internal package:
type MyAPIObject struct {
runtime.TypeMeta json:",inline"
MyPlugin runtime.Object json:"myPlugin"
}
type PluginA struct {
AOption string json:"aOption"
}
// External package:
type MyAPIObject struct {
runtime.TypeMeta json:",inline"
MyPlugin runtime.RawExtension json:"myPlugin"
}
type PluginA struct {
AOption string json:"aOption"
}
// On the wire, the JSON will look something like this:
So what happens? Decode first uses json or yaml to unmarshal the serialized data into your external MyAPIObject. That causes the raw JSON to be stored, but not unpacked. The next step is to copy (using pkg/conversion) into the internal struct. The runtime package's DefaultScheme has conversion functions installed which will unpack the JSON stored in RawExtension, turning it into the correct object type, and storing it in the Object. (TODO: In the case where the object is of an unknown type, a runtime.Unknown object will be created and stored.)*
allocation.devices.config.requests ([]string)
Atomic: will be replaced during a merge
Requests lists the names of requests where the configuration applies. If empty, its applies to all requests.
References to subrequests must include the name of the main request and may include the subrequest using the format <main request>[/<subrequest>]. If just the main request is given, the configuration applies to all subrequests.
Request is the name of the request in the claim which caused this device to be allocated. If it references a subrequest in the firstAvailable list on a DeviceRequest, this field must include both the name of the main request and the subrequest using the format <main request>/<subrequest>.
Multiple devices may have been allocated per request.
allocation.devices.results.adminAccess (boolean)
AdminAccess indicates that this device was allocated for administrative access. See the corresponding request field for a definition of mode.
This is an alpha field and requires enabling the DRAAdminAccess feature gate. Admin access is disabled if this field is unset or set to false, otherwise it is enabled.
Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys. Must be a label name.
Operator represents a key's relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a ResourceClaim can tolerate all taints of a particular category.
TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system. If larger than zero, the time when the pod needs to be evicted is calculated as <time when taint was adedd> + <toleration seconds>.
Value is the taint value the toleration matches to. If the operator is Exists, the value must be empty, otherwise just a regular string. Must be a label value.
allocation.nodeSelector (NodeSelector)
NodeSelector defines where the allocated resources are available. If unset, they are available everywhere.
A node selector represents the union of the results of one or more label queries over a set of nodes; that is, it represents the OR of the selectors represented by the node selector terms.
Required. A list of node selector terms. The terms are ORed.
A null or empty node selector term matches no objects. The requirements of them are ANDed. The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
A list of node selector requirements by node's fields.
devices ([]AllocatedDeviceStatus)
Map: unique values on keys driver, device, pool will be kept during a merge
Devices contains the status of each device allocated for this claim, as reported by the driver. This can include driver-specific information. Entries are owned by their respective drivers.
AllocatedDeviceStatus contains the status of an allocated device, if the driver chooses to report it. This may include driver-specific information.
devices.device (string), required
Device references one device instance via its name in the driver's resource pool. It must be a DNS label.
devices.driver (string), required
Driver specifies the name of the DRA driver whose kubelet plugin should be invoked to process the allocation once the claim is needed on a node.
Must be a DNS subdomain and should end with a DNS domain owned by the vendor of the driver.
devices.pool (string), required
This name together with the driver name and the device name field identify which device was allocated (\<driver name>/\<pool name>/\<device name>).
Must not be longer than 253 characters and may contain one or more DNS sub-domains separated by slashes.
devices.conditions ([]Condition)
Map: unique values on key type will be kept during a merge
Conditions contains the latest observation of the device's state. If the device has been configured according to the class and claim config references, the Ready condition should be True.
Must not contain more than 8 entries.
Condition contains details for one aspect of the current state of this API Resource.
lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
devices.conditions.message (string), required
message is a human readable message indicating details about the transition. This may be an empty string.
devices.conditions.reason (string), required
reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
devices.conditions.status (string), required
status of the condition, one of True, False, Unknown.
devices.conditions.type (string), required
type of condition in CamelCase or in foo.example.com/CamelCase.
devices.conditions.observedGeneration (int64)
observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.
devices.data (RawExtension)
Data contains arbitrary driver-specific data.
The length of the raw data must be smaller or equal to 10 Ki.
*RawExtension is used to hold extensions in external versions.
To use this, make a field which has RawExtension as its type in your external, versioned struct, and Object in your internal struct. You also need to register your various plugin types.
// Internal package:
type MyAPIObject struct {
runtime.TypeMeta `json:",inline"`
MyPlugin runtime.Object `json:"myPlugin"`
}
type PluginA struct {
AOption string `json:"aOption"`
}
// External package:
type MyAPIObject struct {
runtime.TypeMeta `json:",inline"`
MyPlugin runtime.RawExtension `json:"myPlugin"`
}
type PluginA struct {
AOption string `json:"aOption"`
}
// On the wire, the JSON will look something like this:
So what happens? Decode first uses json or yaml to unmarshal the serialized data into your external MyAPIObject. That causes the raw JSON to be stored, but not unpacked. The next step is to copy (using pkg/conversion) into the internal struct. The runtime package's DefaultScheme has conversion functions installed which will unpack the JSON stored in RawExtension, turning it into the correct object type, and storing it in the Object. (TODO: In the case where the object is of an unknown type, a runtime.Unknown object will be created and stored.)*
devices.networkData (NetworkDeviceData)
NetworkData contains network-related information specific to the device.
NetworkDeviceData provides network-related details for the allocated device. This information may be filled by drivers or other components to configure or identify the device within a network context.
devices.networkData.hardwareAddress (string)
HardwareAddress represents the hardware address (e.g. MAC Address) of the device's network interface.
Must not be longer than 128 characters.
devices.networkData.interfaceName (string)
InterfaceName specifies the name of the network interface associated with the allocated device. This might be the name of a physical or virtual network interface being configured in the pod.
Must not be longer than 256 characters.
devices.networkData.ips ([]string)
Atomic: will be replaced during a merge
IPs lists the network addresses assigned to the device's network interface. This can include both IPv4 and IPv6 addresses. The IPs are in the CIDR notation, which includes both the address and the associated subnet mask. e.g.: "192.0.2.5/24" for IPv4 and "2001:db8::5/64" for IPv6.
reservedFor ([]ResourceClaimConsumerReference)
Patch strategy: merge on key uid
Map: unique values on key uid will be kept during a merge
ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.
In a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.
Both schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.
There can be at most 256 such reservations. This may get increased in the future, but not reduced.
ResourceClaimConsumerReference contains enough information to let you locate the consumer of a ResourceClaim. The user must be a resource in the same namespace as the ResourceClaim.
reservedFor.name (string), required
Name is the name of resource being referenced.
reservedFor.resource (string), required
Resource is the type of resource being referenced, for example "pods".
reservedFor.uid (string), required
UID identifies exactly one incarnation of the resource.
reservedFor.apiGroup (string)
APIGroup is the group for the resource being referenced. It is empty for the core API. This matches the group in the APIVersion that is used when creating the resources.
Spec for the ResourceClaim. The entire content is copied unchanged into the ResourceClaim that gets created from this template. The same fields as in a ResourceClaim are also valid here.
ObjectMeta may contain labels and annotations that will be copied into the ResourceClaim when creating it. No other fields are allowed and will be rejected during validation.
ResourceClaimTemplateList
ResourceClaimTemplateList is a collection of claim templates.
ResourceSlice represents one or more resources in a pool of similar resources, managed by a common driver.
apiVersion: resource.k8s.io/v1beta2
import "k8s.io/api/resource/v1beta2"
ResourceSlice
ResourceSlice represents one or more resources in a pool of similar resources, managed by a common driver. A pool may span more than one ResourceSlice, and exactly how many ResourceSlices comprise a pool is determined by the driver.
At the moment, the only supported resources are devices with attributes and capacities. Each device in a given pool, regardless of how many ResourceSlices, must have a unique name. The ResourceSlice in which a device gets published may change over time. The unique identifier for a device is the tuple <driver name>, <pool name>, <device name>.
Whenever a driver needs to update a pool, it increments the pool.Spec.Pool.Generation number and updates all ResourceSlices with that new number and new resource definitions. A consumer must only use ResourceSlices with the highest generation number and ignore all others.
When allocating all resources in a pool matching certain criteria or when looking for the best solution among several different alternatives, a consumer should check the number of ResourceSlices in a pool (included in each ResourceSlice) to determine whether its view of a pool is complete and if not, should wait until the driver has completed updating the pool.
For resources that are not local to a node, the node name is not set. Instead, the driver may use a node selector to specify where the devices are available.
This is an alpha type and requires enabling the DynamicResourceAllocation feature gate.
Changing the spec automatically increments the metadata.generation number.
ResourceSliceSpec
ResourceSliceSpec contains the information published by the driver in one ResourceSlice.
driver (string), required
Driver identifies the DRA driver providing the capacity information. A field selector can be used to list only ResourceSlice objects with a certain driver name.
Must be a DNS subdomain and should end with a DNS domain owned by the vendor of the driver. This field is immutable.
pool (ResourcePool), required
Pool describes the pool that this ResourceSlice belongs to.
ResourcePool describes the pool that ResourceSlices belong to.
pool.generation (int64), required
Generation tracks the change in a pool over time. Whenever a driver changes something about one or more of the resources in a pool, it must change the generation in all ResourceSlices which are part of that pool. Consumers of ResourceSlices should only consider resources from the pool with the highest generation number. The generation may be reset by drivers, which should be fine for consumers, assuming that all ResourceSlices in a pool are updated to match or deleted.
Combined with ResourceSliceCount, this mechanism enables consumers to detect pools which are comprised of multiple ResourceSlices and are in an incomplete state.
pool.name (string), required
Name is used to identify the pool. For node-local devices, this is often the node name, but this is not required.
It must not be longer than 253 characters and must consist of one or more DNS sub-domains separated by slashes. This field is immutable.
pool.resourceSliceCount (int64), required
ResourceSliceCount is the total number of ResourceSlices in the pool at this generation number. Must be greater than zero.
Consumers can use this to check whether they have seen all ResourceSlices belonging to the same pool.
allNodes (boolean)
AllNodes indicates that all nodes have access to the resources in the pool.
Exactly one of NodeName, NodeSelector, AllNodes, and PerDeviceNodeSelection must be set.
devices ([]Device)
Atomic: will be replaced during a merge
Devices lists some or all of the devices in this pool.
Must not have more than 128 entries.
Device represents one individual hardware instance that can be selected based on its attributes. Besides the name, exactly one field must be set.
devices.name (string), required
Name is unique identifier among all devices managed by the driver in the pool. It must be a DNS label.
devices.allNodes (boolean)
AllNodes indicates that all nodes have access to the device.
Must only be set if Spec.PerDeviceNodeSelection is set to true. At most one of NodeName, NodeSelector and AllNodes can be set.
devices.attributes (map[string]DeviceAttribute)
Attributes defines the set of attributes for this device. The name of each attribute must be unique in that set.
The maximum number of attributes and capacities combined is 32.
DeviceAttribute must have exactly one field set.
devices.attributes.bool (boolean)
BoolValue is a true/false value.
devices.attributes.int (int64)
IntValue is a number.
devices.attributes.string (string)
StringValue is a string. Must not be longer than 64 characters.
devices.attributes.version (string)
VersionValue is a semantic version according to semver.org spec 2.0.0. Must not be longer than 64 characters.
devices.capacity (map[string]DeviceCapacity)
Capacity defines the set of capacities for this device. The name of each capacity must be unique in that set.
The maximum number of attributes and capacities combined is 32.
DeviceCapacity describes a quantity associated with a device.
ConsumesCounters defines a list of references to sharedCounters and the set of counters that the device will consume from those counter sets.
There can only be a single entry per counterSet.
The total number of device counter consumption entries must be <= 32. In addition, the total number in the entire ResourceSlice must be <= 1024 (for example, 64 devices with 16 counters each).
DeviceCounterConsumption defines a set of counters that a device will consume from a CounterSet.
Counters defines the counters that will be consumed by the device.
The maximum number counters in a device is 32. In addition, the maximum number of all counters in all devices is 1024 (for example, 64 devices with 16 counters each).
Counter describes a quantity associated with a device.
Value defines how much of a certain device counter is available.
devices.nodeName (string)
NodeName identifies the node where the device is available.
Must only be set if Spec.PerDeviceNodeSelection is set to true. At most one of NodeName, NodeSelector and AllNodes can be set.
devices.nodeSelector (NodeSelector)
NodeSelector defines the nodes where the device is available.
Must use exactly one term.
Must only be set if Spec.PerDeviceNodeSelection is set to true. At most one of NodeName, NodeSelector and AllNodes can be set.
A node selector represents the union of the results of one or more label queries over a set of nodes; that is, it represents the OR of the selectors represented by the node selector terms.
Required. A list of node selector terms. The terms are ORed.
A null or empty node selector term matches no objects. The requirements of them are ANDed. The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
A list of node selector requirements by node's fields.
devices.taints ([]DeviceTaint)
Atomic: will be replaced during a merge
If specified, these are the driver-defined taints.
The maximum number of taints is 4.
This is an alpha field and requires enabling the DRADeviceTaints feature gate.
The device this taint is attached to has the "effect" on any claim which does not tolerate the taint and, through the claim, to pods using the claim.
devices.taints.effect (string), required
The effect of the taint on claims that do not tolerate the taint and through such claims on the pods using them. Valid effects are NoSchedule and NoExecute. PreferNoSchedule as used for nodes is not valid here.
devices.taints.key (string), required
The taint key to be applied to a device. Must be a label name.
devices.taints.timeAdded (Time)
TimeAdded represents the time at which the taint was added. Added automatically during create or update if not set.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
devices.taints.value (string)
The taint value corresponding to the taint key. Must be a label value.
nodeName (string)
NodeName identifies the node which provides the resources in this pool. A field selector can be used to list only ResourceSlice objects belonging to a certain node.
This field can be used to limit access from nodes to ResourceSlices with the same node name. It also indicates to autoscalers that adding new nodes of the same type as some old node might also make new resources available.
Exactly one of NodeName, NodeSelector, AllNodes, and PerDeviceNodeSelection must be set. This field is immutable.
nodeSelector (NodeSelector)
NodeSelector defines which nodes have access to the resources in the pool, when that pool is not limited to a single node.
Must use exactly one term.
Exactly one of NodeName, NodeSelector, AllNodes, and PerDeviceNodeSelection must be set.
A node selector represents the union of the results of one or more label queries over a set of nodes; that is, it represents the OR of the selectors represented by the node selector terms.
Required. A list of node selector terms. The terms are ORed.
A null or empty node selector term matches no objects. The requirements of them are ANDed. The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
A list of node selector requirements by node's fields.
perDeviceNodeSelection (boolean)
PerDeviceNodeSelection defines whether the access from nodes to resources in the pool is set on the ResourceSlice level or on each device. If it is set to true, every device defined the ResourceSlice must specify this individually.
Exactly one of NodeName, NodeSelector, AllNodes, and PerDeviceNodeSelection must be set.
sharedCounters ([]CounterSet)
Atomic: will be replaced during a merge
SharedCounters defines a list of counter sets, each of which has a name and a list of counters available.
The names of the SharedCounters must be unique in the ResourceSlice.
The maximum number of counters in all sets is 32.
*CounterSet defines a named set of counters that are available to be used by devices defined in the ResourceSlice.
The counters are not allocatable by themselves, but can be referenced by devices. When a device is allocated, the portion of counters it uses will no longer be available for use by other devices.*
Service is a named abstraction of software service (for example, mysql) consisting of local port (for example 3306) that the proxy listens on, and the selector that determines which pods will answer requests sent through the proxy.
apiVersion: v1
import "k8s.io/api/core/v1"
Service
Service is a named abstraction of software service (for example, mysql) consisting of local port (for example 3306) that the proxy listens on, and the selector that determines which pods will answer requests sent through the proxy.
ServiceSpec describes the attributes that a user creates on a service.
selector (map[string]string)
Route service traffic to pods with label keys and values matching this selector. If empty or not present, the service is assumed to have an external process managing its endpoints, which Kubernetes will not modify. Only applies to types ClusterIP, NodePort, and LoadBalancer. Ignored if type is ExternalName. More info: https://kubernetes.io/docs/concepts/services-networking/service/
ports ([]ServicePort)
Patch strategy: merge on key port
Map: unique values on keys port, protocol will be kept during a merge
ServicePort contains information on service's port.
ports.port (int32), required
The port that will be exposed by this service.
ports.targetPort (IntOrString)
Number or name of the port to access on the pods targeted by the service. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME. If this is a string, it will be looked up as a named port in the target Pod's container ports. If this is not specified, the value of the 'port' field is used (an identity map). This field is ignored for services with clusterIP=None, and should be omitted or set equal to the 'port' field. More info: https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
ports.protocol (string)
The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is TCP.
ports.name (string)
The name of this port within the service. This must be a DNS_LABEL. All ports within a ServiceSpec must have unique names. When considering the endpoints for a Service, this must match the 'name' field in the EndpointPort. Optional if only one ServicePort is defined on this service.
ports.nodePort (int32)
The port on each node on which this service is exposed when type is NodePort or LoadBalancer. Usually assigned by the system. If a value is specified, in-range, and not in use it will be used, otherwise the operation will fail. If not specified, a port will be allocated if this Service requires one. If this field is specified when creating a Service which does not need it, creation will fail. This field will be wiped when updating a Service to no longer need it (e.g. changing type from NodePort to ClusterIP). More info: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
ports.appProtocol (string)
The application protocol for this port. This is used as a hint for implementations to offer richer behavior for protocols that they understand. This field follows standard Kubernetes label syntax. Valid values are either:
Other protocols should use implementation-defined prefixed names such as mycompany.com/my-custom-protocol.
type (string)
type determines how the Service is exposed. Defaults to ClusterIP. Valid options are ExternalName, ClusterIP, NodePort, and LoadBalancer. "ClusterIP" allocates a cluster-internal IP address for load-balancing to endpoints. Endpoints are determined by the selector or if that is not specified, by manual construction of an Endpoints object or EndpointSlice objects. If clusterIP is "None", no virtual IP is allocated and the endpoints are published as a set of endpoints rather than a virtual IP. "NodePort" builds on ClusterIP and allocates a port on every node which routes to the same endpoints as the clusterIP. "LoadBalancer" builds on NodePort and creates an external load-balancer (if supported in the current cloud) which routes to the same endpoints as the clusterIP. "ExternalName" aliases this service to the specified externalName. Several other fields do not apply to ExternalName services. More info: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
ipFamilies ([]string)
Atomic: will be replaced during a merge
IPFamilies is a list of IP families (e.g. IPv4, IPv6) assigned to this service. This field is usually assigned automatically based on cluster configuration and the ipFamilyPolicy field. If this field is specified manually, the requested family is available in the cluster, and ipFamilyPolicy allows it, it will be used; otherwise creation of the service will fail. This field is conditionally mutable: it allows for adding or removing a secondary IP family, but it does not allow changing the primary IP family of the Service. Valid values are "IPv4" and "IPv6". This field only applies to Services of types ClusterIP, NodePort, and LoadBalancer, and does apply to "headless" services. This field will be wiped when updating a Service to type ExternalName.
This field may hold a maximum of two entries (dual-stack families, in either order). These families must correspond to the values of the clusterIPs field, if specified. Both clusterIPs and ipFamilies are governed by the ipFamilyPolicy field.
ipFamilyPolicy (string)
IPFamilyPolicy represents the dual-stack-ness requested or required by this Service. If there is no value provided, then this field will be set to SingleStack. Services can be "SingleStack" (a single IP family), "PreferDualStack" (two IP families on dual-stack configured clusters or a single IP family on single-stack clusters), or "RequireDualStack" (two IP families on dual-stack configured clusters, otherwise fail). The ipFamilies and clusterIPs fields depend on the value of this field. This field will be wiped when updating a service to type ExternalName.
clusterIP (string)
clusterIP is the IP address of the service and is usually assigned randomly. If an address is specified manually, is in-range (as per system configuration), and is not in use, it will be allocated to the service; otherwise creation of the service will fail. This field may not be changed through updates unless the type field is also being changed to ExternalName (which requires this field to be blank) or the type field is being changed from ExternalName (in which case this field may optionally be specified, as describe above). Valid values are "None", empty string (""), or a valid IP address. Setting this to "None" makes a "headless service" (no virtual IP), which is useful when direct endpoint connections are preferred and proxying is not required. Only applies to types ClusterIP, NodePort, and LoadBalancer. If this field is specified when creating a Service of type ExternalName, creation will fail. This field will be wiped when updating a Service to type ExternalName. More info: https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies
clusterIPs ([]string)
Atomic: will be replaced during a merge
ClusterIPs is a list of IP addresses assigned to this service, and are usually assigned randomly. If an address is specified manually, is in-range (as per system configuration), and is not in use, it will be allocated to the service; otherwise creation of the service will fail. This field may not be changed through updates unless the type field is also being changed to ExternalName (which requires this field to be empty) or the type field is being changed from ExternalName (in which case this field may optionally be specified, as describe above). Valid values are "None", empty string (""), or a valid IP address. Setting this to "None" makes a "headless service" (no virtual IP), which is useful when direct endpoint connections are preferred and proxying is not required. Only applies to types ClusterIP, NodePort, and LoadBalancer. If this field is specified when creating a Service of type ExternalName, creation will fail. This field will be wiped when updating a Service to type ExternalName. If this field is not specified, it will be initialized from the clusterIP field. If this field is specified, clients must ensure that clusterIPs[0] and clusterIP have the same value.
externalIPs is a list of IP addresses for which nodes in the cluster will also accept traffic for this service. These IPs are not managed by Kubernetes. The user is responsible for ensuring that traffic arrives at a node with this IP. A common example is external load-balancers that are not part of the Kubernetes system.
Only applies to Service Type: LoadBalancer. This feature depends on whether the underlying cloud-provider supports specifying the loadBalancerIP when a load balancer is created. This field will be ignored if the cloud-provider does not support the feature. Deprecated: This field was under-specified and its meaning varies across implementations. Using it is non-portable and it may not support dual-stack. Users are encouraged to use implementation-specific annotations when available.
loadBalancerClass is the class of the load balancer implementation this Service belongs to. If specified, the value of this field must be a label-style identifier, with an optional prefix, e.g. "internal-vip" or "example.com/internal-vip". Unprefixed names are reserved for end-users. This field can only be set when the Service type is 'LoadBalancer'. If not set, the default load balancer implementation is used, today this is typically done through the cloud provider integration, but should apply for any default implementation. If set, it is assumed that a load balancer implementation is watching for Services with a matching class. Any default load balancer implementation (e.g. cloud providers) should ignore Services that set this field. This field can only be set when creating or updating a Service to type 'LoadBalancer'. Once set, it can not be changed. This field will be wiped when a service is updated to a non 'LoadBalancer' type.
externalName (string)
externalName is the external reference that discovery mechanisms will return as an alias for this service (e.g. a DNS CNAME record). No proxying will be involved. Must be a lowercase RFC-1123 hostname (https://tools.ietf.org/html/rfc1123) and requires type to be "ExternalName".
externalTrafficPolicy (string)
externalTrafficPolicy describes how nodes distribute service traffic they receive on one of the Service's "externally-facing" addresses (NodePorts, ExternalIPs, and LoadBalancer IPs). If set to "Local", the proxy will configure the service in a way that assumes that external load balancers will take care of balancing the service traffic between nodes, and so each node will deliver traffic only to the node-local endpoints of the service, without masquerading the client source IP. (Traffic mistakenly sent to a node with no endpoints will be dropped.) The default value, "Cluster", uses the standard behavior of routing to all endpoints evenly (possibly modified by topology and other features). Note that traffic sent to an External IP or LoadBalancer IP from within the cluster will always get "Cluster" semantics, but clients sending to a NodePort from within the cluster may need to take traffic policy into account when picking a node.
internalTrafficPolicy (string)
InternalTrafficPolicy describes how nodes distribute service traffic they receive on the ClusterIP. If set to "Local", the proxy will assume that pods only want to talk to endpoints of the service on the same node as the pod, dropping the traffic if there are no local endpoints. The default value, "Cluster", uses the standard behavior of routing to all endpoints evenly (possibly modified by topology and other features).
healthCheckNodePort (int32)
healthCheckNodePort specifies the healthcheck nodePort for the service. This only applies when type is set to LoadBalancer and externalTrafficPolicy is set to Local. If a value is specified, is in-range, and is not in use, it will be used. If not specified, a value will be automatically allocated. External systems (e.g. load-balancers) can use this port to determine if a given node holds endpoints for this service or not. If this field is specified when creating a Service which does not need it, creation will fail. This field will be wiped when updating a Service to no longer need it (e.g. changing type). This field cannot be updated once set.
publishNotReadyAddresses (boolean)
publishNotReadyAddresses indicates that any agent which deals with endpoints for this Service should disregard any indications of ready/not-ready. The primary use case for setting this field is for a StatefulSet's Headless Service to propagate SRV DNS records for its Pods for the purpose of peer discovery. The Kubernetes controllers that generate Endpoints and EndpointSlice resources for Services interpret this to mean that all endpoints are considered "ready" even if the Pods themselves are not. Agents which consume only Kubernetes generated endpoints through the Endpoints or EndpointSlice resources can safely assume this behavior.
sessionAffinityConfig (SessionAffinityConfig)
sessionAffinityConfig contains the configurations of session affinity.
SessionAffinityConfig represents the configurations of session affinity.
sessionAffinityConfig.clientIP (ClientIPConfig)
clientIP contains the configurations of Client IP based session affinity.
ClientIPConfig represents the configurations of Client IP based session affinity.
timeoutSeconds specifies the seconds of ClientIP type session sticky time. The value must be >0 && <=86400(for 1 day) if ServiceAffinity == "ClientIP". Default value is 10800(for 3 hours).
allocateLoadBalancerNodePorts (boolean)
allocateLoadBalancerNodePorts defines if NodePorts will be automatically allocated for services with type LoadBalancer. Default is "true". It may be set to "false" if the cluster load-balancer does not rely on NodePorts. If the caller requests specific NodePorts (by specifying a value), those requests will be respected, regardless of this field. This field may only be set for services with type LoadBalancer and will be cleared if the type is changed to any other type.
trafficDistribution (string)
TrafficDistribution offers a way to express preferences for how traffic is distributed to Service endpoints. Implementations can use this field as a hint, but are not required to guarantee strict adherence. If the field is not set, the implementation will apply its default routing strategy. If set to "PreferClose", implementations should prioritize endpoints that are in the same zone.
ServiceStatus
ServiceStatus represents the current status of a service.
conditions ([]Condition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Current service state
Condition contains details for one aspect of the current state of this API Resource.
conditions.lastTransitionTime (Time), required
lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string), required
message is a human readable message indicating details about the transition. This may be an empty string.
conditions.reason (string), required
reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
conditions.status (string), required
status of the condition, one of True, False, Unknown.
conditions.type (string), required
type of condition in CamelCase or in foo.example.com/CamelCase.
conditions.observedGeneration (int64)
observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.
loadBalancer (LoadBalancerStatus)
LoadBalancer contains the current status of the load-balancer, if one is present.
LoadBalancerStatus represents the status of a load-balancer.
loadBalancer.ingress ([]LoadBalancerIngress)
Atomic: will be replaced during a merge
Ingress is a list containing ingress points for the load-balancer. Traffic intended for the service should be sent to these ingress points.
LoadBalancerIngress represents the status of a load-balancer ingress point: traffic intended for the service should be sent to an ingress point.
loadBalancer.ingress.hostname (string)
Hostname is set for load-balancer ingress points that are DNS based (typically AWS load-balancers)
loadBalancer.ingress.ip (string)
IP is set for load-balancer ingress points that are IP based (typically GCE or OpenStack load-balancers)
loadBalancer.ingress.ipMode (string)
IPMode specifies how the load-balancer IP behaves, and may only be specified when the ip field is specified. Setting this to "VIP" indicates that traffic is delivered to the node with the destination set to the load-balancer's IP and port. Setting this to "Proxy" indicates that traffic is delivered to the node or pod with the destination set to the node's IP and node port or the pod's IP and port. Service implementations may use this information to adjust traffic routing.
loadBalancer.ingress.ports ([]PortStatus)
Atomic: will be replaced during a merge
Ports is a list of records of service ports If used, every port defined in the service should have an entry in it
PortStatus represents the error condition of a service port
loadBalancer.ingress.ports.port (int32), required
Port is the port number of the service port of which status is recorded here
Protocol is the protocol of the service port of which status is recorded here The supported values are: "TCP", "UDP", "SCTP"
loadBalancer.ingress.ports.error (string)
Error is to record the problem with the service port The format of the error shall comply with the following rules: - built-in error values shall be specified in this file and those shall use
CamelCase names
cloud provider specific error values must have names that comply with the
format foo.example.com/CamelCase.
Endpoints is a legacy API and does not contain information about all Service features. Use discoveryv1.EndpointSlice for complete information about Service endpoints.
Deprecated: This API is deprecated in v1.33+. Use discoveryv1.EndpointSlice.
The set of all endpoints is the union of all subsets. Addresses are placed into subsets according to the IPs they share. A single address with multiple ports, some of which are ready and some of which are not (because they come from different containers) will result in the address being displayed in different subsets for the different ports. No address will appear in both Addresses and NotReadyAddresses in the same subset. Sets of addresses and ports that comprise a service.
*EndpointSubset is a group of addresses with a common set of ports. The expanded set of endpoints is the Cartesian product of Addresses x Ports. For example, given:
IP addresses which offer the related ports that are marked as ready. These endpoints should be considered safe for load balancers and clients to utilize.
EndpointAddress is a tuple that describes single IP address. Deprecated: This API is deprecated in v1.33+.
subsets.addresses.ip (string), required
The IP of this endpoint. May not be loopback (127.0.0.0/8 or ::1), link-local (169.254.0.0/16 or fe80::/10), or link-local multicast (224.0.0.0/24 or ff02::/16).
subsets.addresses.hostname (string)
The Hostname of this endpoint
subsets.addresses.nodeName (string)
Optional: Node hosting this endpoint. This can be used to determine endpoints local to a node.
IP addresses which offer the related ports but are not currently marked as ready because they have not yet finished starting, have recently failed a readiness check, or have recently failed a liveness check.
EndpointAddress is a tuple that describes single IP address. Deprecated: This API is deprecated in v1.33+.
subsets.notReadyAddresses.ip (string), required
The IP of this endpoint. May not be loopback (127.0.0.0/8 or ::1), link-local (169.254.0.0/16 or fe80::/10), or link-local multicast (224.0.0.0/24 or ff02::/16).
subsets.notReadyAddresses.hostname (string)
The Hostname of this endpoint
subsets.notReadyAddresses.nodeName (string)
Optional: Node hosting this endpoint. This can be used to determine endpoints local to a node.
Port numbers available on the related IP addresses.
EndpointPort is a tuple that describes a single port. Deprecated: This API is deprecated in v1.33+.
subsets.ports.port (int32), required
The port number of the endpoint.
subsets.ports.protocol (string)
The IP protocol for this port. Must be UDP, TCP, or SCTP. Default is TCP.
subsets.ports.name (string)
The name of this port. This must match the 'name' field in the corresponding ServicePort. Must be a DNS_LABEL. Optional only if one port is defined.
subsets.ports.appProtocol (string)
The application protocol for this port. This is used as a hint for implementations to offer richer behavior for protocols that they understand. This field follows standard Kubernetes label syntax. Valid values are either:
EndpointSlice represents a set of service endpoints.
apiVersion: discovery.k8s.io/v1
import "k8s.io/api/discovery/v1"
EndpointSlice
EndpointSlice represents a set of service endpoints. Most EndpointSlices are created by the EndpointSlice controller to represent the Pods selected by Service objects. For a given service there may be multiple EndpointSlice objects which must be joined to produce the full set of endpoints; you can find all of the slices for a given service by listing EndpointSlices in the service's namespace whose kubernetes.io/service-name label contains the service's name.
addressType specifies the type of address carried by this EndpointSlice. All addresses in this slice must be the same type. This field is immutable after creation. The following address types are currently supported: * IPv4: Represents an IPv4 Address. * IPv6: Represents an IPv6 Address. * FQDN: Represents a Fully Qualified Domain Name. (Deprecated) The EndpointSlice controller only generates, and kube-proxy only processes, slices of addressType "IPv4" and "IPv6". No semantics are defined for the "FQDN" type.
endpoints ([]Endpoint), required
Atomic: will be replaced during a merge
endpoints is a list of unique endpoints in this slice. Each slice may include a maximum of 1000 endpoints.
Endpoint represents a single logical "backend" implementing a service.
endpoints.addresses ([]string), required
Set: unique values will be kept during a merge
addresses of this endpoint. For EndpointSlices of addressType "IPv4" or "IPv6", the values are IP addresses in canonical form. The syntax and semantics of other addressType values are not defined. This must contain at least one address but no more than 100. EndpointSlices generated by the EndpointSlice controller will always have exactly 1 address. No semantics are defined for additional addresses beyond the first, and kube-proxy does not look at them.
endpoints.conditions (EndpointConditions)
conditions contains information about the current status of the endpoint.
EndpointConditions represents the current condition of an endpoint.
endpoints.conditions.ready (boolean)
ready indicates that this endpoint is ready to receive traffic, according to whatever system is managing the endpoint. A nil value should be interpreted as "true". In general, an endpoint should be marked ready if it is serving and not terminating, though this can be overridden in some cases, such as when the associated Service has set the publishNotReadyAddresses flag.
endpoints.conditions.serving (boolean)
serving indicates that this endpoint is able to receive traffic, according to whatever system is managing the endpoint. For endpoints backed by pods, the EndpointSlice controller will mark the endpoint as serving if the pod's Ready condition is True. A nil value should be interpreted as "true".
endpoints.conditions.terminating (boolean)
terminating indicates that this endpoint is terminating. A nil value should be interpreted as "false".
endpoints.deprecatedTopology (map[string]string)
deprecatedTopology contains topology information part of the v1beta1 API. This field is deprecated, and will be removed when the v1beta1 API is removed (no sooner than kubernetes v1.24). While this field can hold values, it is not writable through the v1 API, and any attempts to write to it will be silently ignored. Topology information can be found in the zone and nodeName fields instead.
endpoints.hints (EndpointHints)
hints contains information associated with how an endpoint should be consumed.
EndpointHints provides hints describing how an endpoint should be consumed.
endpoints.hints.forNodes ([]ForNode)
Atomic: will be replaced during a merge
forNodes indicates the node(s) this endpoint should be consumed by when using topology aware routing. May contain a maximum of 8 entries. This is an Alpha feature and is only used when the PreferSameTrafficDistribution feature gate is enabled.
ForNode provides information about which nodes should consume this endpoint.
endpoints.hints.forNodes.name (string), required
name represents the name of the node.
endpoints.hints.forZones ([]ForZone)
Atomic: will be replaced during a merge
forZones indicates the zone(s) this endpoint should be consumed by when using topology aware routing. May contain a maximum of 8 entries.
ForZone provides information about which zones should consume this endpoint.
endpoints.hints.forZones.name (string), required
name represents the name of the zone.
endpoints.hostname (string)
hostname of this endpoint. This field may be used by consumers of endpoints to distinguish endpoints from each other (e.g. in DNS names). Multiple endpoints which use the same hostname should be considered fungible (e.g. multiple A values in DNS). Must be lowercase and pass DNS Label (RFC 1123) validation.
endpoints.nodeName (string)
nodeName represents the name of the Node hosting this endpoint. This can be used to determine endpoints local to a Node.
targetRef is a reference to a Kubernetes object that represents this endpoint.
endpoints.zone (string)
zone is the name of the Zone this endpoint exists in.
ports ([]EndpointPort)
Atomic: will be replaced during a merge
ports specifies the list of network ports exposed by each endpoint in this slice. Each port must have a unique name. Each slice may include a maximum of 100 ports. Services always have at least 1 port, so EndpointSlices generated by the EndpointSlice controller will likewise always have at least 1 port. EndpointSlices used for other purposes may have an empty ports list.
EndpointPort represents a Port used by an EndpointSlice
ports.port (int32)
port represents the port number of the endpoint. If the EndpointSlice is derived from a Kubernetes service, this must be set to the service's target port. EndpointSlices used for other purposes may have a nil port.
ports.protocol (string)
protocol represents the IP protocol for this port. Must be UDP, TCP, or SCTP. Default is TCP.
ports.name (string)
name represents the name of this port. All ports in an EndpointSlice must have a unique name. If the EndpointSlice is derived from a Kubernetes service, this corresponds to the Service.ports[].name. Name must either be an empty string or pass DNS_LABEL validation: * must be no more than 63 characters long. * must consist of lower case alphanumeric characters or '-'. * must start and end with an alphanumeric character. Default is empty string.
ports.appProtocol (string)
The application protocol for this port. This is used as a hint for implementations to offer richer behavior for protocols that they understand. This field follows standard Kubernetes label syntax. Valid values are either:
Ingress is a collection of rules that allow inbound connections to reach the endpoints defined by a backend.
apiVersion: networking.k8s.io/v1
import "k8s.io/api/networking/v1"
Ingress
Ingress is a collection of rules that allow inbound connections to reach the endpoints defined by a backend. An Ingress can be configured to give services externally-reachable urls, load balance traffic, terminate SSL, offer name based virtual hosting etc.
defaultBackend is the backend that should handle requests that don't match any rule. If Rules are not specified, DefaultBackend must be specified. If DefaultBackend is not set, the handling of requests that do not match any of the rules will be up to the Ingress controller.
ingressClassName (string)
ingressClassName is the name of an IngressClass cluster resource. Ingress controller implementations use this field to know whether they should be serving this Ingress resource, by a transitive connection (controller -> IngressClass -> Ingress resource). Although the kubernetes.io/ingress.class annotation (simple constant name) was never formally defined, it was widely supported by Ingress controllers to create a direct binding between Ingress controller and Ingress resources. Newly created Ingress resources should prefer using the field. However, even though the annotation is officially deprecated, for backwards compatibility reasons, ingress controllers should still honor that annotation if present.
rules ([]IngressRule)
Atomic: will be replaced during a merge
rules is a list of host rules used to configure the Ingress. If unspecified, or no rule matches, all traffic is sent to the default backend.
IngressRule represents the rules mapping the paths under a specified host to the related backend services. Incoming requests are first evaluated for a host match, then routed to the backend associated with the matching IngressRuleValue.
rules.host (string)
host is the fully qualified domain name of a network host, as defined by RFC 3986. Note the following deviations from the "host" part of the URI as defined in RFC 3986: 1. IPs are not allowed. Currently an IngressRuleValue can only apply to
the IP in the Spec of the parent Ingress.
2. The : delimiter is not respected because ports are not allowed.
Currently the port of an Ingress is implicitly :80 for http and
:443 for https.
Both these may change in the future. Incoming requests are matched against the host before the IngressRuleValue. If the host is unspecified, the Ingress routes all traffic based on the specified IngressRuleValue.
host can be "precise" which is a domain name without the terminating dot of a network host (e.g. "foo.bar.com") or "wildcard", which is a domain name prefixed with a single wildcard label (e.g. ".foo.com"). The wildcard character '' must appear by itself as the first DNS label and matches only a single label. You cannot have a wildcard label by itself (e.g. Host == "*"). Requests will be matched against the Host field in the following way: 1. If host is precise, the request matches this rule if the http host header is equal to Host. 2. If host is a wildcard, then the request matches this rule if the http host header is to equal to the suffix (removing the first label) of the wildcard rule.
rules.http (HTTPIngressRuleValue)
HTTPIngressRuleValue is a list of http selectors pointing to backends. In the example: http:///? -> backend where where parts of the url correspond to RFC 3986, this resource will be used to match against everything after the last '/' and before the first '?' or '#'.
rules.http.paths ([]HTTPIngressPath), required
Atomic: will be replaced during a merge
paths is a collection of paths that map requests to backends.
HTTPIngressPath associates a path with a backend. Incoming urls matching the path are forwarded to the backend.
backend defines the referenced service endpoint to which the traffic will be forwarded to.
rules.http.paths.pathType (string), required
pathType determines the interpretation of the path matching. PathType can be one of the following values: * Exact: Matches the URL path exactly. * Prefix: Matches based on a URL path prefix split by '/'. Matching is
done on a path element by element basis. A path element refers is the
list of labels in the path split by the '/' separator. A request is a
match for path p if every p is an element-wise prefix of p of the
request path. Note that if the last element of the path is a substring
of the last element in request path, it is not a match (e.g. /foo/bar
matches /foo/bar/baz, but does not match /foo/barbaz).
ImplementationSpecific: Interpretation of the Path matching is up to
the IngressClass. Implementations can treat this as a separate PathType
or treat it identically to Prefix or Exact path types.
Implementations are required to support all path types.
rules.http.paths.path (string)
path is matched against the path of an incoming request. Currently it can contain characters disallowed from the conventional "path" part of a URL as defined by RFC 3986. Paths must begin with a '/' and must be present when using PathType with value "Exact" or "Prefix".
tls ([]IngressTLS)
Atomic: will be replaced during a merge
tls represents the TLS configuration. Currently the Ingress only supports a single TLS port, 443. If multiple members of this list specify different hosts, they will be multiplexed on the same port according to the hostname specified through the SNI TLS extension, if the ingress controller fulfilling the ingress supports SNI.
IngressTLS describes the transport layer security associated with an ingress.
tls.hosts ([]string)
Atomic: will be replaced during a merge
hosts is a list of hosts included in the TLS certificate. The values in this list must match the name/s used in the tlsSecret. Defaults to the wildcard host setting for the loadbalancer controller fulfilling this Ingress, if left unspecified.
tls.secretName (string)
secretName is the name of the secret used to terminate TLS traffic on port 443. Field is left optional to allow TLS routing based on SNI hostname alone. If the SNI host in a listener conflicts with the "Host" header field used by an IngressRule, the SNI host is used for termination and value of the "Host" header is used for routing.
IngressBackend
IngressBackend describes all endpoints for a given service and port.
resource is an ObjectRef to another Kubernetes resource in the namespace of the Ingress object. If resource is specified, a service.Name and service.Port must not be specified. This is a mutually exclusive setting with "Service".
service (IngressServiceBackend)
service references a service as a backend. This is a mutually exclusive setting with "Resource".
IngressServiceBackend references a Kubernetes Service as a Backend.
service.name (string), required
name is the referenced service. The service must exist in the same namespace as the Ingress object.
service.port (ServiceBackendPort)
port of the referenced service. A port name or port number is required for a IngressServiceBackend.
ServiceBackendPort is the service port being referenced.
service.port.name (string)
name is the name of the port on the Service. This is a mutually exclusive setting with "Number".
service.port.number (int32)
number is the numerical port number (e.g. 80) on the Service. This is a mutually exclusive setting with "Name".
IngressStatus
IngressStatus describe the current state of the Ingress.
loadBalancer (IngressLoadBalancerStatus)
loadBalancer contains the current status of the load-balancer.
IngressLoadBalancerStatus represents the status of a load-balancer.
protocol is the protocol of the ingress port. The supported values are: "TCP", "UDP", "SCTP"
loadBalancer.ingress.ports.error (string)
error is to record the problem with the service port The format of the error shall comply with the following rules: - built-in error values shall be specified in this file and those shall use
CamelCase names
cloud provider specific error values must have names that comply with the
format foo.example.com/CamelCase.
IngressClass represents the class of the Ingress, referenced by the Ingress Spec.
apiVersion: networking.k8s.io/v1
import "k8s.io/api/networking/v1"
IngressClass
IngressClass represents the class of the Ingress, referenced by the Ingress Spec. The ingressclass.kubernetes.io/is-default-class annotation can be used to indicate that an IngressClass should be considered default. When a single IngressClass resource has this annotation set to true, new Ingress resources without a class specified will be assigned this default class.
IngressClassSpec provides information about the class of an Ingress.
controller (string)
controller refers to the name of the controller that should handle this class. This allows for different "flavors" that are controlled by the same controller. For example, you may have different parameters for the same implementing controller. This should be specified as a domain-prefixed path no more than 250 characters in length, e.g. "acme.io/ingress-controller". This field is immutable.
parameters (IngressClassParametersReference)
parameters is a link to a custom resource containing additional configuration for the controller. This is optional if the controller does not require extra parameters.
IngressClassParametersReference identifies an API object. This can be used to specify a cluster or namespace-scoped resource.
parameters.kind (string), required
kind is the type of resource being referenced.
parameters.name (string), required
name is the name of resource being referenced.
parameters.apiGroup (string)
apiGroup is the group for the resource being referenced. If APIGroup is not specified, the specified Kind must be in the core API group. For any other third-party types, APIGroup is required.
parameters.namespace (string)
namespace is the namespace of the resource being referenced. This field is required when scope is set to "Namespace" and must be unset when scope is set to "Cluster".
parameters.scope (string)
scope represents if this refers to a cluster or namespace scoped resource. This may be set to "Cluster" (default) or "Namespace".
IngressClassList
IngressClassList is a collection of IngressClasses.
BinaryData contains the binary data. Each key must consist of alphanumeric characters, '-', '_' or '.'. BinaryData can contain byte sequences that are not in the UTF-8 range. The keys stored in BinaryData must not overlap with the ones in the Data field, this is enforced during validation process. Using this field will require 1.10+ apiserver and kubelet.
data (map[string]string)
Data contains the configuration data. Each key must consist of alphanumeric characters, '-', '_' or '.'. Values with non-UTF-8 byte sequences must use the BinaryData field. The keys stored in Data must not overlap with the keys in the BinaryData field, this is enforced during validation process.
immutable (boolean)
Immutable, if set to true, ensures that data stored in the ConfigMap cannot be updated (only object metadata can be modified). If not set to true, the field can be modified at any time. Defaulted to nil.
ConfigMapList
ConfigMapList is a resource containing a list of ConfigMap objects.
Data contains the secret data. Each key must consist of alphanumeric characters, '-', '_' or '.'. The serialized form of the secret data is a base64 encoded string, representing the arbitrary (possibly non-string) data value here. Described in https://tools.ietf.org/html/rfc4648#section-4
immutable (boolean)
Immutable, if set to true, ensures that data stored in the Secret cannot be updated (only object metadata can be modified). If not set to true, the field can be modified at any time. Defaulted to nil.
stringData (map[string]string)
stringData allows specifying non-binary secret data in string form. It is provided as a write-only input field for convenience. All keys and values are merged into the data field on write, overwriting any existing values. The stringData field is never output when reading from the API.
CSIDriver captures information about a Container Storage Interface (CSI) volume driver deployed on the cluster.
apiVersion: storage.k8s.io/v1
import "k8s.io/api/storage/v1"
CSIDriver
CSIDriver captures information about a Container Storage Interface (CSI) volume driver deployed on the cluster. Kubernetes attach detach controller uses this object to determine whether attach is required. Kubelet uses this object to determine whether pod information needs to be passed on mount. CSIDriver objects are non-namespaced.
Standard object metadata. metadata.Name indicates the name of the CSI driver that this object refers to; it MUST be the same name returned by the CSI GetPluginName() call for that driver. The driver name must be 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), dots (.), and alphanumerics between. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
spec represents the specification of the CSI Driver.
CSIDriverSpec
CSIDriverSpec is the specification of a CSIDriver.
attachRequired (boolean)
attachRequired indicates this CSI volume driver requires an attach operation (because it implements the CSI ControllerPublishVolume() method), and that the Kubernetes attach detach controller should call the attach volume interface which checks the volumeattachment status and waits until the volume is attached before proceeding to mounting. The CSI external-attacher coordinates with CSI volume driver and updates the volumeattachment status when the attach operation is complete. If the CSIDriverRegistry feature gate is enabled and the value is specified to false, the attach operation will be skipped. Otherwise the attach operation will be called.
This field is immutable.
fsGroupPolicy (string)
fsGroupPolicy defines if the underlying volume supports changing ownership and permission of the volume before being mounted. Refer to the specific FSGroupPolicy values for additional details.
This field was immutable in Kubernetes < 1.29 and now is mutable.
Defaults to ReadWriteOnceWithFSType, which will examine each volume to determine if Kubernetes should modify ownership and permissions of the volume. With the default policy the defined fsGroup will only be applied if a fstype is defined and the volume's access mode contains ReadWriteOnce.
nodeAllocatableUpdatePeriodSeconds (int64)
nodeAllocatableUpdatePeriodSeconds specifies the interval between periodic updates of the CSINode allocatable capacity for this driver. When set, both periodic updates and updates triggered by capacity-related failures are enabled. If not set, no updates occur (neither periodic nor upon detecting capacity-related failures), and the allocatable.count remains static. The minimum allowed value for this field is 10 seconds.
This is an alpha feature and requires the MutableCSINodeAllocatableCount feature gate to be enabled.
This field is mutable.
podInfoOnMount (boolean)
podInfoOnMount indicates this CSI volume driver requires additional pod information (like podName, podUID, etc.) during mount operations, if set to true. If set to false, pod information will not be passed on mount. Default is false.
The CSI driver specifies podInfoOnMount as part of driver deployment. If true, Kubelet will pass pod information as VolumeContext in the CSI NodePublishVolume() calls. The CSI driver is responsible for parsing and validating the information passed in as VolumeContext.
The following VolumeContext will be passed if podInfoOnMount is set to true. This list might grow, but the prefix will be used. "csi.storage.k8s.io/pod.name": pod.Name "csi.storage.k8s.io/pod.namespace": pod.Namespace "csi.storage.k8s.io/pod.uid": string(pod.UID) "csi.storage.k8s.io/ephemeral": "true" if the volume is an ephemeral inline volume
defined by a CSIVolumeSource, otherwise "false"
"csi.storage.k8s.io/ephemeral" is a new feature in Kubernetes 1.16. It is only required for drivers which support both the "Persistent" and "Ephemeral" VolumeLifecycleMode. Other drivers can leave pod info disabled and/or ignore this field. As Kubernetes 1.15 doesn't support this field, drivers can only support one mode when deployed on such a cluster and the deployment determines which mode that is, for example via a command line parameter of the driver.
This field was immutable in Kubernetes < 1.29 and now is mutable.
requiresRepublish (boolean)
requiresRepublish indicates the CSI driver wants NodePublishVolume being periodically called to reflect any possible change in the mounted volume. This field defaults to false.
Note: After a successful initial NodePublishVolume call, subsequent calls to NodePublishVolume should only update the contents of the volume. New mount points will not be seen by a running container.
seLinuxMount (boolean)
seLinuxMount specifies if the CSI driver supports "-o context" mount option.
When "true", the CSI driver must ensure that all volumes provided by this CSI driver can be mounted separately with different -o context options. This is typical for storage backends that provide volumes as filesystems on block devices or as independent shared volumes. Kubernetes will call NodeStage / NodePublish with "-o context=xyz" mount option when mounting a ReadWriteOncePod volume used in Pod that has explicitly set SELinux context. In the future, it may be expanded to other volume AccessModes. In any case, Kubernetes will ensure that the volume is mounted only with a single SELinux context.
When "false", Kubernetes won't pass any special SELinux mount options to the driver. This is typical for volumes that represent subdirectories of a bigger shared filesystem.
Default is "false".
storageCapacity (boolean)
storageCapacity indicates that the CSI volume driver wants pod scheduling to consider the storage capacity that the driver deployment will report by creating CSIStorageCapacity objects with capacity information, if set to true.
The check can be enabled immediately when deploying a driver. In that case, provisioning new volumes with late binding will pause until the driver deployment has published some suitable CSIStorageCapacity object.
Alternatively, the driver can be deployed with the field unset or false and it can be flipped later when storage capacity information has been published.
This field was immutable in Kubernetes <= 1.22 and now is mutable.
tokenRequests ([]TokenRequest)
Atomic: will be replaced during a merge
tokenRequests indicates the CSI driver needs pods' service account tokens it is mounting volume for to do necessary authentication. Kubelet will pass the tokens in VolumeContext in the CSI NodePublishVolume calls. The CSI driver should parse and validate the following VolumeContext: "csi.storage.k8s.io/serviceAccount.tokens": {
"<audience>": {
"token": <token>,
"expirationTimestamp": <expiration timestamp in RFC3339>,
},
...
}
Note: Audience in each TokenRequest should be different and at most one token is empty string. To receive a new token after expiry, RequiresRepublish can be used to trigger NodePublishVolume periodically.
TokenRequest contains parameters of a service account token.
tokenRequests.audience (string), required
audience is the intended audience of the token in "TokenRequestSpec". It will default to the audiences of kube apiserver.
tokenRequests.expirationSeconds (int64)
expirationSeconds is the duration of validity of the token in "TokenRequestSpec". It has the same default value of "ExpirationSeconds" in "TokenRequestSpec".
volumeLifecycleModes ([]string)
Set: unique values will be kept during a merge
volumeLifecycleModes defines what kind of volumes this CSI volume driver supports. The default if the list is empty is "Persistent", which is the usage defined by the CSI specification and implemented in Kubernetes via the usual PV/PVC mechanism.
The other mode is "Ephemeral". In this mode, volumes are defined inline inside the pod spec with CSIVolumeSource and their lifecycle is tied to the lifecycle of that pod. A driver has to be aware of this because it is only going to get a NodePublishVolume call for such a volume.
CSINode holds information about all CSI drivers installed on a node.
apiVersion: storage.k8s.io/v1
import "k8s.io/api/storage/v1"
CSINode
CSINode holds information about all CSI drivers installed on a node. CSI drivers do not need to create the CSINode object directly. As long as they use the node-driver-registrar sidecar container, the kubelet will automatically populate the CSINode object for the CSI driver as part of kubelet plugin registration. CSINode has the same name as a node. If the object is missing, it means either there are no CSI Drivers available on the node, or the Kubelet version is low enough that it doesn't create this object. CSINode has an OwnerReference that points to the corresponding node object.
CSINodeSpec holds information about the specification of all CSI drivers installed on a node
drivers ([]CSINodeDriver), required
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
drivers is a list of information of all CSI Drivers existing on a node. If all drivers in the list are uninstalled, this can become empty.
CSINodeDriver holds information about the specification of one CSI driver installed on a node
drivers.name (string), required
name represents the name of the CSI driver that this object refers to. This MUST be the same name returned by the CSI GetPluginName() call for that driver.
drivers.nodeID (string), required
nodeID of the node from the driver point of view. This field enables Kubernetes to communicate with storage systems that do not share the same nomenclature for nodes. For example, Kubernetes may refer to a given node as "node1", but the storage system may refer to the same node as "nodeA". When Kubernetes issues a command to the storage system to attach a volume to a specific node, it can use this field to refer to the node name using the ID that the storage system will understand, e.g. "nodeA" instead of "node1". This field is required.
drivers.allocatable (VolumeNodeResources)
allocatable represents the volume resources of a node that are available for scheduling. This field is beta.
VolumeNodeResources is a set of resource limits for scheduling of volumes.
drivers.allocatable.count (int32)
count indicates the maximum number of unique volumes managed by the CSI driver that can be used on a node. A volume that is both attached and mounted on a node is considered to be used once, not twice. The same rule applies for a unique volume that is shared among multiple pods on the same node. If this field is not specified, then the supported number of volumes on this node is unbounded.
drivers.topologyKeys ([]string)
Atomic: will be replaced during a merge
topologyKeys is the list of keys supported by the driver. When a driver is initialized on a cluster, it provides a set of topology keys that it understands (e.g. "company.com/zone", "company.com/region"). When a driver is initialized on a node, it provides the same topology keys along with values. Kubelet will expose these topology keys as labels on its own node object. When Kubernetes does topology aware provisioning, it can use this list to determine which labels it should retrieve from the node object and pass back to the driver. It is possible for different nodes to use different topology keys. This can be empty if driver does not support topology.
CSIStorageCapacity stores the result of one CSI GetCapacity call.
apiVersion: storage.k8s.io/v1
import "k8s.io/api/storage/v1"
CSIStorageCapacity
CSIStorageCapacity stores the result of one CSI GetCapacity call. For a given StorageClass, this describes the available capacity in a particular topology segment. This can be used when considering where to instantiate new PersistentVolumes.
For example this can express things like: - StorageClass "standard" has "1234 GiB" available in "topology.kubernetes.io/zone=us-east1" - StorageClass "localssd" has "10 GiB" available in "kubernetes.io/hostname=knode-abc123"
The following three cases all imply that no capacity is available for a certain combination: - no object exists with suitable topology and storage class name - such an object exists, but the capacity is unset - such an object exists, but the capacity is zero
The producer of these objects can decide which approach is more suitable.
They are consumed by the kube-scheduler when a CSI driver opts into capacity-aware scheduling with CSIDriverSpec.StorageCapacity. The scheduler compares the MaximumVolumeSize against the requested size of pending volumes to filter out unsuitable nodes. If MaximumVolumeSize is unset, it falls back to a comparison against the less precise Capacity. If that is also unset, the scheduler assumes that capacity is insufficient and tries some other node.
Standard object's metadata. The name has no particular meaning. It must be a DNS subdomain (dots allowed, 253 characters). To ensure that there are no conflicts with other CSI drivers on the cluster, the recommendation is to use csisc-<uuid>, a generated name, or a reverse-domain name which ends with the unique CSI driver name.
storageClassName represents the name of the StorageClass that the reported capacity applies to. It must meet the same requirements as the name of a StorageClass object (non-empty, DNS subdomain). If that object no longer exists, the CSIStorageCapacity object is obsolete and should be removed by its creator. This field is immutable.
capacity is the value reported by the CSI driver in its GetCapacityResponse for a GetCapacityRequest with topology and parameters that match the previous fields.
The semantic is currently (CSI spec 1.2) defined as: The available capacity, in bytes, of the storage that can be used to provision volumes. If not set, that information is currently unavailable.
maximumVolumeSize is the value reported by the CSI driver in its GetCapacityResponse for a GetCapacityRequest with topology and parameters that match the previous fields.
This is defined since CSI spec 1.4.0 as the largest size that may be used in a CreateVolumeRequest.capacity_range.required_bytes field to create a volume with the same parameters as those in GetCapacityRequest. The corresponding value in the Kubernetes API is ResourceRequirements.Requests in a volume claim.
nodeTopology defines which nodes have access to the storage for which capacity was reported. If not set, the storage is not accessible from any node in the cluster. If empty, the storage is accessible from all nodes. This field is immutable.
CSIStorageCapacityList
CSIStorageCapacityList is a collection of CSIStorageCapacity objects.
selector is a label query over volumes to consider for binding.
resources (VolumeResourceRequirements)
resources represents the minimum resources the volume should have. If RecoverVolumeExpansionFailure feature is enabled users are allowed to specify resource requirements that are lower than previous value but must still be higher than capacity recorded in the status field of the claim. More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#resources
VolumeResourceRequirements describes the storage resource requirements for a volume.
Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
volumeName (string)
volumeName is the binding reference to the PersistentVolume backing this claim.
dataSource field can be used to specify either: * An existing VolumeSnapshot object (snapshot.storage.k8s.io/VolumeSnapshot) * An existing PVC (PersistentVolumeClaim) If the provisioner or an external controller can support the specified data source, it will create a new volume based on the contents of the specified data source. When the AnyVolumeDataSource feature gate is enabled, dataSource contents will be copied to dataSourceRef, and dataSourceRef contents will be copied to dataSource when dataSourceRef.namespace is not specified. If the namespace is specified, then dataSourceRef will not be copied to dataSource.
dataSourceRef (TypedObjectReference)
dataSourceRef specifies the object from which to populate the volume with data, if a non-empty volume is desired. This may be any object from a non-empty API group (non core object) or a PersistentVolumeClaim object. When this field is specified, volume binding will only succeed if the type of the specified object matches some installed volume populator or dynamic provisioner. This field will replace the functionality of the dataSource field and as such if both fields are non-empty, they must have the same value. For backwards compatibility, when namespace isn't specified in dataSourceRef, both fields (dataSource and dataSourceRef) will be set to the same value automatically if one of them is empty and the other is non-empty. When namespace is specified in dataSourceRef, dataSource isn't set to the same value and must be empty. There are three important differences between dataSource and dataSourceRef: * While dataSource only allows two specific types of objects, dataSourceRef
allows any non-core object, as well as PersistentVolumeClaim objects.
While dataSource ignores disallowed values (dropping them), dataSourceRef
preserves all values, and generates an error if a disallowed value is
specified.
While dataSource only allows local objects, dataSourceRef allows objects
in any namespaces.
(Beta) Using this field requires the AnyVolumeDataSource feature gate to be enabled. (Alpha) Using the namespace field of dataSourceRef requires the CrossNamespaceVolumeDataSource feature gate to be enabled.
TypedObjectReference contains enough information to let you locate the typed referenced object
dataSourceRef.kind (string), required
Kind is the type of resource being referenced
dataSourceRef.name (string), required
Name is the name of resource being referenced
dataSourceRef.apiGroup (string)
APIGroup is the group for the resource being referenced. If APIGroup is not specified, the specified Kind must be in the core API group. For any other third-party types, APIGroup is required.
dataSourceRef.namespace (string)
Namespace is the namespace of resource being referenced Note that when a namespace is specified, a gateway.networking.k8s.io/ReferenceGrant object is required in the referent namespace to allow that namespace's owner to accept the reference. See the ReferenceGrant documentation for details. (Alpha) This field requires the CrossNamespaceVolumeDataSource feature gate to be enabled.
volumeAttributesClassName (string)
volumeAttributesClassName may be used to set the VolumeAttributesClass used by this claim. If specified, the CSI driver will create or update the volume with the attributes defined in the corresponding VolumeAttributesClass. This has a different purpose than storageClassName, it can be changed after the claim is created. An empty string value means that no VolumeAttributesClass will be applied to the claim but it's not allowed to reset this field to empty string once it is set. If unspecified and the PersistentVolumeClaim is unbound, the default VolumeAttributesClass will be set by the persistentvolume controller if it exists. If the resource referred to by volumeAttributesClass does not exist, this PersistentVolumeClaim will be set to a Pending state, as reflected by the modifyVolumeStatus field, until such as a resource exists. More info: https://kubernetes.io/docs/concepts/storage/volume-attributes-classes/ (Beta) Using this field requires the VolumeAttributesClass feature gate to be enabled (off by default).
PersistentVolumeClaimStatus
PersistentVolumeClaimStatus is the current status of a persistent volume claim.
allocatedResourceStatuses stores status of resource being resized for the given PVC. Key names follow standard Kubernetes label syntax. Valid values are either:
* Un-prefixed keys:
- storage - the capacity of the volume.
* Custom resources must use implementation-defined prefixed names such as "example.com/my-custom-resource"
Apart from above values - keys that are unprefixed or have kubernetes.io prefix are considered reserved and hence may not be used.
ClaimResourceStatus can be in any of following states:
- ControllerResizeInProgress:
State set when resize controller starts resizing the volume in control-plane.
- ControllerResizeFailed:
State set when resize has failed in resize controller with a terminal error.
- NodeResizePending:
State set when resize controller has finished resizing the volume but further resizing of
volume is needed on the node.
- NodeResizeInProgress:
State set when kubelet starts resizing the volume.
- NodeResizeFailed:
State set when resizing has failed in kubelet with a terminal error. Transient errors don't set
NodeResizeFailed.
For example: if expanding a PVC for more capacity - this field can be one of the following states:
- pvc.status.allocatedResourceStatus['storage'] = "ControllerResizeInProgress"
- pvc.status.allocatedResourceStatus['storage'] = "ControllerResizeFailed"
- pvc.status.allocatedResourceStatus['storage'] = "NodeResizePending"
- pvc.status.allocatedResourceStatus['storage'] = "NodeResizeInProgress"
- pvc.status.allocatedResourceStatus['storage'] = "NodeResizeFailed"
When this field is not set, it means that no resize operation is in progress for the given PVC.
A controller that receives PVC update with previously unknown resourceName or ClaimResourceStatus should ignore the update for the purpose it was designed. For example - a controller that only is responsible for resizing capacity of the volume, should ignore PVC updates that change other valid resources associated with PVC.
This is an alpha field and requires enabling RecoverVolumeExpansionFailure feature.
allocatedResources tracks the resources allocated to a PVC including its capacity. Key names follow standard Kubernetes label syntax. Valid values are either:
* Un-prefixed keys:
- storage - the capacity of the volume.
* Custom resources must use implementation-defined prefixed names such as "example.com/my-custom-resource"
Apart from above values - keys that are unprefixed or have kubernetes.io prefix are considered reserved and hence may not be used.
Capacity reported here may be larger than the actual capacity when a volume expansion operation is requested. For storage quota, the larger value from allocatedResources and PVC.spec.resources is used. If allocatedResources is not set, PVC.spec.resources alone is used for quota calculation. If a volume expansion capacity request is lowered, allocatedResources is only lowered if there are no expansion operations in progress and if the actual volume capacity is equal or lower than the requested capacity.
A controller that receives PVC update with previously unknown resourceName should ignore the update for the purpose it was designed. For example - a controller that only is responsible for resizing capacity of the volume, should ignore PVC updates that change other valid resources associated with PVC.
This is an alpha field and requires enabling RecoverVolumeExpansionFailure feature.
capacity represents the actual resources of the underlying volume.
conditions ([]PersistentVolumeClaimCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
conditions is the current Condition of persistent volume claim. If underlying persistent volume is being resized then the Condition will be set to 'Resizing'.
PersistentVolumeClaimCondition contains details about state of pvc
lastProbeTime is the time we probed the condition.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.lastTransitionTime (Time)
lastTransitionTime is the time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
message is the human-readable message indicating details about last transition.
conditions.reason (string)
reason is a unique, this should be a short, machine understandable string that gives the reason for condition's last transition. If it reports "Resizing" that means the underlying persistent volume is being resized.
currentVolumeAttributesClassName (string)
currentVolumeAttributesClassName is the current name of the VolumeAttributesClass the PVC is using. When unset, there is no VolumeAttributeClass applied to this PersistentVolumeClaim This is a beta field and requires enabling VolumeAttributesClass feature (off by default).
modifyVolumeStatus (ModifyVolumeStatus)
ModifyVolumeStatus represents the status object of ControllerModifyVolume operation. When this is unset, there is no ModifyVolume operation being attempted. This is a beta field and requires enabling VolumeAttributesClass feature (off by default).
ModifyVolumeStatus represents the status object of ControllerModifyVolume operation
modifyVolumeStatus.status (string), required
status is the status of the ControllerModifyVolume operation. It can be in any of following states:
Pending
Pending indicates that the PersistentVolumeClaim cannot be modified due to unmet requirements, such as
the specified VolumeAttributesClass not existing.
InProgress
InProgress indicates that the volume is being modified.
Infeasible
Infeasible indicates that the request has been rejected as invalid by the CSI driver. To
resolve the error, a valid VolumeAttributesClass needs to be specified.
Note: New statuses can be added in the future. Consumers should check for unknown statuses and fail appropriately.
nodeAffinity defines constraints that limit what nodes this volume can be accessed from. This field influences the scheduling of pods that use this volume.
VolumeNodeAffinity defines constraints that limit what nodes this volume can be accessed from.
nodeAffinity.required (NodeSelector)
required specifies hard node constraints that must be met.
A node selector represents the union of the results of one or more label queries over a set of nodes; that is, it represents the OR of the selectors represented by the node selector terms.
Required. A list of node selector terms. The terms are ORed.
A null or empty node selector term matches no objects. The requirements of them are ANDed. The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
A list of node selector requirements by node's fields.
persistentVolumeReclaimPolicy (string)
persistentVolumeReclaimPolicy defines what happens to a persistent volume when released from its claim. Valid options are Retain (default for manually created PersistentVolumes), Delete (default for dynamically provisioned PersistentVolumes), and Recycle (deprecated). Recycle must be supported by the volume plugin underlying this PersistentVolume. More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#reclaiming
storageClassName (string)
storageClassName is the name of StorageClass to which this persistent volume belongs. Empty value means that this volume does not belong to any StorageClass.
volumeAttributesClassName (string)
Name of VolumeAttributesClass to which this persistent volume belongs. Empty value is not allowed. When this field is not set, it indicates that this volume does not belong to any VolumeAttributesClass. This field is mutable and can be changed by the CSI driver after a volume has been updated successfully to a new class. For an unbound PersistentVolume, the volumeAttributesClassName will be matched with unbound PersistentVolumeClaims during the binding process. This is a beta field and requires enabling VolumeAttributesClass feature (off by default).
volumeMode (string)
volumeMode defines if a volume is intended to be used with a formatted filesystem or to remain in raw block state. Value of Filesystem is implied when not included in spec.
Local
hostPath (HostPathVolumeSource)
hostPath represents a directory on the host. Provisioned by a developer or tester. This is useful for single-node development and testing only! On-host storage is not supported in any way and WILL NOT WORK in a multi-node cluster. More info: https://kubernetes.io/docs/concepts/storage/volumes#hostpath
Represents a host path mapped into a pod. Host path volumes do not support ownership management or SELinux relabeling.
local represents directly-attached storage with node affinity
Local represents directly-attached storage with node affinity
local.path (string), required
path of the full path to the volume on the node. It can be either a directory or block device (disk, partition, ...).
local.fsType (string)
fsType is the filesystem type to mount. It applies only when the Path is a block device. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". The default value is to auto-select a filesystem if unspecified.
awsElasticBlockStore represents an AWS Disk resource that is attached to a kubelet's host machine and then exposed to the pod. Deprecated: AWSElasticBlockStore is deprecated. All operations for the in-tree awsElasticBlockStore type are redirected to the ebs.csi.aws.com CSI driver. More info: https://kubernetes.io/docs/concepts/storage/volumes#awselasticblockstore
*Represents a Persistent Disk resource in AWS.
An AWS EBS disk must exist before mounting to a container. The disk must also be in the same AWS zone as the kubelet. An AWS EBS disk can only be mounted as read/write once. AWS EBS volumes support ownership management and SELinux relabeling.*
fsType is the filesystem type of the volume that you want to mount. Tip: Ensure that the filesystem type is supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://kubernetes.io/docs/concepts/storage/volumes#awselasticblockstore
awsElasticBlockStore.partition (int32)
partition is the partition in the volume that you want to mount. If omitted, the default is to mount by volume name. Examples: For volume /dev/sda1, you specify the partition as "1". Similarly, the volume partition for /dev/sda is "0" (or you can leave the property empty).
azureDisk represents an Azure Data Disk mount on the host and bind mount to the pod. Deprecated: AzureDisk is deprecated. All operations for the in-tree azureDisk type are redirected to the disk.csi.azure.com CSI driver.
AzureDisk represents an Azure Data Disk mount on the host and bind mount to the pod.
azureDisk.diskName (string), required
diskName is the Name of the data disk in the blob storage
azureDisk.diskURI (string), required
diskURI is the URI of data disk in the blob storage
azureDisk.cachingMode (string)
cachingMode is the Host Caching mode: None, Read Only, Read Write.
azureDisk.fsType (string)
fsType is Filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
azureDisk.kind (string)
kind expected values are Shared: multiple blob disks per storage account Dedicated: single blob disk per storage account Managed: azure managed data disk (only in managed availability set). defaults to shared
azureDisk.readOnly (boolean)
readOnly Defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
azureFile (AzureFilePersistentVolumeSource)
azureFile represents an Azure File Service mount on the host and bind mount to the pod. Deprecated: AzureFile is deprecated. All operations for the in-tree azureFile type are redirected to the file.csi.azure.com CSI driver.
AzureFile represents an Azure File Service mount on the host and bind mount to the pod.
azureFile.secretName (string), required
secretName is the name of secret that contains Azure Storage Account Name and Key
azureFile.shareName (string), required
shareName is the azure Share Name
azureFile.readOnly (boolean)
readOnly defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
azureFile.secretNamespace (string)
secretNamespace is the namespace of the secret that contains Azure Storage Account Name and Key default is the same as the Pod
cephfs (CephFSPersistentVolumeSource)
cephFS represents a Ceph FS mount on the host that shares a pod's lifetime. Deprecated: CephFS is deprecated and the in-tree cephfs type is no longer supported.
Represents a Ceph Filesystem mount that lasts the lifetime of a pod Cephfs volumes do not support ownership management or SELinux relabeling.
cinder represents a cinder volume attached and mounted on kubelets host machine. Deprecated: Cinder is deprecated. All operations for the in-tree cinder type are redirected to the cinder.csi.openstack.org CSI driver. More info: https://examples.k8s.io/mysql-cinder-pd/README.md
Represents a cinder volume resource in Openstack. A Cinder volume must exist before mounting to a container. The volume must also be in the same region as the kubelet. Cinder volumes support ownership management and SELinux relabeling.
fsType Filesystem type to mount. Must be a filesystem type supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://examples.k8s.io/mysql-cinder-pd/README.md
secretRef is Optional: points to a secret object containing parameters used to connect to OpenStack.
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
cinder.secretRef.name (string)
name is unique within a namespace to reference a secret resource.
cinder.secretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
csi (CSIPersistentVolumeSource)
csi represents storage that is handled by an external CSI driver.
Represents storage that is managed by an external CSI volume driver
csi.driver (string), required
driver is the name of the driver to use for this volume. Required.
csi.volumeHandle (string), required
volumeHandle is the unique volume name returned by the CSI volume plugin’s CreateVolume to refer to the volume on all subsequent calls. Required.
csi.controllerExpandSecretRef (SecretReference)
controllerExpandSecretRef is a reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI ControllerExpandVolume call. This field is optional, and may be empty if no secret is required. If the secret object contains more than one secret, all secrets are passed.
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
csi.controllerExpandSecretRef.name (string)
name is unique within a namespace to reference a secret resource.
csi.controllerExpandSecretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
csi.controllerPublishSecretRef (SecretReference)
controllerPublishSecretRef is a reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI ControllerPublishVolume and ControllerUnpublishVolume calls. This field is optional, and may be empty if no secret is required. If the secret object contains more than one secret, all secrets are passed.
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
csi.controllerPublishSecretRef.name (string)
name is unique within a namespace to reference a secret resource.
csi.controllerPublishSecretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
csi.fsType (string)
fsType to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs".
csi.nodeExpandSecretRef (SecretReference)
nodeExpandSecretRef is a reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI NodeExpandVolume call. This field is optional, may be omitted if no secret is required. If the secret object contains more than one secret, all secrets are passed.
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
csi.nodeExpandSecretRef.name (string)
name is unique within a namespace to reference a secret resource.
csi.nodeExpandSecretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
csi.nodePublishSecretRef (SecretReference)
nodePublishSecretRef is a reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI NodePublishVolume and NodeUnpublishVolume calls. This field is optional, and may be empty if no secret is required. If the secret object contains more than one secret, all secrets are passed.
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
csi.nodePublishSecretRef.name (string)
name is unique within a namespace to reference a secret resource.
csi.nodePublishSecretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
csi.nodeStageSecretRef (SecretReference)
nodeStageSecretRef is a reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI NodeStageVolume and NodeStageVolume and NodeUnstageVolume calls. This field is optional, and may be empty if no secret is required. If the secret object contains more than one secret, all secrets are passed.
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
csi.nodeStageSecretRef.name (string)
name is unique within a namespace to reference a secret resource.
csi.nodeStageSecretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
csi.readOnly (boolean)
readOnly value to pass to ControllerPublishVolumeRequest. Defaults to false (read/write).
csi.volumeAttributes (map[string]string)
volumeAttributes of the volume to publish.
fc (FCVolumeSource)
fc represents a Fibre Channel resource that is attached to a kubelet's host machine and then exposed to the pod.
Represents a Fibre Channel volume. Fibre Channel volumes can only be mounted as read/write once. Fibre Channel volumes support ownership management and SELinux relabeling.
fc.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
fc.lun (int32)
lun is Optional: FC target lun number
fc.readOnly (boolean)
readOnly is Optional: Defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
fc.targetWWNs ([]string)
Atomic: will be replaced during a merge
targetWWNs is Optional: FC target worldwide names (WWNs)
fc.wwids ([]string)
Atomic: will be replaced during a merge
wwids Optional: FC volume world wide identifiers (wwids) Either wwids or combination of targetWWNs and lun must be set, but not both simultaneously.
flexVolume (FlexPersistentVolumeSource)
flexVolume represents a generic volume resource that is provisioned/attached using an exec based plugin. Deprecated: FlexVolume is deprecated. Consider using a CSIDriver instead.
FlexPersistentVolumeSource represents a generic persistent volume resource that is provisioned/attached using an exec based plugin.
flexVolume.driver (string), required
driver is the name of the driver to use for this volume.
flexVolume.fsType (string)
fsType is the Filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". The default filesystem depends on FlexVolume script.
flexVolume.options (map[string]string)
options is Optional: this field holds extra command options if any.
flexVolume.readOnly (boolean)
readOnly is Optional: defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
flexVolume.secretRef (SecretReference)
secretRef is Optional: SecretRef is reference to the secret object containing sensitive information to pass to the plugin scripts. This may be empty if no secret object is specified. If the secret object contains more than one secret, all secrets are passed to the plugin scripts.
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
flexVolume.secretRef.name (string)
name is unique within a namespace to reference a secret resource.
flexVolume.secretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
flocker (FlockerVolumeSource)
flocker represents a Flocker volume attached to a kubelet's host machine and exposed to the pod for its usage. This depends on the Flocker control service being running. Deprecated: Flocker is deprecated and the in-tree flocker type is no longer supported.
Represents a Flocker volume mounted by the Flocker agent. One and only one of datasetName and datasetUUID should be set. Flocker volumes do not support ownership management or SELinux relabeling.
flocker.datasetName (string)
datasetName is Name of the dataset stored as metadata -> name on the dataset for Flocker should be considered as deprecated
flocker.datasetUUID (string)
datasetUUID is the UUID of the dataset. This is unique identifier of a Flocker dataset
gcePersistentDisk (GCEPersistentDiskVolumeSource)
gcePersistentDisk represents a GCE Disk resource that is attached to a kubelet's host machine and then exposed to the pod. Provisioned by an admin. Deprecated: GCEPersistentDisk is deprecated. All operations for the in-tree gcePersistentDisk type are redirected to the pd.csi.storage.gke.io CSI driver. More info: https://kubernetes.io/docs/concepts/storage/volumes#gcepersistentdisk
*Represents a Persistent Disk resource in Google Compute Engine.
A GCE PD must exist before mounting to a container. The disk must also be in the same GCE project and zone as the kubelet. A GCE PD can only be mounted as read/write once or read-only many times. GCE PDs support ownership management and SELinux relabeling.*
fsType is filesystem type of the volume that you want to mount. Tip: Ensure that the filesystem type is supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://kubernetes.io/docs/concepts/storage/volumes#gcepersistentdisk
gcePersistentDisk.partition (int32)
partition is the partition in the volume that you want to mount. If omitted, the default is to mount by volume name. Examples: For volume /dev/sda1, you specify the partition as "1". Similarly, the volume partition for /dev/sda is "0" (or you can leave the property empty). More info: https://kubernetes.io/docs/concepts/storage/volumes#gcepersistentdisk
glusterfs represents a Glusterfs volume that is attached to a host and exposed to the pod. Provisioned by an admin. Deprecated: Glusterfs is deprecated and the in-tree glusterfs type is no longer supported. More info: https://examples.k8s.io/volumes/glusterfs/README.md
Represents a Glusterfs mount that lasts the lifetime of a pod. Glusterfs volumes do not support ownership management or SELinux relabeling.
iscsi represents an ISCSI Disk resource that is attached to a kubelet's host machine and then exposed to the pod. Provisioned by an admin.
ISCSIPersistentVolumeSource represents an ISCSI disk. ISCSI volumes can only be mounted as read/write once. ISCSI volumes support ownership management and SELinux relabeling.
iscsi.iqn (string), required
iqn is Target iSCSI Qualified Name.
iscsi.lun (int32), required
lun is iSCSI Target Lun number.
iscsi.targetPortal (string), required
targetPortal is iSCSI Target Portal. The Portal is either an IP or ip_addr:port if the port is other than default (typically TCP ports 860 and 3260).
iscsi.chapAuthDiscovery (boolean)
chapAuthDiscovery defines whether support iSCSI Discovery CHAP authentication
iscsi.chapAuthSession (boolean)
chapAuthSession defines whether support iSCSI Session CHAP authentication
iscsi.fsType (string)
fsType is the filesystem type of the volume that you want to mount. Tip: Ensure that the filesystem type is supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://kubernetes.io/docs/concepts/storage/volumes#iscsi
iscsi.initiatorName (string)
initiatorName is the custom iSCSI Initiator Name. If initiatorName is specified with iscsiInterface simultaneously, new iSCSI interface <target portal>:<volume name> will be created for the connection.
iscsi.iscsiInterface (string)
iscsiInterface is the interface Name that uses an iSCSI transport. Defaults to 'default' (tcp).
iscsi.portals ([]string)
Atomic: will be replaced during a merge
portals is the iSCSI Target Portal List. The Portal is either an IP or ip_addr:port if the port is other than default (typically TCP ports 860 and 3260).
iscsi.readOnly (boolean)
readOnly here will force the ReadOnly setting in VolumeMounts. Defaults to false.
iscsi.secretRef (SecretReference)
secretRef is the CHAP Secret for iSCSI target and initiator authentication
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
iscsi.secretRef.name (string)
name is unique within a namespace to reference a secret resource.
iscsi.secretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
photonPersistentDisk represents a PhotonController persistent disk attached and mounted on kubelets host machine. Deprecated: PhotonPersistentDisk is deprecated and the in-tree photonPersistentDisk type is no longer supported.
Represents a Photon Controller persistent disk resource.
photonPersistentDisk.pdID (string), required
pdID is the ID that identifies Photon Controller persistent disk
photonPersistentDisk.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
portworxVolume (PortworxVolumeSource)
portworxVolume represents a portworx volume attached and mounted on kubelets host machine. Deprecated: PortworxVolume is deprecated. All operations for the in-tree portworxVolume type are redirected to the pxd.portworx.com CSI driver when the CSIMigrationPortworx feature-gate is on.
PortworxVolumeSource represents a Portworx volume resource.
portworxVolume.volumeID (string), required
volumeID uniquely identifies a Portworx volume
portworxVolume.fsType (string)
fSType represents the filesystem type to mount Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs". Implicitly inferred to be "ext4" if unspecified.
portworxVolume.readOnly (boolean)
readOnly defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
quobyte (QuobyteVolumeSource)
quobyte represents a Quobyte mount on the host that shares a pod's lifetime. Deprecated: Quobyte is deprecated and the in-tree quobyte type is no longer supported.
Represents a Quobyte mount that lasts the lifetime of a pod. Quobyte volumes do not support ownership management or SELinux relabeling.
quobyte.registry (string), required
registry represents a single or multiple Quobyte Registry services specified as a string as host:port pair (multiple entries are separated with commas) which acts as the central registry for volumes
quobyte.volume (string), required
volume is a string that references an already created Quobyte volume by name.
quobyte.group (string)
group to map volume access to Default is no group
quobyte.readOnly (boolean)
readOnly here will force the Quobyte volume to be mounted with read-only permissions. Defaults to false.
quobyte.tenant (string)
tenant owning the given Quobyte volume in the Backend Used with dynamically provisioned Quobyte volumes, value is set by the plugin
quobyte.user (string)
user to map volume access to Defaults to serivceaccount user
rbd (RBDPersistentVolumeSource)
rbd represents a Rados Block Device mount on the host that shares a pod's lifetime. Deprecated: RBD is deprecated and the in-tree rbd type is no longer supported. More info: https://examples.k8s.io/volumes/rbd/README.md
Represents a Rados Block Device mount that lasts the lifetime of a pod. RBD volumes support ownership management and SELinux relabeling.
fsType is the filesystem type of the volume that you want to mount. Tip: Ensure that the filesystem type is supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://kubernetes.io/docs/concepts/storage/volumes#rbd
scaleIO represents a ScaleIO persistent volume attached and mounted on Kubernetes nodes. Deprecated: ScaleIO is deprecated and the in-tree scaleIO type is no longer supported.
ScaleIOPersistentVolumeSource represents a persistent ScaleIO volume
scaleIO.gateway (string), required
gateway is the host address of the ScaleIO API Gateway.
scaleIO.secretRef (SecretReference), required
secretRef references to the secret for ScaleIO user and other sensitive information. If this is not provided, Login operation will fail.
SecretReference represents a Secret Reference. It has enough information to retrieve secret in any namespace
scaleIO.secretRef.name (string)
name is unique within a namespace to reference a secret resource.
scaleIO.secretRef.namespace (string)
namespace defines the space within which the secret name must be unique.
scaleIO.system (string), required
system is the name of the storage system as configured in ScaleIO.
scaleIO.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Default is "xfs"
scaleIO.protectionDomain (string)
protectionDomain is the name of the ScaleIO Protection Domain for the configured storage.
scaleIO.readOnly (boolean)
readOnly defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
scaleIO.sslEnabled (boolean)
sslEnabled is the flag to enable/disable SSL communication with Gateway, default false
scaleIO.storageMode (string)
storageMode indicates whether the storage for a volume should be ThickProvisioned or ThinProvisioned. Default is ThinProvisioned.
scaleIO.storagePool (string)
storagePool is the ScaleIO Storage Pool associated with the protection domain.
scaleIO.volumeName (string)
volumeName is the name of a volume already created in the ScaleIO system that is associated with this volume source.
storageos (StorageOSPersistentVolumeSource)
storageOS represents a StorageOS volume that is attached to the kubelet's host machine and mounted into the pod. Deprecated: StorageOS is deprecated and the in-tree storageos type is no longer supported. More info: https://examples.k8s.io/volumes/storageos/README.md
Represents a StorageOS persistent volume resource.
storageos.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
storageos.readOnly (boolean)
readOnly defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
secretRef specifies the secret to use for obtaining the StorageOS API credentials. If not specified, default values will be attempted.
storageos.volumeName (string)
volumeName is the human-readable name of the StorageOS volume. Volume names are only unique within a namespace.
storageos.volumeNamespace (string)
volumeNamespace specifies the scope of the volume within StorageOS. If no namespace is specified then the Pod's namespace will be used. This allows the Kubernetes name scoping to be mirrored within StorageOS for tighter integration. Set VolumeName to any name to override the default behaviour. Set to "default" if you are not using namespaces within StorageOS. Namespaces that do not pre-exist within StorageOS will be created.
vsphereVolume (VsphereVirtualDiskVolumeSource)
vsphereVolume represents a vSphere volume attached and mounted on kubelets host machine. Deprecated: VsphereVolume is deprecated. All operations for the in-tree vsphereVolume type are redirected to the csi.vsphere.vmware.com CSI driver.
Represents a vSphere volume resource.
vsphereVolume.volumePath (string), required
volumePath is the path that identifies vSphere volume vmdk
vsphereVolume.fsType (string)
fsType is filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
vsphereVolume.storagePolicyID (string)
storagePolicyID is the storage Policy Based Management (SPBM) profile ID associated with the StoragePolicyName.
vsphereVolume.storagePolicyName (string)
storagePolicyName is the storage Policy Based Management (SPBM) profile name.
PersistentVolumeStatus
PersistentVolumeStatus is the current status of a persistent volume.
lastPhaseTransitionTime (Time)
lastPhaseTransitionTime is the time the phase transitioned from one to another and automatically resets to current time everytime a volume phase transitions.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
message (string)
message is a human-readable message indicating details about why the volume is in this state.
provisioner indicates the type of the provisioner.
allowVolumeExpansion (boolean)
allowVolumeExpansion shows whether the storage class allow volume expand.
allowedTopologies ([]TopologySelectorTerm)
Atomic: will be replaced during a merge
allowedTopologies restrict the node topologies where volumes can be dynamically provisioned. Each volume plugin defines its own supported topology specifications. An empty TopologySelectorTerm list means there is no topology restriction. This field is only honored by servers that enable the VolumeScheduling feature.
A topology selector term represents the result of label queries. A null or empty topology selector term matches no objects. The requirements of them are ANDed. It provides a subset of functionality as NodeSelectorTerm. This is an alpha feature and may change in the future.
An array of string values. One value must match the label to be selected. Each entry in Values is ORed.
mountOptions ([]string)
Atomic: will be replaced during a merge
mountOptions controls the mountOptions for dynamically provisioned PersistentVolumes of this storage class. e.g. ["ro", "soft"]. Not validated - mount of the PVs will simply fail if one is invalid.
parameters (map[string]string)
parameters holds the parameters for the provisioner that should create volumes of this storage class.
reclaimPolicy (string)
reclaimPolicy controls the reclaimPolicy for dynamically provisioned PersistentVolumes of this storage class. Defaults to Delete.
volumeBindingMode (string)
volumeBindingMode indicates how PersistentVolumeClaims should be provisioned and bound. When unset, VolumeBindingImmediate is used. This field is only honored by servers that enable the VolumeScheduling feature.
StorageClassList
StorageClassList is a collection of storage classes.
The token used in the list options to get the next chunk of objects to migrate. When the .status.conditions indicates the migration is "Running", users can use this token to check the progress of the migration.
resource (GroupVersionResource), required
The resource that is being migrated. The migrator sends requests to the endpoint serving the resource. Immutable.
The names of the group, the version, and the resource.
resource.group (string)
The name of the group.
resource.resource (string)
The name of the resource.
resource.version (string)
The name of the version.
StorageVersionMigrationStatus
Status of the storage version migration.
conditions ([]MigrationCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
The latest available observations of the migration's current state.
Describes the state of a migration at a certain point.
conditions.status (string), required
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of the condition.
conditions.lastUpdateTime (Time)
The last time this condition was updated.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
A human readable message indicating details about the transition.
conditions.reason (string)
The reason for the condition's last transition.
resourceVersion (string)
ResourceVersion to compare with the GC cache for performing the migration. This is the current resource version of given group, version and resource when kube-controller-manager first observes this StorageVersionMigration resource.
StorageVersionMigrationList
StorageVersionMigrationList is a collection of storage version migrations.
PersistentVolumeClaimVolumeSource references the user's PVC in the same namespace. This volume finds the bound PV and mounts that volume for the pod. A PersistentVolumeClaimVolumeSource is, essentially, a wrapper around another type of volume that is owned by someone else (the system).
readOnly Will force the ReadOnly setting in VolumeMounts. Default false.
Projections
configMap (ConfigMapVolumeSource)
configMap represents a configMap that should populate this volume
*Adapts a ConfigMap into a volume.
The contents of the target ConfigMap's Data field will be presented in a volume as files using the keys in the Data field as the file names, unless the items element is populated with specific mappings of keys to paths. ConfigMap volumes support ownership management and SELinux relabeling.*
optional specify whether the ConfigMap or its keys must be defined
configMap.defaultMode (int32)
defaultMode is optional: mode bits used to set permissions on created files by default. Must be an octal value between 0000 and 0777 or a decimal value between 0 and 511. YAML accepts both octal and decimal values, JSON requires decimal values for mode bits. Defaults to 0644. Directories within the path are not affected by this setting. This might be in conflict with other options that affect the file mode, like fsGroup, and the result can be other mode bits set.
items if unspecified, each key-value pair in the Data field of the referenced ConfigMap will be projected into the volume as a file whose name is the key and content is the value. If specified, the listed keys will be projected into the specified paths, and unlisted keys will not be present. If a key is specified which is not present in the ConfigMap, the volume setup will error unless it is marked optional. Paths must be relative and may not contain the '..' path or start with '..'.
The contents of the target Secret's Data field will be presented in a volume as files using the keys in the Data field as the file names. Secret volumes support ownership management and SELinux relabeling.*
optional field specify whether the Secret or its keys must be defined
secret.defaultMode (int32)
defaultMode is Optional: mode bits used to set permissions on created files by default. Must be an octal value between 0000 and 0777 or a decimal value between 0 and 511. YAML accepts both octal and decimal values, JSON requires decimal values for mode bits. Defaults to 0644. Directories within the path are not affected by this setting. This might be in conflict with other options that affect the file mode, like fsGroup, and the result can be other mode bits set.
items If unspecified, each key-value pair in the Data field of the referenced Secret will be projected into the volume as a file whose name is the key and content is the value. If specified, the listed keys will be projected into the specified paths, and unlisted keys will not be present. If a key is specified which is not present in the Secret, the volume setup will error unless it is marked optional. Paths must be relative and may not contain the '..' path or start with '..'.
downwardAPI (DownwardAPIVolumeSource)
downwardAPI represents downward API about the pod that should populate this volume
DownwardAPIVolumeSource represents a volume containing downward API info. Downward API volumes support ownership management and SELinux relabeling.
downwardAPI.defaultMode (int32)
Optional: mode bits to use on created files by default. Must be a Optional: mode bits used to set permissions on created files by default. Must be an octal value between 0000 and 0777 or a decimal value between 0 and 511. YAML accepts both octal and decimal values, JSON requires decimal values for mode bits. Defaults to 0644. Directories within the path are not affected by this setting. This might be in conflict with other options that affect the file mode, like fsGroup, and the result can be other mode bits set.
projected items for all in one resources secrets, configmaps, and downward API
Represents a projected volume source
projected.defaultMode (int32)
defaultMode are the mode bits used to set permissions on created files by default. Must be an octal value between 0000 and 0777 or a decimal value between 0 and 511. YAML accepts both octal and decimal values, JSON requires decimal values for mode bits. Directories within the path are not affected by this setting. This might be in conflict with other options that affect the file mode, like fsGroup, and the result can be other mode bits set.
projected.sources ([]VolumeProjection)
Atomic: will be replaced during a merge
sources is the list of volume projections. Each entry in this list handles one source.
Projection that may be projected along with other supported volume types. Exactly one of these fields must be set.
ClusterTrustBundle allows a pod to access the .spec.trustBundle field of ClusterTrustBundle objects in an auto-updating file.
Alpha, gated by the ClusterTrustBundleProjection feature gate.
ClusterTrustBundle objects can either be selected by name, or by the combination of signer name and a label selector.
Kubelet performs aggressive normalization of the PEM contents written into the pod filesystem. Esoteric PEM features such as inter-block comments and block headers are stripped. Certificates are deduplicated. The ordering of certificates within the file is arbitrary, and Kubelet may change the order over time.
ClusterTrustBundleProjection describes how to select a set of ClusterTrustBundle objects and project their contents into the pod filesystem.
Select all ClusterTrustBundles that match this label selector. Only has effect if signerName is set. Mutually-exclusive with name. If unset, interpreted as "match nothing". If set but empty, interpreted as "match everything".
If true, don't block pod startup if the referenced ClusterTrustBundle(s) aren't available. If using name, then the named ClusterTrustBundle is allowed not to exist. If using signerName, then the combination of signerName and labelSelector is allowed to match zero ClusterTrustBundles.
Select all ClusterTrustBundles that match this signer name. Mutually-exclusive with name. The contents of all selected ClusterTrustBundles will be unified and deduplicated.
projected.sources.configMap (ConfigMapProjection)
configMap information about the configMap data to project
*Adapts a ConfigMap into a projected volume.
The contents of the target ConfigMap's Data field will be presented in a projected volume as files using the keys in the Data field as the file names, unless the items element is populated with specific mappings of keys to paths. Note that this is identical to a configmap volume source without the default mode.*
items if unspecified, each key-value pair in the Data field of the referenced ConfigMap will be projected into the volume as a file whose name is the key and content is the value. If specified, the listed keys will be projected into the specified paths, and unlisted keys will not be present. If a key is specified which is not present in the ConfigMap, the volume setup will error unless it is marked optional. Paths must be relative and may not contain the '..' path or start with '..'.
downwardAPI information about the downwardAPI data to project
Represents downward API info for projecting into a projected volume. Note that this is identical to a downwardAPI volume source without the default mode.
secret information about the secret data to project
*Adapts a secret into a projected volume.
The contents of the target Secret's Data field will be presented in a projected volume as files using the keys in the Data field as the file names. Note that this is identical to a secret volume source without the default mode.*
items if unspecified, each key-value pair in the Data field of the referenced Secret will be projected into the volume as a file whose name is the key and content is the value. If specified, the listed keys will be projected into the specified paths, and unlisted keys will not be present. If a key is specified which is not present in the Secret, the volume setup will error unless it is marked optional. Paths must be relative and may not contain the '..' path or start with '..'.
serviceAccountToken is information about the serviceAccountToken data to project
ServiceAccountTokenProjection represents a projected service account token volume. This projection can be used to insert a service account token into the pods runtime filesystem for use against APIs (Kubernetes API Server or otherwise).
audience is the intended audience of the token. A recipient of a token must identify itself with an identifier specified in the audience of the token, and otherwise should reject the token. The audience defaults to the identifier of the apiserver.
expirationSeconds is the requested duration of validity of the service account token. As the token approaches expiration, the kubelet volume plugin will proactively rotate the service account token. The kubelet will start trying to rotate the token if the token is older than 80 percent of its time to live or if the token is older than 24 hours.Defaults to 1 hour and must be at least 10 minutes.
Represents an empty directory for a pod. Empty directory volumes support ownership management and SELinux relabeling.
emptyDir.medium (string)
medium represents what type of storage medium should back this directory. The default is "" which means to use the node's default medium. Must be an empty string (default) or Memory. More info: https://kubernetes.io/docs/concepts/storage/volumes#emptydir
sizeLimit is the total amount of local storage required for this EmptyDir volume. The size limit is also applicable for memory medium. The maximum usage on memory medium EmptyDir would be the minimum value between the SizeLimit specified here and the sum of memory limits of all containers in a pod. The default is nil which means that the limit is undefined. More info: https://kubernetes.io/docs/concepts/storage/volumes#emptydir
hostPath (HostPathVolumeSource)
hostPath represents a pre-existing file or directory on the host machine that is directly exposed to the container. This is generally used for system agents or other privileged things that are allowed to see the host machine. Most containers will NOT need this. More info: https://kubernetes.io/docs/concepts/storage/volumes#hostpath
Represents a host path mapped into a pod. Host path volumes do not support ownership management or SELinux relabeling.
awsElasticBlockStore represents an AWS Disk resource that is attached to a kubelet's host machine and then exposed to the pod. Deprecated: AWSElasticBlockStore is deprecated. All operations for the in-tree awsElasticBlockStore type are redirected to the ebs.csi.aws.com CSI driver. More info: https://kubernetes.io/docs/concepts/storage/volumes#awselasticblockstore
*Represents a Persistent Disk resource in AWS.
An AWS EBS disk must exist before mounting to a container. The disk must also be in the same AWS zone as the kubelet. An AWS EBS disk can only be mounted as read/write once. AWS EBS volumes support ownership management and SELinux relabeling.*
fsType is the filesystem type of the volume that you want to mount. Tip: Ensure that the filesystem type is supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://kubernetes.io/docs/concepts/storage/volumes#awselasticblockstore
awsElasticBlockStore.partition (int32)
partition is the partition in the volume that you want to mount. If omitted, the default is to mount by volume name. Examples: For volume /dev/sda1, you specify the partition as "1". Similarly, the volume partition for /dev/sda is "0" (or you can leave the property empty).
azureDisk represents an Azure Data Disk mount on the host and bind mount to the pod. Deprecated: AzureDisk is deprecated. All operations for the in-tree azureDisk type are redirected to the disk.csi.azure.com CSI driver.
AzureDisk represents an Azure Data Disk mount on the host and bind mount to the pod.
azureDisk.diskName (string), required
diskName is the Name of the data disk in the blob storage
azureDisk.diskURI (string), required
diskURI is the URI of data disk in the blob storage
azureDisk.cachingMode (string)
cachingMode is the Host Caching mode: None, Read Only, Read Write.
azureDisk.fsType (string)
fsType is Filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
azureDisk.kind (string)
kind expected values are Shared: multiple blob disks per storage account Dedicated: single blob disk per storage account Managed: azure managed data disk (only in managed availability set). defaults to shared
azureDisk.readOnly (boolean)
readOnly Defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
azureFile (AzureFileVolumeSource)
azureFile represents an Azure File Service mount on the host and bind mount to the pod. Deprecated: AzureFile is deprecated. All operations for the in-tree azureFile type are redirected to the file.csi.azure.com CSI driver.
AzureFile represents an Azure File Service mount on the host and bind mount to the pod.
azureFile.secretName (string), required
secretName is the name of secret that contains Azure Storage Account Name and Key
azureFile.shareName (string), required
shareName is the azure share Name
azureFile.readOnly (boolean)
readOnly defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
cephfs (CephFSVolumeSource)
cephFS represents a Ceph FS mount on the host that shares a pod's lifetime. Deprecated: CephFS is deprecated and the in-tree cephfs type is no longer supported.
Represents a Ceph Filesystem mount that lasts the lifetime of a pod Cephfs volumes do not support ownership management or SELinux relabeling.
cinder represents a cinder volume attached and mounted on kubelets host machine. Deprecated: Cinder is deprecated. All operations for the in-tree cinder type are redirected to the cinder.csi.openstack.org CSI driver. More info: https://examples.k8s.io/mysql-cinder-pd/README.md
Represents a cinder volume resource in Openstack. A Cinder volume must exist before mounting to a container. The volume must also be in the same region as the kubelet. Cinder volumes support ownership management and SELinux relabeling.
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://examples.k8s.io/mysql-cinder-pd/README.md
secretRef is optional: points to a secret object containing parameters used to connect to OpenStack.
csi (CSIVolumeSource)
csi (Container Storage Interface) represents ephemeral storage that is handled by certain external CSI drivers.
Represents a source location of a volume to mount, managed by an external CSI driver
csi.driver (string), required
driver is the name of the CSI driver that handles this volume. Consult with your admin for the correct name as registered in the cluster.
csi.fsType (string)
fsType to mount. Ex. "ext4", "xfs", "ntfs". If not provided, the empty value is passed to the associated CSI driver which will determine the default filesystem to apply.
nodePublishSecretRef is a reference to the secret object containing sensitive information to pass to the CSI driver to complete the CSI NodePublishVolume and NodeUnpublishVolume calls. This field is optional, and may be empty if no secret is required. If the secret object contains more than one secret, all secret references are passed.
csi.readOnly (boolean)
readOnly specifies a read-only configuration for the volume. Defaults to false (read/write).
csi.volumeAttributes (map[string]string)
volumeAttributes stores driver-specific properties that are passed to the CSI driver. Consult your driver's documentation for supported values.
ephemeral (EphemeralVolumeSource)
ephemeral represents a volume that is handled by a cluster storage driver. The volume's lifecycle is tied to the pod that defines it - it will be created before the pod starts, and deleted when the pod is removed.
Use this if: a) the volume is only needed while the pod runs, b) features of normal volumes like restoring from snapshot or capacity
tracking are needed,
c) the storage driver is specified through a storage class, and d) the storage driver supports dynamic volume provisioning through
a PersistentVolumeClaim (see EphemeralVolumeSource for more
information on the connection between this volume type
and PersistentVolumeClaim).
Use PersistentVolumeClaim or one of the vendor-specific APIs for volumes that persist for longer than the lifecycle of an individual pod.
Use CSI for light-weight local ephemeral volumes if the CSI driver is meant to be used that way - see the documentation of the driver for more information.
A pod can use both types of ephemeral volumes and persistent volumes at the same time.
Represents an ephemeral volume that is handled by a normal storage driver.
Will be used to create a stand-alone PVC to provision the volume. The pod in which this EphemeralVolumeSource is embedded will be the owner of the PVC, i.e. the PVC will be deleted together with the pod. The name of the PVC will be \<pod name>-\<volume name> where \<volume name> is the name from the PodSpec.Volumes array entry. Pod validation will reject the pod if the concatenated name is not valid for a PVC (for example, too long).
An existing PVC with that name that is not owned by the pod will not be used for the pod to avoid using an unrelated volume by mistake. Starting the pod is then blocked until the unrelated PVC is removed. If such a pre-created PVC is meant to be used by the pod, the PVC has to updated with an owner reference to the pod once the pod exists. Normally this should not be necessary, but it may be useful when manually reconstructing a broken cluster.
This field is read-only and no changes will be made by Kubernetes to the PVC after it has been created.
Required, must not be nil.
PersistentVolumeClaimTemplate is used to produce PersistentVolumeClaim objects as part of an EphemeralVolumeSource.
The specification for the PersistentVolumeClaim. The entire content is copied unchanged into the PVC that gets created from this template. The same fields as in a PersistentVolumeClaim are also valid here.
May contain labels and annotations that will be copied into the PVC when creating it. No other fields are allowed and will be rejected during validation.
fc (FCVolumeSource)
fc represents a Fibre Channel resource that is attached to a kubelet's host machine and then exposed to the pod.
Represents a Fibre Channel volume. Fibre Channel volumes can only be mounted as read/write once. Fibre Channel volumes support ownership management and SELinux relabeling.
fc.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
fc.lun (int32)
lun is Optional: FC target lun number
fc.readOnly (boolean)
readOnly is Optional: Defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
fc.targetWWNs ([]string)
Atomic: will be replaced during a merge
targetWWNs is Optional: FC target worldwide names (WWNs)
fc.wwids ([]string)
Atomic: will be replaced during a merge
wwids Optional: FC volume world wide identifiers (wwids) Either wwids or combination of targetWWNs and lun must be set, but not both simultaneously.
flexVolume (FlexVolumeSource)
flexVolume represents a generic volume resource that is provisioned/attached using an exec based plugin. Deprecated: FlexVolume is deprecated. Consider using a CSIDriver instead.
FlexVolume represents a generic volume resource that is provisioned/attached using an exec based plugin.
flexVolume.driver (string), required
driver is the name of the driver to use for this volume.
flexVolume.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". The default filesystem depends on FlexVolume script.
flexVolume.options (map[string]string)
options is Optional: this field holds extra command options if any.
flexVolume.readOnly (boolean)
readOnly is Optional: defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
secretRef is Optional: secretRef is reference to the secret object containing sensitive information to pass to the plugin scripts. This may be empty if no secret object is specified. If the secret object contains more than one secret, all secrets are passed to the plugin scripts.
flocker (FlockerVolumeSource)
flocker represents a Flocker volume attached to a kubelet's host machine. This depends on the Flocker control service being running. Deprecated: Flocker is deprecated and the in-tree flocker type is no longer supported.
Represents a Flocker volume mounted by the Flocker agent. One and only one of datasetName and datasetUUID should be set. Flocker volumes do not support ownership management or SELinux relabeling.
flocker.datasetName (string)
datasetName is Name of the dataset stored as metadata -> name on the dataset for Flocker should be considered as deprecated
flocker.datasetUUID (string)
datasetUUID is the UUID of the dataset. This is unique identifier of a Flocker dataset
gcePersistentDisk (GCEPersistentDiskVolumeSource)
gcePersistentDisk represents a GCE Disk resource that is attached to a kubelet's host machine and then exposed to the pod. Deprecated: GCEPersistentDisk is deprecated. All operations for the in-tree gcePersistentDisk type are redirected to the pd.csi.storage.gke.io CSI driver. More info: https://kubernetes.io/docs/concepts/storage/volumes#gcepersistentdisk
*Represents a Persistent Disk resource in Google Compute Engine.
A GCE PD must exist before mounting to a container. The disk must also be in the same GCE project and zone as the kubelet. A GCE PD can only be mounted as read/write once or read-only many times. GCE PDs support ownership management and SELinux relabeling.*
fsType is filesystem type of the volume that you want to mount. Tip: Ensure that the filesystem type is supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://kubernetes.io/docs/concepts/storage/volumes#gcepersistentdisk
gcePersistentDisk.partition (int32)
partition is the partition in the volume that you want to mount. If omitted, the default is to mount by volume name. Examples: For volume /dev/sda1, you specify the partition as "1". Similarly, the volume partition for /dev/sda is "0" (or you can leave the property empty). More info: https://kubernetes.io/docs/concepts/storage/volumes#gcepersistentdisk
glusterfs represents a Glusterfs mount on the host that shares a pod's lifetime. Deprecated: Glusterfs is deprecated and the in-tree glusterfs type is no longer supported. More info: https://examples.k8s.io/volumes/glusterfs/README.md
Represents a Glusterfs mount that lasts the lifetime of a pod. Glusterfs volumes do not support ownership management or SELinux relabeling.
Represents an ISCSI disk. ISCSI volumes can only be mounted as read/write once. ISCSI volumes support ownership management and SELinux relabeling.
iscsi.iqn (string), required
iqn is the target iSCSI Qualified Name.
iscsi.lun (int32), required
lun represents iSCSI Target Lun number.
iscsi.targetPortal (string), required
targetPortal is iSCSI Target Portal. The Portal is either an IP or ip_addr:port if the port is other than default (typically TCP ports 860 and 3260).
iscsi.chapAuthDiscovery (boolean)
chapAuthDiscovery defines whether support iSCSI Discovery CHAP authentication
iscsi.chapAuthSession (boolean)
chapAuthSession defines whether support iSCSI Session CHAP authentication
iscsi.fsType (string)
fsType is the filesystem type of the volume that you want to mount. Tip: Ensure that the filesystem type is supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://kubernetes.io/docs/concepts/storage/volumes#iscsi
iscsi.initiatorName (string)
initiatorName is the custom iSCSI Initiator Name. If initiatorName is specified with iscsiInterface simultaneously, new iSCSI interface <target portal>:<volume name> will be created for the connection.
iscsi.iscsiInterface (string)
iscsiInterface is the interface Name that uses an iSCSI transport. Defaults to 'default' (tcp).
iscsi.portals ([]string)
Atomic: will be replaced during a merge
portals is the iSCSI Target Portal List. The portal is either an IP or ip_addr:port if the port is other than default (typically TCP ports 860 and 3260).
iscsi.readOnly (boolean)
readOnly here will force the ReadOnly setting in VolumeMounts. Defaults to false.
secretRef is the CHAP Secret for iSCSI target and initiator authentication
image (ImageVolumeSource)
image represents an OCI object (a container image or artifact) pulled and mounted on the kubelet's host machine. The volume is resolved at pod startup depending on which PullPolicy value is provided:
Always: the kubelet always attempts to pull the reference. Container creation will fail If the pull fails. - Never: the kubelet never pulls the reference and only uses a local image or artifact. Container creation will fail if the reference isn't present. - IfNotPresent: the kubelet pulls if the reference isn't already present on disk. Container creation will fail if the reference isn't present and the pull fails.
The volume gets re-resolved if the pod gets deleted and recreated, which means that new remote content will become available on pod recreation. A failure to resolve or pull the image during pod startup will block containers from starting and may add significant latency. Failures will be retried using normal volume backoff and will be reported on the pod reason and message. The types of objects that may be mounted by this volume are defined by the container runtime implementation on a host machine and at minimum must include all valid types supported by the container image field. The OCI object gets mounted in a single directory (spec.containers[].volumeMounts.mountPath) by merging the manifest layers in the same way as for container images. The volume will be mounted read-only (ro) and non-executable files (noexec). Sub path mounts for containers are not supported (spec.containers[].volumeMounts.subpath) before 1.33. The field spec.securityContext.fsGroupChangePolicy has no effect on this volume type.
ImageVolumeSource represents a image volume resource.
image.pullPolicy (string)
Policy for pulling OCI objects. Possible values are: Always: the kubelet always attempts to pull the reference. Container creation will fail If the pull fails. Never: the kubelet never pulls the reference and only uses a local image or artifact. Container creation will fail if the reference isn't present. IfNotPresent: the kubelet pulls if the reference isn't already present on disk. Container creation will fail if the reference isn't present and the pull fails. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise.
image.reference (string)
Required: Image or artifact reference to be used. Behaves in the same way as pod.spec.containers[*].image. Pull secrets will be assembled in the same way as for the container image by looking up node credentials, SA image pull secrets, and pod spec image pull secrets. More info: https://kubernetes.io/docs/concepts/containers/images This field is optional to allow higher level config management to default or override container images in workload controllers like Deployments and StatefulSets.
photonPersistentDisk represents a PhotonController persistent disk attached and mounted on kubelets host machine. Deprecated: PhotonPersistentDisk is deprecated and the in-tree photonPersistentDisk type is no longer supported.
Represents a Photon Controller persistent disk resource.
photonPersistentDisk.pdID (string), required
pdID is the ID that identifies Photon Controller persistent disk
photonPersistentDisk.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
portworxVolume (PortworxVolumeSource)
portworxVolume represents a portworx volume attached and mounted on kubelets host machine. Deprecated: PortworxVolume is deprecated. All operations for the in-tree portworxVolume type are redirected to the pxd.portworx.com CSI driver when the CSIMigrationPortworx feature-gate is on.
PortworxVolumeSource represents a Portworx volume resource.
portworxVolume.volumeID (string), required
volumeID uniquely identifies a Portworx volume
portworxVolume.fsType (string)
fSType represents the filesystem type to mount Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs". Implicitly inferred to be "ext4" if unspecified.
portworxVolume.readOnly (boolean)
readOnly defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
quobyte (QuobyteVolumeSource)
quobyte represents a Quobyte mount on the host that shares a pod's lifetime. Deprecated: Quobyte is deprecated and the in-tree quobyte type is no longer supported.
Represents a Quobyte mount that lasts the lifetime of a pod. Quobyte volumes do not support ownership management or SELinux relabeling.
quobyte.registry (string), required
registry represents a single or multiple Quobyte Registry services specified as a string as host:port pair (multiple entries are separated with commas) which acts as the central registry for volumes
quobyte.volume (string), required
volume is a string that references an already created Quobyte volume by name.
quobyte.group (string)
group to map volume access to Default is no group
quobyte.readOnly (boolean)
readOnly here will force the Quobyte volume to be mounted with read-only permissions. Defaults to false.
quobyte.tenant (string)
tenant owning the given Quobyte volume in the Backend Used with dynamically provisioned Quobyte volumes, value is set by the plugin
quobyte.user (string)
user to map volume access to Defaults to serivceaccount user
rbd (RBDVolumeSource)
rbd represents a Rados Block Device mount on the host that shares a pod's lifetime. Deprecated: RBD is deprecated and the in-tree rbd type is no longer supported. More info: https://examples.k8s.io/volumes/rbd/README.md
Represents a Rados Block Device mount that lasts the lifetime of a pod. RBD volumes support ownership management and SELinux relabeling.
fsType is the filesystem type of the volume that you want to mount. Tip: Ensure that the filesystem type is supported by the host operating system. Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified. More info: https://kubernetes.io/docs/concepts/storage/volumes#rbd
scaleIO represents a ScaleIO persistent volume attached and mounted on Kubernetes nodes. Deprecated: ScaleIO is deprecated and the in-tree scaleIO type is no longer supported.
ScaleIOVolumeSource represents a persistent ScaleIO volume
scaleIO.gateway (string), required
gateway is the host address of the ScaleIO API Gateway.
secretRef references to the secret for ScaleIO user and other sensitive information. If this is not provided, Login operation will fail.
scaleIO.system (string), required
system is the name of the storage system as configured in ScaleIO.
scaleIO.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Default is "xfs".
scaleIO.protectionDomain (string)
protectionDomain is the name of the ScaleIO Protection Domain for the configured storage.
scaleIO.readOnly (boolean)
readOnly Defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
scaleIO.sslEnabled (boolean)
sslEnabled Flag enable/disable SSL communication with Gateway, default false
scaleIO.storageMode (string)
storageMode indicates whether the storage for a volume should be ThickProvisioned or ThinProvisioned. Default is ThinProvisioned.
scaleIO.storagePool (string)
storagePool is the ScaleIO Storage Pool associated with the protection domain.
scaleIO.volumeName (string)
volumeName is the name of a volume already created in the ScaleIO system that is associated with this volume source.
storageos (StorageOSVolumeSource)
storageOS represents a StorageOS volume attached and mounted on Kubernetes nodes. Deprecated: StorageOS is deprecated and the in-tree storageos type is no longer supported.
Represents a StorageOS persistent volume resource.
storageos.fsType (string)
fsType is the filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
storageos.readOnly (boolean)
readOnly defaults to false (read/write). ReadOnly here will force the ReadOnly setting in VolumeMounts.
secretRef specifies the secret to use for obtaining the StorageOS API credentials. If not specified, default values will be attempted.
storageos.volumeName (string)
volumeName is the human-readable name of the StorageOS volume. Volume names are only unique within a namespace.
storageos.volumeNamespace (string)
volumeNamespace specifies the scope of the volume within StorageOS. If no namespace is specified then the Pod's namespace will be used. This allows the Kubernetes name scoping to be mirrored within StorageOS for tighter integration. Set VolumeName to any name to override the default behaviour. Set to "default" if you are not using namespaces within StorageOS. Namespaces that do not pre-exist within StorageOS will be created.
vsphereVolume (VsphereVirtualDiskVolumeSource)
vsphereVolume represents a vSphere volume attached and mounted on kubelets host machine. Deprecated: VsphereVolume is deprecated. All operations for the in-tree vsphereVolume type are redirected to the csi.vsphere.vmware.com CSI driver.
Represents a vSphere volume resource.
vsphereVolume.volumePath (string), required
volumePath is the path that identifies vSphere volume vmdk
vsphereVolume.fsType (string)
fsType is filesystem type to mount. Must be a filesystem type supported by the host operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
vsphereVolume.storagePolicyID (string)
storagePolicyID is the storage Policy Based Management (SPBM) profile ID associated with the StoragePolicyName.
vsphereVolume.storagePolicyName (string)
storagePolicyName is the storage Policy Based Management (SPBM) profile name.
Deprecated
gitRepo (GitRepoVolumeSource)
gitRepo represents a git repository at a particular revision. Deprecated: GitRepo is deprecated. To provision a container with a git repo, mount an EmptyDir into an InitContainer that clones the repo using git, then mount the EmptyDir into the Pod's container.
*Represents a volume that is populated with the contents of a git repository. Git repo volumes do not support ownership management. Git repo volumes support SELinux relabeling.
DEPRECATED: GitRepo is deprecated. To provision a container with a git repo, mount an EmptyDir into an InitContainer that clones the repo using git, then mount the EmptyDir into the Pod's container.*
gitRepo.repository (string), required
repository is the URL
gitRepo.directory (string)
directory is the target directory name. Must not contain or start with '..'. If '.' is supplied, the volume directory will be the git repository. Otherwise, if specified, the volume will contain the git repository in the subdirectory with the given name.
gitRepo.revision (string)
revision is the commit hash for the specified revision.
DownwardAPIVolumeFile
DownwardAPIVolumeFile represents information to create the file containing the pod field
path (string), required
Required: Path is the relative path name of the file to be created. Must not be absolute or contain the '..' path. Must be utf-8 encoded. The first item of the relative path must not start with '..'
Required: Selects a field of the pod: only annotations, labels, name, namespace and uid are supported.
mode (int32)
Optional: mode bits used to set permissions on this file, must be an octal value between 0000 and 0777 or a decimal value between 0 and 511. YAML accepts both octal and decimal values, JSON requires decimal values for mode bits. If not specified, the volume defaultMode will be used. This might be in conflict with other options that affect the file mode, like fsGroup, and the result can be other mode bits set.
Selects a resource of the container: only resources limits and requests (limits.cpu, limits.memory, requests.cpu and requests.memory) are currently supported.
KeyToPath
Maps a string key to a path within a volume.
key (string), required
key is the key to project.
path (string), required
path is the relative path of the file to map the key to. May not be an absolute path. May not contain the path element '..'. May not start with the string '..'.
mode (int32)
mode is Optional: mode bits used to set permissions on this file. Must be an octal value between 0000 and 0777 or a decimal value between 0 and 511. YAML accepts both octal and decimal values, JSON requires decimal values for mode bits. If not specified, the volume defaultMode will be used. This might be in conflict with other options that affect the file mode, like fsGroup, and the result can be other mode bits set.
5.3.11 - VolumeAttachment
VolumeAttachment captures the intent to attach or detach the specified volume to/from the specified node.
apiVersion: storage.k8s.io/v1
import "k8s.io/api/storage/v1"
VolumeAttachment
VolumeAttachment captures the intent to attach or detach the specified volume to/from the specified node.
status represents status of the VolumeAttachment request. Populated by the entity completing the attach or detach operation, i.e. the external-attacher.
VolumeAttachmentSpec
VolumeAttachmentSpec is the specification of a VolumeAttachment request.
attacher (string), required
attacher indicates the name of the volume driver that MUST handle this request. This is the name returned by GetPluginName().
nodeName (string), required
nodeName represents the node that the volume should be attached to.
source (VolumeAttachmentSource), required
source represents the volume that should be attached.
VolumeAttachmentSource represents a volume that should be attached. Right now only PersistentVolumes can be attached via external attacher, in the future we may allow also inline volumes in pods. Exactly one member can be set.
inlineVolumeSpec contains all the information necessary to attach a persistent volume defined by a pod's inline VolumeSource. This field is populated only for the CSIMigration feature. It contains translated fields from a pod's inline VolumeSource to a PersistentVolumeSpec. This field is beta-level and is only honored by servers that enabled the CSIMigration feature.
source.persistentVolumeName (string)
persistentVolumeName represents the name of the persistent volume to attach.
VolumeAttachmentStatus
VolumeAttachmentStatus is the status of a VolumeAttachment request.
attached (boolean), required
attached indicates the volume is successfully attached. This field must only be set by the entity completing the attach operation, i.e. the external-attacher.
attachError (VolumeError)
attachError represents the last error encountered during attach operation, if any. This field must only be set by the entity completing the attach operation, i.e. the external-attacher.
VolumeError captures an error encountered during a volume operation.
attachError.errorCode (int32)
errorCode is a numeric gRPC code representing the error encountered during Attach or Detach operations.
This is an optional, alpha field that requires the MutableCSINodeAllocatableCount feature gate being enabled to be set.
attachError.message (string)
message represents the error encountered during Attach or Detach operation. This string may be logged, so it should not contain sensitive information.
attachError.time (Time)
time represents the time the error was encountered.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
attachmentMetadata (map[string]string)
attachmentMetadata is populated with any information returned by the attach operation, upon successful attach, that must be passed into subsequent WaitForAttach or Mount calls. This field must only be set by the entity completing the attach operation, i.e. the external-attacher.
detachError (VolumeError)
detachError represents the last error encountered during detach operation, if any. This field must only be set by the entity completing the detach operation, i.e. the external-attacher.
VolumeError captures an error encountered during a volume operation.
detachError.errorCode (int32)
errorCode is a numeric gRPC code representing the error encountered during Attach or Detach operations.
This is an optional, alpha field that requires the MutableCSINodeAllocatableCount feature gate being enabled to be set.
detachError.message (string)
message represents the error encountered during Attach or Detach operation. This string may be logged, so it should not contain sensitive information.
detachError.time (Time)
time represents the time the error was encountered.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
VolumeAttachmentList
VolumeAttachmentList is a collection of VolumeAttachment objects.
VolumeAttributesClass represents a specification of mutable volume attributes defined by the CSI driver.
apiVersion: storage.k8s.io/v1beta1
import "k8s.io/api/storage/v1beta1"
VolumeAttributesClass
VolumeAttributesClass represents a specification of mutable volume attributes defined by the CSI driver. The class can be specified during dynamic provisioning of PersistentVolumeClaims, and changed in the PersistentVolumeClaim spec after provisioning.
parameters hold volume attributes defined by the CSI driver. These values are opaque to the Kubernetes and are passed directly to the CSI driver. The underlying storage provider supports changing these attributes on an existing volume, however the parameters field itself is immutable. To invoke a volume update, a new VolumeAttributesClass should be created with new parameters, and the PersistentVolumeClaim should be updated to reference the new VolumeAttributesClass.
This field is required and must contain at least one key/value pair. The keys cannot be empty, and the maximum number of parameters is 512, with a cumulative max size of 256K. If the CSI driver rejects invalid parameters, the target PersistentVolumeClaim will be set to an "Infeasible" state in the modifyVolumeStatus field.
VolumeAttributesClassList
VolumeAttributesClassList is a collection of VolumeAttributesClass objects.
ServiceAccount binds together: * a name, understood by users, and perhaps by peripheral systems, for an identity * a principal that can be authenticated and authorized * a set of secrets.
apiVersion: v1
import "k8s.io/api/core/v1"
ServiceAccount
ServiceAccount binds together: * a name, understood by users, and perhaps by peripheral systems, for an identity * a principal that can be authenticated and authorized * a set of secrets
AutomountServiceAccountToken indicates whether pods running as this service account should have an API token automatically mounted. Can be overridden at the pod level.
ImagePullSecrets is a list of references to secrets in the same namespace to use for pulling any images in pods that reference this ServiceAccount. ImagePullSecrets are distinct from Secrets because Secrets can be mounted in the pod, but ImagePullSecrets are only accessed by the kubelet. More info: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
Map: unique values on key name will be kept during a merge
Secrets is a list of the secrets in the same namespace that pods running using this ServiceAccount are allowed to use. Pods are only limited to this list if this service account has a "kubernetes.io/enforce-mountable-secrets" annotation set to "true". The "kubernetes.io/enforce-mountable-secrets" annotation is deprecated since v1.32. Prefer separate namespaces to isolate access to mounted secrets. This field should not be used to find auto-generated service account token secrets for use outside of pods. Instead, tokens can be requested directly using the TokenRequest API, or service account token secrets can be manually created. More info: https://kubernetes.io/docs/concepts/configuration/secret
ServiceAccountList
ServiceAccountList is a list of ServiceAccount objects
Status is filled in by the server and indicates whether the token can be authenticated.
TokenRequestSpec
TokenRequestSpec contains client provided parameters of a token request.
audiences ([]string), required
Atomic: will be replaced during a merge
Audiences are the intendend audiences of the token. A recipient of a token must identify themself with an identifier in the list of audiences of the token, and otherwise should reject the token. A token issued for multiple audiences may be used to authenticate against any of the audiences listed but implies a high degree of trust between the target audiences.
boundObjectRef (BoundObjectReference)
BoundObjectRef is a reference to an object that the token will be bound to. The token will only be valid for as long as the bound object exists. NOTE: The API server's TokenReview endpoint will validate the BoundObjectRef, but other audiences may not. Keep ExpirationSeconds small if you want prompt revocation.
BoundObjectReference is a reference to an object that a token is bound to.
boundObjectRef.apiVersion (string)
API version of the referent.
boundObjectRef.kind (string)
Kind of the referent. Valid kinds are 'Pod' and 'Secret'.
boundObjectRef.name (string)
Name of the referent.
boundObjectRef.uid (string)
UID of the referent.
expirationSeconds (int64)
ExpirationSeconds is the requested duration of validity of the request. The token issuer may return a token with a different validity duration so a client needs to check the 'expiration' field in a response.
TokenRequestStatus
TokenRequestStatus is the result of a token request.
expirationTimestamp (Time), required
ExpirationTimestamp is the time of expiration of the returned token.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
token (string), required
Token is the opaque bearer token.
Operations
create create token of a ServiceAccount
HTTP Request
POST /api/v1/namespaces/{namespace}/serviceaccounts/{name}/token
TokenReview attempts to authenticate a token to a known user.
apiVersion: authentication.k8s.io/v1
import "k8s.io/api/authentication/v1"
TokenReview
TokenReview attempts to authenticate a token to a known user. Note: TokenReview requests may be cached by the webhook token authenticator plugin in the kube-apiserver.
Status is filled in by the server and indicates whether the request can be authenticated.
TokenReviewSpec
TokenReviewSpec is a description of the token authentication request.
audiences ([]string)
Atomic: will be replaced during a merge
Audiences is a list of the identifiers that the resource server presented with the token identifies as. Audience-aware token authenticators will verify that the token was intended for at least one of the audiences in this list. If no audiences are provided, the audience will default to the audience of the Kubernetes apiserver.
token (string)
Token is the opaque bearer token.
TokenReviewStatus
TokenReviewStatus is the result of the token authentication request.
audiences ([]string)
Atomic: will be replaced during a merge
Audiences are audience identifiers chosen by the authenticator that are compatible with both the TokenReview and token. An identifier is any identifier in the intersection of the TokenReviewSpec audiences and the token's audiences. A client of the TokenReview API that sets the spec.audiences field should validate that a compatible audience identifier is returned in the status.audiences field to ensure that the TokenReview server is audience aware. If a TokenReview returns an empty status.audience field where status.authenticated is "true", the token is valid against the audience of the Kubernetes API server.
authenticated (boolean)
Authenticated indicates that the token was associated with a known user.
error (string)
Error indicates that the token couldn't be checked
user (UserInfo)
User is the UserInfo associated with the provided token.
UserInfo holds the information about the user needed to implement the user.Info interface.
user.extra (map[string][]string)
Any additional information provided by the authenticator.
user.groups ([]string)
Atomic: will be replaced during a merge
The names of groups this user is a part of.
user.uid (string)
A unique value that identifies this user across time. If this user is deleted and another user by the same name is added, they will have different UIDs.
user.username (string)
The name that uniquely identifies this user among all active users.
CertificateSigningRequest objects provide a mechanism to obtain x509 certificates by submitting a certificate signing request, and having it asynchronously approved and issued.
apiVersion: certificates.k8s.io/v1
import "k8s.io/api/certificates/v1"
CertificateSigningRequest
CertificateSigningRequest objects provide a mechanism to obtain x509 certificates by submitting a certificate signing request, and having it asynchronously approved and issued.
Kubelets use this API to obtain:
client certificates to authenticate to kube-apiserver (with the "kubernetes.io/kube-apiserver-client-kubelet" signerName).
serving certificates for TLS endpoints kube-apiserver can connect to securely (with the "kubernetes.io/kubelet-serving" signerName).
This API can be used to request client certificates to authenticate to kube-apiserver (with the "kubernetes.io/kube-apiserver-client" signerName), or to obtain certificates from custom non-Kubernetes signers.
spec contains the certificate request, and is immutable after creation. Only the request, signerName, expirationSeconds, and usages fields can be set on creation. Other fields are derived by Kubernetes and cannot be modified by users.
status contains information about whether the request is approved or denied, and the certificate issued by the signer, or the failure condition indicating signer failure.
CertificateSigningRequestSpec
CertificateSigningRequestSpec contains the certificate request.
request ([]byte), required
Atomic: will be replaced during a merge
request contains an x509 certificate signing request encoded in a "CERTIFICATE REQUEST" PEM block. When serialized as JSON or YAML, the data is additionally base64-encoded.
signerName (string), required
signerName indicates the requested signer, and is a qualified name.
List/watch requests for CertificateSigningRequests can filter on this field using a "spec.signerName=NAME" fieldSelector.
Well-known Kubernetes signers are:
"kubernetes.io/kube-apiserver-client": issues client certificates that can be used to authenticate to kube-apiserver.
Requests for this signer are never auto-approved by kube-controller-manager, can be issued by the "csrsigning" controller in kube-controller-manager.
"kubernetes.io/kube-apiserver-client-kubelet": issues client certificates that kubelets use to authenticate to kube-apiserver.
Requests for this signer can be auto-approved by the "csrapproving" controller in kube-controller-manager, and can be issued by the "csrsigning" controller in kube-controller-manager.
"kubernetes.io/kubelet-serving" issues serving certificates that kubelets use to serve TLS endpoints, which kube-apiserver can connect to securely.
Requests for this signer are never auto-approved by kube-controller-manager, and can be issued by the "csrsigning" controller in kube-controller-manager.
Custom signerNames can also be specified. The signer defines:
Trust distribution: how trust (CA bundles) are distributed.
Permitted subjects: and behavior when a disallowed subject is requested.
Required, permitted, or forbidden x509 extensions in the request (including whether subjectAltNames are allowed, which types, restrictions on allowed values) and behavior when a disallowed extension is requested.
Required, permitted, or forbidden key usages / extended key usages.
Expiration/certificate lifetime: whether it is fixed by the signer, configurable by the admin.
Whether or not requests for CA certificates are allowed.
expirationSeconds (int32)
expirationSeconds is the requested duration of validity of the issued certificate. The certificate signer may issue a certificate with a different validity duration so a client must check the delta between the notBefore and and notAfter fields in the issued certificate to determine the actual duration.
The v1.22+ in-tree implementations of the well-known Kubernetes signers will honor this field as long as the requested duration is not greater than the maximum duration they will honor per the --cluster-signing-duration CLI flag to the Kubernetes controller manager.
Certificate signers may not honor this field for various reasons:
Old signer that is unaware of the field (such as the in-tree
implementations prior to v1.22)
Signer whose configured maximum is shorter than the requested duration
Signer whose configured minimum is longer than the requested duration
The minimum valid value for expirationSeconds is 600, i.e. 10 minutes.
extra (map[string][]string)
extra contains extra attributes of the user that created the CertificateSigningRequest. Populated by the API server on creation and immutable.
groups ([]string)
Atomic: will be replaced during a merge
groups contains group membership of the user that created the CertificateSigningRequest. Populated by the API server on creation and immutable.
uid (string)
uid contains the uid of the user that created the CertificateSigningRequest. Populated by the API server on creation and immutable.
usages ([]string)
Atomic: will be replaced during a merge
usages specifies a set of key usages requested in the issued certificate.
Requests for TLS client certificates typically request: "digital signature", "key encipherment", "client auth".
Requests for TLS serving certificates typically request: "key encipherment", "digital signature", "server auth".
username contains the name of the user that created the CertificateSigningRequest. Populated by the API server on creation and immutable.
CertificateSigningRequestStatus
CertificateSigningRequestStatus contains conditions used to indicate approved/denied/failed status of the request, and the issued certificate.
certificate ([]byte)
Atomic: will be replaced during a merge
certificate is populated with an issued certificate by the signer after an Approved condition is present. This field is set via the /status subresource. Once populated, this field is immutable.
If the certificate signing request is denied, a condition of type "Denied" is added and this field remains empty. If the signer cannot issue the certificate, a condition of type "Failed" is added and this field remains empty.
Validation requirements:
certificate must contain one or more PEM blocks.
All PEM blocks must have the "CERTIFICATE" label, contain no headers, and the encoded data
must be a BER-encoded ASN.1 Certificate structure as described in section 4 of RFC5280.
Non-PEM content may appear before or after the "CERTIFICATE" PEM blocks and is unvalidated,
to allow for explanatory text as described in section 5.2 of RFC7468.
If more than one PEM block is present, and the definition of the requested spec.signerName does not indicate otherwise, the first block is the issued certificate, and subsequent blocks should be treated as intermediate certificates and presented in TLS handshakes.
The certificate is encoded in PEM format.
When serialized as JSON or YAML, the data is additionally base64-encoded, so it consists of:
Map: unique values on key type will be kept during a merge
conditions applied to the request. Known conditions are "Approved", "Denied", and "Failed".
CertificateSigningRequestCondition describes a condition of a CertificateSigningRequest object
conditions.status (string), required
status of the condition, one of True, False, Unknown. Approved, Denied, and Failed conditions may not be "False" or "Unknown".
conditions.type (string), required
type of the condition. Known conditions are "Approved", "Denied", and "Failed".
An "Approved" condition is added via the /approval subresource, indicating the request was approved and should be issued by the signer.
A "Denied" condition is added via the /approval subresource, indicating the request was denied and should not be issued by the signer.
A "Failed" condition is added via the /status subresource, indicating the signer failed to issue the certificate.
Approved and Denied conditions are mutually exclusive. Approved, Denied, and Failed conditions cannot be removed once added.
Only one condition of a given type is allowed.
conditions.lastTransitionTime (Time)
lastTransitionTime is the time the condition last transitioned from one status to another. If unset, when a new condition type is added or an existing condition's status is changed, the server defaults this to the current time.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.lastUpdateTime (Time)
lastUpdateTime is the time of the last update to this condition
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
message contains a human readable message with details about the request state
conditions.reason (string)
reason indicates a brief reason for the request state
CertificateSigningRequestList
CertificateSigningRequestList is a collection of CertificateSigningRequest objects
ClusterTrustBundle is a cluster-scoped container for X.
apiVersion: certificates.k8s.io/v1beta1
import "k8s.io/api/certificates/v1beta1"
ClusterTrustBundle
ClusterTrustBundle is a cluster-scoped container for X.509 trust anchors (root certificates).
ClusterTrustBundle objects are considered to be readable by any authenticated user in the cluster, because they can be mounted by pods using the clusterTrustBundle projection. All service accounts have read access to ClusterTrustBundles by default. Users who only have namespace-level access to a cluster can read ClusterTrustBundles by impersonating a serviceaccount that they have access to.
It can be optionally associated with a particular assigner, in which case it contains one valid set of trust anchors for that signer. Signers may have multiple associated ClusterTrustBundles; each is an independent set of trust anchors for that signer. Admission control is used to enforce that only users with permissions on the signer can create or modify the corresponding bundle.
spec contains the signer (if any) and trust anchors.
ClusterTrustBundleSpec
ClusterTrustBundleSpec contains the signer and trust anchors.
trustBundle (string), required
trustBundle contains the individual X.509 trust anchors for this bundle, as PEM bundle of PEM-wrapped, DER-formatted X.509 certificates.
The data must consist only of PEM certificate blocks that parse as valid X.509 certificates. Each certificate must include a basic constraints extension with the CA bit set. The API server will reject objects that contain duplicate certificates, or that use PEM block headers.
Users of ClusterTrustBundles, including Kubelet, are free to reorder and deduplicate certificate blocks in this file according to their own logic, as well as to drop PEM block headers and inter-block data.
signerName (string)
signerName indicates the associated signer, if any.
In order to create or update a ClusterTrustBundle that sets signerName, you must have the following cluster-scoped permission: group=certificates.k8s.io resource=signers resourceName=<the signer name> verb=attest.
If signerName is not empty, then the ClusterTrustBundle object must be named with the signer name as a prefix (translating slashes to colons). For example, for the signer name example.com/foo, valid ClusterTrustBundle object names include example.com:foo:abc and example.com:foo:v1.
If signerName is empty, then the ClusterTrustBundle object's name must not have such a prefix.
List/watch requests for ClusterTrustBundles can filter on this field using a spec.signerName=NAME field selector.
ClusterTrustBundleList
ClusterTrustBundleList is a collection of ClusterTrustBundle objects
SelfSubjectReview contains the user information that the kube-apiserver has about the user making this request.
apiVersion: authentication.k8s.io/v1
import "k8s.io/api/authentication/v1"
SelfSubjectReview
SelfSubjectReview contains the user information that the kube-apiserver has about the user making this request. When using impersonation, users will receive the user info of the user being impersonated. If impersonation or request header authentication is used, any extra keys will have their case ignored and returned as lowercase.
Status is filled in by the server with the user attributes.
SelfSubjectReviewStatus
SelfSubjectReviewStatus is filled by the kube-apiserver and sent back to a user.
userInfo (UserInfo)
User attributes of the user making this request.
UserInfo holds the information about the user needed to implement the user.Info interface.
userInfo.extra (map[string][]string)
Any additional information provided by the authenticator.
userInfo.groups ([]string)
Atomic: will be replaced during a merge
The names of groups this user is a part of.
userInfo.uid (string)
A unique value that identifies this user across time. If this user is deleted and another user by the same name is added, they will have different UIDs.
userInfo.username (string)
The name that uniquely identifies this user among all active users.
Operations
create create a SelfSubjectReview
HTTP Request
POST /apis/authentication.k8s.io/v1/selfsubjectreviews
LocalSubjectAccessReview checks whether or not a user or group can perform an action in a given namespace.
apiVersion: authorization.k8s.io/v1
import "k8s.io/api/authorization/v1"
LocalSubjectAccessReview
LocalSubjectAccessReview checks whether or not a user or group can perform an action in a given namespace. Having a namespace scoped resource makes it much easier to grant namespace scoped policy that includes permissions checking.
Spec holds information about the request being evaluated. spec.namespace must be equal to the namespace you made the request against. If empty, it is defaulted.
SelfSubjectAccessReview checks whether or the current user can perform an action.
apiVersion: authorization.k8s.io/v1
import "k8s.io/api/authorization/v1"
SelfSubjectAccessReview
SelfSubjectAccessReview checks whether or the current user can perform an action. Not filling in a spec.namespace means "in all namespaces". Self is a special case, because users should always be able to check whether they can perform an action
Status is filled in by the server and indicates whether the request is allowed or not
SelfSubjectAccessReviewSpec
SelfSubjectAccessReviewSpec is a description of the access request. Exactly one of ResourceAuthorizationAttributes and NonResourceAuthorizationAttributes must be set
nonResourceAttributes (NonResourceAttributes)
NonResourceAttributes describes information for a non-resource access request
NonResourceAttributes includes the authorization attributes available for non-resource requests to the Authorizer interface
nonResourceAttributes.path (string)
Path is the URL path of the request
nonResourceAttributes.verb (string)
Verb is the standard HTTP verb
resourceAttributes (ResourceAttributes)
ResourceAuthorizationAttributes describes information for a resource access request
ResourceAttributes includes the authorization attributes available for resource requests to the Authorizer interface
fieldSelector describes the limitation on access based on field. It can only limit access, not broaden it.
This field is alpha-level. To use this field, you must enable the AuthorizeWithSelectors feature gate (disabled by default).
*FieldSelectorAttributes indicates a field limited access. Webhook authors are encouraged to * ensure rawSelector and requirements are not both set * consider the requirements field if set * not try to parse or consider the rawSelector field if set. This is to avoid another CVE-2022-2880 (i.e. getting different systems to agree on how exactly to parse a query is not something we want), see https://www.oxeye.io/resources/golang-parameter-smuggling-attack for more details. For the SubjectAccessReview endpoints of the kube-apiserver: * If rawSelector is empty and requirements are empty, the request is not limited. * If rawSelector is present and requirements are empty, the rawSelector will be parsed and limited if the parsing succeeds. * If rawSelector is empty and requirements are present, the requirements should be honored * If rawSelector is present and requirements are present, the request is invalid.
rawSelector is the serialization of a field selector that would be included in a query parameter. Webhook implementations are encouraged to ignore rawSelector. The kube-apiserver's *SubjectAccessReview will parse the rawSelector as long as the requirements are not present.
requirements is the parsed interpretation of a field selector. All requirements must be met for a resource instance to match the selector. Webhook implementations should handle requirements, but how to handle them is up to the webhook. Since requirements can only limit the request, it is safe to authorize as unlimited request if the requirements are not understood.
FieldSelectorRequirement is a selector that contains values, a key, and an operator that relates the key and values.
operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. The list of operators may grow in the future.
values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty.
resourceAttributes.group (string)
Group is the API Group of the Resource. "*" means all.
labelSelector describes the limitation on access based on labels. It can only limit access, not broaden it.
This field is alpha-level. To use this field, you must enable the AuthorizeWithSelectors feature gate (disabled by default).
*LabelSelectorAttributes indicates a label limited access. Webhook authors are encouraged to * ensure rawSelector and requirements are not both set * consider the requirements field if set * not try to parse or consider the rawSelector field if set. This is to avoid another CVE-2022-2880 (i.e. getting different systems to agree on how exactly to parse a query is not something we want), see https://www.oxeye.io/resources/golang-parameter-smuggling-attack for more details. For the SubjectAccessReview endpoints of the kube-apiserver: * If rawSelector is empty and requirements are empty, the request is not limited. * If rawSelector is present and requirements are empty, the rawSelector will be parsed and limited if the parsing succeeds. * If rawSelector is empty and requirements are present, the requirements should be honored * If rawSelector is present and requirements are present, the request is invalid.
rawSelector is the serialization of a field selector that would be included in a query parameter. Webhook implementations are encouraged to ignore rawSelector. The kube-apiserver's *SubjectAccessReview will parse the rawSelector as long as the requirements are not present.
requirements is the parsed interpretation of a label selector. All requirements must be met for a resource instance to match the selector. Webhook implementations should handle requirements, but how to handle them is up to the webhook. Since requirements can only limit the request, it is safe to authorize as unlimited request if the requirements are not understood.
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.
resourceAttributes.name (string)
Name is the name of the resource being requested for a "get" or deleted for a "delete". "" (empty) means all.
resourceAttributes.namespace (string)
Namespace is the namespace of the action being requested. Currently, there is no distinction between no namespace and all namespaces "" (empty) is defaulted for LocalSubjectAccessReviews "" (empty) is empty for cluster-scoped resources "" (empty) means "all" for namespace scoped resources from a SubjectAccessReview or SelfSubjectAccessReview
resourceAttributes.resource (string)
Resource is one of the existing resource types. "*" means all.
resourceAttributes.subresource (string)
Subresource is one of the existing resource types. "" means none.
resourceAttributes.verb (string)
Verb is a kubernetes resource API verb, like: get, list, watch, create, update, delete, proxy. "*" means all.
resourceAttributes.version (string)
Version is the API Version of the Resource. "*" means all.
Operations
create create a SelfSubjectAccessReview
HTTP Request
POST /apis/authorization.k8s.io/v1/selfsubjectaccessreviews
SelfSubjectRulesReview enumerates the set of actions the current user can perform within a namespace.
apiVersion: authorization.k8s.io/v1
import "k8s.io/api/authorization/v1"
SelfSubjectRulesReview
SelfSubjectRulesReview enumerates the set of actions the current user can perform within a namespace. The returned list of actions may be incomplete depending on the server's authorization mode, and any errors experienced during the evaluation. SelfSubjectRulesReview should be used by UIs to show/hide actions, or to quickly let an end user reason about their permissions. It should NOT Be used by external systems to drive authorization decisions as this raises confused deputy, cache lifetime/revocation, and correctness concerns. SubjectAccessReview, and LocalAccessReview are the correct way to defer authorization decisions to the API server.
Spec holds information about the request being evaluated.
status (SubjectRulesReviewStatus)
Status is filled in by the server and indicates the set of actions a user can perform.
SubjectRulesReviewStatus contains the result of a rules check. This check can be incomplete depending on the set of authorizers the server is configured with and any errors experienced during evaluation. Because authorization rules are additive, if a rule appears in a list it's safe to assume the subject has that permission, even if that list is incomplete.
status.incomplete (boolean), required
Incomplete is true when the rules returned by this call are incomplete. This is most commonly encountered when an authorizer, such as an external authorizer, doesn't support rules evaluation.
NonResourceRules is the list of actions the subject is allowed to perform on non-resources. The list ordering isn't significant, may contain duplicates, and possibly be incomplete.
NonResourceRule holds information that describes a rule for the non-resource
NonResourceURLs is a set of partial urls that a user should have access to. s are allowed, but only as the full, final step in the path. "" means all.
status.resourceRules ([]ResourceRule), required
Atomic: will be replaced during a merge
ResourceRules is the list of actions the subject is allowed to perform on resources. The list ordering isn't significant, may contain duplicates, and possibly be incomplete.
ResourceRule is the list of actions the subject is allowed to perform on resources. The list ordering isn't significant, may contain duplicates, and possibly be incomplete.
status.resourceRules.verbs ([]string), required
Atomic: will be replaced during a merge
Verb is a list of kubernetes resource API verbs, like: get, list, watch, create, update, delete, proxy. "*" means all.
status.resourceRules.apiGroups ([]string)
Atomic: will be replaced during a merge
APIGroups is the name of the APIGroup that contains the resources. If multiple API groups are specified, any action requested against one of the enumerated resources in any API group will be allowed. "*" means all.
status.resourceRules.resourceNames ([]string)
Atomic: will be replaced during a merge
ResourceNames is an optional white list of names that the rule applies to. An empty set means that everything is allowed. "*" means all.
status.resourceRules.resources ([]string)
Atomic: will be replaced during a merge
Resources is a list of resources this rule applies to. "" means all in the specified apiGroups.
"/foo" represents the subresource 'foo' for all resources in the specified apiGroups.
status.evaluationError (string)
EvaluationError can appear in combination with Rules. It indicates an error occurred during rule evaluation, such as an authorizer that doesn't support rule evaluation, and that ResourceRules and/or NonResourceRules may be incomplete.
SelfSubjectRulesReviewSpec
SelfSubjectRulesReviewSpec defines the specification for SelfSubjectRulesReview.
namespace (string)
Namespace to evaluate rules for. Required.
Operations
create create a SelfSubjectRulesReview
HTTP Request
POST /apis/authorization.k8s.io/v1/selfsubjectrulesreviews
Status is filled in by the server and indicates whether the request is allowed or not
SubjectAccessReviewSpec
SubjectAccessReviewSpec is a description of the access request. Exactly one of ResourceAuthorizationAttributes and NonResourceAuthorizationAttributes must be set
extra (map[string][]string)
Extra corresponds to the user.Info.GetExtra() method from the authenticator. Since that is input to the authorizer it needs a reflection here.
groups ([]string)
Atomic: will be replaced during a merge
Groups is the groups you're testing for.
nonResourceAttributes (NonResourceAttributes)
NonResourceAttributes describes information for a non-resource access request
NonResourceAttributes includes the authorization attributes available for non-resource requests to the Authorizer interface
nonResourceAttributes.path (string)
Path is the URL path of the request
nonResourceAttributes.verb (string)
Verb is the standard HTTP verb
resourceAttributes (ResourceAttributes)
ResourceAuthorizationAttributes describes information for a resource access request
ResourceAttributes includes the authorization attributes available for resource requests to the Authorizer interface
fieldSelector describes the limitation on access based on field. It can only limit access, not broaden it.
This field is alpha-level. To use this field, you must enable the AuthorizeWithSelectors feature gate (disabled by default).
*FieldSelectorAttributes indicates a field limited access. Webhook authors are encouraged to * ensure rawSelector and requirements are not both set * consider the requirements field if set * not try to parse or consider the rawSelector field if set. This is to avoid another CVE-2022-2880 (i.e. getting different systems to agree on how exactly to parse a query is not something we want), see https://www.oxeye.io/resources/golang-parameter-smuggling-attack for more details. For the SubjectAccessReview endpoints of the kube-apiserver: * If rawSelector is empty and requirements are empty, the request is not limited. * If rawSelector is present and requirements are empty, the rawSelector will be parsed and limited if the parsing succeeds. * If rawSelector is empty and requirements are present, the requirements should be honored * If rawSelector is present and requirements are present, the request is invalid.
rawSelector is the serialization of a field selector that would be included in a query parameter. Webhook implementations are encouraged to ignore rawSelector. The kube-apiserver's *SubjectAccessReview will parse the rawSelector as long as the requirements are not present.
requirements is the parsed interpretation of a field selector. All requirements must be met for a resource instance to match the selector. Webhook implementations should handle requirements, but how to handle them is up to the webhook. Since requirements can only limit the request, it is safe to authorize as unlimited request if the requirements are not understood.
FieldSelectorRequirement is a selector that contains values, a key, and an operator that relates the key and values.
operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. The list of operators may grow in the future.
values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty.
resourceAttributes.group (string)
Group is the API Group of the Resource. "*" means all.
labelSelector describes the limitation on access based on labels. It can only limit access, not broaden it.
This field is alpha-level. To use this field, you must enable the AuthorizeWithSelectors feature gate (disabled by default).
*LabelSelectorAttributes indicates a label limited access. Webhook authors are encouraged to * ensure rawSelector and requirements are not both set * consider the requirements field if set * not try to parse or consider the rawSelector field if set. This is to avoid another CVE-2022-2880 (i.e. getting different systems to agree on how exactly to parse a query is not something we want), see https://www.oxeye.io/resources/golang-parameter-smuggling-attack for more details. For the SubjectAccessReview endpoints of the kube-apiserver: * If rawSelector is empty and requirements are empty, the request is not limited. * If rawSelector is present and requirements are empty, the rawSelector will be parsed and limited if the parsing succeeds. * If rawSelector is empty and requirements are present, the requirements should be honored * If rawSelector is present and requirements are present, the request is invalid.
rawSelector is the serialization of a field selector that would be included in a query parameter. Webhook implementations are encouraged to ignore rawSelector. The kube-apiserver's *SubjectAccessReview will parse the rawSelector as long as the requirements are not present.
requirements is the parsed interpretation of a label selector. All requirements must be met for a resource instance to match the selector. Webhook implementations should handle requirements, but how to handle them is up to the webhook. Since requirements can only limit the request, it is safe to authorize as unlimited request if the requirements are not understood.
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.
resourceAttributes.name (string)
Name is the name of the resource being requested for a "get" or deleted for a "delete". "" (empty) means all.
resourceAttributes.namespace (string)
Namespace is the namespace of the action being requested. Currently, there is no distinction between no namespace and all namespaces "" (empty) is defaulted for LocalSubjectAccessReviews "" (empty) is empty for cluster-scoped resources "" (empty) means "all" for namespace scoped resources from a SubjectAccessReview or SelfSubjectAccessReview
resourceAttributes.resource (string)
Resource is one of the existing resource types. "*" means all.
resourceAttributes.subresource (string)
Subresource is one of the existing resource types. "" means none.
resourceAttributes.verb (string)
Verb is a kubernetes resource API verb, like: get, list, watch, create, update, delete, proxy. "*" means all.
resourceAttributes.version (string)
Version is the API Version of the Resource. "*" means all.
uid (string)
UID information about the requesting user.
user (string)
User is the user you're testing for. If you specify "User" but not "Groups", then is it interpreted as "What if User were not a member of any groups
SubjectAccessReviewStatus
SubjectAccessReviewStatus
allowed (boolean), required
Allowed is required. True if the action would be allowed, false otherwise.
denied (boolean)
Denied is optional. True if the action would be denied, otherwise false. If both allowed is false and denied is false, then the authorizer has no opinion on whether to authorize the action. Denied may not be true if Allowed is true.
evaluationError (string)
EvaluationError is an indication that some error occurred during the authorization check. It is entirely possible to get an error and be able to continue determine authorization status in spite of it. For instance, RBAC can be missing a role, but enough roles are still present and bound to reason about the request.
reason (string)
Reason is optional. It indicates why a request was allowed or denied.
Operations
create create a SubjectAccessReview
HTTP Request
POST /apis/authorization.k8s.io/v1/subjectaccessreviews
AggregationRule is an optional field that describes how to build the Rules for this ClusterRole. If AggregationRule is set, then the Rules are controller managed and direct changes to Rules will be stomped by the controller.
AggregationRule describes how to locate ClusterRoles to aggregate into the ClusterRole
ClusterRoleSelectors holds a list of selectors which will be used to find ClusterRoles and create the rules. If any of the selectors match, then the ClusterRole's permissions will be added
rules ([]PolicyRule)
Atomic: will be replaced during a merge
Rules holds all the PolicyRules for this ClusterRole
PolicyRule holds information that describes a policy rule, but does not contain information about who the rule applies to or which namespace the rule applies to.
rules.apiGroups ([]string)
Atomic: will be replaced during a merge
APIGroups is the name of the APIGroup that contains the resources. If multiple API groups are specified, any action requested against one of the enumerated resources in any API group will be allowed. "" represents the core API group and "*" represents all API groups.
rules.resources ([]string)
Atomic: will be replaced during a merge
Resources is a list of resources this rule applies to. '*' represents all resources.
rules.verbs ([]string), required
Atomic: will be replaced during a merge
Verbs is a list of Verbs that apply to ALL the ResourceKinds contained in this rule. '*' represents all verbs.
rules.resourceNames ([]string)
Atomic: will be replaced during a merge
ResourceNames is an optional white list of names that the rule applies to. An empty set means that everything is allowed.
rules.nonResourceURLs ([]string)
Atomic: will be replaced during a merge
NonResourceURLs is a set of partial urls that a user should have access to. *s are allowed, but only as the full, final step in the path Since non-resource URLs are not namespaced, this field is only applicable for ClusterRoles referenced from a ClusterRoleBinding. Rules can either apply to API resources (such as "pods" or "secrets") or non-resource URL paths (such as "/api"), but not both.
ClusterRoleBinding references a ClusterRole, but not contain it.
apiVersion: rbac.authorization.k8s.io/v1
import "k8s.io/api/rbac/v1"
ClusterRoleBinding
ClusterRoleBinding references a ClusterRole, but not contain it. It can reference a ClusterRole in the global namespace, and adds who information via Subject.
RoleRef can only reference a ClusterRole in the global namespace. If the RoleRef cannot be resolved, the Authorizer must return an error. This field is immutable.
RoleRef contains information that points to the role being used
roleRef.apiGroup (string), required
APIGroup is the group for the resource being referenced
roleRef.kind (string), required
Kind is the type of resource being referenced
roleRef.name (string), required
Name is the name of resource being referenced
subjects ([]Subject)
Atomic: will be replaced during a merge
Subjects holds references to the objects the role applies to.
Subject contains a reference to the object or user identities a role binding applies to. This can either hold a direct API object reference, or a value for non-objects such as user and group names.
subjects.kind (string), required
Kind of object being referenced. Values defined by this API group are "User", "Group", and "ServiceAccount". If the Authorizer does not recognized the kind value, the Authorizer should report an error.
subjects.name (string), required
Name of the object being referenced.
subjects.apiGroup (string)
APIGroup holds the API group of the referenced subject. Defaults to "" for ServiceAccount subjects. Defaults to "rbac.authorization.k8s.io" for User and Group subjects.
subjects.namespace (string)
Namespace of the referenced object. If the object kind is non-namespace, such as "User" or "Group", and this value is not empty the Authorizer should report an error.
ClusterRoleBindingList
ClusterRoleBindingList is a collection of ClusterRoleBindings
PolicyRule holds information that describes a policy rule, but does not contain information about who the rule applies to or which namespace the rule applies to.
rules.apiGroups ([]string)
Atomic: will be replaced during a merge
APIGroups is the name of the APIGroup that contains the resources. If multiple API groups are specified, any action requested against one of the enumerated resources in any API group will be allowed. "" represents the core API group and "*" represents all API groups.
rules.resources ([]string)
Atomic: will be replaced during a merge
Resources is a list of resources this rule applies to. '*' represents all resources.
rules.verbs ([]string), required
Atomic: will be replaced during a merge
Verbs is a list of Verbs that apply to ALL the ResourceKinds contained in this rule. '*' represents all verbs.
rules.resourceNames ([]string)
Atomic: will be replaced during a merge
ResourceNames is an optional white list of names that the rule applies to. An empty set means that everything is allowed.
rules.nonResourceURLs ([]string)
Atomic: will be replaced during a merge
NonResourceURLs is a set of partial urls that a user should have access to. *s are allowed, but only as the full, final step in the path Since non-resource URLs are not namespaced, this field is only applicable for ClusterRoles referenced from a ClusterRoleBinding. Rules can either apply to API resources (such as "pods" or "secrets") or non-resource URL paths (such as "/api"), but not both.
RoleBinding references a role, but does not contain it.
apiVersion: rbac.authorization.k8s.io/v1
import "k8s.io/api/rbac/v1"
RoleBinding
RoleBinding references a role, but does not contain it. It can reference a Role in the same namespace or a ClusterRole in the global namespace. It adds who information via Subjects and namespace information by which namespace it exists in. RoleBindings in a given namespace only have effect in that namespace.
RoleRef can reference a Role in the current namespace or a ClusterRole in the global namespace. If the RoleRef cannot be resolved, the Authorizer must return an error. This field is immutable.
RoleRef contains information that points to the role being used
roleRef.apiGroup (string), required
APIGroup is the group for the resource being referenced
roleRef.kind (string), required
Kind is the type of resource being referenced
roleRef.name (string), required
Name is the name of resource being referenced
subjects ([]Subject)
Atomic: will be replaced during a merge
Subjects holds references to the objects the role applies to.
Subject contains a reference to the object or user identities a role binding applies to. This can either hold a direct API object reference, or a value for non-objects such as user and group names.
subjects.kind (string), required
Kind of object being referenced. Values defined by this API group are "User", "Group", and "ServiceAccount". If the Authorizer does not recognized the kind value, the Authorizer should report an error.
subjects.name (string), required
Name of the object being referenced.
subjects.apiGroup (string)
APIGroup holds the API group of the referenced subject. Defaults to "" for ServiceAccount subjects. Defaults to "rbac.authorization.k8s.io" for User and Group subjects.
subjects.namespace (string)
Namespace of the referenced object. If the object kind is non-namespace, such as "User" or "Group", and this value is not empty the Authorizer should report an error.
FlowSchema defines the schema of a group of flows.
apiVersion: flowcontrol.apiserver.k8s.io/v1
import "k8s.io/api/flowcontrol/v1"
FlowSchema
FlowSchema defines the schema of a group of flows. Note that a flow is made up of a set of inbound API requests with similar attributes and is identified by a pair of strings: the name of the FlowSchema and a "flow distinguisher".
FlowSchemaSpec describes how the FlowSchema's specification looks like.
distinguisherMethod (FlowDistinguisherMethod)
distinguisherMethod defines how to compute the flow distinguisher for requests that match this schema. nil specifies that the distinguisher is disabled and thus will always be the empty string.
FlowDistinguisherMethod specifies the method of a flow distinguisher.
distinguisherMethod.type (string), required
type is the type of flow distinguisher method The supported types are "ByUser" and "ByNamespace". Required.
matchingPrecedence (int32)
matchingPrecedence is used to choose among the FlowSchemas that match a given request. The chosen FlowSchema is among those with the numerically lowest (which we take to be logically highest) MatchingPrecedence. Each MatchingPrecedence value must be ranged in [1,10000]. Note that if the precedence is not specified, it will be set to 1000 as default.
priorityLevelConfiguration should reference a PriorityLevelConfiguration in the cluster. If the reference cannot be resolved, the FlowSchema will be ignored and marked as invalid in its status. Required.
PriorityLevelConfigurationReference contains information that points to the "request-priority" being used.
name is the name of the priority level configuration being referenced Required.
rules ([]PolicyRulesWithSubjects)
Atomic: will be replaced during a merge
rules describes which requests will match this flow schema. This FlowSchema matches a request if and only if at least one member of rules matches the request. if it is an empty slice, there will be no requests matching the FlowSchema.
PolicyRulesWithSubjects prescribes a test that applies to a request to an apiserver. The test considers the subject making the request, the verb being requested, and the resource to be acted upon. This PolicyRulesWithSubjects matches a request if and only if both (a) at least one member of subjects matches the request and (b) at least one member of resourceRules or nonResourceRules matches the request.
rules.subjects ([]Subject), required
Atomic: will be replaced during a merge
subjects is the list of normal user, serviceaccount, or group that this rule cares about. There must be at least one member in this slice. A slice that includes both the system:authenticated and system:unauthenticated user groups matches every request. Required.
Subject matches the originator of a request, as identified by the request authentication system. There are three ways of matching an originator; by user, group, or service account.
rules.subjects.kind (string), required
kind indicates which one of the other fields is non-empty. Required
rules.subjects.group (GroupSubject)
group matches based on user group name.
GroupSubject holds detailed information for group-kind subject.
namespace is the namespace of matching ServiceAccount objects. Required.
rules.subjects.user (UserSubject)
user matches based on username.
UserSubject holds detailed information for user-kind subject.
rules.subjects.user.name (string), required
name is the username that matches, or "*" to match all usernames. Required.
rules.nonResourceRules ([]NonResourcePolicyRule)
Atomic: will be replaced during a merge
nonResourceRules is a list of NonResourcePolicyRules that identify matching requests according to their verb and the target non-resource URL.
NonResourcePolicyRule is a predicate that matches non-resource requests according to their verb and the target non-resource URL. A NonResourcePolicyRule matches a request if and only if both (a) at least one member of verbs matches the request and (b) at least one member of nonResourceURLs matches the request.
nonResourceURLs is a set of url prefixes that a user should have access to and may not be empty. For example:
"/healthz" is legal
"/hea*" is illegal
"/hea" is legal but matches nothing
"/hea/*" also matches nothing
"/healthz/" matches all per-component health checks.
"" matches all non-resource urls. if it is present, it must be the only entry. Required.
rules.nonResourceRules.verbs ([]string), required
Set: unique values will be kept during a merge
verbs is a list of matching verbs and may not be empty. "*" matches all verbs. If it is present, it must be the only entry. Required.
rules.resourceRules ([]ResourcePolicyRule)
Atomic: will be replaced during a merge
resourceRules is a slice of ResourcePolicyRules that identify matching requests according to their verb and the target resource. At least one of resourceRules and nonResourceRules has to be non-empty.
ResourcePolicyRule is a predicate that matches some resource requests, testing the request's verb and the target resource. A ResourcePolicyRule matches a resource request if and only if: (a) at least one member of verbs matches the request, (b) at least one member of apiGroups matches the request, (c) at least one member of resources matches the request, and (d) either (d1) the request does not specify a namespace (i.e., Namespace=="") and clusterScope is true or (d2) the request specifies a namespace and least one member of namespaces matches the request's namespace.
resources is a list of matching resources (i.e., lowercase and plural) with, if desired, subresource. For example, [ "services", "nodes/status" ]. This list may not be empty. "*" matches all resources and, if present, must be the only entry. Required.
rules.resourceRules.verbs ([]string), required
Set: unique values will be kept during a merge
verbs is a list of matching verbs and may not be empty. "*" matches all verbs and, if present, must be the only entry. Required.
rules.resourceRules.clusterScope (boolean)
clusterScope indicates whether to match requests that do not specify a namespace (which happens either because the resource is not namespaced or the request targets all namespaces). If this field is omitted or false then the namespaces field must contain a non-empty list.
rules.resourceRules.namespaces ([]string)
Set: unique values will be kept during a merge
namespaces is a list of target namespaces that restricts matches. A request that specifies a target namespace matches only if either (a) this list contains that target namespace or (b) this list contains "". Note that "" matches any specified namespace but does not match a request that does not specify a namespace (see the clusterScope field for that). This list may be empty, but only if clusterScope is true.
FlowSchemaStatus
FlowSchemaStatus represents the current state of a FlowSchema.
conditions ([]FlowSchemaCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
conditions is a list of the current states of FlowSchema.
FlowSchemaCondition describes conditions for a FlowSchema.
conditions.lastTransitionTime (Time)
lastTransitionTime is the last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
message is a human-readable message indicating details about last transition.
conditions.reason (string)
reason is a unique, one-word, CamelCase reason for the condition's last transition.
conditions.status (string)
status is the status of the condition. Can be True, False, Unknown. Required.
MaxLimitRequestRatio if specified, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value; this represents the max burst for the named resource.
scopeSelector is also a collection of filters like scopes that must match each object tracked by a quota but expressed using ScopeSelectorOperator in combination with possible values. For a resource to match, both scopes AND scopeSelector (if specified in spec), must be matched.
A scope selector represents the AND of the selectors represented by the scoped-resource selector requirements.
The name of the scope that the selector applies to.
scopeSelector.matchExpressions.values ([]string)
Atomic: will be replaced during a merge
An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.
scopes ([]string)
Atomic: will be replaced during a merge
A collection of filters that must match each object tracked by a quota. If not specified, the quota matches all objects.
ResourceQuotaStatus
ResourceQuotaStatus defines the enforced hard limits and observed use.
podSelector selects the pods to which this NetworkPolicy object applies. The array of ingress rules is applied to any pods selected by this field. Multiple network policies can select the same set of pods. In this case, the ingress rules for each are combined additively. This field is NOT optional and follows standard label selector semantics. An empty podSelector matches all pods in this namespace.
policyTypes ([]string)
Atomic: will be replaced during a merge
policyTypes is a list of rule types that the NetworkPolicy relates to. Valid options are ["Ingress"], ["Egress"], or ["Ingress", "Egress"]. If this field is not specified, it will default based on the existence of ingress or egress rules; policies that contain an egress section are assumed to affect egress, and all policies (whether or not they contain an ingress section) are assumed to affect ingress. If you want to write an egress-only policy, you must explicitly specify policyTypes [ "Egress" ]. Likewise, if you want to write a policy that specifies that no egress is allowed, you must specify a policyTypes value that include "Egress" (since such a policy would not include an egress section and would otherwise default to just [ "Ingress" ]). This field is beta-level in 1.8
ingress ([]NetworkPolicyIngressRule)
Atomic: will be replaced during a merge
ingress is a list of ingress rules to be applied to the selected pods. Traffic is allowed to a pod if there are no NetworkPolicies selecting the pod (and cluster policy otherwise allows the traffic), OR if the traffic source is the pod's local node, OR if the traffic matches at least one ingress rule across all of the NetworkPolicy objects whose podSelector matches the pod. If this field is empty then this NetworkPolicy does not allow any traffic (and serves solely to ensure that the pods it selects are isolated by default)
NetworkPolicyIngressRule describes a particular set of traffic that is allowed to the pods matched by a NetworkPolicySpec's podSelector. The traffic must match both ports and from.
ingress.from ([]NetworkPolicyPeer)
Atomic: will be replaced during a merge
from is a list of sources which should be able to access the pods selected for this rule. Items in this list are combined using a logical OR operation. If this field is empty or missing, this rule matches all sources (traffic not restricted by source). If this field is present and contains at least one item, this rule allows traffic only if the traffic matches at least one item in the from list.
NetworkPolicyPeer describes a peer to allow traffic to/from. Only certain combinations of fields are allowed
ingress.from.ipBlock (IPBlock)
ipBlock defines policy on a particular IPBlock. If this field is set then neither of the other fields can be.
IPBlock describes a particular CIDR (Ex. "192.168.1.0/24","2001:db8::/64") that is allowed to the pods matched by a NetworkPolicySpec's podSelector. The except entry describes CIDRs that should not be included within this rule.
ingress.from.ipBlock.cidr (string), required
cidr is a string representing the IPBlock Valid examples are "192.168.1.0/24" or "2001:db8::/64"
ingress.from.ipBlock.except ([]string)
Atomic: will be replaced during a merge
except is a slice of CIDRs that should not be included within an IPBlock Valid examples are "192.168.1.0/24" or "2001:db8::/64" Except values will be rejected if they are outside the cidr range
namespaceSelector selects namespaces using cluster-scoped labels. This field follows standard label selector semantics; if present but empty, it selects all namespaces.
If podSelector is also set, then the NetworkPolicyPeer as a whole selects the pods matching podSelector in the namespaces selected by namespaceSelector. Otherwise it selects all pods in the namespaces selected by namespaceSelector.
podSelector is a label selector which selects pods. This field follows standard label selector semantics; if present but empty, it selects all pods.
If namespaceSelector is also set, then the NetworkPolicyPeer as a whole selects the pods matching podSelector in the Namespaces selected by NamespaceSelector. Otherwise it selects the pods matching podSelector in the policy's own namespace.
ingress.ports ([]NetworkPolicyPort)
Atomic: will be replaced during a merge
ports is a list of ports which should be made accessible on the pods selected for this rule. Each item in this list is combined using a logical OR. If this field is empty or missing, this rule matches all ports (traffic not restricted by port). If this field is present and contains at least one item, then this rule allows traffic only if the traffic matches at least one port in the list.
NetworkPolicyPort describes a port to allow traffic on
ingress.ports.port (IntOrString)
port represents the port on the given protocol. This can either be a numerical or named port on a pod. If this field is not provided, this matches all port names and numbers. If present, only traffic on the specified protocol AND port will be matched.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
ingress.ports.endPort (int32)
endPort indicates that the range of ports from port to endPort if set, inclusive, should be allowed by the policy. This field cannot be defined if the port field is not defined or if the port field is defined as a named (string) port. The endPort must be equal or greater than port.
ingress.ports.protocol (string)
protocol represents the protocol (TCP, UDP, or SCTP) which traffic must match. If not specified, this field defaults to TCP.
egress ([]NetworkPolicyEgressRule)
Atomic: will be replaced during a merge
egress is a list of egress rules to be applied to the selected pods. Outgoing traffic is allowed if there are no NetworkPolicies selecting the pod (and cluster policy otherwise allows the traffic), OR if the traffic matches at least one egress rule across all of the NetworkPolicy objects whose podSelector matches the pod. If this field is empty then this NetworkPolicy limits all outgoing traffic (and serves solely to ensure that the pods it selects are isolated by default). This field is beta-level in 1.8
NetworkPolicyEgressRule describes a particular set of traffic that is allowed out of pods matched by a NetworkPolicySpec's podSelector. The traffic must match both ports and to. This type is beta-level in 1.8
egress.to ([]NetworkPolicyPeer)
Atomic: will be replaced during a merge
to is a list of destinations for outgoing traffic of pods selected for this rule. Items in this list are combined using a logical OR operation. If this field is empty or missing, this rule matches all destinations (traffic not restricted by destination). If this field is present and contains at least one item, this rule allows traffic only if the traffic matches at least one item in the to list.
NetworkPolicyPeer describes a peer to allow traffic to/from. Only certain combinations of fields are allowed
egress.to.ipBlock (IPBlock)
ipBlock defines policy on a particular IPBlock. If this field is set then neither of the other fields can be.
IPBlock describes a particular CIDR (Ex. "192.168.1.0/24","2001:db8::/64") that is allowed to the pods matched by a NetworkPolicySpec's podSelector. The except entry describes CIDRs that should not be included within this rule.
egress.to.ipBlock.cidr (string), required
cidr is a string representing the IPBlock Valid examples are "192.168.1.0/24" or "2001:db8::/64"
egress.to.ipBlock.except ([]string)
Atomic: will be replaced during a merge
except is a slice of CIDRs that should not be included within an IPBlock Valid examples are "192.168.1.0/24" or "2001:db8::/64" Except values will be rejected if they are outside the cidr range
namespaceSelector selects namespaces using cluster-scoped labels. This field follows standard label selector semantics; if present but empty, it selects all namespaces.
If podSelector is also set, then the NetworkPolicyPeer as a whole selects the pods matching podSelector in the namespaces selected by namespaceSelector. Otherwise it selects all pods in the namespaces selected by namespaceSelector.
podSelector is a label selector which selects pods. This field follows standard label selector semantics; if present but empty, it selects all pods.
If namespaceSelector is also set, then the NetworkPolicyPeer as a whole selects the pods matching podSelector in the Namespaces selected by NamespaceSelector. Otherwise it selects the pods matching podSelector in the policy's own namespace.
egress.ports ([]NetworkPolicyPort)
Atomic: will be replaced during a merge
ports is a list of destination ports for outgoing traffic. Each item in this list is combined using a logical OR. If this field is empty or missing, this rule matches all ports (traffic not restricted by port). If this field is present and contains at least one item, then this rule allows traffic only if the traffic matches at least one port in the list.
NetworkPolicyPort describes a port to allow traffic on
egress.ports.port (IntOrString)
port represents the port on the given protocol. This can either be a numerical or named port on a pod. If this field is not provided, this matches all port names and numbers. If present, only traffic on the specified protocol AND port will be matched.
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
egress.ports.endPort (int32)
endPort indicates that the range of ports from port to endPort if set, inclusive, should be allowed by the policy. This field cannot be defined if the port field is not defined or if the port field is defined as a named (string) port. The endPort must be equal or greater than port.
egress.ports.protocol (string)
protocol represents the protocol (TCP, UDP, or SCTP) which traffic must match. If not specified, this field defaults to TCP.
NetworkPolicyList
NetworkPolicyList is a list of NetworkPolicy objects.
Most recently observed status of the PodDisruptionBudget.
PodDisruptionBudgetSpec
PodDisruptionBudgetSpec is a description of a PodDisruptionBudget.
maxUnavailable (IntOrString)
An eviction is allowed if at most "maxUnavailable" pods selected by "selector" are unavailable after the eviction, i.e. even in absence of the evicted pod. For example, one can prevent all voluntary evictions by specifying 0. This is a mutually exclusive setting with "minAvailable".
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
minAvailable (IntOrString)
An eviction is allowed if at least "minAvailable" pods selected by "selector" will still be available after the eviction, i.e. even in the absence of the evicted pod. So for example you can prevent all voluntary evictions by specifying "100%".
IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.
Label query over pods whose evictions are managed by the disruption budget. A null selector will match no pods, while an empty ({}) selector will select all pods within the namespace.
unhealthyPodEvictionPolicy (string)
UnhealthyPodEvictionPolicy defines the criteria for when unhealthy pods should be considered for eviction. Current implementation considers healthy pods, as pods that have status.conditions item with type="Ready",status="True".
Valid policies are IfHealthyBudget and AlwaysAllow. If no policy is specified, the default behavior will be used, which corresponds to the IfHealthyBudget policy.
IfHealthyBudget policy means that running pods (status.phase="Running"), but not yet healthy can be evicted only if the guarded application is not disrupted (status.currentHealthy is at least equal to status.desiredHealthy). Healthy pods will be subject to the PDB for eviction.
AlwaysAllow policy means that all running pods (status.phase="Running"), but not yet healthy are considered disrupted and can be evicted regardless of whether the criteria in a PDB is met. This means perspective running pods of a disrupted application might not get a chance to become healthy. Healthy pods will be subject to the PDB for eviction.
Additional policies may be added in the future. Clients making eviction decisions should disallow eviction of unhealthy pods if they encounter an unrecognized policy in this field.
PodDisruptionBudgetStatus
PodDisruptionBudgetStatus represents information about the status of a PodDisruptionBudget. Status may trail the actual state of a system.
currentHealthy (int32), required
current number of healthy pods
desiredHealthy (int32), required
minimum desired number of healthy pods
disruptionsAllowed (int32), required
Number of pod disruptions that are currently allowed.
expectedPods (int32), required
total number of pods counted by this disruption budget
conditions ([]Condition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Conditions contain conditions for PDB. The disruption controller sets the DisruptionAllowed condition. The following are known values for the reason field (additional reasons could be added in the future): - SyncFailed: The controller encountered an error and wasn't able to compute
the number of allowed disruptions. Therefore no disruptions are
allowed and the status of the condition will be False.
InsufficientPods: The number of pods are either at or below the number
required by the PodDisruptionBudget. No disruptions are
allowed and the status of the condition will be False.
SufficientPods: There are more pods than required by the PodDisruptionBudget.
The condition will be True, and the number of allowed
disruptions are provided by the disruptionsAllowed property.
Condition contains details for one aspect of the current state of this API Resource.
conditions.lastTransitionTime (Time), required
lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string), required
message is a human readable message indicating details about the transition. This may be an empty string.
conditions.reason (string), required
reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
conditions.status (string), required
status of the condition, one of True, False, Unknown.
conditions.type (string), required
type of condition in CamelCase or in foo.example.com/CamelCase.
conditions.observedGeneration (int64)
observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.
disruptedPods (map[string]Time)
DisruptedPods contains information about pods whose eviction was processed by the API server eviction subresource handler but has not yet been observed by the PodDisruptionBudget controller. A pod will be in this map from the time when the API server processed the eviction request to the time when the pod is seen by PDB controller as having been marked for deletion (or after a timeout). The key in the map is the name of the pod and the value is the time when the API server processed the eviction request. If the deletion didn't occur and a pod is still there it will be removed from the list automatically by PodDisruptionBudget controller after some time. If everything goes smooth this map should be empty for the most of the time. Large number of entries in the map may indicate problems with pod deletions.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
observedGeneration (int64)
Most recent generation observed when updating this PDB status. DisruptionsAllowed and other status information is valid only if observedGeneration equals to PDB's object generation.
PodDisruptionBudgetList
PodDisruptionBudgetList is a collection of PodDisruptionBudgets.
PriorityLevelConfigurationSpec specifies the configuration of a priority level.
exempt (ExemptPriorityLevelConfiguration)
exempt specifies how requests are handled for an exempt priority level. This field MUST be empty if type is "Limited". This field MAY be non-empty if type is "Exempt". If empty and type is "Exempt" then the default values for ExemptPriorityLevelConfiguration apply.
ExemptPriorityLevelConfiguration describes the configurable aspects of the handling of exempt requests. In the mandatory exempt configuration object the values in the fields here can be modified by authorized users, unlike the rest of the spec.
exempt.lendablePercent (int32)
lendablePercent prescribes the fraction of the level's NominalCL that can be borrowed by other priority levels. This value of this field must be between 0 and 100, inclusive, and it defaults to 0. The number of seats that other levels can borrow from this level, known as this level's LendableConcurrencyLimit (LendableCL), is defined as follows.
nominalConcurrencyShares (NCS) contributes to the computation of the NominalConcurrencyLimit (NominalCL) of this level. This is the number of execution seats nominally reserved for this priority level. This DOES NOT limit the dispatching from this priority level but affects the other priority levels through the borrowing mechanism. The server's concurrency limit (ServerCL) is divided among all the priority levels in proportion to their NCS values:
Bigger numbers mean a larger nominal concurrency limit, at the expense of every other priority level. This field has a default value of zero.
limited (LimitedPriorityLevelConfiguration)
limited specifies how requests are handled for a Limited priority level. This field must be non-empty if and only if type is "Limited".
*LimitedPriorityLevelConfiguration specifies how to handle requests that are subject to limits. It addresses two issues:
How are requests for this priority level limited?
What should be done with requests that exceed the limit?*
limited.borrowingLimitPercent (int32)
borrowingLimitPercent, if present, configures a limit on how many seats this priority level can borrow from other priority levels. The limit is known as this level's BorrowingConcurrencyLimit (BorrowingCL) and is a limit on the total number of seats that this level may borrow at any one time. This field holds the ratio of that limit to the level's nominal concurrency limit. When this field is non-nil, it must hold a non-negative integer and the limit is calculated as follows.
The value of this field can be more than 100, implying that this priority level can borrow a number of seats that is greater than its own nominal concurrency limit (NominalCL). When this field is left nil, the limit is effectively infinite.
limited.lendablePercent (int32)
lendablePercent prescribes the fraction of the level's NominalCL that can be borrowed by other priority levels. The value of this field must be between 0 and 100, inclusive, and it defaults to 0. The number of seats that other levels can borrow from this level, known as this level's LendableConcurrencyLimit (LendableCL), is defined as follows.
limitResponse indicates what to do with requests that can not be executed right now
LimitResponse defines how to handle requests that can not be executed right now.
limited.limitResponse.type (string), required
type is "Queue" or "Reject". "Queue" means that requests that can not be executed upon arrival are held in a queue until they can be executed or a queuing limit is reached. "Reject" means that requests that can not be executed upon arrival are rejected. Required.
queuing holds the configuration parameters for queuing. This field may be non-empty only if type is "Queue".
QueuingConfiguration holds the configuration parameters for queuing
limited.limitResponse.queuing.handSize (int32)
handSize is a small positive number that configures the shuffle sharding of requests into queues. When enqueuing a request at this priority level the request's flow identifier (a string pair) is hashed and the hash value is used to shuffle the list of queues and deal a hand of the size specified here. The request is put into one of the shortest queues in that hand. handSize must be no larger than queues, and should be significantly smaller (so that a few heavy flows do not saturate most of the queues). See the user-facing documentation for more extensive guidance on setting this field. This field has a default value of 8.
queueLengthLimit is the maximum number of requests allowed to be waiting in a given queue of this priority level at a time; excess requests are rejected. This value must be positive. If not specified, it will be defaulted to 50.
limited.limitResponse.queuing.queues (int32)
queues is the number of queues for this priority level. The queues exist independently at each apiserver. The value must be positive. Setting it to 1 effectively precludes shufflesharding and thus makes the distinguisher method of associated flow schemas irrelevant. This field has a default value of 64.
limited.nominalConcurrencyShares (int32)
nominalConcurrencyShares (NCS) contributes to the computation of the NominalConcurrencyLimit (NominalCL) of this level. This is the number of execution seats available at this priority level. This is used both for requests dispatched from this priority level as well as requests dispatched from other priority levels borrowing seats from this level. The server's concurrency limit (ServerCL) is divided among the Limited priority levels in proportion to their NCS values:
Bigger numbers mean a larger nominal concurrency limit, at the expense of every other priority level.
If not specified, this field defaults to a value of 30.
Setting this field to zero supports the construction of a "jail" for this priority level that is used to hold some request(s)
type (string), required
type indicates whether this priority level is subject to limitation on request execution. A value of "Exempt" means that requests of this priority level are not subject to a limit (and thus are never queued) and do not detract from the capacity made available to other priority levels. A value of "Limited" means that (a) requests of this priority level are subject to limits and (b) some of the server's limited capacity is made available exclusively to this priority level. Required.
PriorityLevelConfigurationStatus
PriorityLevelConfigurationStatus represents the current state of a "request-priority".
Map: unique values on key type will be kept during a merge
conditions is the current state of "request-priority".
PriorityLevelConfigurationCondition defines the condition of priority level.
conditions.lastTransitionTime (Time)
lastTransitionTime is the last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
message is a human-readable message indicating details about last transition.
conditions.reason (string)
reason is a unique, one-word, CamelCase reason for the condition's last transition.
conditions.status (string)
status is the status of the condition. Can be True, False, Unknown. Required.
conditions.type (string)
type is the type of the condition. Required.
PriorityLevelConfigurationList
PriorityLevelConfigurationList is a list of PriorityLevelConfiguration objects.
Specification of the desired behavior of the ValidatingAdmissionPolicy.
ValidatingAdmissionPolicySpec is the specification of the desired behavior of the AdmissionPolicy.
spec.auditAnnotations ([]AuditAnnotation)
Atomic: will be replaced during a merge
auditAnnotations contains CEL expressions which are used to produce audit annotations for the audit event of the API request. validations and auditAnnotations may not both be empty; a least one of validations or auditAnnotations is required.
AuditAnnotation describes how to produce an audit annotation for an API request.
spec.auditAnnotations.key (string), required
key specifies the audit annotation key. The audit annotation keys of a ValidatingAdmissionPolicy must be unique. The key must be a qualified name ([A-Za-z0-9][-A-Za-z0-9_.]*) no more than 63 bytes in length.
The key is combined with the resource name of the ValidatingAdmissionPolicy to construct an audit annotation key: "{ValidatingAdmissionPolicy name}/{key}".
If an admission webhook uses the same resource name as this ValidatingAdmissionPolicy and the same audit annotation key, the annotation key will be identical. In this case, the first annotation written with the key will be included in the audit event and all subsequent annotations with the same key will be discarded.
valueExpression represents the expression which is evaluated by CEL to produce an audit annotation value. The expression must evaluate to either a string or null value. If the expression evaluates to a string, the audit annotation is included with the string value. If the expression evaluates to null or empty string the audit annotation will be omitted. The valueExpression may be no longer than 5kb in length. If the result of the valueExpression is more than 10kb in length, it will be truncated to 10kb.
If multiple ValidatingAdmissionPolicyBinding resources match an API request, then the valueExpression will be evaluated for each binding. All unique values produced by the valueExpressions will be joined together in a comma-separated list.
Required.
spec.failurePolicy (string)
failurePolicy defines how to handle failures for the admission policy. Failures can occur from CEL expression parse errors, type check errors, runtime errors and invalid or mis-configured policy definitions or bindings.
A policy is invalid if spec.paramKind refers to a non-existent Kind. A binding is invalid if spec.paramRef.name refers to a non-existent resource.
failurePolicy does not define how validations that evaluate to false are handled.
When failurePolicy is set to Fail, ValidatingAdmissionPolicyBinding validationActions define how failures are enforced.
Allowed values are Ignore or Fail. Defaults to Fail.
spec.matchConditions ([]MatchCondition)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
MatchConditions is a list of conditions that must be met for a request to be validated. Match conditions filter requests that have already been matched by the rules, namespaceSelector, and objectSelector. An empty list of matchConditions matches all requests. There are a maximum of 64 match conditions allowed.
If a parameter object is provided, it can be accessed via the params handle in the same manner as validation expressions.
The exact matching logic is (in order):
If ANY matchCondition evaluates to FALSE, the policy is skipped.
If ALL matchConditions evaluate to TRUE, the policy is evaluated.
If any matchCondition evaluates to an error (but none are FALSE):
If failurePolicy=Fail, reject the request
If failurePolicy=Ignore, the policy is skipped
MatchCondition represents a condition which must by fulfilled for a request to be sent to a webhook.
Expression represents the expression which will be evaluated by CEL. Must evaluate to bool. CEL expressions have access to the contents of the AdmissionRequest and Authorizer, organized into CEL variables:
'object' - The object from the incoming request. The value is null for DELETE requests. 'oldObject' - The existing object. The value is null for CREATE requests. 'request' - Attributes of the admission request(/pkg/apis/admission/types.go#AdmissionRequest). 'authorizer' - A CEL Authorizer. May be used to perform authorization checks for the principal (user or service account) of the request.
See https://pkg.go.dev/k8s.io/apiserver/pkg/cel/library#Authz
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/
Required.
spec.matchConditions.name (string), required
Name is an identifier for this match condition, used for strategic merging of MatchConditions, as well as providing an identifier for logging purposes. A good name should be descriptive of the associated expression. Name must be a qualified name consisting of alphanumeric characters, '-', '' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')
Required.
spec.matchConstraints (MatchResources)
MatchConstraints specifies what resources this policy is designed to validate. The AdmissionPolicy cares about a request if it matches all Constraints. However, in order to prevent clusters from being put into an unstable state that cannot be recovered from via the API ValidatingAdmissionPolicy cannot match ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding. Required.
MatchResources decides whether to run the admission control policy on an object based on whether it meets the match criteria. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
ExcludeResourceRules describes what operations on what resources/subresources the ValidatingAdmissionPolicy should not care about. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.matchConstraints.matchPolicy (string)
matchPolicy defines how the "MatchResources" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would not be sent to the ValidatingAdmissionPolicy.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would be converted to apps/v1 and sent to the ValidatingAdmissionPolicy.
NamespaceSelector decides whether to run the admission control policy on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the policy.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the policy on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the validation based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the cel validation, and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
ResourceRules describes what operations on what resources/subresources the ValidatingAdmissionPolicy matches. The policy cares about an operation if it matches any Rule.
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.paramKind (ParamKind)
ParamKind specifies the kind of resources used to parameterize this policy. If absent, there are no parameters for this policy and the param CEL variable will not be provided to validation expressions. If ParamKind refers to a non-existent kind, this policy definition is mis-configured and the FailurePolicy is applied. If paramKind is specified but paramRef is unset in ValidatingAdmissionPolicyBinding, the params variable will be null.
ParamKind is a tuple of Group Kind and Version.
spec.paramKind.apiVersion (string)
APIVersion is the API group version the resources belong to. In format of "group/version". Required.
spec.paramKind.kind (string)
Kind is the API kind the resources belong to. Required.
spec.validations ([]Validation)
Atomic: will be replaced during a merge
Validations contain CEL expressions which is used to apply the validation. Validations and AuditAnnotations may not both be empty; a minimum of one Validations or AuditAnnotations is required.
Validation specifies the CEL expression which is used to apply the validation.
spec.validations.expression (string), required
Expression represents the expression which will be evaluated by CEL. ref: https://github.com/google/cel-spec CEL expressions have access to the contents of the API request/response, organized into CEL variables as well as some other useful variables:
'object' - The object from the incoming request. The value is null for DELETE requests. - 'oldObject' - The existing object. The value is null for CREATE requests. - 'request' - Attributes of the API request(ref). - 'params' - Parameter resource referred to by the policy binding being evaluated. Only populated if the policy has a ParamKind. - 'namespaceObject' - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources. - 'variables' - Map of composited variables, from its name to its lazily evaluated value.
For example, a variable named 'foo' can be accessed as 'variables.foo'.
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
The apiVersion, kind, metadata.name and metadata.generateName are always accessible from the root of the object. No other metadata properties are accessible.
Only property names of the form [a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible. Accessible property names are escaped according to the following rules when accessed in the expression: - '' escapes to 'underscores' - '.' escapes to 'dot' - '-' escapes to 'dash' - '/' escapes to 'slash' - Property names that exactly match a CEL RESERVED keyword escape to '{keyword}__'. The keywords are:
"true", "false", "null", "in", "as", "break", "const", "continue", "else", "for", "function", "if",
"import", "let", "loop", "package", "namespace", "return".
Examples:
Expression accessing a property named "namespace": {"Expression": "object.namespace > 0"}
Expression accessing a property named "x-prop": {"Expression": "object.x__dash__prop > 0"}
Expression accessing a property named "redact__d": {"Expression": "object.redact__underscores__d > 0"}
Equality on arrays with list type of 'set' or 'map' ignores element order, i.e. [1, 2] == [2, 1]. Concatenation on arrays with x-kubernetes-list-type use the semantics of the list type:
'set': X + Y performs a union where the array positions of all elements in X are preserved and
non-intersecting elements in Y are appended, retaining their partial order.
'map': X + Y performs a merge where the array positions of all keys in X are preserved but the values
are overwritten by values in Y when the key sets of X and Y intersect. Elements in Y with
non-intersecting keys are appended, retaining their partial order.
Required.
spec.validations.message (string)
Message represents the message displayed when validation fails. The message is required if the Expression contains line breaks. The message must not contain line breaks. If unset, the message is "failed rule: {Rule}". e.g. "must be a URL with the host matching spec.host" If the Expression contains line breaks. Message is required. The message must not contain line breaks. If unset, the message is "failed Expression: {Expression}".
spec.validations.messageExpression (string)
messageExpression declares a CEL expression that evaluates to the validation failure message that is returned when this rule fails. Since messageExpression is used as a failure message, it must evaluate to a string. If both message and messageExpression are present on a validation, then messageExpression will be used if validation fails. If messageExpression results in a runtime error, the runtime error is logged, and the validation failure message is produced as if the messageExpression field were unset. If messageExpression evaluates to an empty string, a string with only spaces, or a string that contains line breaks, then the validation failure message will also be produced as if the messageExpression field were unset, and the fact that messageExpression produced an empty string/string with only spaces/string with line breaks will be logged. messageExpression has access to all the same variables as the expression except for 'authorizer' and 'authorizer.requestResource'. Example: "object.x must be less than max ("+string(params.max)+")"
spec.validations.reason (string)
Reason represents a machine-readable description of why this validation failed. If this is the first validation in the list to fail, this reason, as well as the corresponding HTTP response code, are used in the HTTP response to the client. The currently supported reasons are: "Unauthorized", "Forbidden", "Invalid", "RequestEntityTooLarge". If not set, StatusReasonInvalid is used in the response to the client.
spec.variables ([]Variable)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
Variables contain definitions of variables that can be used in composition of other expressions. Each variable is defined as a named CEL expression. The variables defined here will be available under variables in other expressions of the policy except MatchConditions because MatchConditions are evaluated before the rest of the policy.
The expression of a variable can refer to other variables defined earlier in the list but not those after. Thus, Variables must be sorted by the order of first appearance and acyclic.
Variable is the definition of a variable that is used for composition. A variable is defined as a named expression.
spec.variables.expression (string), required
Expression is the expression that will be evaluated as the value of the variable. The CEL expression has access to the same identifiers as the CEL expressions in Validation.
spec.variables.name (string), required
Name is the name of the variable. The name must be a valid CEL identifier and unique among all variables. The variable can be accessed in other expressions through variables For example, if name is "foo", the variable will be available as variables.foo
status (ValidatingAdmissionPolicyStatus)
The status of the ValidatingAdmissionPolicy, including warnings that are useful to determine if the policy behaves in the expected way. Populated by the system. Read-only.
ValidatingAdmissionPolicyStatus represents the status of an admission validation policy.
status.conditions ([]Condition)
Map: unique values on key type will be kept during a merge
The conditions represent the latest available observations of a policy's current state.
Condition contains details for one aspect of the current state of this API Resource.
lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
status.conditions.message (string), required
message is a human readable message indicating details about the transition. This may be an empty string.
status.conditions.reason (string), required
reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
status.conditions.status (string), required
status of the condition, one of True, False, Unknown.
status.conditions.type (string), required
type of condition in CamelCase or in foo.example.com/CamelCase.
status.conditions.observedGeneration (int64)
observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.
status.observedGeneration (int64)
The generation observed by the controller.
status.typeChecking (TypeChecking)
The results of type checking for each expression. Presence of this field indicates the completion of the type checking.
TypeChecking contains results of type checking the expressions in the ValidatingAdmissionPolicy
The path to the field that refers the expression. For example, the reference to the expression of the first item of validations is "spec.validations[0].expression"
The content of type checking information in a human-readable form. Each line of the warning contains the type that the expression is checked against, followed by the type check error from the compiler.
ValidatingAdmissionPolicyList
ValidatingAdmissionPolicyList is a list of ValidatingAdmissionPolicy.
ValidatingAdmissionPolicyBinding binds the ValidatingAdmissionPolicy with paramerized resources. ValidatingAdmissionPolicyBinding and parameter CRDs together define how cluster administrators configure policies for clusters.
For a given admission request, each binding will cause its policy to be evaluated N times, where N is 1 for policies/bindings that don't use params, otherwise N is the number of parameters selected by the binding.
The CEL expressions of a policy must have a computed CEL cost below the maximum CEL budget. Each evaluation of the policy is given an independent CEL cost budget. Adding/removing policies, bindings, or params can not affect whether a given (policy, binding, param) combination is within its own CEL budget.
Specification of the desired behavior of the ValidatingAdmissionPolicyBinding.
ValidatingAdmissionPolicyBindingSpec is the specification of the ValidatingAdmissionPolicyBinding.
spec.matchResources (MatchResources)
MatchResources declares what resources match this binding and will be validated by it. Note that this is intersected with the policy's matchConstraints, so only requests that are matched by the policy can be selected by this. If this is unset, all resources matched by the policy are validated by this binding When resourceRules is unset, it does not constrain resource matching. If a resource is matched by the other fields of this object, it will be validated. Note that this is differs from ValidatingAdmissionPolicy matchConstraints, where resourceRules are required.
MatchResources decides whether to run the admission control policy on an object based on whether it meets the match criteria. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
ExcludeResourceRules describes what operations on what resources/subresources the ValidatingAdmissionPolicy should not care about. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.matchResources.matchPolicy (string)
matchPolicy defines how the "MatchResources" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would not be sent to the ValidatingAdmissionPolicy.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would be converted to apps/v1 and sent to the ValidatingAdmissionPolicy.
NamespaceSelector decides whether to run the admission control policy on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the policy.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the policy on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the validation based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the cel validation, and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
ResourceRules describes what operations on what resources/subresources the ValidatingAdmissionPolicy matches. The policy cares about an operation if it matches any Rule.
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
spec.matchResources.resourceRules.scope (string)
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.paramRef (ParamRef)
paramRef specifies the parameter resource used to configure the admission control policy. It should point to a resource of the type specified in ParamKind of the bound ValidatingAdmissionPolicy. If the policy specifies a ParamKind and the resource referred to by ParamRef does not exist, this binding is considered mis-configured and the FailurePolicy of the ValidatingAdmissionPolicy applied. If the policy does not specify a ParamKind then this field is ignored, and the rules are evaluated without a param.
ParamRef describes how to locate the params to be used as input to expressions of rules applied by a policy binding.
spec.paramRef.name (string)
name is the name of the resource being referenced.
One of name or selector must be set, but name and selector are mutually exclusive properties. If one is set, the other must be unset.
A single parameter used for all admission requests can be configured by setting the name field, leaving selector blank, and setting namespace if paramKind is namespace-scoped.
spec.paramRef.namespace (string)
namespace is the namespace of the referenced resource. Allows limiting the search for params to a specific namespace. Applies to both name and selector fields.
A per-namespace parameter may be used by specifying a namespace-scoped paramKind in the policy and leaving this field empty.
If paramKind is cluster-scoped, this field MUST be unset. Setting this field results in a configuration error.
If paramKind is namespace-scoped, the namespace of the object being evaluated for admission will be used when this field is left unset. Take care that if this is left empty the binding must not match any cluster-scoped resources, which will result in an error.
spec.paramRef.parameterNotFoundAction (string)
parameterNotFoundAction controls the behavior of the binding when the resource exists, and name or selector is valid, but there are no parameters matched by the binding. If the value is set to Allow, then no matched parameters will be treated as successful validation by the binding. If set to Deny, then no matched parameters will be subject to the failurePolicy of the policy.
selector can be used to match multiple param objects based on their labels. Supply selector: {} to match all resources of the ParamKind.
If multiple params are found, they are all evaluated with the policy expressions and the results are ANDed together.
One of name or selector must be set, but name and selector are mutually exclusive properties. If one is set, the other must be unset.
spec.policyName (string)
PolicyName references a ValidatingAdmissionPolicy name which the ValidatingAdmissionPolicyBinding binds to. If the referenced resource does not exist, this binding is considered invalid and will be ignored Required.
spec.validationActions ([]string)
Set: unique values will be kept during a merge
validationActions declares how Validations of the referenced ValidatingAdmissionPolicy are enforced. If a validation evaluates to false it is always enforced according to these actions.
Failures defined by the ValidatingAdmissionPolicy's FailurePolicy are enforced according to these actions only if the FailurePolicy is set to Fail, otherwise the failures are ignored. This includes compilation errors, runtime errors and misconfigurations of the policy.
validationActions is declared as a set of action values. Order does not matter. validationActions may not contain duplicates of the same action.
The supported actions values are:
"Deny" specifies that a validation failure results in a denied request.
"Warn" specifies that a validation failure is reported to the request client in HTTP Warning headers, with a warning code of 299. Warnings can be sent both for allowed or denied admission responses.
"Audit" specifies that a validation failure is included in the published audit event for the request. The audit event will contain a validation.policy.admission.k8s.io/validation_failure audit annotation with a value containing the details of the validation failures, formatted as a JSON list of objects, each with the following fields: - message: The validation failure message string - policy: The resource name of the ValidatingAdmissionPolicy - binding: The resource name of the ValidatingAdmissionPolicyBinding - expressionIndex: The index of the failed validations in the ValidatingAdmissionPolicy - validationActions: The enforcement actions enacted for the validation failure Example audit annotation: "validation.policy.admission.k8s.io/validation_failure": "[{\"message\": \"Invalid value\", {\"policy\": \"policy.example.com\", {\"binding\": \"policybinding.example.com\", {\"expressionIndex\": \"1\", {\"validationActions\": [\"Audit\"]}]"
Clients should expect to handle additional values by ignoring any values not recognized.
"Deny" and "Warn" may not be used together since this combination needlessly duplicates the validation failure both in the API response body and the HTTP warning headers.
Required.
Operations
get read the specified ValidatingAdmissionPolicy
HTTP Request
GET /apis/admissionregistration.k8s.io/v1/validatingadmissionpolicies/{name}
ValidatingAdmissionPolicyBinding binds the ValidatingAdmissionPolicy with paramerized resources.
apiVersion: admissionregistration.k8s.io/v1
import "k8s.io/api/admissionregistration/v1"
ValidatingAdmissionPolicyBinding
ValidatingAdmissionPolicyBinding binds the ValidatingAdmissionPolicy with paramerized resources. ValidatingAdmissionPolicyBinding and parameter CRDs together define how cluster administrators configure policies for clusters.
For a given admission request, each binding will cause its policy to be evaluated N times, where N is 1 for policies/bindings that don't use params, otherwise N is the number of parameters selected by the binding.
The CEL expressions of a policy must have a computed CEL cost below the maximum CEL budget. Each evaluation of the policy is given an independent CEL cost budget. Adding/removing policies, bindings, or params can not affect whether a given (policy, binding, param) combination is within its own CEL budget.
Specification of the desired behavior of the ValidatingAdmissionPolicyBinding.
ValidatingAdmissionPolicyBindingSpec is the specification of the ValidatingAdmissionPolicyBinding.
spec.matchResources (MatchResources)
MatchResources declares what resources match this binding and will be validated by it. Note that this is intersected with the policy's matchConstraints, so only requests that are matched by the policy can be selected by this. If this is unset, all resources matched by the policy are validated by this binding When resourceRules is unset, it does not constrain resource matching. If a resource is matched by the other fields of this object, it will be validated. Note that this is differs from ValidatingAdmissionPolicy matchConstraints, where resourceRules are required.
MatchResources decides whether to run the admission control policy on an object based on whether it meets the match criteria. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
ExcludeResourceRules describes what operations on what resources/subresources the ValidatingAdmissionPolicy should not care about. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.matchResources.matchPolicy (string)
matchPolicy defines how the "MatchResources" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would not be sent to the ValidatingAdmissionPolicy.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would be converted to apps/v1 and sent to the ValidatingAdmissionPolicy.
NamespaceSelector decides whether to run the admission control policy on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the policy.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the policy on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the validation based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the cel validation, and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
ResourceRules describes what operations on what resources/subresources the ValidatingAdmissionPolicy matches. The policy cares about an operation if it matches any Rule.
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
spec.matchResources.resourceRules.scope (string)
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.paramRef (ParamRef)
paramRef specifies the parameter resource used to configure the admission control policy. It should point to a resource of the type specified in ParamKind of the bound ValidatingAdmissionPolicy. If the policy specifies a ParamKind and the resource referred to by ParamRef does not exist, this binding is considered mis-configured and the FailurePolicy of the ValidatingAdmissionPolicy applied. If the policy does not specify a ParamKind then this field is ignored, and the rules are evaluated without a param.
ParamRef describes how to locate the params to be used as input to expressions of rules applied by a policy binding.
spec.paramRef.name (string)
name is the name of the resource being referenced.
One of name or selector must be set, but name and selector are mutually exclusive properties. If one is set, the other must be unset.
A single parameter used for all admission requests can be configured by setting the name field, leaving selector blank, and setting namespace if paramKind is namespace-scoped.
spec.paramRef.namespace (string)
namespace is the namespace of the referenced resource. Allows limiting the search for params to a specific namespace. Applies to both name and selector fields.
A per-namespace parameter may be used by specifying a namespace-scoped paramKind in the policy and leaving this field empty.
If paramKind is cluster-scoped, this field MUST be unset. Setting this field results in a configuration error.
If paramKind is namespace-scoped, the namespace of the object being evaluated for admission will be used when this field is left unset. Take care that if this is left empty the binding must not match any cluster-scoped resources, which will result in an error.
spec.paramRef.parameterNotFoundAction (string)
parameterNotFoundAction controls the behavior of the binding when the resource exists, and name or selector is valid, but there are no parameters matched by the binding. If the value is set to Allow, then no matched parameters will be treated as successful validation by the binding. If set to Deny, then no matched parameters will be subject to the failurePolicy of the policy.
selector can be used to match multiple param objects based on their labels. Supply selector: {} to match all resources of the ParamKind.
If multiple params are found, they are all evaluated with the policy expressions and the results are ANDed together.
One of name or selector must be set, but name and selector are mutually exclusive properties. If one is set, the other must be unset.
spec.policyName (string)
PolicyName references a ValidatingAdmissionPolicy name which the ValidatingAdmissionPolicyBinding binds to. If the referenced resource does not exist, this binding is considered invalid and will be ignored Required.
spec.validationActions ([]string)
Set: unique values will be kept during a merge
validationActions declares how Validations of the referenced ValidatingAdmissionPolicy are enforced. If a validation evaluates to false it is always enforced according to these actions.
Failures defined by the ValidatingAdmissionPolicy's FailurePolicy are enforced according to these actions only if the FailurePolicy is set to Fail, otherwise the failures are ignored. This includes compilation errors, runtime errors and misconfigurations of the policy.
validationActions is declared as a set of action values. Order does not matter. validationActions may not contain duplicates of the same action.
The supported actions values are:
"Deny" specifies that a validation failure results in a denied request.
"Warn" specifies that a validation failure is reported to the request client in HTTP Warning headers, with a warning code of 299. Warnings can be sent both for allowed or denied admission responses.
"Audit" specifies that a validation failure is included in the published audit event for the request. The audit event will contain a validation.policy.admission.k8s.io/validation_failure audit annotation with a value containing the details of the validation failures, formatted as a JSON list of objects, each with the following fields: - message: The validation failure message string - policy: The resource name of the ValidatingAdmissionPolicy - binding: The resource name of the ValidatingAdmissionPolicyBinding - expressionIndex: The index of the failed validations in the ValidatingAdmissionPolicy - validationActions: The enforcement actions enacted for the validation failure Example audit annotation: "validation.policy.admission.k8s.io/validation_failure": "[{\"message\": \"Invalid value\", {\"policy\": \"policy.example.com\", {\"binding\": \"policybinding.example.com\", {\"expressionIndex\": \"1\", {\"validationActions\": [\"Audit\"]}]"
Clients should expect to handle additional values by ignoring any values not recognized.
"Deny" and "Warn" may not be used together since this combination needlessly duplicates the validation failure both in the API response body and the HTTP warning headers.
Required.
ValidatingAdmissionPolicy
ValidatingAdmissionPolicy describes the definition of an admission validation policy that accepts or rejects an object without changing it.
Specification of the desired behavior of the ValidatingAdmissionPolicy.
ValidatingAdmissionPolicySpec is the specification of the desired behavior of the AdmissionPolicy.
spec.auditAnnotations ([]AuditAnnotation)
Atomic: will be replaced during a merge
auditAnnotations contains CEL expressions which are used to produce audit annotations for the audit event of the API request. validations and auditAnnotations may not both be empty; a least one of validations or auditAnnotations is required.
AuditAnnotation describes how to produce an audit annotation for an API request.
spec.auditAnnotations.key (string), required
key specifies the audit annotation key. The audit annotation keys of a ValidatingAdmissionPolicy must be unique. The key must be a qualified name ([A-Za-z0-9][-A-Za-z0-9_.]*) no more than 63 bytes in length.
The key is combined with the resource name of the ValidatingAdmissionPolicy to construct an audit annotation key: "{ValidatingAdmissionPolicy name}/{key}".
If an admission webhook uses the same resource name as this ValidatingAdmissionPolicy and the same audit annotation key, the annotation key will be identical. In this case, the first annotation written with the key will be included in the audit event and all subsequent annotations with the same key will be discarded.
valueExpression represents the expression which is evaluated by CEL to produce an audit annotation value. The expression must evaluate to either a string or null value. If the expression evaluates to a string, the audit annotation is included with the string value. If the expression evaluates to null or empty string the audit annotation will be omitted. The valueExpression may be no longer than 5kb in length. If the result of the valueExpression is more than 10kb in length, it will be truncated to 10kb.
If multiple ValidatingAdmissionPolicyBinding resources match an API request, then the valueExpression will be evaluated for each binding. All unique values produced by the valueExpressions will be joined together in a comma-separated list.
Required.
spec.failurePolicy (string)
failurePolicy defines how to handle failures for the admission policy. Failures can occur from CEL expression parse errors, type check errors, runtime errors and invalid or mis-configured policy definitions or bindings.
A policy is invalid if spec.paramKind refers to a non-existent Kind. A binding is invalid if spec.paramRef.name refers to a non-existent resource.
failurePolicy does not define how validations that evaluate to false are handled.
When failurePolicy is set to Fail, ValidatingAdmissionPolicyBinding validationActions define how failures are enforced.
Allowed values are Ignore or Fail. Defaults to Fail.
spec.matchConditions ([]MatchCondition)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
MatchConditions is a list of conditions that must be met for a request to be validated. Match conditions filter requests that have already been matched by the rules, namespaceSelector, and objectSelector. An empty list of matchConditions matches all requests. There are a maximum of 64 match conditions allowed.
If a parameter object is provided, it can be accessed via the params handle in the same manner as validation expressions.
The exact matching logic is (in order):
If ANY matchCondition evaluates to FALSE, the policy is skipped.
If ALL matchConditions evaluate to TRUE, the policy is evaluated.
If any matchCondition evaluates to an error (but none are FALSE):
If failurePolicy=Fail, reject the request
If failurePolicy=Ignore, the policy is skipped
MatchCondition represents a condition which must by fulfilled for a request to be sent to a webhook.
Expression represents the expression which will be evaluated by CEL. Must evaluate to bool. CEL expressions have access to the contents of the AdmissionRequest and Authorizer, organized into CEL variables:
'object' - The object from the incoming request. The value is null for DELETE requests. 'oldObject' - The existing object. The value is null for CREATE requests. 'request' - Attributes of the admission request(/pkg/apis/admission/types.go#AdmissionRequest). 'authorizer' - A CEL Authorizer. May be used to perform authorization checks for the principal (user or service account) of the request.
See https://pkg.go.dev/k8s.io/apiserver/pkg/cel/library#Authz
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/
Required.
spec.matchConditions.name (string), required
Name is an identifier for this match condition, used for strategic merging of MatchConditions, as well as providing an identifier for logging purposes. A good name should be descriptive of the associated expression. Name must be a qualified name consisting of alphanumeric characters, '-', '' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')
Required.
spec.matchConstraints (MatchResources)
MatchConstraints specifies what resources this policy is designed to validate. The AdmissionPolicy cares about a request if it matches all Constraints. However, in order to prevent clusters from being put into an unstable state that cannot be recovered from via the API ValidatingAdmissionPolicy cannot match ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding. Required.
MatchResources decides whether to run the admission control policy on an object based on whether it meets the match criteria. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
ExcludeResourceRules describes what operations on what resources/subresources the ValidatingAdmissionPolicy should not care about. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.matchConstraints.matchPolicy (string)
matchPolicy defines how the "MatchResources" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would not be sent to the ValidatingAdmissionPolicy.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would be converted to apps/v1 and sent to the ValidatingAdmissionPolicy.
NamespaceSelector decides whether to run the admission control policy on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the policy.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the policy on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the validation based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the cel validation, and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
ResourceRules describes what operations on what resources/subresources the ValidatingAdmissionPolicy matches. The policy cares about an operation if it matches any Rule.
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.paramKind (ParamKind)
ParamKind specifies the kind of resources used to parameterize this policy. If absent, there are no parameters for this policy and the param CEL variable will not be provided to validation expressions. If ParamKind refers to a non-existent kind, this policy definition is mis-configured and the FailurePolicy is applied. If paramKind is specified but paramRef is unset in ValidatingAdmissionPolicyBinding, the params variable will be null.
ParamKind is a tuple of Group Kind and Version.
spec.paramKind.apiVersion (string)
APIVersion is the API group version the resources belong to. In format of "group/version". Required.
spec.paramKind.kind (string)
Kind is the API kind the resources belong to. Required.
spec.validations ([]Validation)
Atomic: will be replaced during a merge
Validations contain CEL expressions which is used to apply the validation. Validations and AuditAnnotations may not both be empty; a minimum of one Validations or AuditAnnotations is required.
Validation specifies the CEL expression which is used to apply the validation.
spec.validations.expression (string), required
Expression represents the expression which will be evaluated by CEL. ref: https://github.com/google/cel-spec CEL expressions have access to the contents of the API request/response, organized into CEL variables as well as some other useful variables:
'object' - The object from the incoming request. The value is null for DELETE requests. - 'oldObject' - The existing object. The value is null for CREATE requests. - 'request' - Attributes of the API request(ref). - 'params' - Parameter resource referred to by the policy binding being evaluated. Only populated if the policy has a ParamKind. - 'namespaceObject' - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources. - 'variables' - Map of composited variables, from its name to its lazily evaluated value.
For example, a variable named 'foo' can be accessed as 'variables.foo'.
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
The apiVersion, kind, metadata.name and metadata.generateName are always accessible from the root of the object. No other metadata properties are accessible.
Only property names of the form [a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible. Accessible property names are escaped according to the following rules when accessed in the expression: - '' escapes to 'underscores' - '.' escapes to 'dot' - '-' escapes to 'dash' - '/' escapes to 'slash' - Property names that exactly match a CEL RESERVED keyword escape to '{keyword}__'. The keywords are:
"true", "false", "null", "in", "as", "break", "const", "continue", "else", "for", "function", "if",
"import", "let", "loop", "package", "namespace", "return".
Examples:
Expression accessing a property named "namespace": {"Expression": "object.namespace > 0"}
Expression accessing a property named "x-prop": {"Expression": "object.x__dash__prop > 0"}
Expression accessing a property named "redact__d": {"Expression": "object.redact__underscores__d > 0"}
Equality on arrays with list type of 'set' or 'map' ignores element order, i.e. [1, 2] == [2, 1]. Concatenation on arrays with x-kubernetes-list-type use the semantics of the list type:
'set': X + Y performs a union where the array positions of all elements in X are preserved and
non-intersecting elements in Y are appended, retaining their partial order.
'map': X + Y performs a merge where the array positions of all keys in X are preserved but the values
are overwritten by values in Y when the key sets of X and Y intersect. Elements in Y with
non-intersecting keys are appended, retaining their partial order.
Required.
spec.validations.message (string)
Message represents the message displayed when validation fails. The message is required if the Expression contains line breaks. The message must not contain line breaks. If unset, the message is "failed rule: {Rule}". e.g. "must be a URL with the host matching spec.host" If the Expression contains line breaks. Message is required. The message must not contain line breaks. If unset, the message is "failed Expression: {Expression}".
spec.validations.messageExpression (string)
messageExpression declares a CEL expression that evaluates to the validation failure message that is returned when this rule fails. Since messageExpression is used as a failure message, it must evaluate to a string. If both message and messageExpression are present on a validation, then messageExpression will be used if validation fails. If messageExpression results in a runtime error, the runtime error is logged, and the validation failure message is produced as if the messageExpression field were unset. If messageExpression evaluates to an empty string, a string with only spaces, or a string that contains line breaks, then the validation failure message will also be produced as if the messageExpression field were unset, and the fact that messageExpression produced an empty string/string with only spaces/string with line breaks will be logged. messageExpression has access to all the same variables as the expression except for 'authorizer' and 'authorizer.requestResource'. Example: "object.x must be less than max ("+string(params.max)+")"
spec.validations.reason (string)
Reason represents a machine-readable description of why this validation failed. If this is the first validation in the list to fail, this reason, as well as the corresponding HTTP response code, are used in the HTTP response to the client. The currently supported reasons are: "Unauthorized", "Forbidden", "Invalid", "RequestEntityTooLarge". If not set, StatusReasonInvalid is used in the response to the client.
spec.variables ([]Variable)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
Variables contain definitions of variables that can be used in composition of other expressions. Each variable is defined as a named CEL expression. The variables defined here will be available under variables in other expressions of the policy except MatchConditions because MatchConditions are evaluated before the rest of the policy.
The expression of a variable can refer to other variables defined earlier in the list but not those after. Thus, Variables must be sorted by the order of first appearance and acyclic.
Variable is the definition of a variable that is used for composition. A variable is defined as a named expression.
spec.variables.expression (string), required
Expression is the expression that will be evaluated as the value of the variable. The CEL expression has access to the same identifiers as the CEL expressions in Validation.
spec.variables.name (string), required
Name is the name of the variable. The name must be a valid CEL identifier and unique among all variables. The variable can be accessed in other expressions through variables For example, if name is "foo", the variable will be available as variables.foo
status (ValidatingAdmissionPolicyStatus)
The status of the ValidatingAdmissionPolicy, including warnings that are useful to determine if the policy behaves in the expected way. Populated by the system. Read-only.
ValidatingAdmissionPolicyStatus represents the status of an admission validation policy.
status.conditions ([]Condition)
Map: unique values on key type will be kept during a merge
The conditions represent the latest available observations of a policy's current state.
Condition contains details for one aspect of the current state of this API Resource.
lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
status.conditions.message (string), required
message is a human readable message indicating details about the transition. This may be an empty string.
status.conditions.reason (string), required
reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
status.conditions.status (string), required
status of the condition, one of True, False, Unknown.
status.conditions.type (string), required
type of condition in CamelCase or in foo.example.com/CamelCase.
status.conditions.observedGeneration (int64)
observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.
status.observedGeneration (int64)
The generation observed by the controller.
status.typeChecking (TypeChecking)
The results of type checking for each expression. Presence of this field indicates the completion of the type checking.
TypeChecking contains results of type checking the expressions in the ValidatingAdmissionPolicy
The path to the field that refers the expression. For example, the reference to the expression of the first item of validations is "spec.validations[0].expression"
The content of type checking information in a human-readable form. Each line of the warning contains the type that the expression is checked against, followed by the type check error from the compiler.
ValidatingAdmissionPolicyBindingList
ValidatingAdmissionPolicyBindingList is a list of ValidatingAdmissionPolicyBinding.
Specification of the desired behavior of the MutatingAdmissionPolicy.
MutatingAdmissionPolicySpec is the specification of the desired behavior of the admission policy.
spec.failurePolicy (string)
failurePolicy defines how to handle failures for the admission policy. Failures can occur from CEL expression parse errors, type check errors, runtime errors and invalid or mis-configured policy definitions or bindings.
A policy is invalid if paramKind refers to a non-existent Kind. A binding is invalid if paramRef.name refers to a non-existent resource.
failurePolicy does not define how validations that evaluate to false are handled.
Allowed values are Ignore or Fail. Defaults to Fail.
spec.matchConditions ([]MatchCondition)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
matchConditions is a list of conditions that must be met for a request to be validated. Match conditions filter requests that have already been matched by the matchConstraints. An empty list of matchConditions matches all requests. There are a maximum of 64 match conditions allowed.
If a parameter object is provided, it can be accessed via the params handle in the same manner as validation expressions.
The exact matching logic is (in order):
If ANY matchCondition evaluates to FALSE, the policy is skipped.
If ALL matchConditions evaluate to TRUE, the policy is evaluated.
If any matchCondition evaluates to an error (but none are FALSE):
Expression represents the expression which will be evaluated by CEL. Must evaluate to bool. CEL expressions have access to the contents of the AdmissionRequest and Authorizer, organized into CEL variables:
'object' - The object from the incoming request. The value is null for DELETE requests. 'oldObject' - The existing object. The value is null for CREATE requests. 'request' - Attributes of the admission request(/pkg/apis/admission/types.go#AdmissionRequest). 'authorizer' - A CEL Authorizer. May be used to perform authorization checks for the principal (user or service account) of the request.
See https://pkg.go.dev/k8s.io/apiserver/pkg/cel/library#Authz
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/
Required.
spec.matchConditions.name (string), required
Name is an identifier for this match condition, used for strategic merging of MatchConditions, as well as providing an identifier for logging purposes. A good name should be descriptive of the associated expression. Name must be a qualified name consisting of alphanumeric characters, '-', '' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')
Required.
spec.matchConstraints (MatchResources)
matchConstraints specifies what resources this policy is designed to validate. The MutatingAdmissionPolicy cares about a request if it matches all Constraints. However, in order to prevent clusters from being put into an unstable state that cannot be recovered from via the API MutatingAdmissionPolicy cannot match MutatingAdmissionPolicy and MutatingAdmissionPolicyBinding. The CREATE, UPDATE and CONNECT operations are allowed. The DELETE operation may not be matched. '*' matches CREATE, UPDATE and CONNECT. Required.
MatchResources decides whether to run the admission control policy on an object based on whether it meets the match criteria. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
ExcludeResourceRules describes what operations on what resources/subresources the policy should not care about. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.matchConstraints.matchPolicy (string)
matchPolicy defines how the "MatchResources" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], the admission policy does not consider requests to apps/v1beta1 or extensions/v1beta1 API groups.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], the admission policy does consider requests made to apps/v1beta1 or extensions/v1beta1 API groups. The API server translates the request to a matched resource API if necessary.
NamespaceSelector decides whether to run the admission control policy on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the policy.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the policy on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the policy based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the policy's expression (CEL), and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
ResourceRules describes what operations on what resources/subresources the admission policy matches. The policy cares about an operation if it matches any Rule.
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.mutations ([]Mutation)
Atomic: will be replaced during a merge
mutations contain operations to perform on matching objects. mutations may not be empty; a minimum of one mutation is required. mutations are evaluated in order, and are reinvoked according to the reinvocationPolicy. The mutations of a policy are invoked for each binding of this policy and reinvocation of mutations occurs on a per binding basis.
Mutation specifies the CEL expression which is used to apply the Mutation.
spec.mutations.patchType (string), required
patchType indicates the patch strategy used. Allowed values are "ApplyConfiguration" and "JSONPatch". Required.
applyConfiguration defines the desired configuration values of an object. The configuration is applied to the admission object using structured merge diff. A CEL expression is used to create apply configuration.
ApplyConfiguration defines the desired configuration values of an object.
Apply configurations are declared in CEL using object initialization. For example, this CEL expression returns an apply configuration to set a single field:
Apply configurations may not modify atomic structs, maps or arrays due to the risk of accidental deletion of values not included in the apply configuration.
CEL expressions have access to the object types needed to create apply configurations:
'Object' - CEL type of the resource object. - 'Object.<fieldName>' - CEL type of object field (such as 'Object.spec') - 'Object.<fieldName1>.<fieldName2>...<fieldNameN>` - CEL type of nested field (such as 'Object.spec.containers')
CEL expressions have access to the contents of the API request, organized into CEL variables as well as some other useful variables:
'object' - The object from the incoming request. The value is null for DELETE requests. - 'oldObject' - The existing object. The value is null for CREATE requests. - 'request' - Attributes of the API request(ref). - 'params' - Parameter resource referred to by the policy binding being evaluated. Only populated if the policy has a ParamKind. - 'namespaceObject' - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources. - 'variables' - Map of composited variables, from its name to its lazily evaluated value.
For example, a variable named 'foo' can be accessed as 'variables.foo'.
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
The apiVersion, kind, metadata.name and metadata.generateName are always accessible from the root of the object. No other metadata properties are accessible.
Only property names of the form [a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible. Required.
spec.mutations.jsonPatch (JSONPatch)
jsonPatch defines a JSON patch operation to perform a mutation to the object. A CEL expression is used to create the JSON patch.
CEL expressions have access to the types needed to create JSON patches and objects:
'JSONPatch' - CEL type of JSON Patch operations. JSONPatch has the fields 'op', 'from', 'path' and 'value'.
See JSON patch for more details. The 'value' field may be set to any of: string,
integer, array, map or object. If set, the 'path' and 'from' fields must be set to a
JSON pointer string, where the 'jsonpatch.escapeKey()' CEL
function may be used to escape path keys containing '/' and '~'.
'Object' - CEL type of the resource object. - 'Object.<fieldName>' - CEL type of object field (such as 'Object.spec') - 'Object.<fieldName1>.<fieldName2>...<fieldNameN>` - CEL type of nested field (such as 'Object.spec.containers')
CEL expressions have access to the contents of the API request, organized into CEL variables as well as some other useful variables:
'object' - The object from the incoming request. The value is null for DELETE requests. - 'oldObject' - The existing object. The value is null for CREATE requests. - 'request' - Attributes of the API request(ref). - 'params' - Parameter resource referred to by the policy binding being evaluated. Only populated if the policy has a ParamKind. - 'namespaceObject' - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources. - 'variables' - Map of composited variables, from its name to its lazily evaluated value.
For example, a variable named 'foo' can be accessed as 'variables.foo'.
'jsonpatch.escapeKey' - Performs JSONPatch key escaping. '~' and '/' are escaped as '~0' and `~1' respectively).
Only property names of the form [a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible. Required.
spec.paramKind (ParamKind)
paramKind specifies the kind of resources used to parameterize this policy. If absent, there are no parameters for this policy and the param CEL variable will not be provided to validation expressions. If paramKind refers to a non-existent kind, this policy definition is mis-configured and the FailurePolicy is applied. If paramKind is specified but paramRef is unset in MutatingAdmissionPolicyBinding, the params variable will be null.
ParamKind is a tuple of Group Kind and Version.
spec.paramKind.apiVersion (string)
APIVersion is the API group version the resources belong to. In format of "group/version". Required.
spec.paramKind.kind (string)
Kind is the API kind the resources belong to. Required.
spec.reinvocationPolicy (string)
reinvocationPolicy indicates whether mutations may be called multiple times per MutatingAdmissionPolicyBinding as part of a single admission evaluation. Allowed values are "Never" and "IfNeeded".
Never: These mutations will not be called more than once per binding in a single admission evaluation.
IfNeeded: These mutations may be invoked more than once per binding for a single admission request and there is no guarantee of order with respect to other admission plugins, admission webhooks, bindings of this policy and admission policies. Mutations are only reinvoked when mutations change the object after this mutation is invoked. Required.
spec.variables ([]Variable)
Atomic: will be replaced during a merge
variables contain definitions of variables that can be used in composition of other expressions. Each variable is defined as a named CEL expression. The variables defined here will be available under variables in other expressions of the policy except matchConditions because matchConditions are evaluated before the rest of the policy.
The expression of a variable can refer to other variables defined earlier in the list but not those after. Thus, variables must be sorted by the order of first appearance and acyclic.
Variable is the definition of a variable that is used for composition.
spec.variables.expression (string), required
Expression is the expression that will be evaluated as the value of the variable. The CEL expression has access to the same identifiers as the CEL expressions in Validation.
spec.variables.name (string), required
Name is the name of the variable. The name must be a valid CEL identifier and unique among all variables. The variable can be accessed in other expressions through variables For example, if name is "foo", the variable will be available as variables.foo
MutatingAdmissionPolicyBinding
MutatingAdmissionPolicyBinding binds the MutatingAdmissionPolicy with parametrized resources. MutatingAdmissionPolicyBinding and the optional parameter resource together define how cluster administrators configure policies for clusters.
For a given admission request, each binding will cause its policy to be evaluated N times, where N is 1 for policies/bindings that don't use params, otherwise N is the number of parameters selected by the binding. Each evaluation is constrained by a runtime cost budget.
Adding/removing policies, bindings, or params can not affect whether a given (policy, binding, param) combination is within its own CEL budget.
Specification of the desired behavior of the MutatingAdmissionPolicyBinding.
MutatingAdmissionPolicyBindingSpec is the specification of the MutatingAdmissionPolicyBinding.
spec.matchResources (MatchResources)
matchResources limits what resources match this binding and may be mutated by it. Note that if matchResources matches a resource, the resource must also match a policy's matchConstraints and matchConditions before the resource may be mutated. When matchResources is unset, it does not constrain resource matching, and only the policy's matchConstraints and matchConditions must match for the resource to be mutated. Additionally, matchResources.resourceRules are optional and do not constraint matching when unset. Note that this is differs from MutatingAdmissionPolicy matchConstraints, where resourceRules are required. The CREATE, UPDATE and CONNECT operations are allowed. The DELETE operation may not be matched. '*' matches CREATE, UPDATE and CONNECT.
MatchResources decides whether to run the admission control policy on an object based on whether it meets the match criteria. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
ExcludeResourceRules describes what operations on what resources/subresources the policy should not care about. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.matchResources.matchPolicy (string)
matchPolicy defines how the "MatchResources" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], the admission policy does not consider requests to apps/v1beta1 or extensions/v1beta1 API groups.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], the admission policy does consider requests made to apps/v1beta1 or extensions/v1beta1 API groups. The API server translates the request to a matched resource API if necessary.
NamespaceSelector decides whether to run the admission control policy on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the policy.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the policy on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the policy based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the policy's expression (CEL), and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
ResourceRules describes what operations on what resources/subresources the admission policy matches. The policy cares about an operation if it matches any Rule.
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
spec.matchResources.resourceRules.scope (string)
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.paramRef (ParamRef)
paramRef specifies the parameter resource used to configure the admission control policy. It should point to a resource of the type specified in spec.ParamKind of the bound MutatingAdmissionPolicy. If the policy specifies a ParamKind and the resource referred to by ParamRef does not exist, this binding is considered mis-configured and the FailurePolicy of the MutatingAdmissionPolicy applied. If the policy does not specify a ParamKind then this field is ignored, and the rules are evaluated without a param.
ParamRef describes how to locate the params to be used as input to expressions of rules applied by a policy binding.
spec.paramRef.name (string)
name is the name of the resource being referenced.
name and selector are mutually exclusive properties. If one is set, the other must be unset.
spec.paramRef.namespace (string)
namespace is the namespace of the referenced resource. Allows limiting the search for params to a specific namespace. Applies to both name and selector fields.
A per-namespace parameter may be used by specifying a namespace-scoped paramKind in the policy and leaving this field empty.
If paramKind is cluster-scoped, this field MUST be unset. Setting this field results in a configuration error.
If paramKind is namespace-scoped, the namespace of the object being evaluated for admission will be used when this field is left unset. Take care that if this is left empty the binding must not match any cluster-scoped resources, which will result in an error.
spec.paramRef.parameterNotFoundAction (string)
parameterNotFoundAction controls the behavior of the binding when the resource exists, and name or selector is valid, but there are no parameters matched by the binding. If the value is set to Allow, then no matched parameters will be treated as successful validation by the binding. If set to Deny, then no matched parameters will be subject to the failurePolicy of the policy.
selector can be used to match multiple param objects based on their labels. Supply selector: {} to match all resources of the ParamKind.
If multiple params are found, they are all evaluated with the policy expressions and the results are ANDed together.
One of name or selector must be set, but name and selector are mutually exclusive properties. If one is set, the other must be unset.
spec.policyName (string)
policyName references a MutatingAdmissionPolicy name which the MutatingAdmissionPolicyBinding binds to. If the referenced resource does not exist, this binding is considered invalid and will be ignored Required.
MutatingAdmissionPolicyList
MutatingAdmissionPolicyList is a list of MutatingAdmissionPolicy.
MutatingAdmissionPolicyBinding binds the MutatingAdmissionPolicy with parametrized resources. MutatingAdmissionPolicyBinding and the optional parameter resource together define how cluster administrators configure policies for clusters.
For a given admission request, each binding will cause its policy to be evaluated N times, where N is 1 for policies/bindings that don't use params, otherwise N is the number of parameters selected by the binding. Each evaluation is constrained by a runtime cost budget.
Adding/removing policies, bindings, or params can not affect whether a given (policy, binding, param) combination is within its own CEL budget.
Specification of the desired behavior of the MutatingAdmissionPolicyBinding.
MutatingAdmissionPolicyBindingSpec is the specification of the MutatingAdmissionPolicyBinding.
spec.matchResources (MatchResources)
matchResources limits what resources match this binding and may be mutated by it. Note that if matchResources matches a resource, the resource must also match a policy's matchConstraints and matchConditions before the resource may be mutated. When matchResources is unset, it does not constrain resource matching, and only the policy's matchConstraints and matchConditions must match for the resource to be mutated. Additionally, matchResources.resourceRules are optional and do not constraint matching when unset. Note that this is differs from MutatingAdmissionPolicy matchConstraints, where resourceRules are required. The CREATE, UPDATE and CONNECT operations are allowed. The DELETE operation may not be matched. '*' matches CREATE, UPDATE and CONNECT.
MatchResources decides whether to run the admission control policy on an object based on whether it meets the match criteria. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
ExcludeResourceRules describes what operations on what resources/subresources the policy should not care about. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.matchResources.matchPolicy (string)
matchPolicy defines how the "MatchResources" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], the admission policy does not consider requests to apps/v1beta1 or extensions/v1beta1 API groups.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], the admission policy does consider requests made to apps/v1beta1 or extensions/v1beta1 API groups. The API server translates the request to a matched resource API if necessary.
NamespaceSelector decides whether to run the admission control policy on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the policy.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the policy on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the policy based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the policy's expression (CEL), and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
ResourceRules describes what operations on what resources/subresources the admission policy matches. The policy cares about an operation if it matches any Rule.
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
spec.matchResources.resourceRules.scope (string)
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.paramRef (ParamRef)
paramRef specifies the parameter resource used to configure the admission control policy. It should point to a resource of the type specified in spec.ParamKind of the bound MutatingAdmissionPolicy. If the policy specifies a ParamKind and the resource referred to by ParamRef does not exist, this binding is considered mis-configured and the FailurePolicy of the MutatingAdmissionPolicy applied. If the policy does not specify a ParamKind then this field is ignored, and the rules are evaluated without a param.
ParamRef describes how to locate the params to be used as input to expressions of rules applied by a policy binding.
spec.paramRef.name (string)
name is the name of the resource being referenced.
name and selector are mutually exclusive properties. If one is set, the other must be unset.
spec.paramRef.namespace (string)
namespace is the namespace of the referenced resource. Allows limiting the search for params to a specific namespace. Applies to both name and selector fields.
A per-namespace parameter may be used by specifying a namespace-scoped paramKind in the policy and leaving this field empty.
If paramKind is cluster-scoped, this field MUST be unset. Setting this field results in a configuration error.
If paramKind is namespace-scoped, the namespace of the object being evaluated for admission will be used when this field is left unset. Take care that if this is left empty the binding must not match any cluster-scoped resources, which will result in an error.
spec.paramRef.parameterNotFoundAction (string)
parameterNotFoundAction controls the behavior of the binding when the resource exists, and name or selector is valid, but there are no parameters matched by the binding. If the value is set to Allow, then no matched parameters will be treated as successful validation by the binding. If set to Deny, then no matched parameters will be subject to the failurePolicy of the policy.
selector can be used to match multiple param objects based on their labels. Supply selector: {} to match all resources of the ParamKind.
If multiple params are found, they are all evaluated with the policy expressions and the results are ANDed together.
One of name or selector must be set, but name and selector are mutually exclusive properties. If one is set, the other must be unset.
spec.policyName (string)
policyName references a MutatingAdmissionPolicy name which the MutatingAdmissionPolicyBinding binds to. If the referenced resource does not exist, this binding is considered invalid and will be ignored Required.
MutatingAdmissionPolicy
MutatingAdmissionPolicy describes the definition of an admission mutation policy that mutates the object coming into admission chain.
Specification of the desired behavior of the MutatingAdmissionPolicy.
MutatingAdmissionPolicySpec is the specification of the desired behavior of the admission policy.
spec.failurePolicy (string)
failurePolicy defines how to handle failures for the admission policy. Failures can occur from CEL expression parse errors, type check errors, runtime errors and invalid or mis-configured policy definitions or bindings.
A policy is invalid if paramKind refers to a non-existent Kind. A binding is invalid if paramRef.name refers to a non-existent resource.
failurePolicy does not define how validations that evaluate to false are handled.
Allowed values are Ignore or Fail. Defaults to Fail.
spec.matchConditions ([]MatchCondition)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
matchConditions is a list of conditions that must be met for a request to be validated. Match conditions filter requests that have already been matched by the matchConstraints. An empty list of matchConditions matches all requests. There are a maximum of 64 match conditions allowed.
If a parameter object is provided, it can be accessed via the params handle in the same manner as validation expressions.
The exact matching logic is (in order):
If ANY matchCondition evaluates to FALSE, the policy is skipped.
If ALL matchConditions evaluate to TRUE, the policy is evaluated.
If any matchCondition evaluates to an error (but none are FALSE):
Expression represents the expression which will be evaluated by CEL. Must evaluate to bool. CEL expressions have access to the contents of the AdmissionRequest and Authorizer, organized into CEL variables:
'object' - The object from the incoming request. The value is null for DELETE requests. 'oldObject' - The existing object. The value is null for CREATE requests. 'request' - Attributes of the admission request(/pkg/apis/admission/types.go#AdmissionRequest). 'authorizer' - A CEL Authorizer. May be used to perform authorization checks for the principal (user or service account) of the request.
See https://pkg.go.dev/k8s.io/apiserver/pkg/cel/library#Authz
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/
Required.
spec.matchConditions.name (string), required
Name is an identifier for this match condition, used for strategic merging of MatchConditions, as well as providing an identifier for logging purposes. A good name should be descriptive of the associated expression. Name must be a qualified name consisting of alphanumeric characters, '-', '' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')
Required.
spec.matchConstraints (MatchResources)
matchConstraints specifies what resources this policy is designed to validate. The MutatingAdmissionPolicy cares about a request if it matches all Constraints. However, in order to prevent clusters from being put into an unstable state that cannot be recovered from via the API MutatingAdmissionPolicy cannot match MutatingAdmissionPolicy and MutatingAdmissionPolicyBinding. The CREATE, UPDATE and CONNECT operations are allowed. The DELETE operation may not be matched. '*' matches CREATE, UPDATE and CONNECT. Required.
MatchResources decides whether to run the admission control policy on an object based on whether it meets the match criteria. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
ExcludeResourceRules describes what operations on what resources/subresources the policy should not care about. The exclude rules take precedence over include rules (if a resource matches both, it is excluded)
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.matchConstraints.matchPolicy (string)
matchPolicy defines how the "MatchResources" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], the admission policy does not consider requests to apps/v1beta1 or extensions/v1beta1 API groups.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], the admission policy does consider requests made to apps/v1beta1 or extensions/v1beta1 API groups. The API server translates the request to a matched resource API if necessary.
NamespaceSelector decides whether to run the admission control policy on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the policy.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the policy on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the policy based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the policy's expression (CEL), and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
ResourceRules describes what operations on what resources/subresources the admission policy matches. The policy cares about an operation if it matches any Rule.
NamedRuleWithOperations is a tuple of Operations and Resources with ResourceNames.
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
spec.mutations ([]Mutation)
Atomic: will be replaced during a merge
mutations contain operations to perform on matching objects. mutations may not be empty; a minimum of one mutation is required. mutations are evaluated in order, and are reinvoked according to the reinvocationPolicy. The mutations of a policy are invoked for each binding of this policy and reinvocation of mutations occurs on a per binding basis.
Mutation specifies the CEL expression which is used to apply the Mutation.
spec.mutations.patchType (string), required
patchType indicates the patch strategy used. Allowed values are "ApplyConfiguration" and "JSONPatch". Required.
applyConfiguration defines the desired configuration values of an object. The configuration is applied to the admission object using structured merge diff. A CEL expression is used to create apply configuration.
ApplyConfiguration defines the desired configuration values of an object.
Apply configurations are declared in CEL using object initialization. For example, this CEL expression returns an apply configuration to set a single field:
Apply configurations may not modify atomic structs, maps or arrays due to the risk of accidental deletion of values not included in the apply configuration.
CEL expressions have access to the object types needed to create apply configurations:
'Object' - CEL type of the resource object. - 'Object.<fieldName>' - CEL type of object field (such as 'Object.spec') - 'Object.<fieldName1>.<fieldName2>...<fieldNameN>` - CEL type of nested field (such as 'Object.spec.containers')
CEL expressions have access to the contents of the API request, organized into CEL variables as well as some other useful variables:
'object' - The object from the incoming request. The value is null for DELETE requests. - 'oldObject' - The existing object. The value is null for CREATE requests. - 'request' - Attributes of the API request(ref). - 'params' - Parameter resource referred to by the policy binding being evaluated. Only populated if the policy has a ParamKind. - 'namespaceObject' - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources. - 'variables' - Map of composited variables, from its name to its lazily evaluated value.
For example, a variable named 'foo' can be accessed as 'variables.foo'.
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
The apiVersion, kind, metadata.name and metadata.generateName are always accessible from the root of the object. No other metadata properties are accessible.
Only property names of the form [a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible. Required.
spec.mutations.jsonPatch (JSONPatch)
jsonPatch defines a JSON patch operation to perform a mutation to the object. A CEL expression is used to create the JSON patch.
CEL expressions have access to the types needed to create JSON patches and objects:
'JSONPatch' - CEL type of JSON Patch operations. JSONPatch has the fields 'op', 'from', 'path' and 'value'.
See JSON patch for more details. The 'value' field may be set to any of: string,
integer, array, map or object. If set, the 'path' and 'from' fields must be set to a
JSON pointer string, where the 'jsonpatch.escapeKey()' CEL
function may be used to escape path keys containing '/' and '~'.
'Object' - CEL type of the resource object. - 'Object.<fieldName>' - CEL type of object field (such as 'Object.spec') - 'Object.<fieldName1>.<fieldName2>...<fieldNameN>` - CEL type of nested field (such as 'Object.spec.containers')
CEL expressions have access to the contents of the API request, organized into CEL variables as well as some other useful variables:
'object' - The object from the incoming request. The value is null for DELETE requests. - 'oldObject' - The existing object. The value is null for CREATE requests. - 'request' - Attributes of the API request(ref). - 'params' - Parameter resource referred to by the policy binding being evaluated. Only populated if the policy has a ParamKind. - 'namespaceObject' - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources. - 'variables' - Map of composited variables, from its name to its lazily evaluated value.
For example, a variable named 'foo' can be accessed as 'variables.foo'.
'jsonpatch.escapeKey' - Performs JSONPatch key escaping. '~' and '/' are escaped as '~0' and `~1' respectively).
Only property names of the form [a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible. Required.
spec.paramKind (ParamKind)
paramKind specifies the kind of resources used to parameterize this policy. If absent, there are no parameters for this policy and the param CEL variable will not be provided to validation expressions. If paramKind refers to a non-existent kind, this policy definition is mis-configured and the FailurePolicy is applied. If paramKind is specified but paramRef is unset in MutatingAdmissionPolicyBinding, the params variable will be null.
ParamKind is a tuple of Group Kind and Version.
spec.paramKind.apiVersion (string)
APIVersion is the API group version the resources belong to. In format of "group/version". Required.
spec.paramKind.kind (string)
Kind is the API kind the resources belong to. Required.
spec.reinvocationPolicy (string)
reinvocationPolicy indicates whether mutations may be called multiple times per MutatingAdmissionPolicyBinding as part of a single admission evaluation. Allowed values are "Never" and "IfNeeded".
Never: These mutations will not be called more than once per binding in a single admission evaluation.
IfNeeded: These mutations may be invoked more than once per binding for a single admission request and there is no guarantee of order with respect to other admission plugins, admission webhooks, bindings of this policy and admission policies. Mutations are only reinvoked when mutations change the object after this mutation is invoked. Required.
spec.variables ([]Variable)
Atomic: will be replaced during a merge
variables contain definitions of variables that can be used in composition of other expressions. Each variable is defined as a named CEL expression. The variables defined here will be available under variables in other expressions of the policy except matchConditions because matchConditions are evaluated before the rest of the policy.
The expression of a variable can refer to other variables defined earlier in the list but not those after. Thus, variables must be sorted by the order of first appearance and acyclic.
Variable is the definition of a variable that is used for composition.
spec.variables.expression (string), required
Expression is the expression that will be evaluated as the value of the variable. The CEL expression has access to the same identifiers as the CEL expressions in Validation.
spec.variables.name (string), required
Name is the name of the variable. The name must be a valid CEL identifier and unique among all variables. The variable can be accessed in other expressions through variables For example, if name is "foo", the variable will be available as variables.foo
MutatingAdmissionPolicyBindingList
MutatingAdmissionPolicyBindingList is a list of MutatingAdmissionPolicyBinding.
status indicates the actual state of the CustomResourceDefinition
CustomResourceDefinitionSpec
CustomResourceDefinitionSpec describes how a user wants their resource to appear
group (string), required
group is the API group of the defined custom resource. The custom resources are served under /apis/\<group>/.... Must match the name of the CustomResourceDefinition (in the form \<names.plural>.\<group>).
names (CustomResourceDefinitionNames), required
names specify the resource and kind names for the custom resource.
CustomResourceDefinitionNames indicates the names to serve this CustomResourceDefinition
names.kind (string), required
kind is the serialized kind of the resource. It is normally CamelCase and singular. Custom resource instances will use this value as the kind attribute in API calls.
names.plural (string), required
plural is the plural name of the resource to serve. The custom resources are served under /apis/\<group>/\<version>/.../\<plural>. Must match the name of the CustomResourceDefinition (in the form \<names.plural>.\<group>). Must be all lowercase.
names.categories ([]string)
Atomic: will be replaced during a merge
categories is a list of grouped resources this custom resource belongs to (e.g. 'all'). This is published in API discovery documents, and used by clients to support invocations like kubectl get all.
names.listKind (string)
listKind is the serialized kind of the list for this resource. Defaults to "kindList".
names.shortNames ([]string)
Atomic: will be replaced during a merge
shortNames are short names for the resource, exposed in API discovery documents, and used by clients to support invocations like kubectl get \<shortname>. It must be all lowercase.
names.singular (string)
singular is the singular name of the resource. It must be all lowercase. Defaults to lowercased kind.
scope (string), required
scope indicates whether the defined custom resource is cluster- or namespace-scoped. Allowed values are Cluster and Namespaced.
versions is the list of all API versions of the defined custom resource. Version names are used to compute the order in which served versions are listed in API discovery. If the version string is "kube-like", it will sort above non "kube-like" version strings, which are ordered lexicographically. "Kube-like" versions start with a "v", then are followed by a number (the major version), then optionally the string "alpha" or "beta" and another number (the minor version). These are sorted first by GA > beta > alpha (where GA is a version with no suffix such as beta or alpha), and then by comparing major version, then minor version. An example sorted list of versions: v10, v2, v1, v11beta2, v10beta3, v3beta1, v12alpha1, v11alpha2, foo1, foo10.
CustomResourceDefinitionVersion describes a version for CRD.
versions.name (string), required
name is the version name, e.g. “v1”, “v2beta1”, etc. The custom resources are served under this version at /apis/\<group>/\<version>/... if served is true.
versions.served (boolean), required
served is a flag enabling/disabling this version from being served via REST APIs
versions.storage (boolean), required
storage indicates this version should be used when persisting custom resources to storage. There must be exactly one version with storage=true.
priority is an integer defining the relative importance of this column compared to others. Lower numbers are considered higher priority. Columns that may be omitted in limited space scenarios should be given a priority greater than 0.
versions.deprecated (boolean)
deprecated indicates this version of the custom resource API is deprecated. When set to true, API requests to this version receive a warning header in the server response. Defaults to false.
versions.deprecationWarning (string)
deprecationWarning overrides the default warning returned to API clients. May only be set when deprecated is true. The default warning indicates this version is deprecated and recommends use of the newest served version of equal or greater stability, if one exists.
versions.schema (CustomResourceValidation)
schema describes the schema used for validation, pruning, and defaulting of this version of the custom resource.
CustomResourceValidation is a list of validation methods for CustomResources.
jsonPath is a simple JSON path which is evaluated against each custom resource to produce a field selector value. Only JSON paths without the array notation are allowed. Must point to a field of type string, boolean or integer. Types with enum values and strings with formats are allowed. If jsonPath refers to absent field in a resource, the jsonPath evaluates to an empty string. Must not point to metdata fields. Required.
specReplicasPath defines the JSON path inside of a custom resource that corresponds to Scale spec.replicas. Only JSON paths without the array notation are allowed. Must be a JSON Path under .spec. If there is no value under the given path in the custom resource, the /scale subresource will return an error on GET.
statusReplicasPath defines the JSON path inside of a custom resource that corresponds to Scale status.replicas. Only JSON paths without the array notation are allowed. Must be a JSON Path under .status. If there is no value under the given path in the custom resource, the status.replicas value in the /scale subresource will default to 0.
labelSelectorPath defines the JSON path inside of a custom resource that corresponds to Scale status.selector. Only JSON paths without the array notation are allowed. Must be a JSON Path under .status or .spec. Must be set to work with HorizontalPodAutoscaler. The field pointed by this JSON path must be a string field (not a complex selector struct) which contains a serialized label selector in string form. More info: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions#scale-subresource If there is no value under the given path in the custom resource, the status.selector value in the /scale subresource will default to the empty string.
status indicates the custom resource should serve a /status subresource. When enabled: 1. requests to the custom resource primary endpoint ignore changes to the status stanza of the object. 2. requests to the custom resource /status subresource ignore changes to anything other than the status stanza of the object.
CustomResourceSubresourceStatus defines how to serve the status subresource for CustomResources. Status is represented by the .status JSON path inside of a CustomResource. When set, * exposes a /status subresource for the custom resource * PUT requests to the /status subresource take a custom resource object, and ignore changes to anything except the status stanza * PUT/POST/PATCH requests to the custom resource ignore changes to the status stanza
conversion (CustomResourceConversion)
conversion defines conversion settings for the CRD.
CustomResourceConversion describes how to convert different versions of a CR.
conversion.strategy (string), required
strategy specifies how custom resources are converted between versions. Allowed values are: - "None": The converter only change the apiVersion and would not touch any other field in the custom resource. - "Webhook": API Server will call to an external webhook to do the conversion. Additional information
is needed for this option. This requires spec.preserveUnknownFields to be false, and spec.conversion.webhook to be set.
conversion.webhook (WebhookConversion)
webhook describes how to call the conversion webhook. Required when strategy is set to "Webhook".
WebhookConversion describes how to call a conversion webhook
conversionReviewVersions is an ordered list of preferred ConversionReview versions the Webhook expects. The API server will use the first version in the list which it supports. If none of the versions specified in this list are supported by API server, conversion will fail for the custom resource. If a persisted Webhook configuration specifies allowed versions and does not include any versions known to the API Server, calls to the webhook will fail.
clientConfig is the instructions for how to call the webhook if strategy is Webhook.
WebhookClientConfig contains the information to make a TLS connection with the webhook.
conversion.webhook.clientConfig.caBundle ([]byte)
caBundle is a PEM encoded CA bundle which will be used to validate the webhook's server certificate. If unspecified, system trust roots on the apiserver are used.
port is an optional service port at which the webhook will be contacted. port should be a valid port number (1-65535, inclusive). Defaults to 443 for backward compatibility.
conversion.webhook.clientConfig.url (string)
url gives the location of the webhook, in standard URL form (scheme://host:port/path). Exactly one of url or service must be specified.
The host should not refer to a service running in the cluster; use the service field instead. The host might be resolved via external DNS in some apiservers (e.g., kube-apiserver cannot resolve in-cluster DNS as that would be a layering violation). host may also be an IP address.
Please note that using localhost or 127.0.0.1 as a host is risky unless you take great care to run this webhook on all hosts which run an apiserver which might need to make calls to this webhook. Such installs are likely to be non-portable, i.e., not easy to turn up in a new cluster.
The scheme must be "https"; the URL must begin with "https://".
A path is optional, and if present may be any string permissible in a URL. You may use the path to pass an arbitrary string to the webhook, for example, a cluster identifier.
Attempting to use a user or basic auth e.g. "user:password@" is not allowed. Fragments ("#...") and query parameters ("?...") are not allowed, either.
preserveUnknownFields (boolean)
preserveUnknownFields indicates that object fields which are not specified in the OpenAPI schema should be preserved when persisting to storage. apiVersion, kind, metadata and known fields inside metadata are always preserved. This field is deprecated in favor of setting x-preserve-unknown-fields to true in spec.versions[*].schema.openAPIV3Schema. See https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#field-pruning for details.
default is a default value for undefined object fields. Defaulting is a beta feature under the CustomResourceDefaulting feature gate. Defaulting requires spec.preserveUnknownFields to be false.
JSON represents any valid JSON value. These types are supported: bool, int64, float64, string, []interface{}, map[string]interface{} and nil.
JSONSchemaPropsOrStringArray represents a JSONSchemaProps or a string array.
description (string)
enum ([]JSON)
Atomic: will be replaced during a merge
JSON represents any valid JSON value. These types are supported: bool, int64, float64, string, []interface{}, map[string]interface{} and nil.
example (JSON)
JSON represents any valid JSON value. These types are supported: bool, int64, float64, string, []interface{}, map[string]interface{} and nil.
exclusiveMaximum (boolean)
exclusiveMinimum (boolean)
externalDocs (ExternalDocumentation)
ExternalDocumentation allows referencing an external resource for extended documentation.
externalDocs.description (string)
externalDocs.url (string)
format (string)
format is an OpenAPI v3 format string. Unknown formats are ignored. The following formats are validated:
bsonobjectid: a bson object ID, i.e. a 24 characters hex string - uri: an URI as parsed by Golang net/url.ParseRequestURI - email: an email address as parsed by Golang net/mail.ParseAddress - hostname: a valid representation for an Internet host name, as defined by RFC 1034, section 3.1 [RFC1034]. - ipv4: an IPv4 IP as parsed by Golang net.ParseIP - ipv6: an IPv6 IP as parsed by Golang net.ParseIP - cidr: a CIDR as parsed by Golang net.ParseCIDR - mac: a MAC address as parsed by Golang net.ParseMAC - uuid: an UUID that allows uppercase defined by the regex (?i)^[0-9a-f]{8}-?[0-9a-f]{4}-?[0-9a-f]{4}-?[0-9a-f]{4}-?[0-9a-f]{12}$ - uuid3: an UUID3 that allows uppercase defined by the regex (?i)^[0-9a-f]{8}-?[0-9a-f]{4}-?3[0-9a-f]{3}-?[0-9a-f]{4}-?[0-9a-f]{12}$ - uuid4: an UUID4 that allows uppercase defined by the regex (?i)^[0-9a-f]{8}-?[0-9a-f]{4}-?4[0-9a-f]{3}-?[89ab][0-9a-f]{3}-?[0-9a-f]{12}$ - uuid5: an UUID5 that allows uppercase defined by the regex (?i)^[0-9a-f]{8}-?[0-9a-f]{4}-?5[0-9a-f]{3}-?[89ab][0-9a-f]{3}-?[0-9a-f]{12}$ - isbn: an ISBN10 or ISBN13 number string like "0321751043" or "978-0321751041" - isbn10: an ISBN10 number string like "0321751043" - isbn13: an ISBN13 number string like "978-0321751041" - creditcard: a credit card number defined by the regex ^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$ with any non digit characters mixed in - ssn: a U.S. social security number following the regex ^\d{3}[- ]?\d{2}[- ]?\d{4}$ - hexcolor: an hexadecimal color code like "#FFFFFF: following the regex ^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$ - rgbcolor: an RGB color code like rgb like "rgb(255,255,2559" - byte: base64 encoded binary data - password: any kind of string - date: a date string like "2006-01-02" as defined by full-date in RFC3339 - duration: a duration string like "22 ns" as parsed by Golang time.ParseDuration or compatible with Scala duration format - datetime: a date time string like "2014-12-15T19:30:20.000Z" as defined by date-time in RFC3339.
id (string)
items (JSONSchemaPropsOrArray)
JSONSchemaPropsOrArray represents a value that can either be a JSONSchemaProps or an array of JSONSchemaProps. Mainly here for serialization purposes.
x-kubernetes-embedded-resource defines that the value is an embedded Kubernetes runtime.Object, with TypeMeta and ObjectMeta. The type must be object. It is allowed to further restrict the embedded object. kind, apiVersion and metadata are validated automatically. x-kubernetes-preserve-unknown-fields is allowed to be true, but does not have to be if the object is fully specified (up to kind, apiVersion, metadata).
x-kubernetes-int-or-string (boolean)
x-kubernetes-int-or-string specifies that this value is either an integer or a string. If this is true, an empty type is allowed and type as child of anyOf is permitted if following one of the following patterns:
anyOf:
type: integer
type: string
allOf:
anyOf:
type: integer
type: string
... zero or more
x-kubernetes-list-map-keys ([]string)
Atomic: will be replaced during a merge
x-kubernetes-list-map-keys annotates an array with the x-kubernetes-list-type map by specifying the keys used as the index of the map.
This tag MUST only be used on lists that have the "x-kubernetes-list-type" extension set to "map". Also, the values specified for this attribute must be a scalar typed field of the child structure (no nesting is supported).
The properties specified must either be required or have a default value, to ensure those properties are present for all list items.
x-kubernetes-list-type (string)
x-kubernetes-list-type annotates an array to further describe its topology. This extension must only be used on lists and may have 3 possible values:
atomic: the list is treated as a single entity, like a scalar.
Atomic lists will be entirely replaced when updated. This extension
may be used on any type of list (struct, scalar, ...).
set:
Sets are lists that must not have multiple items with the same value. Each
value must be a scalar, an object with x-kubernetes-map-type atomic or an
array with x-kubernetes-list-type atomic.
map:
These lists are like maps in that their elements have a non-index key
used to identify them. Order is preserved upon merge. The map tag
must only be used on a list with elements of type object.
Defaults to atomic for arrays.
x-kubernetes-map-type (string)
x-kubernetes-map-type annotates an object to further describe its topology. This extension must only be used when type is object and may have 2 possible values:
granular:
These maps are actual maps (key-value pairs) and each fields are independent
from each other (they can each be manipulated by separate actors). This is
the default behaviour for all maps.
atomic: the list is treated as a single entity, like a scalar.
Atomic maps will be entirely replaced when updated.
x-kubernetes-preserve-unknown-fields (boolean)
x-kubernetes-preserve-unknown-fields stops the API server decoding step from pruning fields which are not specified in the validation schema. This affects fields recursively, but switches back to normal pruning behaviour if nested properties or additionalProperties are specified in the schema. This can either be true or undefined. False is forbidden.
x-kubernetes-validations ([]ValidationRule)
Patch strategy: merge on key rule
Map: unique values on key rule will be kept during a merge
x-kubernetes-validations describes a list of validation rules written in the CEL expression language.
ValidationRule describes a validation rule written in the CEL expression language.
x-kubernetes-validations.rule (string), required
Rule represents the expression which will be evaluated by CEL. ref: https://github.com/google/cel-spec The Rule is scoped to the location of the x-kubernetes-validations extension in the schema. The self variable in the CEL expression is bound to the scoped value. Example: - Rule scoped to the root of a resource with a status subresource: {"rule": "self.status.actual <= self.spec.maxDesired"}
If the Rule is scoped to an object with properties, the accessible properties of the object are field selectable via self.field and field presence can be checked via has(self.field). Null valued fields are treated as absent fields in CEL expressions. If the Rule is scoped to an object with additionalProperties (i.e. a map) the value of the map are accessible via self[mapKey], map containment can be checked via mapKey in self and all entries of the map are accessible via CEL macros and functions such as self.all(...). If the Rule is scoped to an array, the elements of the array are accessible via self[i] and also by macros and functions. If the Rule is scoped to a scalar, self is bound to the scalar value. Examples: - Rule scoped to a map of objects: {"rule": "self.components['Widget'].priority < 10"} - Rule scoped to a list of integers: {"rule": "self.values.all(value, value >= 0 && value < 100)"} - Rule scoped to a string value: {"rule": "self.startsWith('kube')"}
The apiVersion, kind, metadata.name and metadata.generateName are always accessible from the root of the object and from any x-kubernetes-embedded-resource annotated objects. No other metadata properties are accessible.
Unknown data preserved in custom resources via x-kubernetes-preserve-unknown-fields is not accessible in CEL expressions. This includes: - Unknown field values that are preserved by object schemas with x-kubernetes-preserve-unknown-fields. - Object properties where the property schema is of an "unknown type". An "unknown type" is recursively defined as:
A schema with no type and x-kubernetes-preserve-unknown-fields set to true
An array where the items schema is of an "unknown type"
An object where the additionalProperties schema is of an "unknown type"
Only property names of the form [a-zA-Z_.-/][a-zA-Z0-9_.-/]* are accessible. Accessible property names are escaped according to the following rules when accessed in the expression: - '' escapes to 'underscores' - '.' escapes to 'dot' - '-' escapes to 'dash' - '/' escapes to 'slash' - Property names that exactly match a CEL RESERVED keyword escape to '{keyword}__'. The keywords are:
"true", "false", "null", "in", "as", "break", "const", "continue", "else", "for", "function", "if",
"import", "let", "loop", "package", "namespace", "return".
Examples:
Rule accessing a property named "namespace": {"rule": "self.namespace > 0"}
Rule accessing a property named "x-prop": {"rule": "self.x__dash__prop > 0"}
Rule accessing a property named "redact__d": {"rule": "self.redact__underscores__d > 0"}
Equality on arrays with x-kubernetes-list-type of 'set' or 'map' ignores element order, i.e. [1, 2] == [2, 1]. Concatenation on arrays with x-kubernetes-list-type use the semantics of the list type:
'set': X + Y performs a union where the array positions of all elements in X are preserved and
non-intersecting elements in Y are appended, retaining their partial order.
'map': X + Y performs a merge where the array positions of all keys in X are preserved but the values
are overwritten by values in Y when the key sets of X and Y intersect. Elements in Y with
non-intersecting keys are appended, retaining their partial order.
If rule makes use of the oldSelf variable it is implicitly a transition rule.
By default, the oldSelf variable is the same type as self. When optionalOldSelf is true, the oldSelf variable is a CEL optional
variable whose value() is the same type as self.
See the documentation for the optionalOldSelf field for details.
Transition rules by default are applied only on UPDATE requests and are skipped if an old value could not be found. You can opt a transition rule into unconditional evaluation by setting optionalOldSelf to true.
x-kubernetes-validations.fieldPath (string)
fieldPath represents the field path returned when the validation fails. It must be a relative JSON path (i.e. with array notation) scoped to the location of this x-kubernetes-validations extension in the schema and refer to an existing field. e.g. when validation checks if a specific attribute foo under a map testMap, the fieldPath could be set to .testMap.foo If the validation checks two lists must have unique attributes, the fieldPath could be set to either of the list: e.g. .testList It does not support list numeric index. It supports child operation to refer to an existing field currently. Refer to JSONPath support in Kubernetes for more info. Numeric index of array is not supported. For field name which contains special characters, use ['specialName'] to refer the field name. e.g. for attribute foo.34$ appears in a list testList, the fieldPath could be set to .testList['foo.34$']
x-kubernetes-validations.message (string)
Message represents the message displayed when validation fails. The message is required if the Rule contains line breaks. The message must not contain line breaks. If unset, the message is "failed rule: {Rule}". e.g. "must be a URL with the host matching spec.host"
MessageExpression declares a CEL expression that evaluates to the validation failure message that is returned when this rule fails. Since messageExpression is used as a failure message, it must evaluate to a string. If both message and messageExpression are present on a rule, then messageExpression will be used if validation fails. If messageExpression results in a runtime error, the runtime error is logged, and the validation failure message is produced as if the messageExpression field were unset. If messageExpression evaluates to an empty string, a string with only spaces, or a string that contains line breaks, then the validation failure message will also be produced as if the messageExpression field were unset, and the fact that messageExpression produced an empty string/string with only spaces/string with line breaks will be logged. messageExpression has access to all the same variables as the rule; the only difference is the return type. Example: "x must be less than max ("+string(self.max)+")"
optionalOldSelf is used to opt a transition rule into evaluation even when the object is first created, or if the old object is missing the value.
When enabled oldSelf will be a CEL optional whose value will be None if there is no old value, or when the object is initially created.
You may check for presence of oldSelf using oldSelf.hasValue() and unwrap it after checking using oldSelf.value(). Check the CEL documentation for Optional types for more information: https://pkg.go.dev/github.com/google/cel-go/cel#OptionalTypes
May not be set unless oldSelf is used in rule.
x-kubernetes-validations.reason (string)
reason provides a machine-readable validation failure reason that is returned to the caller when a request fails this validation rule. The HTTP status code returned to the caller will match the reason of the reason of the first failed validation rule. The currently supported reasons are: "FieldValueInvalid", "FieldValueForbidden", "FieldValueRequired", "FieldValueDuplicate". If not set, default to use "FieldValueInvalid". All future added reasons must be accepted by clients when reading this value and unknown reasons should be treated as FieldValueInvalid.
CustomResourceDefinitionStatus
CustomResourceDefinitionStatus indicates the state of the CustomResourceDefinition
acceptedNames (CustomResourceDefinitionNames)
acceptedNames are the names that are actually being used to serve discovery. They may be different than the names in spec.
CustomResourceDefinitionNames indicates the names to serve this CustomResourceDefinition
acceptedNames.kind (string), required
kind is the serialized kind of the resource. It is normally CamelCase and singular. Custom resource instances will use this value as the kind attribute in API calls.
acceptedNames.plural (string), required
plural is the plural name of the resource to serve. The custom resources are served under /apis/\<group>/\<version>/.../\<plural>. Must match the name of the CustomResourceDefinition (in the form \<names.plural>.\<group>). Must be all lowercase.
acceptedNames.categories ([]string)
Atomic: will be replaced during a merge
categories is a list of grouped resources this custom resource belongs to (e.g. 'all'). This is published in API discovery documents, and used by clients to support invocations like kubectl get all.
acceptedNames.listKind (string)
listKind is the serialized kind of the list for this resource. Defaults to "kindList".
acceptedNames.shortNames ([]string)
Atomic: will be replaced during a merge
shortNames are short names for the resource, exposed in API discovery documents, and used by clients to support invocations like kubectl get \<shortname>. It must be all lowercase.
acceptedNames.singular (string)
singular is the singular name of the resource. It must be all lowercase. Defaults to lowercased kind.
conditions ([]CustomResourceDefinitionCondition)
Map: unique values on key type will be kept during a merge
conditions indicate state for particular aspects of a CustomResourceDefinition
CustomResourceDefinitionCondition contains details for the current condition of this pod.
conditions.status (string), required
status is the status of the condition. Can be True, False, Unknown.
conditions.type (string), required
type is the type of the condition. Types include Established, NamesAccepted and Terminating.
conditions.lastTransitionTime (Time)
lastTransitionTime last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
message is a human-readable message indicating details about last transition.
conditions.reason (string)
reason is a unique, one-word, CamelCase reason for the condition's last transition.
storedVersions ([]string)
Atomic: will be replaced during a merge
storedVersions lists all versions of CustomResources that were ever persisted. Tracking these versions allows a migration path for stored versions in etcd. The field is mutable so a migration controller can finish a migration to another version (ensuring no old objects are left in storage), and then remove the rest of the versions from this list. Versions may not be removed from spec.versions while they exist in this list.
CustomResourceDefinitionList
CustomResourceDefinitionList is a list of CustomResourceDefinition objects.
DeviceClass is a vendor- or admin-provided resource that contains device configuration and selectors.
apiVersion: resource.k8s.io/v1beta2
import "k8s.io/api/resource/v1beta2"
DeviceClass
DeviceClass is a vendor- or admin-provided resource that contains device configuration and selectors. It can be referenced in the device requests of a claim to apply these presets. Cluster scoped.
This is an alpha type and requires enabling the DynamicResourceAllocation feature gate.
Spec defines what can be allocated and how to configure it.
This is mutable. Consumers have to be prepared for classes changing at any time, either because they get updated or replaced. Claim allocations are done once based on whatever was set in classes at the time of allocation.
Changing the spec automatically increments the metadata.generation number.
DeviceClassSpec
DeviceClassSpec is used in a [DeviceClass] to define what can be allocated and how to configure it.
config ([]DeviceClassConfiguration)
Atomic: will be replaced during a merge
Config defines configuration parameters that apply to each device that is claimed via this class. Some classses may potentially be satisfied by multiple drivers, so each instance of a vendor configuration applies to exactly one driver.
They are passed to the driver, but are not considered while allocating the claim.
OpaqueDeviceConfiguration contains configuration parameters for a driver in a format defined by the driver vendor.
config.opaque.driver (string), required
Driver is used to determine which kubelet plugin needs to be passed these configuration parameters.
An admission policy provided by the driver developer could use this to decide whether it needs to validate them.
Must be a DNS subdomain and should end with a DNS domain owned by the vendor of the driver.
config.opaque.parameters (RawExtension), required
Parameters can contain arbitrary data. It is the responsibility of the driver developer to handle validation and versioning. Typically this includes self-identification and a version ("kind" + "apiVersion" for Kubernetes types), with conversion between different versions.
The length of the raw data must be smaller or equal to 10 Ki.
*RawExtension is used to hold extensions in external versions.
To use this, make a field which has RawExtension as its type in your external, versioned struct, and Object in your internal struct. You also need to register your various plugin types.
// Internal package:
type MyAPIObject struct {
runtime.TypeMeta json:",inline"
MyPlugin runtime.Object json:"myPlugin"
}
type PluginA struct {
AOption string json:"aOption"
}
// External package:
type MyAPIObject struct {
runtime.TypeMeta json:",inline"
MyPlugin runtime.RawExtension json:"myPlugin"
}
type PluginA struct {
AOption string json:"aOption"
}
// On the wire, the JSON will look something like this:
So what happens? Decode first uses json or yaml to unmarshal the serialized data into your external MyAPIObject. That causes the raw JSON to be stored, but not unpacked. The next step is to copy (using pkg/conversion) into the internal struct. The runtime package's DefaultScheme has conversion functions installed which will unpack the JSON stored in RawExtension, turning it into the correct object type, and storing it in the Object. (TODO: In the case where the object is of an unknown type, a runtime.Unknown object will be created and stored.)*
selectors ([]DeviceSelector)
Atomic: will be replaced during a merge
Each selector must be satisfied by a device which is claimed via this class.
DeviceSelector must have exactly one field set.
selectors.cel (CELDeviceSelector)
CEL contains a CEL expression for selecting a device.
CELDeviceSelector contains a CEL expression for selecting a device.
selectors.cel.expression (string), required
Expression is a CEL expression which evaluates a single device. It must evaluate to true when the device under consideration satisfies the desired criteria, and false when it does not. Any other result is an error and causes allocation of devices to abort.
The expression's input is an object named "device", which carries the following properties:
driver (string): the name of the driver which defines this device.
attributes (map[string]object): the device's attributes, grouped by prefix
(e.g. device.attributes["dra.example.com"] evaluates to an object with all
of the attributes which were prefixed by "dra.example.com".
capacity (map[string]object): the device's capacities, grouped by prefix.
Example: Consider a device with driver="dra.example.com", which exposes two attributes named "model" and "ext.example.com/family" and which exposes one capacity named "modules". This input to this expression would have the following fields:
The device.driver field can be used to check for a specific driver, either as a high-level precondition (i.e. you only want to consider devices from this driver) or as part of a multi-clause expression that is meant to consider devices from different drivers.
The value type of each attribute is defined by the device definition, and users who write these expressions must consult the documentation for their specific drivers. The value type of each capacity is Quantity.
If an unknown prefix is used as a lookup in either device.attributes or device.capacity, an empty map will be returned. Any reference to an unknown field will cause an evaluation error and allocation to abort.
A robust expression should check for the existence of attributes before referencing them.
For ease of use, the cel.bind() function is enabled, and can be used to simplify expressions that access multiple attributes with the same domain. For example:
The length of the expression must be smaller or equal to 10 Ki. The cost of evaluating it is also limited based on the estimated number of logical steps.
AdmissionReviewVersions is an ordered list of preferred AdmissionReview versions the Webhook expects. API server will try to use first version in the list which it supports. If none of the versions specified in this list supported by API server, validation will fail for this object. If a persisted webhook configuration specifies allowed versions and does not include any versions known to the API Server, calls to the webhook will fail and be subject to the failure policy.
ClientConfig defines how to communicate with the hook. Required
WebhookClientConfig contains the information to make a TLS connection with the webhook
webhooks.clientConfig.caBundle ([]byte)
caBundle is a PEM encoded CA bundle which will be used to validate the webhook's server certificate. If unspecified, system trust roots on the apiserver are used.
webhooks.clientConfig.service (ServiceReference)
service is a reference to the service for this webhook. Either service or url must be specified.
If the webhook is running within the cluster, then you should use service.
ServiceReference holds a reference to Service.legacy.k8s.io
namespace is the namespace of the service. Required
webhooks.clientConfig.service.path (string)
path is an optional URL path which will be sent in any request to this service.
webhooks.clientConfig.service.port (int32)
If specified, the port on the service that hosting webhook. Default to 443 for backward compatibility. port should be a valid port number (1-65535, inclusive).
webhooks.clientConfig.url (string)
url gives the location of the webhook, in standard URL form (scheme://host:port/path). Exactly one of url or service must be specified.
The host should not refer to a service running in the cluster; use the service field instead. The host might be resolved via external DNS in some apiservers (e.g., kube-apiserver cannot resolve in-cluster DNS as that would be a layering violation). host may also be an IP address.
Please note that using localhost or 127.0.0.1 as a host is risky unless you take great care to run this webhook on all hosts which run an apiserver which might need to make calls to this webhook. Such installs are likely to be non-portable, i.e., not easy to turn up in a new cluster.
The scheme must be "https"; the URL must begin with "https://".
A path is optional, and if present may be any string permissible in a URL. You may use the path to pass an arbitrary string to the webhook, for example, a cluster identifier.
Attempting to use a user or basic auth e.g. "user:password@" is not allowed. Fragments ("#...") and query parameters ("?...") are not allowed, either.
webhooks.name (string), required
The name of the admission webhook. Name should be fully qualified, e.g., imagepolicy.kubernetes.io, where "imagepolicy" is the name of the webhook, and kubernetes.io is the name of the organization. Required.
webhooks.sideEffects (string), required
SideEffects states whether this webhook has side effects. Acceptable values are: None, NoneOnDryRun (webhooks created via v1beta1 may also specify Some or Unknown). Webhooks with side effects MUST implement a reconciliation system, since a request may be rejected by a future step in the admission chain and the side effects therefore need to be undone. Requests with the dryRun attribute will be auto-rejected if they match a webhook with sideEffects == Unknown or Some.
webhooks.failurePolicy (string)
FailurePolicy defines how unrecognized errors from the admission endpoint are handled - allowed values are Ignore or Fail. Defaults to Fail.
webhooks.matchConditions ([]MatchCondition)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
MatchConditions is a list of conditions that must be met for a request to be sent to this webhook. Match conditions filter requests that have already been matched by the rules, namespaceSelector, and objectSelector. An empty list of matchConditions matches all requests. There are a maximum of 64 match conditions allowed.
The exact matching logic is (in order):
If ANY matchCondition evaluates to FALSE, the webhook is skipped.
If ALL matchConditions evaluate to TRUE, the webhook is called.
If any matchCondition evaluates to an error (but none are FALSE):
If failurePolicy=Fail, reject the request
If failurePolicy=Ignore, the error is ignored and the webhook is skipped
MatchCondition represents a condition which must by fulfilled for a request to be sent to a webhook.
Expression represents the expression which will be evaluated by CEL. Must evaluate to bool. CEL expressions have access to the contents of the AdmissionRequest and Authorizer, organized into CEL variables:
'object' - The object from the incoming request. The value is null for DELETE requests. 'oldObject' - The existing object. The value is null for CREATE requests. 'request' - Attributes of the admission request(/pkg/apis/admission/types.go#AdmissionRequest). 'authorizer' - A CEL Authorizer. May be used to perform authorization checks for the principal (user or service account) of the request.
See https://pkg.go.dev/k8s.io/apiserver/pkg/cel/library#Authz
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/
Required.
webhooks.matchConditions.name (string), required
Name is an identifier for this match condition, used for strategic merging of MatchConditions, as well as providing an identifier for logging purposes. A good name should be descriptive of the associated expression. Name must be a qualified name consisting of alphanumeric characters, '-', '' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')
Required.
webhooks.matchPolicy (string)
matchPolicy defines how the "rules" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would not be sent to the webhook.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would be converted to apps/v1 and sent to the webhook.
NamespaceSelector decides whether to run the webhook on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the webhook.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the webhook on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the webhook based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the webhook, and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
webhooks.reinvocationPolicy (string)
reinvocationPolicy indicates whether this webhook should be called multiple times as part of a single admission evaluation. Allowed values are "Never" and "IfNeeded".
Never: the webhook will not be called more than once in a single admission evaluation.
IfNeeded: the webhook will be called at least one additional time as part of the admission evaluation if the object being admitted is modified by other admission plugins after the initial webhook call. Webhooks that specify this option must be idempotent, able to process objects they previously admitted. Note: * the number of additional invocations is not guaranteed to be exactly one. * if additional invocations result in further modifications to the object, webhooks are not guaranteed to be invoked again. * webhooks that use this option may be reordered to minimize the number of additional invocations. * to validate an object after all mutations are guaranteed complete, use a validating admission webhook instead.
Defaults to "Never".
webhooks.rules ([]RuleWithOperations)
Atomic: will be replaced during a merge
Rules describes what operations on what resources/subresources the webhook cares about. The webhook cares about an operation if it matches any Rule. However, in order to prevent ValidatingAdmissionWebhooks and MutatingAdmissionWebhooks from putting the cluster in a state which cannot be recovered from without completely disabling the plugin, ValidatingAdmissionWebhooks and MutatingAdmissionWebhooks are never called on admission requests for ValidatingWebhookConfiguration and MutatingWebhookConfiguration objects.
RuleWithOperations is a tuple of Operations and Resources. It is recommended to make sure that all the tuple expansions are valid.
webhooks.rules.apiGroups ([]string)
Atomic: will be replaced during a merge
APIGroups is the API groups the resources belong to. '' is all groups. If '' is present, the length of the slice must be one. Required.
webhooks.rules.apiVersions ([]string)
Atomic: will be replaced during a merge
APIVersions is the API versions the resources belong to. '' is all versions. If '' is present, the length of the slice must be one. Required.
webhooks.rules.operations ([]string)
Atomic: will be replaced during a merge
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
webhooks.rules.resources ([]string)
Atomic: will be replaced during a merge
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
webhooks.rules.scope (string)
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
webhooks.timeoutSeconds (int32)
TimeoutSeconds specifies the timeout for this webhook. After the timeout passes, the webhook call will be ignored or the API call will fail based on the failure policy. The timeout value must be between 1 and 30 seconds. Default to 10 seconds.
MutatingWebhookConfigurationList
MutatingWebhookConfigurationList is a list of MutatingWebhookConfiguration.
AdmissionReviewVersions is an ordered list of preferred AdmissionReview versions the Webhook expects. API server will try to use first version in the list which it supports. If none of the versions specified in this list supported by API server, validation will fail for this object. If a persisted webhook configuration specifies allowed versions and does not include any versions known to the API Server, calls to the webhook will fail and be subject to the failure policy.
ClientConfig defines how to communicate with the hook. Required
WebhookClientConfig contains the information to make a TLS connection with the webhook
webhooks.clientConfig.caBundle ([]byte)
caBundle is a PEM encoded CA bundle which will be used to validate the webhook's server certificate. If unspecified, system trust roots on the apiserver are used.
webhooks.clientConfig.service (ServiceReference)
service is a reference to the service for this webhook. Either service or url must be specified.
If the webhook is running within the cluster, then you should use service.
ServiceReference holds a reference to Service.legacy.k8s.io
namespace is the namespace of the service. Required
webhooks.clientConfig.service.path (string)
path is an optional URL path which will be sent in any request to this service.
webhooks.clientConfig.service.port (int32)
If specified, the port on the service that hosting webhook. Default to 443 for backward compatibility. port should be a valid port number (1-65535, inclusive).
webhooks.clientConfig.url (string)
url gives the location of the webhook, in standard URL form (scheme://host:port/path). Exactly one of url or service must be specified.
The host should not refer to a service running in the cluster; use the service field instead. The host might be resolved via external DNS in some apiservers (e.g., kube-apiserver cannot resolve in-cluster DNS as that would be a layering violation). host may also be an IP address.
Please note that using localhost or 127.0.0.1 as a host is risky unless you take great care to run this webhook on all hosts which run an apiserver which might need to make calls to this webhook. Such installs are likely to be non-portable, i.e., not easy to turn up in a new cluster.
The scheme must be "https"; the URL must begin with "https://".
A path is optional, and if present may be any string permissible in a URL. You may use the path to pass an arbitrary string to the webhook, for example, a cluster identifier.
Attempting to use a user or basic auth e.g. "user:password@" is not allowed. Fragments ("#...") and query parameters ("?...") are not allowed, either.
webhooks.name (string), required
The name of the admission webhook. Name should be fully qualified, e.g., imagepolicy.kubernetes.io, where "imagepolicy" is the name of the webhook, and kubernetes.io is the name of the organization. Required.
webhooks.sideEffects (string), required
SideEffects states whether this webhook has side effects. Acceptable values are: None, NoneOnDryRun (webhooks created via v1beta1 may also specify Some or Unknown). Webhooks with side effects MUST implement a reconciliation system, since a request may be rejected by a future step in the admission chain and the side effects therefore need to be undone. Requests with the dryRun attribute will be auto-rejected if they match a webhook with sideEffects == Unknown or Some.
webhooks.failurePolicy (string)
FailurePolicy defines how unrecognized errors from the admission endpoint are handled - allowed values are Ignore or Fail. Defaults to Fail.
webhooks.matchConditions ([]MatchCondition)
Patch strategy: merge on key name
Map: unique values on key name will be kept during a merge
MatchConditions is a list of conditions that must be met for a request to be sent to this webhook. Match conditions filter requests that have already been matched by the rules, namespaceSelector, and objectSelector. An empty list of matchConditions matches all requests. There are a maximum of 64 match conditions allowed.
The exact matching logic is (in order):
If ANY matchCondition evaluates to FALSE, the webhook is skipped.
If ALL matchConditions evaluate to TRUE, the webhook is called.
If any matchCondition evaluates to an error (but none are FALSE):
If failurePolicy=Fail, reject the request
If failurePolicy=Ignore, the error is ignored and the webhook is skipped
MatchCondition represents a condition which must by fulfilled for a request to be sent to a webhook.
Expression represents the expression which will be evaluated by CEL. Must evaluate to bool. CEL expressions have access to the contents of the AdmissionRequest and Authorizer, organized into CEL variables:
'object' - The object from the incoming request. The value is null for DELETE requests. 'oldObject' - The existing object. The value is null for CREATE requests. 'request' - Attributes of the admission request(/pkg/apis/admission/types.go#AdmissionRequest). 'authorizer' - A CEL Authorizer. May be used to perform authorization checks for the principal (user or service account) of the request.
See https://pkg.go.dev/k8s.io/apiserver/pkg/cel/library#Authz
'authorizer.requestResource' - A CEL ResourceCheck constructed from the 'authorizer' and configured with the
request resource.
Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/
Required.
webhooks.matchConditions.name (string), required
Name is an identifier for this match condition, used for strategic merging of MatchConditions, as well as providing an identifier for logging purposes. A good name should be descriptive of the associated expression. Name must be a qualified name consisting of alphanumeric characters, '-', '' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')
Required.
webhooks.matchPolicy (string)
matchPolicy defines how the "rules" list is used to match incoming requests. Allowed values are "Exact" or "Equivalent".
Exact: match a request only if it exactly matches a specified rule. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, but "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would not be sent to the webhook.
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would be converted to apps/v1 and sent to the webhook.
NamespaceSelector decides whether to run the webhook on an object based on whether the namespace for that object matches the selector. If the object itself is a namespace, the matching is performed on object.metadata.labels. If the object is another cluster scoped resource, it never skips the webhook.
For example, to run the webhook on any objects whose namespace is not associated with "runlevel" of "0" or "1"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "runlevel",
"operator": "NotIn",
"values": [
"0",
"1"
]
}
]
}
If instead you want to only run the webhook on any objects whose namespace is associated with the "environment" of "prod" or "staging"; you will set the selector as follows: "namespaceSelector": {
"matchExpressions": [
{
"key": "environment",
"operator": "In",
"values": [
"prod",
"staging"
]
}
]
}
ObjectSelector decides whether to run the webhook based on if the object has matching labels. objectSelector is evaluated against both the oldObject and newObject that would be sent to the webhook, and is considered to match if either object matches the selector. A null object (oldObject in the case of create, or newObject in the case of delete) or an object that cannot have labels (like a DeploymentRollback or a PodProxyOptions object) is not considered to match. Use the object selector only if the webhook is opt-in, because end users may skip the admission webhook by setting the labels. Default to the empty LabelSelector, which matches everything.
webhooks.rules ([]RuleWithOperations)
Atomic: will be replaced during a merge
Rules describes what operations on what resources/subresources the webhook cares about. The webhook cares about an operation if it matches any Rule. However, in order to prevent ValidatingAdmissionWebhooks and MutatingAdmissionWebhooks from putting the cluster in a state which cannot be recovered from without completely disabling the plugin, ValidatingAdmissionWebhooks and MutatingAdmissionWebhooks are never called on admission requests for ValidatingWebhookConfiguration and MutatingWebhookConfiguration objects.
RuleWithOperations is a tuple of Operations and Resources. It is recommended to make sure that all the tuple expansions are valid.
webhooks.rules.apiGroups ([]string)
Atomic: will be replaced during a merge
APIGroups is the API groups the resources belong to. '' is all groups. If '' is present, the length of the slice must be one. Required.
webhooks.rules.apiVersions ([]string)
Atomic: will be replaced during a merge
APIVersions is the API versions the resources belong to. '' is all versions. If '' is present, the length of the slice must be one. Required.
webhooks.rules.operations ([]string)
Atomic: will be replaced during a merge
Operations is the operations the admission hook cares about - CREATE, UPDATE, DELETE, CONNECT or * for all of those operations and any future admission operations that are added. If '*' is present, the length of the slice must be one. Required.
webhooks.rules.resources ([]string)
Atomic: will be replaced during a merge
Resources is a list of resources this rule applies to.
For example: 'pods' means pods. 'pods/log' means the log subresource of pods. '' means all resources, but not subresources. 'pods/' means all subresources of pods. '/scale' means all scale subresources. '/*' means all resources and their subresources.
If wildcard is present, the validation rule will ensure resources do not overlap with each other.
Depending on the enclosing object, subresources might not be allowed. Required.
webhooks.rules.scope (string)
scope specifies the scope of this rule. Valid values are "Cluster", "Namespaced", and "" "Cluster" means that only cluster-scoped resources will match this rule. Namespace API objects are cluster-scoped. "Namespaced" means that only namespaced resources will match this rule. "" means that there are no scope restrictions. Subresources match the scope of their parent resource. Default is "*".
webhooks.timeoutSeconds (int32)
TimeoutSeconds specifies the timeout for this webhook. After the timeout passes, the webhook call will be ignored or the API call will fail based on the failure policy. The timeout value must be between 1 and 30 seconds. Default to 10 seconds.
ValidatingWebhookConfigurationList
ValidatingWebhookConfigurationList is a list of ValidatingWebhookConfiguration.
Status contains derived information about an API server
APIServiceSpec
APIServiceSpec contains information for locating and communicating with a server. Only https is supported, though you are able to disable certificate verification.
groupPriorityMinimum (int32), required
GroupPriorityMinimum is the priority this group should have at least. Higher priority means that the group is preferred by clients over lower priority ones. Note that other versions of this group might specify even higher GroupPriorityMinimum values such that the whole group gets a higher priority. The primary sort is based on GroupPriorityMinimum, ordered highest number to lowest (20 before 10). The secondary sort is based on the alphabetical comparison of the name of the object. (v1.bar before v1.foo) We'd recommend something like: *.k8s.io (except extensions) at 18000 and PaaSes (OpenShift, Deis) are recommended to be in the 2000s
versionPriority (int32), required
VersionPriority controls the ordering of this API version inside of its group. Must be greater than zero. The primary sort is based on VersionPriority, ordered highest to lowest (20 before 10). Since it's inside of a group, the number can be small, probably in the 10s. In case of equal version priorities, the version string will be used to compute the order inside a group. If the version string is "kube-like", it will sort above non "kube-like" version strings, which are ordered lexicographically. "Kube-like" versions start with a "v", then are followed by a number (the major version), then optionally the string "alpha" or "beta" and another number (the minor version). These are sorted first by GA > beta > alpha (where GA is a version with no suffix such as beta or alpha), and then by comparing major version, then minor version. An example sorted list of versions: v10, v2, v1, v11beta2, v10beta3, v3beta1, v12alpha1, v11alpha2, foo1, foo10.
caBundle ([]byte)
Atomic: will be replaced during a merge
CABundle is a PEM encoded CA bundle which will be used to validate an API server's serving certificate. If unspecified, system trust roots on the apiserver are used.
group (string)
Group is the API group name this server hosts
insecureSkipTLSVerify (boolean)
InsecureSkipTLSVerify disables TLS certificate verification when communicating with this server. This is strongly discouraged. You should use the CABundle instead.
service (ServiceReference)
Service is a reference to the service for this API server. It must communicate on port 443. If the Service is nil, that means the handling for the API groupversion is handled locally on this server. The call will simply delegate to the normal handler chain to be fulfilled.
ServiceReference holds a reference to Service.legacy.k8s.io
service.name (string)
Name is the name of the service
service.namespace (string)
Namespace is the namespace of the service
service.port (int32)
If specified, the port on the service that hosting webhook. Default to 443 for backward compatibility. port should be a valid port number (1-65535, inclusive).
version (string)
Version is the API version this server hosts. For example, "v1"
APIServiceStatus
APIServiceStatus contains derived information about an API server
conditions ([]APIServiceCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Current service state of apiService.
APIServiceCondition describes the state of an APIService at a particular point
conditions.status (string), required
Status is the status of the condition. Can be True, False, Unknown.
conditions.type (string), required
Type is the type of the condition.
conditions.lastTransitionTime (Time)
Last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
Human-readable message indicating details about last transition.
conditions.reason (string)
Unique, one-word, CamelCase reason for the condition's last transition.
Event is a report of an event somewhere in the cluster.
apiVersion: events.k8s.io/v1
import "k8s.io/api/events/v1"
Event
Event is a report of an event somewhere in the cluster. It generally denotes some state change in the system. Events have a limited retention time and triggers and messages may evolve with time. Event consumers should not rely on the timing of an event with a given Reason reflecting a consistent underlying trigger, or the continued existence of events with that Reason. Events should be treated as informative, best-effort, supplemental data.
eventTime is the time when this Event was first observed. It is required.
MicroTime is version of Time with microsecond level precision.
action (string)
action is what action was taken/failed regarding to the regarding object. It is machine-readable. This field cannot be empty for new Events and it can have at most 128 characters.
deprecatedCount (int32)
deprecatedCount is the deprecated field assuring backward compatibility with core.v1 Event type.
deprecatedFirstTimestamp (Time)
deprecatedFirstTimestamp is the deprecated field assuring backward compatibility with core.v1 Event type.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
deprecatedLastTimestamp (Time)
deprecatedLastTimestamp is the deprecated field assuring backward compatibility with core.v1 Event type.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
deprecatedSource (EventSource)
deprecatedSource is the deprecated field assuring backward compatibility with core.v1 Event type.
EventSource contains information for an event.
deprecatedSource.component (string)
Component from which the event is generated.
deprecatedSource.host (string)
Node name on which the event is generated.
note (string)
note is a human-readable description of the status of this operation. Maximal length of the note is 1kB, but libraries should be prepared to handle values up to 64kB.
reason (string)
reason is why the action was taken. It is human-readable. This field cannot be empty for new Events and it can have at most 128 characters.
regarding contains the object this Event is about. In most cases it's an Object reporting controller implements, e.g. ReplicaSetController implements ReplicaSets and this event is emitted because it acts on some changes in a ReplicaSet object.
related is the optional secondary object for more complex actions. E.g. when regarding object triggers a creation or deletion of related object.
reportingController (string)
reportingController is the name of the controller that emitted this Event, e.g. kubernetes.io/kubelet. This field cannot be empty for new Events.
reportingInstance (string)
reportingInstance is the ID of the controller instance, e.g. kubelet-xyzf. This field cannot be empty for new Events and it can have at most 128 characters.
series (EventSeries)
series is data about the Event series this event represents or nil if it's a singleton Event.
EventSeries contain information on series of events, i.e. thing that was/is happening continuously for some time. How often to update the EventSeries is up to the event reporters. The default event reporter in "k8s.io/client-go/tools/events/event_broadcaster.go" shows how this struct is updated on heartbeats and can guide customized reporter implementations.
series.count (int32), required
count is the number of occurrences in this series up to the last heartbeat time.
series.lastObservedTime (MicroTime), required
lastObservedTime is the time when last Event from the series was seen before last heartbeat.
MicroTime is version of Time with microsecond level precision.
type (string)
type is the type of this event (Normal, Warning), new types could be added in the future. It is machine-readable. This field cannot be empty for new Events.
IPAddress represents a single IP of a single IP Family.
apiVersion: networking.k8s.io/v1
import "k8s.io/api/networking/v1"
IPAddress
IPAddress represents a single IP of a single IP Family. The object is designed to be used by APIs that operate on IP addresses. The object is used by the Service core API for allocation of IP addresses. An IP address can be represented in different formats, to guarantee the uniqueness of the IP, the name of the object is the IP address in canonical format, four decimal digits separated by dots suppressing leading zeros for IPv4 and the representation defined by RFC 5952 for IPv6. Valid: 192.168.1.5 or 2001:db8::1 or 2001:db8:aaaa:bbbb:cccc:dddd:eeee:1 Invalid: 10.01.2.3 or 2001:db8:0:0:0::1
acquireTime is a time when the current lease was acquired.
MicroTime is version of Time with microsecond level precision.
holderIdentity (string)
holderIdentity contains the identity of the holder of a current lease. If Coordinated Leader Election is used, the holder identity must be equal to the elected LeaseCandidate.metadata.name field.
leaseDurationSeconds (int32)
leaseDurationSeconds is a duration that candidates for a lease need to wait to force acquire it. This is measured against the time of last observed renewTime.
leaseTransitions (int32)
leaseTransitions is the number of transitions of a lease between holders.
preferredHolder (string)
PreferredHolder signals to a lease holder that the lease has a more optimal holder and should be given up. This field can only be set if Strategy is also set.
renewTime (MicroTime)
renewTime is a time when the current holder of a lease has last updated the lease.
MicroTime is version of Time with microsecond level precision.
strategy (string)
Strategy indicates the strategy for picking the leader for coordinated leader election. If the field is not specified, there is no active coordination for this lease. (Alpha) Using this field requires the CoordinatedLeaderElection feature gate to be enabled.
LeaseCandidate defines a candidate for a Lease object.
apiVersion: coordination.k8s.io/v1beta1
import "k8s.io/api/coordination/v1beta1"
LeaseCandidate
LeaseCandidate defines a candidate for a Lease object. Candidates are created such that coordinated leader election will pick the best leader from the list of candidates.
BinaryVersion is the binary version. It must be in a semver format without leading v. This field is required.
leaseName (string), required
LeaseName is the name of the lease for which this candidate is contending. The limits on this field are the same as on Lease.name. Multiple lease candidates may reference the same Lease.name. This field is immutable.
strategy (string), required
Strategy is the strategy that coordinated leader election will use for picking the leader. If multiple candidates for the same Lease return different strategies, the strategy provided by the candidate with the latest BinaryVersion will be used. If there is still conflict, this is a user error and coordinated leader election will not operate the Lease until resolved.
emulationVersion (string)
EmulationVersion is the emulation version. It must be in a semver format without leading v. EmulationVersion must be less than or equal to BinaryVersion. This field is required when strategy is "OldestEmulationVersion"
pingTime (MicroTime)
PingTime is the last time that the server has requested the LeaseCandidate to renew. It is only done during leader election to check if any LeaseCandidates have become ineligible. When PingTime is updated, the LeaseCandidate will respond by updating RenewTime.
MicroTime is version of Time with microsecond level precision.
renewTime (MicroTime)
RenewTime is the time that the LeaseCandidate was last updated. Any time a Lease needs to do leader election, the PingTime field is updated to signal to the LeaseCandidate that they should update the RenewTime. Old LeaseCandidate objects are also garbage collected if it has been hours since the last renew. The PingTime field is updated regularly to prevent garbage collection for still active LeaseCandidates.
MicroTime is version of Time with microsecond level precision.
NamespaceStatus is information about the current status of a Namespace.
conditions ([]NamespaceCondition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
Represents the latest available observations of a namespace's current state.
NamespaceCondition contains details about state of namespace.
conditions.status (string), required
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of namespace controller condition.
conditions.lastTransitionTime (Time)
Last time the condition transitioned from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
Human-readable message indicating details about last transition.
conditions.reason (string)
Unique, one-word, CamelCase reason for the condition's last transition.
NodeSpec describes the attributes that a node is created with.
configSource (NodeConfigSource)
Deprecated: Previously used to specify the source of the node's configuration for the DynamicKubeletConfig feature. This feature is removed.
NodeConfigSource specifies a source of node configuration. Exactly one subfield (excluding metadata) must be non-nil. This API is deprecated since 1.22
PodCIDR represents the pod IP range assigned to the node.
podCIDRs ([]string)
Set: unique values will be kept during a merge
podCIDRs represents the IP ranges assigned to the node for usage by Pods on that node. If this field is specified, the 0th entry must match the podCIDR field. It may contain at most 1 value for each of IPv4 and IPv6.
providerID (string)
ID of the node assigned by the cloud provider in the format: <ProviderName>://<ProviderSpecificNodeID>
taints ([]Taint)
Atomic: will be replaced during a merge
If specified, the node's taints.
The node this Taint is attached to has the "effect" on any pod that does not tolerate the Taint.
taints.effect (string), required
Required. The effect of the taint on pods that do not tolerate the taint. Valid effects are NoSchedule, PreferNoSchedule and NoExecute.
taints.key (string), required
Required. The taint key to be applied to a node.
taints.timeAdded (Time)
TimeAdded represents the time at which the taint was added. It is only written for NoExecute taints.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
NodeStatus is information about the current status of a node.
addresses ([]NodeAddress)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
List of addresses reachable to the node. Queried from cloud provider, if available. More info: https://kubernetes.io/docs/reference/node/node-status/#addresses Note: This field is declared as mergeable, but the merge key is not sufficiently unique, which can cause data corruption when it is merged. Callers should instead use a full-replacement patch. See https://pr.k8s.io/79391 for an example. Consumers should assume that addresses can change during the lifetime of a Node. However, there are some exceptions where this may not be possible, such as Pods that inherit a Node's address in its own status or consumers of the downward API (status.hostIP).
NodeAddress contains information for the node's address.
addresses.address (string), required
The node address.
addresses.type (string), required
Node address type, one of Hostname, ExternalIP or InternalIP.
NodeCondition contains condition information for a node.
conditions.status (string), required
Status of the condition, one of True, False, Unknown.
conditions.type (string), required
Type of node condition.
conditions.lastHeartbeatTime (Time)
Last time we got an update on a given condition.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.lastTransitionTime (Time)
Last time the condition transit from one status to another.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string)
Human readable message indicating details about last transition.
conditions.reason (string)
(brief) reason for the condition's last transition.
config (NodeConfigStatus)
Status of the config assigned to the node via the dynamic Kubelet config feature.
NodeConfigStatus describes the status of the config assigned by Node.Spec.ConfigSource.
config.active (NodeConfigSource)
Active reports the checkpointed config the node is actively using. Active will represent either the current version of the Assigned config, or the current LastKnownGood config, depending on whether attempting to use the Assigned config results in an error.
NodeConfigSource specifies a source of node configuration. Exactly one subfield (excluding metadata) must be non-nil. This API is deprecated since 1.22
Namespace is the metadata.namespace of the referenced ConfigMap. This field is required in all cases.
config.active.configMap.resourceVersion (string)
ResourceVersion is the metadata.ResourceVersion of the referenced ConfigMap. This field is forbidden in Node.Spec, and required in Node.Status.
config.active.configMap.uid (string)
UID is the metadata.UID of the referenced ConfigMap. This field is forbidden in Node.Spec, and required in Node.Status.
config.assigned (NodeConfigSource)
Assigned reports the checkpointed config the node will try to use. When Node.Spec.ConfigSource is updated, the node checkpoints the associated config payload to local disk, along with a record indicating intended config. The node refers to this record to choose its config checkpoint, and reports this record in Assigned. Assigned only updates in the status after the record has been checkpointed to disk. When the Kubelet is restarted, it tries to make the Assigned config the Active config by loading and validating the checkpointed payload identified by Assigned.
NodeConfigSource specifies a source of node configuration. Exactly one subfield (excluding metadata) must be non-nil. This API is deprecated since 1.22
ResourceVersion is the metadata.ResourceVersion of the referenced ConfigMap. This field is forbidden in Node.Spec, and required in Node.Status.
config.assigned.configMap.uid (string)
UID is the metadata.UID of the referenced ConfigMap. This field is forbidden in Node.Spec, and required in Node.Status.
config.error (string)
Error describes any problems reconciling the Spec.ConfigSource to the Active config. Errors may occur, for example, attempting to checkpoint Spec.ConfigSource to the local Assigned record, attempting to checkpoint the payload associated with Spec.ConfigSource, attempting to load or validate the Assigned config, etc. Errors may occur at different points while syncing config. Earlier errors (e.g. download or checkpointing errors) will not result in a rollback to LastKnownGood, and may resolve across Kubelet retries. Later errors (e.g. loading or validating a checkpointed config) will result in a rollback to LastKnownGood. In the latter case, it is usually possible to resolve the error by fixing the config assigned in Spec.ConfigSource. You can find additional information for debugging by searching the error message in the Kubelet log. Error is a human-readable description of the error state; machines can check whether or not Error is empty, but should not rely on the stability of the Error text across Kubelet versions.
config.lastKnownGood (NodeConfigSource)
LastKnownGood reports the checkpointed config the node will fall back to when it encounters an error attempting to use the Assigned config. The Assigned config becomes the LastKnownGood config when the node determines that the Assigned config is stable and correct. This is currently implemented as a 10-minute soak period starting when the local record of Assigned config is updated. If the Assigned config is Active at the end of this period, it becomes the LastKnownGood. Note that if Spec.ConfigSource is reset to nil (use local defaults), the LastKnownGood is also immediately reset to nil, because the local default config is always assumed good. You should not make assumptions about the node's method of determining config stability and correctness, as this may change or become configurable in the future.
NodeConfigSource specifies a source of node configuration. Exactly one subfield (excluding metadata) must be non-nil. This API is deprecated since 1.22
Features describes the set of features implemented by the CRI implementation.
NodeFeatures describes the set of features implemented by the CRI implementation. The features contained in the NodeFeatures should depend only on the cri implementation independent of runtime handlers.
features.supplementalGroupsPolicy (boolean)
SupplementalGroupsPolicy is set to true if the runtime supports SupplementalGroupsPolicy and ContainerUser.
images ([]ContainerImage)
Atomic: will be replaced during a merge
List of container images on this node
Describe a container image
images.names ([]string)
Atomic: will be replaced during a merge
Names by which this image is known. e.g. ["kubernetes.example/hyperkube:v1.0.7", "cloud-vendor.registry.example/cloud-vendor/hyperkube:v1.0.7"]
RuntimeClass defines a class of container runtime supported in the cluster.
apiVersion: node.k8s.io/v1
import "k8s.io/api/node/v1"
RuntimeClass
RuntimeClass defines a class of container runtime supported in the cluster. The RuntimeClass is used to determine which container runtime is used to run all containers in a pod. RuntimeClasses are manually defined by a user or cluster provisioner, and referenced in the PodSpec. The Kubelet is responsible for resolving the RuntimeClassName reference before running the pod. For more details, see https://kubernetes.io/docs/concepts/containers/runtime-class/
handler specifies the underlying runtime and configuration that the CRI implementation will use to handle pods of this class. The possible values are specific to the node & CRI configuration. It is assumed that all handlers are available on every node, and handlers of the same name are equivalent on every node. For example, a handler called "runc" might specify that the runc OCI runtime (using native Linux containers) will be used to run the containers in a pod. The Handler must be lowercase, conform to the DNS Label (RFC 1123) requirements, and is immutable.
podFixed represents the fixed resource overhead associated with running a pod.
scheduling (Scheduling)
scheduling holds the scheduling constraints to ensure that pods running with this RuntimeClass are scheduled to nodes that support it. If scheduling is nil, this RuntimeClass is assumed to be supported by all nodes.
Scheduling specifies the scheduling constraints for nodes supporting a RuntimeClass.
scheduling.nodeSelector (map[string]string)
nodeSelector lists labels that must be present on nodes that support this RuntimeClass. Pods using this RuntimeClass can only be scheduled to a node matched by this selector. The RuntimeClass nodeSelector is merged with a pod's existing nodeSelector. Any conflicts will cause the pod to be rejected in admission.
scheduling.tolerations ([]Toleration)
Atomic: will be replaced during a merge
tolerations are appended (excluding duplicates) to pods running with this RuntimeClass during admission, effectively unioning the set of nodes tolerated by the pod and the RuntimeClass.
The pod this Toleration is attached to tolerates any taint that matches the triple <key,value,effect> using the matching operator .
scheduling.tolerations.key (string)
Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys.
scheduling.tolerations.operator (string)
Operator represents a key's relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a pod can tolerate all taints of a particular category.
scheduling.tolerations.value (string)
Value is the taint value the toleration matches to. If the operator is Exists, the value should be empty, otherwise just a regular string.
scheduling.tolerations.effect (string)
Effect indicates the taint effect to match. Empty means match all taint effects. When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
scheduling.tolerations.tolerationSeconds (int64)
TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system.
RuntimeClassList
RuntimeClassList is a list of RuntimeClass objects.
ServiceCIDR defines a range of IP addresses using CIDR format (e.
apiVersion: networking.k8s.io/v1
import "k8s.io/api/networking/v1"
ServiceCIDR
ServiceCIDR defines a range of IP addresses using CIDR format (e.g. 192.168.0.0/24 or 2001:db2::/64). This range is used to allocate ClusterIPs to Service objects.
ServiceCIDRSpec define the CIDRs the user wants to use for allocating ClusterIPs for Services.
cidrs ([]string)
Atomic: will be replaced during a merge
CIDRs defines the IP blocks in CIDR notation (e.g. "192.168.0.0/24" or "2001:db8::/64") from which to assign service cluster IPs. Max of two CIDRs is allowed, one of each IP family. This field is immutable.
ServiceCIDRStatus
ServiceCIDRStatus describes the current state of the ServiceCIDR.
conditions ([]Condition)
Patch strategy: merge on key type
Map: unique values on key type will be kept during a merge
conditions holds an array of metav1.Condition that describe the state of the ServiceCIDR. Current service state
Condition contains details for one aspect of the current state of this API Resource.
conditions.lastTransitionTime (Time), required
lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
conditions.message (string), required
message is a human readable message indicating details about the transition. This may be an empty string.
conditions.reason (string), required
reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
conditions.status (string), required
status of the condition, one of True, False, Unknown.
conditions.type (string), required
type of condition in CamelCase or in foo.example.com/CamelCase.
conditions.observedGeneration (int64)
observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.
ServiceCIDRList
ServiceCIDRList contains a list of ServiceCIDR objects.
When present, indicates that modifications should not be persisted. An invalid or unrecognized dryRun directive will result in an error response and no further processing of the request. Valid values are: - All: all dry run stages will be processed
gracePeriodSeconds (int64)
The duration in seconds before the object should be deleted. Value must be non-negative integer. The value zero indicates delete immediately. If this value is nil, the default grace period for the specified type will be used. Defaults to a per object value if not specified. zero means delete immediately.
if set to true, it will trigger an unsafe deletion of the resource in case the normal deletion flow fails with a corrupt object error. A resource is considered corrupt if it can not be retrieved from the underlying storage successfully because of a) its data can not be transformed e.g. decryption failure, or b) it fails to decode into an object. NOTE: unsafe deletion ignores finalizer constraints, skips precondition checks, and removes the object from the storage. WARNING: This may potentially break the cluster if the workload associated with the resource being unsafe-deleted relies on normal deletion flow. Use only if you REALLY know what you are doing. The default value is false, and the user must opt in to enable it
Deprecated: please use the PropagationPolicy, this field will be deprecated in 1.7. Should the dependent objects be orphaned. If true/false, the "orphan" finalizer will be added to/removed from the object's finalizers list. Either this field or PropagationPolicy may be set, but not both.
preconditions (Preconditions)
Must be fulfilled before a deletion is carried out. If not possible, a 409 Conflict status will be returned.
Preconditions must be fulfilled before an operation (update, delete, etc.) is carried out.
preconditions.resourceVersion (string)
Specifies the target ResourceVersion
preconditions.uid (string)
Specifies the target UID.
propagationPolicy (string)
Whether and how garbage collection will be performed. Either this field or OrphanDependents may be set, but not both. The default policy is decided by the existing finalizer set in the metadata.finalizers and the resource-specific default policy. Acceptable values are: 'Orphan' - orphan the dependents; 'Background' - allow the garbage collector to delete the dependents in the background; 'Foreground' - a cascading policy that deletes all dependents in the foreground.
5.9.2 - LabelSelector
A label selector is a label query over a set of resources.
import "k8s.io/apimachinery/pkg/apis/meta/v1"
A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects.
matchExpressions ([]LabelSelectorRequirement)
Atomic: will be replaced during a merge
matchExpressions is a list of label selector requirements. The requirements are ANDed.
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
matchExpressions.key (string), required
key is the label key that the selector applies to.
matchExpressions.operator (string), required
operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.
matchExpressions.values ([]string)
Atomic: will be replaced during a merge
values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.
matchLabels (map[string]string)
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed.
5.9.3 - ListMeta
ListMeta describes metadata that synthetic resources must have, including lists and various status objects.
import "k8s.io/apimachinery/pkg/apis/meta/v1"
ListMeta describes metadata that synthetic resources must have, including lists and various status objects. A resource may have only one of {ObjectMeta, ListMeta}.
continue (string)
continue may be set if the user set a limit on the number of items returned, and indicates that the server has more data available. The value is opaque and may be used to issue another request to the endpoint that served this list to retrieve the next set of available objects. Continuing a consistent list may not be possible if the server configuration has changed or more than a few minutes have passed. The resourceVersion field returned when using this continue value will be identical to the value in the first response, unless you have received this token from an error message.
remainingItemCount (int64)
remainingItemCount is the number of subsequent items in the list which are not included in this list response. If the list request contained label or field selectors, then the number of remaining items is unknown and the field will be left unset and omitted during serialization. If the list is complete (either because it is not chunking or because this is the last chunk), then there are no more remaining items and this field will be left unset and omitted during serialization. Servers older than v1.15 do not set this field. The intended use of the remainingItemCount is estimating the size of a collection. Clients should not rely on the remainingItemCount to be set or to be exact.
A node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
import "k8s.io/api/core/v1"
A node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
key (string), required
The label key that the selector applies to.
operator (string), required
Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
values ([]string)
Atomic: will be replaced during a merge
An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.
5.9.6 - ObjectFieldSelector
ObjectFieldSelector selects an APIVersioned field of an object.
import "k8s.io/api/core/v1"
ObjectFieldSelector selects an APIVersioned field of an object.
fieldPath (string), required
Path of the field to select in the specified API version.
apiVersion (string)
Version of the schema the FieldPath is written in terms of, defaults to "v1".
5.9.7 - ObjectMeta
ObjectMeta is metadata that all persisted resources must have, which includes all objects users must create.
import "k8s.io/apimachinery/pkg/apis/meta/v1"
ObjectMeta is metadata that all persisted resources must have, which includes all objects users must create.
name (string)
Name must be unique within a namespace. Is required when creating resources, although some resources may allow a client to request the generation of an appropriate name automatically. Name is primarily intended for creation idempotence and configuration definition. Cannot be updated. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names#names
generateName (string)
GenerateName is an optional prefix, used by the server, to generate a unique name ONLY IF the Name field has not been provided. If this field is used, the name returned to the client will be different than the name passed. This value will also be combined with a unique suffix. The provided value has the same validation rules as the Name field, and may be truncated by the length of the suffix required to make the value unique on the server.
If this field is specified and the generated name exists, the server will return a 409.
Namespace defines the space within which each name must be unique. An empty namespace is equivalent to the "default" namespace, but "default" is the canonical representation. Not all objects are required to be scoped to a namespace - the value of this field for those objects will be empty.
Must be empty before the object is deleted from the registry. Each entry is an identifier for the responsible component that will remove the entry from the list. If the deletionTimestamp of the object is non-nil, entries in this list can only be removed. Finalizers may be processed and removed in any order. Order is NOT enforced because it introduces significant risk of stuck finalizers. finalizers is a shared field, any actor with permission can reorder it. If the finalizer list is processed in order, then this can lead to a situation in which the component responsible for the first finalizer in the list is waiting for a signal (field value, external system, or other) produced by a component responsible for a finalizer later in the list, resulting in a deadlock. Without enforced ordering finalizers are free to order amongst themselves and are not vulnerable to ordering changes in the list.
managedFields ([]ManagedFieldsEntry)
Atomic: will be replaced during a merge
ManagedFields maps workflow-id and version to the set of fields that are managed by that workflow. This is mostly for internal housekeeping, and users typically shouldn't need to set or understand this field. A workflow can be the user's name, a controller's name, or the name of a specific apply path like "ci-cd". The set of fields is always in the version that the workflow used when modifying the object.
ManagedFieldsEntry is a workflow-id, a FieldSet and the group version of the resource that the fieldset applies to.
managedFields.apiVersion (string)
APIVersion defines the version of this resource that this field set applies to. The format is "group/version" just like the top-level APIVersion field. It is necessary to track the version of a field set because it cannot be automatically converted.
managedFields.fieldsType (string)
FieldsType is the discriminator for the different fields format and version. There is currently only one possible value: "FieldsV1"
managedFields.fieldsV1 (FieldsV1)
FieldsV1 holds the first JSON version format as described in the "FieldsV1" type.
*FieldsV1 stores a set of fields in a data structure like a Trie, in JSON format.
Each key is either a '.' representing the field itself, and will always map to an empty set, or a string representing a sub-field or item. The string will follow one of these four formats: 'f:', where is the name of a field in a struct, or key in a map 'v:', where is the exact json formatted value of a list item 'i:', where is position of a item in a list 'k:', where is a map of a list item's key fields to their unique values If a key maps to an empty Fields value, the field that key represents is part of the set.
The exact format is defined in sigs.k8s.io/structured-merge-diff*
managedFields.manager (string)
Manager is an identifier of the workflow managing these fields.
managedFields.operation (string)
Operation is the type of operation which lead to this ManagedFieldsEntry being created. The only valid values for this field are 'Apply' and 'Update'.
managedFields.subresource (string)
Subresource is the name of the subresource used to update that object, or empty string if the object was updated through the main resource. The value of this field is used to distinguish between managers, even if they share the same name. For example, a status update will be distinct from a regular update using the same manager name. Note that the APIVersion field is not related to the Subresource field and it always corresponds to the version of the main resource.
managedFields.time (Time)
Time is the timestamp of when the ManagedFields entry was added. The timestamp will also be updated if a field is added, the manager changes any of the owned fields value or removes a field. The timestamp does not update when a field is removed from the entry because another manager took it over.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
ownerReferences ([]OwnerReference)
Patch strategy: merge on key uid
Map: unique values on key uid will be kept during a merge
List of objects depended by this object. If ALL objects in the list have been deleted, this object will be garbage collected. If this object is managed by a controller, then an entry in this list will point to this controller, with the controller field set to true. There cannot be more than one managing controller.
OwnerReference contains enough information to let you identify an owning object. An owning object must be in the same namespace as the dependent, or be cluster-scoped, so there is no namespace field.
If true, AND if the owner has the "foregroundDeletion" finalizer, then the owner cannot be deleted from the key-value store until this reference is removed. See https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletion for how the garbage collector interacts with this field and enforces the foreground deletion. Defaults to false. To set this field, a user needs "delete" permission of the owner, otherwise 422 (Unprocessable Entity) will be returned.
ownerReferences.controller (boolean)
If true, this reference points to the managing controller.
Read-only
creationTimestamp (Time)
CreationTimestamp is a timestamp representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value. It is represented in RFC3339 form and is in UTC.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
deletionGracePeriodSeconds (int64)
Number of seconds allowed for this object to gracefully terminate before it will be removed from the system. Only set when deletionTimestamp is also set. May only be shortened. Read-only.
deletionTimestamp (Time)
DeletionTimestamp is RFC 3339 date and time at which this resource will be deleted. This field is set by the server when a graceful deletion is requested by the user, and is not directly settable by a client. The resource is expected to be deleted (no longer visible from resource lists, and not reachable by name) after the time in this field, once the finalizers list is empty. As long as the finalizers list contains items, deletion is blocked. Once the deletionTimestamp is set, this value may not be unset or be set further into the future, although it may be shortened or the resource may be deleted prior to this time. For example, a user may request that a pod is deleted in 30 seconds. The Kubelet will react by sending a graceful termination signal to the containers in the pod. After that 30 seconds, the Kubelet will send a hard termination signal (SIGKILL) to the container and after cleanup, remove the pod from the API. In the presence of network partitions, this object may still exist after this timestamp, until an administrator or automated process can determine the resource is fully terminated. If not set, graceful deletion of the object has not been requested.
Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
generation (int64)
A sequence number representing a specific generation of the desired state. Populated by the system. Read-only.
resourceVersion (string)
An opaque value that represents the internal version of this object that can be used by clients to determine when objects have changed. May be used for optimistic concurrency, change detection, and the watch operation on a resource or set of resources. Clients must treat these values as opaque and passed unmodified back to the server. They may only be valid for a particular resource or set of resources.
Deprecated: selfLink is a legacy read-only field that is no longer populated by the system.
uid (string)
UID is the unique in time and space value for this object. It is typically generated by the server on successful creation of a resource and is not allowed to change on PUT operations.
ObjectReference contains enough information to let you inspect or modify the referred object.
import "k8s.io/api/core/v1"
ObjectReference contains enough information to let you inspect or modify the referred object.
apiVersion (string)
API version of the referent.
fieldPath (string)
If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object.
Patch is provided to give a concrete name and type to the Kubernetes PATCH request body.
import "k8s.io/apimachinery/pkg/apis/meta/v1"
Patch is provided to give a concrete name and type to the Kubernetes PATCH request body.
5.9.10 - Quantity
Quantity is a fixed-point representation of a number.
import "k8s.io/apimachinery/pkg/api/resource"
Quantity is a fixed-point representation of a number. It provides convenient marshaling/unmarshaling in JSON and YAML, in addition to String() and AsInt64() accessors.
The serialization format is:
\<quantity> ::= \<signedNumber>\<suffix>
(Note that \<suffix> may be empty, from the "" case in \<decimalSI>.)
\<digit> ::= 0 | 1 | ... | 9 \<digits> ::= \<digit> | \<digit>\<digits> \<number> ::= \<digits> | \<digits>.\<digits> | \<digits>. | .\<digits> \<sign> ::= "+" | "-" \<signedNumber> ::= \<number> | \<sign>\<number> \<suffix> ::= \<binarySI> | \<decimalExponent> | \<decimalSI> \<binarySI> ::= Ki | Mi | Gi | Ti | Pi | Ei
(International System of units; See: http://physics.nist.gov/cuu/Units/binary.html)
\<decimalSI> ::= m | "" | k | M | G | T | P | E
(Note that 1024 = 1Ki but 1000 = 1k; I didn't choose the capitalization.)
\<decimalExponent> ::= "e" \<signedNumber> | "E" \<signedNumber>
No matter which of the three exponent forms is used, no quantity may represent a number greater than 2^63-1 in magnitude, nor may it have more than 3 decimal places. Numbers larger or more precise will be capped or rounded up. (E.g.: 0.1m will rounded up to 1m.) This may be extended in the future if we require larger or smaller quantities.
When a Quantity is parsed from a string, it will remember the type of suffix it had, and will use the same type again when it is serialized.
Before serializing, Quantity will be put in "canonical form". This means that Exponent/suffix will be adjusted up or down (with a corresponding increase or decrease in Mantissa) such that:
No precision is lost - No fractional digits will be emitted - The exponent (or suffix) is as large as possible.
The sign will be omitted unless the number is negative.
Examples:
1.5 will be serialized as "1500m" - 1.5Gi will be serialized as "1536Mi"
Note that the quantity will NEVER be internally represented by a floating point number. That is the whole point of this exercise.
Non-canonical values will still parse as long as they are well formed, but will be re-emitted in their canonical form. (So always use canonical form, or don't diff.)
This format is intended to make it difficult to use these numbers without writing some sort of special handling code in the hopes that that will cause implementors to also use a fixed point implementation.
5.9.11 - ResourceFieldSelector
ResourceFieldSelector represents container resources (cpu, memory) and their output format.
import "k8s.io/api/core/v1"
ResourceFieldSelector represents container resources (cpu, memory) and their output format
resource (string), required
Required: resource to select
containerName (string)
Container name: required for volumes, optional for env vars
Suggested HTTP return code for this status, 0 if not set.
details (StatusDetails)
Atomic: will be replaced during a merge
Extended data associated with the reason. Each reason may define its own extended details. This field is optional and the data returned is not guaranteed to conform to any schema except that defined by the reason type.
StatusDetails is a set of additional properties that MAY be set by the server to provide additional information about a response. The Reason field of a Status object defines what attributes will be set. Clients must ignore fields that do not match the defined type of each attribute, and should assume that any attribute may be empty, invalid, or under defined.
details.causes ([]StatusCause)
Atomic: will be replaced during a merge
The Causes array includes more details associated with the StatusReason failure. Not all StatusReasons may provide detailed causes.
StatusCause provides more information about an api.Status failure, including cases when multiple errors are encountered.
details.causes.field (string)
The field of the resource that has caused this error, as named by its JSON serialization. May include dot and postfix notation for nested attributes. Arrays are zero-indexed. Fields may appear more than once in an array of causes due to fields having multiple errors. Optional.
Examples:
"name" - the field "name" on the current resource
"items[0].name" - the field "name" on the first array entry in "items"
details.causes.message (string)
A human-readable description of the cause of the error. This field may be presented as-is to a reader.
details.causes.reason (string)
A machine-readable description of the cause of the error. If this value is empty there is no information available.
details.group (string)
The group attribute of the resource associated with the status StatusReason.
The name attribute of the resource associated with the status StatusReason (when there is a single name which can be described).
details.retryAfterSeconds (int32)
If specified, the time in seconds before the operation should be retried. Some errors may indicate the client must take an alternate action - for those errors this field may indicate how long to wait before taking the alternate action.
A machine-readable description of why this operation is in the "Failure" status. If this value is empty there is no information available. A Reason clarifies an HTTP status code but does not override it.
TypedLocalObjectReference contains enough information to let you locate the typed referenced object inside the same namespace.
import "k8s.io/api/core/v1"
TypedLocalObjectReference contains enough information to let you locate the typed referenced object inside the same namespace.
kind (string), required
Kind is the type of resource being referenced
name (string), required
Name is the name of resource being referenced
apiGroup (string)
APIGroup is the group for the resource being referenced. If APIGroup is not specified, the specified Kind must be in the core API group. For any other third-party types, APIGroup is required.
5.10 - Common Parameters
allowWatchBookmarks
allowWatchBookmarks requests watch events with type "BOOKMARK". Servers that do not implement bookmarks may ignore this flag and bookmarks are sent at the server's discretion. Clients should not assume bookmarks are returned at any specific interval, nor may they assume the server will send any BOOKMARK event during a session. If this is not a watch, this field is ignored.
continue
The continue option should be set when retrieving more results from the server. Since this value is server defined, clients may only use the continue value from a previous query result with identical query parameters (except for the value of continue) and the server may reject a continue value it does not recognize. If the specified continue value is no longer valid whether due to expiration (generally five to fifteen minutes) or a configuration change on the server, the server will respond with a 410 ResourceExpired error together with a continue token. If the client needs a consistent list, it must restart their list without the continue field. Otherwise, the client may send another list request with the token received with the 410 error, the server will respond with a list starting from the next key, but from the latest snapshot, which is inconsistent from the previous list results - objects that are created, modified, or deleted after the first list request will be included in the response, as long as their keys are after the "next key".
This field is not supported when watch is true. Clients may start a watch from the last resourceVersion value returned by the server and not miss any modifications.
dryRun
When present, indicates that modifications should not be persisted. An invalid or unrecognized dryRun directive will result in an error response and no further processing of the request. Valid values are: - All: all dry run stages will be processed
fieldManager
fieldManager is a name associated with the actor or entity that is making these changes. The value must be less than or 128 characters long, and only contain printable characters, as defined by https://golang.org/pkg/unicode/#IsPrint.
fieldSelector
A selector to restrict the list of returned objects by their fields. Defaults to everything.
fieldValidation
fieldValidation instructs the server on how to handle objects in the request (POST/PUT/PATCH) containing unknown or duplicate fields. Valid values are: - Ignore: This will ignore any unknown fields that are silently dropped from the object, and will ignore all but the last duplicate field that the decoder encounters. This is the default behavior prior to v1.23. - Warn: This will send a warning via the standard warning response header for each unknown field that is dropped from the object, and for each duplicate field that is encountered. The request will still succeed if there are no other errors, and will only persist the last of any duplicate fields. This is the default in v1.23+ - Strict: This will fail the request with a BadRequest error if any unknown fields would be dropped from the object, or if any duplicate fields are present. The error returned from the server will contain all unknown and duplicate fields encountered.
force
Force is going to "force" Apply requests. It means user will re-acquire conflicting fields owned by other people. Force flag must be unset for non-apply patch requests.
gracePeriodSeconds
The duration in seconds before the object should be deleted. Value must be non-negative integer. The value zero indicates delete immediately. If this value is nil, the default grace period for the specified type will be used. Defaults to a per object value if not specified. zero means delete immediately.
ignoreStoreReadErrorWithClusterBreakingPotential
if set to true, it will trigger an unsafe deletion of the resource in case the normal deletion flow fails with a corrupt object error. A resource is considered corrupt if it can not be retrieved from the underlying storage successfully because of a) its data can not be transformed e.g. decryption failure, or b) it fails to decode into an object. NOTE: unsafe deletion ignores finalizer constraints, skips precondition checks, and removes the object from the storage. WARNING: This may potentially break the cluster if the workload associated with the resource being unsafe-deleted relies on normal deletion flow. Use only if you REALLY know what you are doing. The default value is false, and the user must opt in to enable it
labelSelector
A selector to restrict the list of returned objects by their labels. Defaults to everything.
limit
limit is a maximum number of responses to return for a list call. If more items exist, the server will set the continue field on the list metadata to a value that can be used with the same initial query to retrieve the next set of results. Setting a limit may return fewer than the requested amount of items (up to zero items) in the event all requested objects are filtered out and clients should only use the presence of the continue field to determine whether more results are available. Servers may choose not to support the limit argument and will return all of the available results. If limit is specified and the continue field is empty, clients may assume that no more results are available. This field is not supported if watch is true.
The server guarantees that the objects returned when using continue will be identical to issuing a single list call without a limit - that is, no objects created, modified, or deleted after the first request is issued will be included in any subsequent continued requests. This is sometimes referred to as a consistent snapshot, and ensures that a client that is using limit to receive smaller chunks of a very large result can ensure they see all possible objects. If objects are updated during a chunked list the version of the object that was present at the time the first list result was calculated is returned.
namespace
object name and auth scope, such as for teams and projects
pretty
If 'true', then the output is pretty printed. Defaults to 'false' unless the user-agent indicates a browser or command-line HTTP tool (curl and wget).
propagationPolicy
Whether and how garbage collection will be performed. Either this field or OrphanDependents may be set, but not both. The default policy is decided by the existing finalizer set in the metadata.finalizers and the resource-specific default policy. Acceptable values are: 'Orphan' - orphan the dependents; 'Background' - allow the garbage collector to delete the dependents in the background; 'Foreground' - a cascading policy that deletes all dependents in the foreground.
sendInitialEvents=true may be set together with watch=true. In that case, the watch stream will begin with synthetic events to produce the current state of objects in the collection. Once all such events have been sent, a synthetic "Bookmark" event will be sent. The bookmark will report the ResourceVersion (RV) corresponding to the set of objects, and be marked with "k8s.io/initial-events-end": "true" annotation. Afterwards, the watch stream will proceed as usual, sending watch events corresponding to changes (subsequent to the RV) to objects watched.
When sendInitialEvents option is set, we require resourceVersionMatch option to also be set. The semantic of the watch request is as following: - resourceVersionMatch = NotOlderThan
is interpreted as "data at least as new as the provided resourceVersion"
and the bookmark event is send when the state is synced
to a resourceVersion at least as fresh as the one provided by the ListOptions.
If resourceVersion is unset, this is interpreted as "consistent read" and the
bookmark event is send when the state is synced at least to the moment
when request started being processed.
resourceVersionMatch set to any other value or unset
Invalid error is returned.
Defaults to true if resourceVersion="" or resourceVersion="0" (for backward compatibility reasons) and to false otherwise.
timeoutSeconds
Timeout for the list/watch call. This limits the duration of the call, regardless of any activity or inactivity.
watch
Watch for changes to the described resources and return them as a stream of add, update, and remove notifications. Specify resourceVersion.
6 - Instrumentation
6.1 - Kubernetes Component SLI Metrics
High-level indicators for measuring the reliability and performance of Kubernetes components.
FEATURE STATE:Kubernetes v1.27 [beta] (enabled by default: true)
By default, Kubernetes 1.33 publishes Service Level Indicator (SLI) metrics
for each Kubernetes component binary. This metric endpoint is exposed on the serving
HTTPS port of each component, at the path /metrics/slis. The
ComponentSLIsfeature gate
defaults to enabled for each Kubernetes component as of v1.27.
SLI Metrics
With SLI metrics enabled, each Kubernetes component exposes two metrics,
labeled per healthcheck:
a gauge (which represents the current state of the healthcheck)
a counter (which records the cumulative counts observed for each healthcheck state)
You can use the metric information to calculate per-component availability statistics.
For example, the API server checks the health of etcd. You can work out and report how
available or unavailable etcd has been - as reported by its client, the API server.
The prometheus gauge data looks like this:
# HELP kubernetes_healthcheck [ALPHA] This metric records the result of a single healthcheck.
# TYPE kubernetes_healthcheck gauge
kubernetes_healthcheck{name="autoregister-completion",type="healthz"} 1
kubernetes_healthcheck{name="autoregister-completion",type="readyz"} 1
kubernetes_healthcheck{name="etcd",type="healthz"} 1
kubernetes_healthcheck{name="etcd",type="readyz"} 1
kubernetes_healthcheck{name="etcd-readiness",type="readyz"} 1
kubernetes_healthcheck{name="informer-sync",type="readyz"} 1
kubernetes_healthcheck{name="log",type="healthz"} 1
kubernetes_healthcheck{name="log",type="readyz"} 1
kubernetes_healthcheck{name="ping",type="healthz"} 1
kubernetes_healthcheck{name="ping",type="readyz"} 1
While the counter data looks like this:
# HELP kubernetes_healthchecks_total [ALPHA] This metric records the results of all healthcheck.
# TYPE kubernetes_healthchecks_total counter
kubernetes_healthchecks_total{name="autoregister-completion",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="etcd",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="etcd",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="etcd-readiness",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="informer-sync",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="informer-sync",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="log",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="log",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="readyz"} 15
Using this data
The component SLIs metrics endpoint is intended to be scraped at a high frequency. Scraping
at a high frequency means that you end up with greater granularity of the gauge's signal, which
can be then used to calculate SLOs. The /metrics/slis endpoint provides the raw data necessary
to calculate an availability SLO for the respective Kubernetes component.
6.2 - CRI Pod & Container Metrics
Collection of Pod & Container metrics via the CRI.
FEATURE STATE:Kubernetes v1.23 [alpha]
The kubelet collects pod and
container metrics via cAdvisor. As an alpha feature,
Kubernetes lets you configure the collection of pod and container
metrics via the Container Runtime Interface (CRI). You
must enable the PodAndContainerStatsFromCRIfeature gate and
use a compatible CRI implementation (containerd >= 1.6.0, CRI-O >= 1.23.0) to
use the CRI based collection mechanism.
CRI Pod & Container Metrics
With PodAndContainerStatsFromCRI enabled, the kubelet polls the underlying container
runtime for pod and container stats instead of inspecting the host system directly using cAdvisor.
The benefits of relying on the container runtime for this information as opposed to direct
collection with cAdvisor include:
Potential improved performance if the container runtime already collects this information
during normal operations. In this case, the data can be re-used instead of being aggregated
again by the kubelet.
It further decouples the kubelet and the container runtime allowing collection of metrics for
container runtimes that don't run processes directly on the host with kubelet where they are
observable by cAdvisor (for example: container runtimes that use virtualization).
6.3 - Node metrics data
Mechanisms for accessing metrics at node, volume, pod and container level, as seen by the kubelet.
The kubelet
gathers metric statistics at the node, volume, pod and container level,
and emits this information in the
Summary API.
You can send a proxied request to the stats summary API via the
Kubernetes API server.
Here is an example of a Summary API request for a node named minikube:
kubectl get --raw "/api/v1/nodes/minikube/proxy/stats/summary"
Here is the same API call using curl:
# You need to run "kubectl proxy" first# Change 8080 to the port that "kubectl proxy" assignscurl http://localhost:8080/api/v1/nodes/minikube/proxy/stats/summary
Note:
Beginning with metrics-server 0.6.x, metrics-server queries the /metrics/resource
kubelet endpoint, and not /stats/summary.
As an alpha feature, Kubernetes lets you configure kubelet to collect Linux kernel
Pressure Stall Information
(PSI) for CPU, memory and IO usage. The information is collected at node, pod and container level.
See Summary API for detailed schema.
You must enable the KubeletPSIfeature gate
to use this feature. The information is also exposed in
Prometheus metrics.
The task pages for Troubleshooting Clusters discuss
how to use a metrics pipeline that rely on these data.
6.4 - Kubernetes z-pages
Provides runtime diagnostics for Kubernetes components, offering insights into component runtime status and configuration flags.
FEATURE STATE:Kubernetes v1.32 [alpha]
Kubernetes core components can expose a suite of z-endpoints to make it easier for users
to debug their cluster and its components. These endpoints are strictly to be used for human
inspection to gain real time debugging information of a component binary.
Avoid automated scraping of data returned by these endpoints; in Kubernetes 1.33
these are an alpha feature and the response format may change in future releases.
z-pages
Kubernetes v1.33 allows you to enable z-pages to help you troubleshoot
problems with its core control plane components. These special debugging endpoints provide internal
information about running components. For Kubernetes 1.33, components
serve the following endpoints (when enabled):
Enabled using the ComponentStatuszfeature gate,
the /statusz endpoint displays high level information about the component such as its Kubernetes version, emulation version, start time and more.
The /statusz response from the API server is similar to:
kube-apiserver statusz
Warning: This endpoint is not meant to be machine parseable, has no formatting compatibility guarantees and is for debugging purposes only.
Started: Wed Oct 16 21:03:43 UTC 2024
Up: 0 hr 00 min 16 sec
Go version: go1.23.2
Binary version: 1.32.0-alpha.0.1484+5eeac4f21a491b-dirty
Emulation version: 1.32.0-alpha.0.1484
flagz
Enabled using the ComponentFlagzfeature gate, the /flagz endpoint shows you the command line arguments that were used to start a component.
The /flagz data for the API server looks something like:
kube-apiserver flags
Warning: This endpoint is not meant to be machine parseable, has no formatting compatibility guarantees and is for debugging purposes only.
advertise-address=192.168.8.2
contention-profiling=false
enable-priority-and-fairness=true
profiling=true
authorization-mode=[Node,RBAC]
authorization-webhook-cache-authorized-ttl=5m0s
authorization-webhook-cache-unauthorized-ttl=30s
authorization-webhook-version=v1beta1
default-watch-cache-size=100
6.5 - Kubernetes Metrics Reference
Details of the metric data that Kubernetes components export.
Metrics (v1.34)
This page details the metrics that different Kubernetes components export. You can query the metrics endpoint for these
components using an HTTP scrape, and fetch the current metrics data in Prometheus format.
List of Stable Kubernetes Metrics
Stable metrics observe strict API contracts and no labels can be added or removed from stable metrics during their lifetime.
Admission webhook latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit).
STABLE
Histogram
nameoperationrejectedtype
apiserver_current_inflight_requests
Maximal number of currently used inflight request limit of this apiserver per request kind in last second.
STABLE
Gauge
request_kind
apiserver_longrunning_requests
Gauge of all active long-running apiserver requests broken out by verb, group, version, resource, scope and component. Not all requests are tracked this way.
STABLE
Gauge
componentgroupresourcescopesubresourceverbversion
apiserver_request_duration_seconds
Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component.
Time between when a cronjob is scheduled to be run, and when the corresponding job is created
STABLE
Histogram
job_controller_job_pods_finished_total
The number of finished Pods that are fully tracked
STABLE
Counter
completion_moderesult
job_controller_job_sync_duration_seconds
The time it took to sync a job
STABLE
Histogram
actioncompletion_moderesult
job_controller_job_syncs_total
The number of job syncs
STABLE
Counter
actioncompletion_moderesult
job_controller_jobs_finished_total
The number of finished jobs
STABLE
Counter
completion_modereasonresult
kube_pod_resource_limit
Resources limit for workloads on the cluster, broken down by pod. This shows the resource usage the scheduler and kubelet expect per pod for resources along with the unit for the resource if any.
STABLE
Custom
namespacepodnodeschedulerpriorityresourceunit
kube_pod_resource_request
Resources requested by workloads on the cluster, broken down by pod. This shows the resource usage the scheduler and kubelet expect per pod for resources along with the unit for the resource if any.
STABLE
Custom
namespacepodnodeschedulerpriorityresourceunit
kubernetes_healthcheck
This metric records the result of a single healthcheck.
STABLE
Gauge
nametype
kubernetes_healthchecks_total
This metric records the results of all healthcheck.
STABLE
Counter
namestatustype
node_collector_evictions_total
Number of Node evictions that happened since current instance of NodeController started.
STABLE
Counter
zone
node_cpu_usage_seconds_total
Cumulative cpu time consumed by the node in core-seconds
STABLE
Custom
node_memory_working_set_bytes
Current working set of the node in bytes
STABLE
Custom
pod_cpu_usage_seconds_total
Cumulative cpu time consumed by the pod in core-seconds
STABLE
Custom
podnamespace
pod_memory_working_set_bytes
Current working set of the pod in bytes
STABLE
Custom
podnamespace
resource_scrape_error
1 if there was an error while getting container metrics, 0 otherwise
Latency for running all plugins of a specific extension point.
STABLE
Histogram
extension_pointprofilestatus
scheduler_pending_pods
Number of pending pods, by the queue type. 'active' means number of pods in activeQ; 'backoff' means number of pods in backoffQ; 'unschedulable' means number of pods in unschedulablePods that the scheduler attempted to schedule and failed; 'gated' is the number of unschedulable pods that the scheduler never attempted to schedule because they are gated.
STABLE
Gauge
queue
scheduler_pod_scheduling_attempts
Number of attempts to successfully schedule a pod.
STABLE
Histogram
scheduler_preemption_attempts_total
Total preemption attempts in the cluster till now
STABLE
Counter
scheduler_preemption_victims
Number of selected preemption victims
STABLE
Histogram
scheduler_queue_incoming_pods_total
Number of pods added to scheduling queues by event and queue type.
STABLE
Counter
eventqueue
scheduler_schedule_attempts_total
Number of attempts to schedule pods, by the result. 'unschedulable' means a pod could not be scheduled, while 'error' means an internal scheduler problem.
STABLE
Counter
profileresult
scheduler_scheduling_attempt_duration_seconds
Scheduling attempt latency in seconds (scheduling algorithm + binding)
STABLE
Histogram
profileresult
List of Beta Kubernetes Metrics
Beta metrics observe a looser API contract than its stable counterparts. No labels can be removed from beta metrics during their lifetime, however, labels can be added while the metric is in the beta stage. This offers the assurance that beta metrics will honor existing dashboards and alerts, while allowing for amendments in the future.
apiserver_cel_compilation_duration_seconds
CEL compilation time in seconds.
BETA
Histogram
apiserver_cel_evaluation_duration_seconds
CEL evaluation time in seconds.
BETA
Histogram
apiserver_flowcontrol_current_executing_requests
Number of requests in initial (for a WATCH) or any (for a non-WATCH) execution stage in the API Priority and Fairness subsystem
BETA
Gauge
flow_schemapriority_level
apiserver_flowcontrol_current_executing_seats
Concurrency (number of seats) occupied by the currently executing (initial stage for a WATCH, any stage otherwise) requests in the API Priority and Fairness subsystem
BETA
Gauge
flow_schemapriority_level
apiserver_flowcontrol_current_inqueue_requests
Number of requests currently pending in queues of the API Priority and Fairness subsystem
BETA
Gauge
flow_schemapriority_level
apiserver_flowcontrol_dispatched_requests_total
Number of requests executed by API Priority and Fairness subsystem
BETA
Counter
flow_schemapriority_level
apiserver_flowcontrol_nominal_limit_seats
Nominal number of execution seats configured for each priority level
BETA
Gauge
priority_level
apiserver_flowcontrol_rejected_requests_total
Number of requests rejected by API Priority and Fairness subsystem
Validation admission latency for individual validation expressions in seconds, labeled by policy and further including binding and enforcement action taken.
BETA
Histogram
enforcement_actionerror_typepolicypolicy_binding
apiserver_validating_admission_policy_check_total
Validation admission policy check total, labeled by policy and further identified by binding and enforcement action taken.
Number of times declarative validation has panicked during validation.
BETA
Counter
disabled_metrics_total
The count of disabled metrics.
BETA
Counter
hidden_metrics_total
The count of hidden metrics.
BETA
Counter
kubernetes_feature_enabled
This metric records the data about the stage and enablement of a k8s feature.
BETA
Gauge
namestage
prober_probe_total
Cumulative number of a liveness, readiness or startup probe for a container by result.
BETA
Counter
containernamespacepodpod_uidprobe_typeresult
registered_metrics_total
The count of registered metrics broken by stability level and deprecation version.
BETA
Counter
deprecated_versionstability_level
scheduler_pod_scheduling_sli_duration_seconds
E2e latency for a pod being scheduled, from the time the pod enters the scheduling queue and might involve multiple scheduling attempts.
BETA
Histogram
attempts
List of Alpha Kubernetes Metrics
Alpha metrics do not have any API guarantees. These metrics must be used at your own risk, subsequent versions of Kubernetes may remove these metrics altogether, or mutate the API in such a way that breaks existing dashboards and alerts.
aggregator_discovery_aggregation_count_total
Counter of number of times discovery was aggregated
ALPHA
Counter
aggregator_openapi_v2_regeneration_count
Counter of OpenAPI v2 spec regeneration count broken down by causing APIService name and reason.
ALPHA
Counter
apiservicereason
aggregator_openapi_v2_regeneration_duration
Gauge of OpenAPI v2 spec regeneration duration in seconds.
ALPHA
Gauge
reason
aggregator_unavailable_apiservice
Gauge of APIServices which are marked as unavailable broken down by APIService name.
ALPHA
Custom
name
aggregator_unavailable_apiservice_total
Counter of APIServices which are marked as unavailable broken down by APIService name and reason.
Admission match condition evaluation errors count, identified by name of resource containing the match condition and broken out for each kind containing matchConditions (webhook or policy), operation and admission type (validate or admit).
Admission match condition evaluation time in seconds, identified by name and broken out for each kind containing matchConditions (webhook or policy), operation and type (validate or admit).
Admission match condition evaluation exclusions count, identified by name of resource containing the match condition and broken out for each kind containing matchConditions (webhook or policy), operation and admission type (validate or admit).
Admission sub-step latency summary in seconds, broken out for each operation and API resource and step type (validate or admit).
ALPHA
Summary
operationrejectedtype
apiserver_admission_webhook_fail_open_count
Admission webhook fail open count, identified by name and broken out for each admission type (validating or admit).
ALPHA
Counter
nametype
apiserver_admission_webhook_rejection_count
Admission webhook rejection count, identified by name and broken out for each admission type (validating or admit) and operation. Additional labels specify an error type (calling_webhook_error or apiserver_internal_error if an error occurred; no_error otherwise) and optionally a non-zero rejection code if the webhook rejects the request with an HTTP status code (honored by the apiserver when the code is greater or equal to 400). Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded.
ALPHA
Counter
error_typenameoperationrejection_codetype
apiserver_admission_webhook_request_total
Admission webhook request total, identified by name and broken out for each admission type (validating or admit) and operation. Additional labels specify whether the request was rejected or not and an HTTP status code. Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded.
ALPHA
Counter
codenameoperationrejectedtype
apiserver_audit_error_total
Counter of audit events that failed to be audited properly. Plugin identifies the plugin affected by the error.
ALPHA
Counter
plugin
apiserver_audit_event_total
Counter of audit events generated and sent to the audit backend.
ALPHA
Counter
apiserver_audit_level_total
Counter of policy levels for audit events (1 per request).
ALPHA
Counter
level
apiserver_audit_requests_rejected_total
Counter of apiserver requests rejected due to an error in audit logging backend.
Latency of jwt authentication operations in seconds. This is the time spent authenticating a token for cache miss only (i.e. when the token is not found in the cache).
Request latency in seconds. Broken down by status code.
ALPHA
Histogram
code
apiserver_delegated_authz_request_total
Number of HTTP requests partitioned by status code.
ALPHA
Counter
code
apiserver_egress_dialer_dial_duration_seconds
Dial latency histogram in seconds, labeled by the protocol (http-connect or grpc), transport (tcp or uds)
ALPHA
Histogram
protocoltransport
apiserver_egress_dialer_dial_failure_count
Dial failure count, labeled by the protocol (http-connect or grpc), transport (tcp or uds), and stage (connect or proxy). The stage indicates at which stage the dial failed
ALPHA
Counter
protocolstagetransport
apiserver_egress_dialer_dial_start_total
Dial starts, labeled by the protocol (http-connect or grpc) and transport (tcp or uds).
Number of records in data encryption key (DEK) source cache. On a restart, this value is an approximation of the number of decrypt RPC calls the server will make to the KMS plugin.
Observations, at the end of every nanosecond, of number of requests (as a fraction of the relevant limit) waiting or in any stage of execution (but only initial stage for WATCHes)
Observations, at the end of every nanosecond, of the number of requests (as a fraction of the relevant limit) waiting or in regular stage of execution
ALPHA
TimingRatioHistogram
phaserequest_kind
apiserver_flowcontrol_request_concurrency_in_use
Concurrency (number of seats) occupied by the currently executing (initial stage for a WATCH, any stage otherwise) requests in the API Priority and Fairness subsystem
ALPHA
Gauge
flow_schemapriority_level
1.31.0
apiserver_flowcontrol_request_concurrency_limit
Nominal number of execution seats configured for each priority level
Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate OR the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment)
ALPHA
Counter
apiserver_kube_aggregator_x509_missing_san_total
Counts the number of requests to servers missing SAN extension in their serving certificate OR the number of connection failures due to the lack of x509 certificate SAN extension missing (either/or, based on the runtime environment)
ALPHA
Counter
apiserver_nodeport_repair_port_errors_total
Number of errors detected on ports by the repair loop broken down by type of error: leak, repair, full, outOfRange, duplicate, unknown
ALPHA
Counter
type
apiserver_nodeport_repair_reconcile_errors_total
Number of reconciliation failures on the nodeport repair reconcile loop
ALPHA
Counter
apiserver_request_aborts_total
Number of requests which apiserver aborted possibly due to a timeout, for each group, version, verb, resource, subresource and scope
ALPHA
Counter
groupresourcescopesubresourceverbversion
apiserver_request_body_size_bytes
Apiserver request body size in bytes broken out by resource and verb.
ALPHA
Histogram
resourceverb
apiserver_request_filter_duration_seconds
Request filter latency distribution in seconds, for each filter type
ALPHA
Histogram
filter
apiserver_request_post_timeout_total
Tracks the activity of the request handlers after the associated requests have been timed out by the apiserver
ALPHA
Counter
sourcestatus
apiserver_request_sli_duration_seconds
Response latency distribution (not counting webhook duration and priority & fairness queue wait times) in seconds for each verb, group, version, resource, subresource, scope and component.
ALPHA
Histogram
componentgroupresourcescopesubresourceverbversion
apiserver_request_slo_duration_seconds
Response latency distribution (not counting webhook duration and priority & fairness queue wait times) in seconds for each verb, group, version, resource, subresource, scope and component.
ALPHA
Histogram
componentgroupresourcescopesubresourceverbversion
1.27.0
apiserver_request_terminations_total
Number of requests which apiserver terminated in self-defense.
Total number of cache misses while accessing key decryption key(KEK).
ALPHA
Counter
apiserver_storage_events_received_total
Number of etcd events received split by kind.
ALPHA
Counter
resource
apiserver_storage_list_evaluated_objects_total
Number of objects tested in the course of serving a LIST request from storage
ALPHA
Counter
resource
apiserver_storage_list_fetched_objects_total
Number of objects read from storage in the course of serving a LIST request
ALPHA
Counter
resource
apiserver_storage_list_returned_objects_total
Number of objects returned for a LIST request from storage
ALPHA
Counter
resource
apiserver_storage_list_total
Number of LIST requests served from storage
ALPHA
Counter
resource
apiserver_storage_transformation_duration_seconds
Latencies in seconds of value transformation operations.
ALPHA
Histogram
transformation_typetransformer_prefix
apiserver_storage_transformation_operations_total
Total number of transformations. Successful transformation will have a status 'OK' and a varied status string when the transformation fails. The status, resource, and transformation_type fields can be used for alerting purposes. For example, you can monitor for encryption/decryption failures using the transformation_type (e.g., from_storage for decryption and to_storage for encryption). Additionally, these fields can be used to ensure that the correct transformers are applied to each resource.
Total number of requests that were handled by the StreamTranslatorProxy, which processes streaming RemoteCommand/V5
ALPHA
Counter
code
apiserver_stream_tunnel_requests_total
Total number of requests that were handled by the StreamTunnelProxy, which processes streaming PortForward/V2
ALPHA
Counter
code
apiserver_terminated_watchers_total
Counter of watchers closed due to unresponsiveness broken by resource type.
ALPHA
Counter
resource
apiserver_tls_handshake_errors_total
Number of requests dropped with 'TLS handshake error from' error
ALPHA
Counter
apiserver_watch_cache_consistent_read_total
Counter for consistent reads from cache.
ALPHA
Counter
fallbackresourcesuccess
apiserver_watch_cache_events_dispatched_total
Counter of events dispatched in watch cache broken by resource type.
ALPHA
Counter
resource
apiserver_watch_cache_events_received_total
Counter of events received in watch cache broken by resource type.
ALPHA
Counter
resource
apiserver_watch_cache_initializations_total
Counter of watch cache initializations broken by resource type.
ALPHA
Counter
resource
apiserver_watch_cache_read_wait_seconds
Histogram of time spent waiting for a watch cache to become fresh.
ALPHA
Histogram
resource
apiserver_watch_cache_resource_version
Current resource version of watch cache broken by resource type.
ALPHA
Gauge
resource
apiserver_watch_events_sizes
Watch event size distribution in bytes
ALPHA
Histogram
groupkindversion
apiserver_watch_events_total
Number of events sent in watch clients
ALPHA
Counter
groupkindversion
apiserver_watch_list_duration_seconds
Response latency distribution in seconds for watch list requests broken by group, version, resource and scope.
ALPHA
Histogram
groupresourcescopeversion
apiserver_webhooks_x509_insecure_sha1_total
Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate OR the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment)
ALPHA
Counter
apiserver_webhooks_x509_missing_san_total
Counts the number of requests to servers missing SAN extension in their serving certificate OR the number of connection failures due to the lack of x509 certificate SAN extension missing (either/or, based on the runtime environment)
Total number of Pods deleted by DeviceTaintEvictionController since its start.
ALPHA
Counter
dra_grpc_operations_duration_seconds
Duration in seconds of the DRA gRPC operations
ALPHA
Histogram
driver_namegrpc_status_codemethod_name
dra_operations_duration_seconds
Latency histogram in seconds for the duration of handling all ResourceClaims referenced by a pod when the pod starts or stops. Identified by the name of the operation (PrepareResources or UnprepareResources) and separated by the success of the operation. The number of failed operations is provided through the histogram's overall count.
ALPHA
Histogram
is_erroroperation_name
endpoint_slice_controller_changes
Number of EndpointSlice changes
ALPHA
Counter
operation
endpoint_slice_controller_desired_endpoint_slices
Number of EndpointSlices that would exist with perfect endpoint allocation
The number of volumes that failed force cleanup after their reconstruction failed during kubelet startup.
ALPHA
Counter
force_cleaned_failed_volume_operations_total
The number of volumes that were force cleaned after their reconstruction failed during kubelet startup. This includes both successful and failed cleanups.
The time(seconds) that the HPA controller takes to calculate one metric. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. The label 'error' should be either 'spec', 'internal', or 'none'. The label 'metric_type' corresponds to HPA.spec.metrics[*].type
Number of metric computations. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. The label 'metric_type' corresponds to HPA.spec.metrics[*].type
The time(seconds) that the HPA controller takes to reconcile once. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. Note that if both spec and internal errors happen during a reconciliation, the first one to occur is reported in `error` label.
Number of reconciliations of HPA controller. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. Note that if both spec and internal errors happen during a reconciliation, the first one to occur is reported in `error` label.
ALPHA
Counter
actionerror
job_controller_job_finished_indexes_total
`The number of finished indexes. Possible values for the, status label are: "succeeded", "failed". Possible values for the, backoffLimit label are: "perIndex" and "global"`
ALPHA
Counter
backoffLimitstatus
job_controller_job_pods_creation_total
`The number of Pods created by the Job controller labelled with a reason for the Pod creation., This metric also distinguishes between Pods created using different PodReplacementPolicy settings., Possible values of the "reason" label are:, "new", "recreate_terminating_or_failed", "recreate_failed"., Possible values of the "status" label are:, "succeeded", "failed".`
ALPHA
Counter
reasonstatus
job_controller_jobs_by_external_controller_total
The number of Jobs managed by an external controller
`The number of failed Pods handled by failure policy with, respect to the failure policy action applied based on the matched, rule. Possible values of the action label correspond to the, possible values for the failure policy rule action, which are:, "FailJob", "Ignore" and "Count".`
`The number of terminated pods (phase=Failed|Succeeded), that have the finalizer batch.kubernetes.io/job-tracking, The event label can be "add" or "delete".`
ALPHA
Counter
event
kube_apiserver_clusterip_allocator_allocated_ips
Gauge measuring the number of allocated IPs for Services
Total number of requests for pods/logs sliced by usage type: enforce_tls, skip_tls_allowed, skip_tls_denied
ALPHA
Counter
usage
1.27.0
kubelet_active_pods
The number of pods the kubelet considers active and which are being considered when admitting new pods. static is true if the pod is not from the apiserver.
ALPHA
Gauge
static
kubelet_admission_rejections_total
Cumulative number pod admission rejections by the Kubelet.
Gauge of the TTL (time-to-live) of the Kubelet's client certificate. The value is in seconds until certificate expiry (negative if already expired). If client certificate is invalid or unused, the value will be +INF.
Histogram of the number of seconds the previous certificate lived before being rotated.
ALPHA
Histogram
kubelet_certificate_manager_server_ttl_seconds
Gauge of the shortest TTL (time-to-live) of the Kubelet's serving certificate. The value is in seconds until certificate expiry (negative if already expired). If serving certificate is invalid or unused, the value will be +INF.
ALPHA
Gauge
kubelet_cgroup_manager_duration_seconds
Duration in seconds for cgroup manager operations. Broken down by method.
ALPHA
Histogram
operation_type
kubelet_cgroup_version
cgroup version on the hosts.
ALPHA
Gauge
kubelet_container_aligned_compute_resources_count
Cumulative number of aligned compute resources allocated to containers by alignment type.
Duration in seconds of node startup during registration.
ALPHA
Gauge
kubelet_orphan_pod_cleaned_volumes
The total number of orphaned Pods whose volumes were cleaned in the last periodic sweep.
ALPHA
Gauge
kubelet_orphan_pod_cleaned_volumes_errors
The number of orphaned Pods whose volumes failed to be cleaned in the last periodic sweep.
ALPHA
Gauge
kubelet_orphaned_runtime_pods_total
Number of pods that have been detected in the container runtime without being already known to the pod worker. This typically indicates the kubelet was restarted while a pod was force deleted in the API or in the local configuration, which is unusual.
ALPHA
Counter
kubelet_pleg_discard_events
The number of discard events in PLEG.
ALPHA
Counter
kubelet_pleg_last_seen_seconds
Timestamp in seconds when PLEG was last seen active.
ALPHA
Gauge
kubelet_pleg_relist_duration_seconds
Duration in seconds for relisting pods in PLEG.
ALPHA
Histogram
kubelet_pleg_relist_interval_seconds
Interval in seconds between relisting in PLEG.
ALPHA
Histogram
kubelet_pod_resources_endpoint_errors_get
Number of requests to the PodResource Get endpoint which returned error. Broken down by server api version.
Number of requests to the PodResource GetAllocatableResources endpoint. Broken down by server api version.
ALPHA
Counter
server_api_version
kubelet_pod_resources_endpoint_requests_list
Number of requests to the PodResource List endpoint. Broken down by server api version.
ALPHA
Counter
server_api_version
kubelet_pod_resources_endpoint_requests_total
Cumulative number of requests to the PodResource endpoint. Broken down by server api version.
ALPHA
Counter
server_api_version
kubelet_pod_start_duration_seconds
Duration in seconds from kubelet seeing a pod for the first time to the pod starting to run
ALPHA
Histogram
kubelet_pod_start_sli_duration_seconds
Duration in seconds to start a pod, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch
ALPHA
Histogram
kubelet_pod_start_total_duration_seconds
Duration in seconds to start a pod since creation, including time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch
ALPHA
Histogram
kubelet_pod_status_sync_duration_seconds
Duration in seconds to sync a pod status update. Measures time from detection of a change to pod status until the API is successfully updated for that pod, even if multiple intevening changes to pod status occur.
ALPHA
Histogram
kubelet_pod_worker_duration_seconds
Duration in seconds to sync a single pod. Broken down by operation type: create, update, or sync
ALPHA
Histogram
operation_type
kubelet_pod_worker_start_duration_seconds
Duration in seconds from kubelet seeing a pod to starting a worker.
ALPHA
Histogram
kubelet_preemptions
Cumulative number of pod preemptions by preemption resource
ALPHA
Counter
preemption_signal
kubelet_restarted_pods_total
Number of pods that have been restarted because they were deleted and recreated with the same UID while the kubelet was watching them (common for static pods, extremely uncommon for API pods)
ALPHA
Counter
static
kubelet_run_podsandbox_duration_seconds
Duration in seconds of the run_podsandbox operations. Broken down by RuntimeClass.Handler.
ALPHA
Histogram
runtime_handler
kubelet_run_podsandbox_errors_total
Cumulative number of the run_podsandbox operation errors by RuntimeClass.Handler.
ALPHA
Counter
runtime_handler
kubelet_running_containers
Number of containers currently running
ALPHA
Gauge
container_state
kubelet_running_pods
Number of pods that have a running pod sandbox
ALPHA
Gauge
kubelet_runtime_operations_duration_seconds
Duration in seconds of runtime operations. Broken down by operation type.
ALPHA
Histogram
operation_type
kubelet_runtime_operations_errors_total
Cumulative number of runtime operation errors by operation type.
ALPHA
Counter
operation_type
kubelet_runtime_operations_total
Cumulative number of runtime operations by operation type.
ALPHA
Counter
operation_type
kubelet_server_expiration_renew_errors
Counter of certificate renewal errors.
ALPHA
Counter
kubelet_sleep_action_terminated_early_total
The number of times lifecycle sleep handler got terminated before it finishes
ALPHA
Counter
kubelet_started_containers_errors_total
Cumulative number of errors when starting containers
Cumulative number of errors when starting hostprocess containers. This metric will only be collected on Windows.
ALPHA
Counter
codecontainer_type
kubelet_started_host_process_containers_total
Cumulative number of hostprocess containers started. This metric will only be collected on Windows.
ALPHA
Counter
container_type
kubelet_started_pods_errors_total
Cumulative number of errors when starting pods
ALPHA
Counter
kubelet_started_pods_total
Cumulative number of pods started
ALPHA
Counter
kubelet_topology_manager_admission_duration_ms
Duration in milliseconds to serve a pod admission request.
ALPHA
Histogram
kubelet_topology_manager_admission_errors_total
The number of admission request failures where resources could not be aligned.
ALPHA
Counter
kubelet_topology_manager_admission_requests_total
The number of admission requests where resources have to be aligned.
ALPHA
Counter
kubelet_volume_metric_collection_duration_seconds
Duration in seconds to calculate volume stats
ALPHA
Histogram
metric_source
kubelet_volume_stats_available_bytes
Number of available bytes in the volume
ALPHA
Custom
namespacepersistentvolumeclaim
kubelet_volume_stats_capacity_bytes
Capacity in bytes of the volume
ALPHA
Custom
namespacepersistentvolumeclaim
kubelet_volume_stats_health_status_abnormal
Abnormal volume health status. The count is either 1 or 0. 1 indicates the volume is unhealthy, 0 indicates volume is healthy
ALPHA
Custom
namespacepersistentvolumeclaim
kubelet_volume_stats_inodes
Maximum number of inodes in the volume
ALPHA
Custom
namespacepersistentvolumeclaim
kubelet_volume_stats_inodes_free
Number of free inodes in the volume
ALPHA
Custom
namespacepersistentvolumeclaim
kubelet_volume_stats_inodes_used
Number of used inodes in the volume
ALPHA
Custom
namespacepersistentvolumeclaim
kubelet_volume_stats_used_bytes
Number of used bytes in the volume
ALPHA
Custom
namespacepersistentvolumeclaim
kubelet_working_pods
Number of pods the kubelet is actually running, broken down by lifecycle phase, whether the pod is desired, orphaned, or runtime only (also orphaned), and whether the pod is static. An orphaned pod has been removed from local configuration or force deleted in the API and consumes resources that are not otherwise visible.
A metric with a constant '1' value labeled by major, minor, git version, git commit, git tree state, build date, Go version, and compiler from which Kubernetes was built, and platform on which it is running.
Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Please make sure to group by name.
ALPHA
Gauge
name
leader_election_slowpath_total
Total number of slow path exercised in renewing leader leases. 'name' is the string used to identify the lease. Please make sure to group by name.
ALPHA
Counter
name
node_authorizer_graph_actions_duration_seconds
Histogram of duration of graph actions in node authorizer.
ALPHA
Histogram
operation
node_collector_unhealthy_nodes_in_zone
Gauge measuring number of not Ready Nodes per zones.
Number of exempt requests, not counting ignored or out of scope requests.
ALPHA
Counter
request_operationresourcesubresource
pod_swap_usage_bytes
Current amount of the pod swap usage in bytes. Reported only on non-windows systems
ALPHA
Custom
podnamespace
prober_probe_duration_seconds
Duration in seconds for a probe response.
ALPHA
Histogram
containernamespacepodprobe_type
pv_collector_bound_pv_count
Gauge measuring number of persistent volume currently bound
ALPHA
Custom
storage_class
pv_collector_bound_pvc_count
Gauge measuring number of persistent volume claim currently bound
ALPHA
Custom
namespacestorage_classvolume_attributes_class
pv_collector_total_pv_count
Gauge measuring total number of persistent volumes
ALPHA
Custom
plugin_namevolume_mode
pv_collector_unbound_pv_count
Gauge measuring number of persistent volume currently unbound
ALPHA
Custom
storage_class
pv_collector_unbound_pvc_count
Gauge measuring number of persistent volume claim currently unbound
ALPHA
Custom
namespacestorage_classvolume_attributes_class
reconstruct_volume_operations_errors_total
The number of volumes that failed reconstruction from the operating system during kubelet startup.
ALPHA
Counter
reconstruct_volume_operations_total
The number of volumes that were attempted to be reconstructed from the operating system during kubelet startup. This includes both successful and failed reconstruction.
ALPHA
Counter
replicaset_controller_sorting_deletion_age_ratio
The ratio of chosen deleted pod's ages to the current youngest pod's age (at the time). Should be <2. The intent of this metric is to measure the rough efficacy of the LogarithmicScaleDown feature gate's effect on the sorting (and deletion) of pods when a replicaset scales down. This only considers Ready pods when calculating and reporting.
Number of ResourceClaims creation request failures
ALPHA
Counter
resourceclaim_controller_resource_claims
Number of ResourceClaims
ALPHA
Gauge
rest_client_dns_resolution_duration_seconds
DNS resolver latency in seconds. Broken down by host.
ALPHA
Histogram
host
rest_client_exec_plugin_call_total
Number of calls to an exec plugin, partitioned by the type of event encountered (no_error, plugin_execution_error, plugin_not_found_error, client_internal_error) and an optional exit code. The exit code will be set to 0 if and only if the plugin call was successful.
ALPHA
Counter
call_statuscode
rest_client_exec_plugin_certificate_rotation_age
Histogram of the number of seconds the last auth exec plugin client certificate lived before being rotated. If auth exec plugin client certificates are unused, histogram will contain no data.
ALPHA
Histogram
rest_client_exec_plugin_ttl_seconds
Gauge of the shortest TTL (time-to-live) of the client certificate(s) managed by the auth exec plugin. The value is in seconds until certificate expiry (negative if already expired). If auth exec plugins are unused or manage no TLS certificates, the value will be +INF.
ALPHA
Gauge
rest_client_rate_limiter_duration_seconds
Client side rate limiter latency in seconds. Broken down by verb, and host.
ALPHA
Histogram
hostverb
rest_client_request_duration_seconds
Request latency in seconds. Broken down by verb, and host.
ALPHA
Histogram
hostverb
rest_client_request_retries_total
Number of request retries, partitioned by status code, verb, and host.
ALPHA
Counter
codehostverb
rest_client_request_size_bytes
Request size in bytes. Broken down by verb and host.
ALPHA
Histogram
hostverb
rest_client_requests_total
Number of HTTP requests, partitioned by status code, method, and host.
ALPHA
Counter
codehostmethod
rest_client_response_size_bytes
Response size in bytes. Broken down by verb and host.
ALPHA
Histogram
hostverb
rest_client_transport_cache_entries
Number of transport entries in the internal cache.
ALPHA
Gauge
rest_client_transport_create_calls_total
Number of calls to get a new transport, partitioned by the result of the operation hit: obtained from the cache, miss: created and added to the cache, uncacheable: created and not cached
ALPHA
Counter
result
retroactive_storageclass_errors_total
Total number of failed retroactive StorageClass assignments to persistent volume claim
ALPHA
Counter
retroactive_storageclass_total
Total number of retroactive StorageClass assignments to persistent volume claim
ALPHA
Counter
root_ca_cert_publisher_sync_duration_seconds
Number of namespace syncs happened in root ca cert publisher.
ALPHA
Histogram
code
root_ca_cert_publisher_sync_total
Number of namespace syncs happened in root ca cert publisher.
ALPHA
Counter
code
running_managed_controllers
Indicates where instances of a controller are currently running
ALPHA
Gauge
managername
scheduler_cache_size
Number of nodes, pods, and assumed (bound) pods in the scheduler cache.
ALPHA
Gauge
type
scheduler_event_handling_duration_seconds
Event handling latency in seconds.
ALPHA
Histogram
event
scheduler_goroutines
Number of running goroutines split by the work they do such as binding.
ALPHA
Gauge
operation
scheduler_inflight_events
Number of events currently tracked in the scheduling queue.
ALPHA
Gauge
event
scheduler_permit_wait_duration_seconds
Duration of waiting on permit.
ALPHA
Histogram
result
scheduler_plugin_evaluation_total
Number of attempts to schedule pods by each plugin and the extension point (available only in PreFilter, Filter, PreScore, and Score).
ALPHA
Counter
extension_pointpluginprofile
scheduler_plugin_execution_duration_seconds
Duration for running a plugin at a specific extension point.
ALPHA
Histogram
extension_pointpluginstatus
scheduler_preemption_goroutines_duration_seconds
Duration in seconds for running goroutines for the preemption.
Duration for running a queueing hint function of a plugin.
ALPHA
Histogram
eventhintplugin
scheduler_scheduling_algorithm_duration_seconds
Scheduling algorithm latency in seconds
ALPHA
Histogram
scheduler_unschedulable_pods
The number of unschedulable pods broken down by plugin name. A pod will increment the gauge for all plugins that caused it to not schedule and so this metric have meaning only when broken down by plugin.
ALPHA
Gauge
pluginprofile
scheduler_volume_binder_cache_requests_total
Total number for request volume binding cache
ALPHA
Counter
operation
scheduler_volume_scheduling_stage_error_total
Volume scheduling stage error count
ALPHA
Counter
operation
scrape_error
1 if there was an error while getting container metrics, 0 otherwise
The time it took to delete the job since it became eligible for deletion
ALPHA
Histogram
volume_manager_selinux_container_errors_total
Number of errors when kubelet cannot compute SELinux context for a container. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of containers.
ALPHA
Gauge
access_mode
volume_manager_selinux_container_warnings_total
Number of errors when kubelet cannot compute SELinux context for a container that are ignored. They will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes.
Number of errors when a Pod defines different SELinux contexts for its containers that use the same volume. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of Pods.
Number of errors when a Pod defines different SELinux contexts for its containers that use the same volume. They are not errors yet, but they will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes.
Number of errors when a Pod uses a volume that is already mounted with a different SELinux context than the Pod needs. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of Pods.
Number of errors when a Pod uses a volume that is already mounted with a different SELinux context than the Pod needs. They are not errors yet, but they will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes.
ALPHA
Gauge
access_modevolume_plugin
volume_manager_selinux_volumes_admitted_total
Number of volumes whose SELinux context was fine and will be mounted with mount -o context option.
ALPHA
Gauge
access_modevolume_plugin
volume_manager_total_volumes
Number of volumes in Volume Manager
ALPHA
Custom
plugin_namestate
volume_operation_total_errors
Total volume operation errors
ALPHA
Counter
operation_nameplugin_name
volume_operation_total_seconds
Storage operation end to end duration in seconds
ALPHA
Histogram
operation_nameplugin_name
watch_cache_capacity
Total capacity of watch cache broken by resource type.
ALPHA
Gauge
resource
watch_cache_capacity_decrease_total
Total number of watch cache capacity decrease events broken by resource type.
ALPHA
Counter
resource
watch_cache_capacity_increase_total
Total number of watch cache capacity increase events broken by resource type.
ALPHA
Counter
resource
workqueue_adds_total
Total number of adds handled by workqueue
ALPHA
Counter
name
workqueue_depth
Current depth of workqueue
ALPHA
Gauge
name
workqueue_longest_running_processor_seconds
How many seconds has the longest running processor for workqueue been running.
ALPHA
Gauge
name
workqueue_queue_duration_seconds
How long in seconds an item stays in workqueue before being requested.
ALPHA
Histogram
name
workqueue_retries_total
Total number of retries handled by workqueue
ALPHA
Counter
name
workqueue_unfinished_work_seconds
How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
ALPHA
Gauge
name
workqueue_work_duration_seconds
How long in seconds processing an item from workqueue takes.
We're extremely grateful for security researchers and users that report vulnerabilities to
the Kubernetes Open Source Community. All reports are thoroughly investigated by a set of community volunteers.
To make a report, submit your vulnerability to the Kubernetes bug bounty program.
This allows triage and handling of the vulnerability with standardized response times.
You may encrypt your email to this list using the GPG keys of the
Security Response Committee members.
Encryption using GPG is NOT required to make a disclosure.
When Should I Report a Vulnerability?
You think you discovered a potential security vulnerability in Kubernetes
You are unsure how a vulnerability affects Kubernetes
You think you discovered a vulnerability in another project that Kubernetes depends on
For projects with their own vulnerability reporting and disclosure process, please report it directly there
When Should I NOT Report a Vulnerability?
You need help tuning Kubernetes components for security
You need help applying security related updates
Your issue is not security related
Security Vulnerability Response
Each report is acknowledged and analyzed by Security Response Committee members within 3 working days.
This will set off the Security Release Process.
Any vulnerability information shared with Security Response Committee stays within Kubernetes project
and will not be disseminated to other projects unless it is necessary to get the issue fixed.
As the security issue moves from triage, to identified fix, to release planning we will keep the reporter updated.
Public Disclosure Timing
A public disclosure date is negotiated by the Kubernetes Security Response Committee and the bug submitter.
We prefer to fully disclose the bug as soon as possible once a user mitigation is available. It is reasonable
to delay disclosure when the bug or the fix is not yet fully understood, the solution is not well-tested,
or for vendor coordination. The timeframe for disclosure is from immediate (especially if it's already publicly known)
to a few weeks. For a vulnerability with a straightforward mitigation, we expect report date to disclosure date
to be on the order of 7 days. The Kubernetes Security Response Committee holds the final say when setting a disclosure date.
The Kubernetes project publishes a programmatically accessible feed of published
security issues in JSON feed
and RSS feed
formats. You can access it by executing the following commands:
This feed is auto-refreshing with a noticeable but small lag (minutes to hours)
from the time a CVE is announced to the time it is accessible in this feed.
The source of truth of this feed is a set of GitHub Issues, filtered by a controlled and
restricted label official-cve-feed. The raw data is stored in a Google Cloud
Bucket which is writable only by a small number of trusted members of the
Community.
8 - Node Reference Information
This section contains the following reference topics about nodes:
FEATURE STATE:Kubernetes v1.30 [beta] (enabled by default: true)
Checkpointing a container is the functionality to create a stateful copy of a
running container. Once you have a stateful copy of a container, you could
move it to a different computer for debugging or similar purposes.
If you move the checkpointed container data to a computer that's able to restore
it, that restored container continues to run at exactly the same
point it was checkpointed. You can also inspect the saved data, provided that you
have suitable tools for doing so.
Creating a checkpoint of a container might have security implications. Typically
a checkpoint contains all memory pages of all processes in the checkpointed
container. This means that everything that used to be in memory is now available
on the local disk. This includes all private data and possibly keys used for
encryption. The underlying CRI implementations (the container runtime on that node)
should create the checkpoint archive to be only accessible by the root user. It
is still important to remember if the checkpoint archive is transferred to another
system all memory pages will be readable by the owner of the checkpoint archive.
Operations
post checkpoint the specified container
Tell the kubelet to checkpoint a specific container from the specified Pod.
The kubelet will request a checkpoint from the underlying
CRI implementation. In the checkpoint
request the kubelet will specify the name of the checkpoint archive as
checkpoint-<podFullName>-<containerName>-<timestamp>.tar and also request to
store the checkpoint archive in the checkpoints directory below its root
directory (as defined by --root-dir). This defaults to
/var/lib/kubelet/checkpoints.
The checkpoint archive is in tar format, and could be listed using an implementation of
tar. The contents of the
archive depend on the underlying CRI implementation (the container runtime on that node).
Timeout in seconds to wait until the checkpoint creation is finished.
If zero or no timeout is specified the default CRI timeout value will be used. Checkpoint
creation time depends directly on the used memory of the container.
The more memory a container uses the more time is required to create
the corresponding checkpoint.
Response
200: OK
401: Unauthorized
404: Not Found (if the ContainerCheckpoint feature gate is disabled)
404: Not Found (if the specified namespace, pod or container cannot be found)
500: Internal Server Error (if the CRI implementation encounter an error during checkpointing (see error message for further details))
500: Internal Server Error (if the CRI implementation does not implement the checkpoint CRI API (see error message for further details))
8.2 - Linux Kernel Version Requirements
Note: This section links to third party projects that provide functionality required by Kubernetes. The Kubernetes project authors aren't responsible for these projects, which are listed alphabetically. To add a project to this list, read the content guide before submitting a change. More information.
Many features rely on specific kernel functionalities and have minimum kernel version requirements.
However, relying solely on kernel version numbers may not be sufficient
for certain operating system distributions,
as maintainers for distributions such as RHEL, Ubuntu and SUSE often backport selected features
to older kernel releases (retaining the older kernel version).
Pod sysctls
On Linux, the sysctl() system call configures kernel parameters at run time. There is a command
line tool named sysctl that you can use to configure these parameters, and many are exposed via
the proc filesystem.
Some sysctls are only available if you have a modern enough kernel.
The following sysctls have a minimal kernel version requirement,
and are supported in the safe set:
net.ipv4.vs.conn_reuse_mode (used in ipvs proxy mode, needs kernel 4.1+);
kube proxy nftables proxy mode
For Kubernetes 1.33, the
nftables mode of kube-proxy requires
version 1.0.1 or later
of the nft command-line tool, as well as kernel 5.13 or later.
For testing/development purposes, you can use older kernels, as far back as 5.4 if you set the
nftables.skipKernelVersionCheck option in the kube-proxy config.
But this is not recommended in production since it may cause problems with other nftables
users on the system.
Version 2 control groups
Kubernetes cgroup v1 support is in maintained mode starting from Kubernetes v1.31; using cgroup v2
is recommended.
In Linux 5.8, the system-level cpu.stat file was added to the root cgroup for convenience.
In runc document, Kernel older than 5.2 is not recommended due to lack of freezer.
Some features may depend on new kernel functionalities and have specific kernel requirements:
Recursive read only mount:
This is implemented by applying the MOUNT_ATTR_RDONLY attribute with the AT_RECURSIVE flag
using mount_setattr(2) added in Linux kernel v5.12.
Pod user namespace support requires minimal kernel version 6.5+, according to
KEP-127.
For node system swap, tmpfs set to noswap
is not supported until kernel 6.3.
Linux kernel long term maintenance
Active kernel releases can be found in kernel.org.
There are usually several long term maintenance kernel releases provided for the purposes of backporting
bug fixes for older kernel trees. Only important bug fixes are applied to such kernels and they don't
usually see very frequent releases, especially for older trees.
See the Linux kernel website for the list of releases
in the Longterm category.
8.3 - Articles on dockershim Removal and on Using CRI-compatible Runtimes
This is a list of articles and other pages that are either
about the Kubernetes' deprecation and removal of dockershim,
or about using CRI-compatible container runtimes,
in connection with that removal.
topology.kubernetes.io/zone
(if known to the kubelet – Kubernetes may not have this information to set the label)
Note:
The value of these labels is cloud provider specific and is not guaranteed to be reliable.
For example, the value of kubernetes.io/hostname may be the same as the node name in some environments
and a different value in other environments.
The kubelet is mostly a stateless
process running on a Kubernetes node.
This document outlines files that kubelet reads and writes.
Note:
This document is for informational purpose and not describing any guaranteed behaviors or APIs.
It lists resources used by the kubelet, which is an implementation detail and a subject to change at any release.
The kubelet typically uses the control plane as
the source of truth on what needs to run on the Node, and the
container runtime to retrieve
the current state of containers. So long as you provide a kubeconfig (API client configuration)
to the kubelet, the kubelet does connect to your control plane; otherwise the node operates in
standalone mode.
On Linux nodes, the kubelet also relies on reading cgroups and various system files to collect metrics.
On Windows nodes, the kubelet collects metrics via a different mechanism that does not rely on
paths.
There are also a few other files that are used by the kubelet as well,
as kubelet communicates using local Unix-domain sockets. Some are sockets that the
kubelet listens on, and for other sockets the kubelet discovers them and then connects
as a client.
Note:
This page lists paths as Linux paths, which map to the Windows paths by adding a root disk
C:\ in place of / (unless specified otherwise).
For example, /var/lib/kubelet/device-plugins maps to C:\var\lib\kubelet\device-plugins.
Configuration
Kubelet configuration files
The path to the kubelet configuration file can be configured
using the command line argument --config. The kubelet also supports
drop-in configuration files
to enhance configuration.
Certificates
Certificates and private keys are typically located at /var/lib/kubelet/pki,
but can be configured using the --cert-dir kubelet command line argument.
Names of certificate files are also configurable.
Manifests
Manifests for static pods are typically located in /etc/kubernetes/manifests.
Location can be configured using the staticPodPath kubelet configuration option.
Systemd unit settings
When kubelet is running as a systemd unit, some kubelet configuration may be declared
in systemd unit settings file. Typically it includes:
All resource managers keep the mapping of Pods to allocated resources in state files.
State files are located in the kubelet's base directory, also termed the root directory
(but not the same as /, the node root directory). You can configure the base directory
for the kubelet
using the kubelet command line argument --root-dir.
Device manager creates checkpoints in the same directory with socket files: /var/lib/kubelet/device-plugins/.
The name of a checkpoint file is kubelet_internal_checkpoint for
Device Manager
Pod resource checkpoints
FEATURE STATE:Kubernetes v1.33 [beta] (enabled by default: true)
allocated_pods_state records the resources allocated to each pod running on the node
actuated_pods_state records the resources that have been accepted by the runtime
for each pod pod running on the node
The files are located within the kubelet base directory
(/var/lib/kubelet by default on Linux; configurable using --root-dir).
Container runtime
Kubelet communicates with the container runtime using socket configured via the
configuration parameters:
containerRuntimeEndpoint for runtime operations
imageServiceEndpoint for image management operations
The actual values of those endpoints depend on the container runtime being used.
Device plugins
The kubelet exposes a socket at the path /var/lib/kubelet/device-plugins/kubelet.sock for
various Device Plugins to register.
When a device plugin registers itself, it provides its socket path for the kubelet to connect.
The device plugin socket should be in the directory device-plugins within the kubelet base
directory. On a typical Linux node, this means /var/lib/kubelet/device-plugins.
Pod resources API
Pod Resources API
will be exposed at the path /var/lib/kubelet/pod-resources.
DRA, CSI, and Device plugins
The kubelet looks for socket files created by device plugins managed via DRA,
device manager, or storage plugins, and then attempts to connect
to these sockets. The directory that the kubelet looks in is plugins_registry within the kubelet base
directory, so on a typical Linux node this means /var/lib/kubelet/plugins_registry.
Note, for the device plugins there are two alternative registration mechanisms
Only one should be used for a given plugin.
The types of plugins that can place socket files into that directory are:
CSI plugins
DRA plugins
Device Manager plugins
(typically /var/lib/kubelet/plugins_registry).
Graceful node shutdown
FEATURE STATE:Kubernetes v1.21 [beta] (enabled by default: true)
Graceful node shutdown
stores state locally at /var/lib/kubelet/graceful_node_shutdown_state.
Image Pull Records
FEATURE STATE:Kubernetes v1.33 [alpha] (enabled by default: false)
The kubelet stores records of attempted and successful image pulls, and uses it
to verify that the image was previously successfully pulled with the same credentials.
These records are cached as files in the image_registry directory within
the kubelet base directory. On a typical Linux node, this means /var/lib/kubelet/image_manager.
There are two subdirectories to image_manager:
pulling - stores records about images the Kubelet is attempting to pull.
pulled - stores records about images that were successfully pulled by the Kubelet,
along with metadata about the credentials used for the pulls.
Seccomp profile files referenced from Pods should be placed in /var/lib/kubelet/seccomp.
See the seccomp reference for details.
AppArmor
The kubelet does not load or refer to AppArmor profiles by a Kubernetes-specific path.
AppArmor profiles are loaded via the node operating system rather then referenced by their path.
Locking
FEATURE STATE:Kubernetes v1.2 [alpha]
A lock file for the kubelet; typically /var/run/kubelet.lock. The kubelet uses this to ensure
that two different kubelets don't try to run in conflict with each other.
You can configure the path to the lock file using the the --lock-file kubelet command line argument.
If two kubelets on the same node use a different value for the lock file path, they will not be able to
detect a conflict when both are running.
When using the kubelet's --config-dir flag to specify a drop-in directory for
configuration, there is some specific behavior on how different types are
merged.
Here are some examples of how different data types behave during configuration merging:
Structure Fields
There are two types of structure fields in a YAML structure: singular (or a
scalar type) and embedded (structures that contain scalar types).
The configuration merging process handles the overriding of singular and embedded struct fields to create a resulting kubelet configuration.
For instance, you may want a baseline kubelet configuration for all nodes, but you may want to customize the address and authorization fields.
This can be done as follows:
You can overide the slices/lists values of the kubelet configuration.
However, the entire list gets overridden during the merging process.
For example, you can override the clusterDNS list as follows:
Individual fields in maps, regardless of their value types (boolean, string, etc.), can be selectively overridden.
However, for map[string][]string, the entire list associated with a specific field gets overridden.
Let's understand this better with an example, particularly on fields like featureGates and staticPodURLHeader:
This page provides details of version compatibility between the Kubernetes
device plugin API,
and different versions of Kubernetes itself.
Compatibility matrix
v1alpha1
v1beta1
Kubernetes 1.21
-
✓
Kubernetes 1.22
-
✓
Kubernetes 1.23
-
✓
Kubernetes 1.24
-
✓
Kubernetes 1.25
-
✓
Kubernetes 1.26
-
✓
Key:
✓ Exactly the same features / API objects in both device plugin API and
the Kubernetes version.
+ The device plugin API has features or API objects that may not be present in the
Kubernetes cluster, either because the device plugin API has added additional new API
calls, or that the server has removed an old API call. However, everything they have in
common (most other APIs) will work. Note that alpha APIs may vanish or
change significantly between one minor release and the next.
- The Kubernetes cluster has features the device plugin API can't use,
either because server has added additional API calls, or that device plugin API has
removed an old API call. However, everything they share in common (most APIs) will work.
8.8 - Kubelet Systemd Watchdog
FEATURE STATE:Kubernetes v1.32 [beta] (enabled by default: true)
On Linux nodes, Kubernetes 1.33 supports integrating with
systemd to allow the operating system supervisor to recover
a failed kubelet. This integration is not enabled by default.
It can be used as an alternative to periodically requesting
the kubelet's /healthz endpoint for health checks. If the kubelet
does not respond to the watchdog within the timeout period, the watchdog
will kill the kubelet.
The systemd watchdog works by requiring the service to periodically send
a keep-alive signal to the systemd process. If the signal is not received
within a specified timeout period, the service is considered unresponsive
and is terminated. The service can then be restarted according to the configuration.
Configuration
Using the systemd watchdog requires configuring the WatchdogSec parameter
in the [Service] section of the kubelet service unit file:
[Service]
WatchdogSec=30s
Setting WatchdogSec=30s indicates a service watchdog timeout of 30 seconds.
Within the kubelet, the sd_notify() function is invoked, at intervals of \( WatchdogSec \div 2\). to send
WATCHDOG=1 (a keep-alive message). If the watchdog is not fed
within the timeout period, the kubelet will be killed. Setting Restart
to "always", "on-failure", "on-watchdog", or "on-abnormal" will ensure that the service
is automatically restarted.
Some details about the systemd configuration:
If you set the systemd value for WatchdogSec to 0, or omit setting it, the systemd watchdog is not
enabled for this unit.
The kubelet supports a minimum watchdog period of 1.0 seconds; this is to prevent the kubelet
from being killed unexpectedly. You can set the value of WatchdogSec in a systemd unit definition
to a period shorter than 1 second, but Kubernetes does not support any shorter interval.
The timeout does not have to be a whole integer number of seconds.
The Kubernetes project suggests setting WatchdogSec to approximately a 15s period.
Periods longer than 10 minutes are supported but explicitly not recommended.
Example Configuration
[Unit]Description=kubelet: The Kubernetes Node AgentDocumentation=https://kubernetes.io/docs/home/Wants=network-online.targetAfter=network-online.target[Service]ExecStart=/usr/bin/kubelet# Configures the watchdog timeoutWatchdogSec=30sRestart=on-failureStartLimitInterval=0RestartSec=10[Install]WantedBy=multi-user.target
The status of a node in Kubernetes is a critical
aspect of managing a Kubernetes cluster. In this article, we'll cover the basics of
monitoring and maintaining node status to ensure a healthy and stable cluster.
Node status fields
A Node's status contains the following information:
You can use kubectl to view a Node's status and other details:
kubectl describe node <insert-node-name-here>
Each section of the output is described below.
Addresses
The usage of these fields varies depending on your cloud provider or bare metal configuration.
HostName: The hostname as reported by the node's kernel. Can be overridden via the kubelet
--hostname-override parameter.
ExternalIP: Typically the IP address of the node that is externally routable (available from
outside the cluster).
InternalIP: Typically the IP address of the node that is routable only within the cluster.
Conditions
The conditions field describes the status of all Running nodes. Examples of conditions include:
Node conditions, and a description of when each condition applies.
Node Condition
Description
Ready
True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 50 seconds)
DiskPressure
True if pressure exists on the disk size—that is, if the disk capacity is low; otherwise False
MemoryPressure
True if pressure exists on the node memory—that is, if the node memory is low; otherwise False
PIDPressure
True if pressure exists on the processes—that is, if there are too many processes on the node; otherwise False
NetworkUnavailable
True if the network for the node is not correctly configured, otherwise False
Note:
If you use command-line tools to print details of a cordoned Node, the Condition includes
SchedulingDisabled. SchedulingDisabled is not a Condition in the Kubernetes API; instead,
cordoned nodes are marked Unschedulable in their spec.
In the Kubernetes API, a node's condition is represented as part of the .status
of the Node resource. For example, the following JSON structure describes a healthy node:
When problems occur on nodes, the Kubernetes control plane automatically creates
taints that match the conditions
affecting the node. An example of this is when the status of the Ready condition
remains Unknown or False for longer than the kube-controller-manager's NodeMonitorGracePeriod,
which defaults to 50 seconds. This will cause either an node.kubernetes.io/unreachable taint, for an Unknown status,
or a node.kubernetes.io/not-ready taint, for a False status, to be added to the Node.
These taints affect pending pods as the scheduler takes the Node's taints into consideration when
assigning a pod to a Node. Existing pods scheduled to the node may be evicted due to the application
of NoExecute taints. Pods may also have tolerations that let
them schedule to and continue running on a Node even though it has a specific taint.
Describes the resources available on the node: CPU, memory, and the maximum
number of pods that can be scheduled onto the node.
The fields in the capacity block indicate the total amount of resources that a
Node has. The allocatable block indicates the amount of resources on a
Node that is available to be consumed by normal Pods.
You may read more about capacity and allocatable resources while learning how
to reserve compute resources
on a Node.
Info
Describes general information about the node, such as kernel version, Kubernetes
version (kubelet and kube-proxy version), container runtime details, and which
operating system the node uses.
The kubelet gathers this information from the node and publishes it into
the Kubernetes API.
Heartbeats
Heartbeats, sent by Kubernetes nodes, help your cluster determine the
availability of each node, and to take action when failures are detected.
For nodes there are two forms of heartbeats:
updates to the .status of a Node
Lease objects
within the kube-node-leasenamespace.
Each Node has an associated Lease object.
Compared to updates to .status of a Node, a Lease is a lightweight resource.
Using Leases for heartbeats reduces the performance impact of these updates
for large clusters.
The kubelet is responsible for creating and updating the .status of Nodes,
and for updating their related Leases.
The kubelet updates the node's .status either when there is change in status
or if there has been no update for a configured interval. The default interval
for .status updates to Nodes is 5 minutes, which is much longer than the 40
second default timeout for unreachable nodes.
The kubelet creates and then updates its Lease object every 10 seconds
(the default update interval). Lease updates occur independently from
updates to the Node's .status. If the Lease update fails, the kubelet retries,
using exponential backoff that starts at 200 milliseconds and capped at 7 seconds.
8.10 - Seccomp and Kubernetes
Seccomp stands for secure computing mode and has been a feature of the Linux
kernel since version 2.6.12. It can be used to sandbox the privileges of a
process, restricting the calls it is able to make from userspace into the
kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a
node to your Pods and containers.
Seccomp fields
FEATURE STATE:Kubernetes v1.19 [stable]
There are four ways to specify a seccomp profile for a
pod:
The Pod in the example above runs as Unconfined, while the
ephemeral-container and init-container specifically defines
RuntimeDefault. If the ephemeral or init container would not have set the
securityContext.seccompProfile field explicitly, then the value would be
inherited from the Pod. The same applies to the container, which runs a
Localhost profile my-profile.json.
Generally speaking, fields from (ephemeral) containers have a higher priority
than the Pod level value, while containers which do not set the seccomp field
inherit the profile from the Pod.
Note:
It is not possible to apply a seccomp profile to a Pod or container running with
privileged: true set in the container's securityContext. Privileged
containers always run as Unconfined.
The following values are possible for the seccompProfile.type:
Unconfined
The workload runs without any seccomp restrictions.
RuntimeDefault
A default seccomp profile defined by the
container runtime
is applied. The default profiles aim to provide a strong set of security
defaults while preserving the functionality of the workload. It is possible that
the default profiles differ between container runtimes and their release
versions, for example when comparing those from
CRI-O and
containerd.
Localhost
The localhostProfile will be applied, which has to be available on the node
disk (on Linux it's /var/lib/kubelet/seccomp). The availability of the seccomp
profile is verified by the
container runtime
on container creation. If the profile does not exist, then the container
creation will fail with a CreateContainerError.
Localhost profiles
Seccomp profiles are JSON files following the scheme defined by the
OCI runtime specification.
A profile basically defines actions based on matched syscalls, but also allows
to pass specific values as arguments to syscalls. For example:
The defaultAction in the profile above is defined as SCMP_ACT_ERRNO and
will return as fallback to the actions defined in syscalls. The error is
defined as code 38 via the defaultErrnoRet field.
The following actions are generally possible:
SCMP_ACT_ERRNO
Return the specified error code.
SCMP_ACT_ALLOW
Allow the syscall to be executed.
SCMP_ACT_KILL_PROCESS
Kill the process.
SCMP_ACT_KILL_THREAD and SCMP_ACT_KILL
Kill only the thread.
SCMP_ACT_TRAP
Throw a SIGSYS signal.
SCMP_ACT_NOTIFY and SECCOMP_RET_USER_NOTIF.
Notify the user space.
SCMP_ACT_TRACE
Notify a tracing process with the specified value.
SCMP_ACT_LOG
Allow the syscall to be executed after the action has been logged to syslog or
auditd.
Some actions like SCMP_ACT_NOTIFY or SECCOMP_RET_USER_NOTIF may be not
supported depending on the container runtime, OCI runtime or Linux kernel
version being used. There may be also further limitations, for example that
SCMP_ACT_NOTIFY cannot be used as defaultAction or for certain syscalls like
write. All those limitations are defined by either the OCI runtime
(runc,
crun) or
libseccomp.
The syscalls JSON array contains a list of objects referencing syscalls by
their respective names. For example, the action SCMP_ACT_ALLOW can be used
to create a whitelist of allowed syscalls as outlined in the example above. It
would also be possible to define another list using the action SCMP_ACT_ERRNO
but a different return (errnoRet) value.
It is also possible to specify the arguments (args) passed to certain
syscalls. More information about those advanced use cases can be found in the
OCI runtime spec
and the Seccomp Linux kernel documentation.
There are 3 valid values for the protocol of a port for a Service:
SCTP
FEATURE STATE:Kubernetes v1.20 [stable]
When using a network plugin that supports SCTP traffic, you can use SCTP for
most Services. For type: LoadBalancer Services, SCTP support depends on the cloud
provider offering this facility. (Most do not).
SCTP is not supported on nodes that run Windows.
Support for multihomed SCTP associations
The support of multihomed SCTP associations requires that the CNI plugin can support the assignment of multiple interfaces and IP addresses to a Pod.
NAT for multihomed SCTP associations requires special logic in the corresponding kernel modules.
TCP
You can use TCP for any kind of Service, and it's the default network protocol.
UDP
You can use UDP for most Services. For type: LoadBalancer Services,
UDP support depends on the cloud provider offering this facility.
Special cases
HTTP
If your cloud provider supports it, you can use a Service in LoadBalancer mode to
configure a load balancer outside of your Kubernetes cluster, in a special mode
where your cloud provider's load balancer implements HTTP / HTTPS reverse proxying,
with traffic forwarded to the backend endpoints for that Service.
Typically, you set the protocol for the Service to TCP and add an
annotation
(usually specific to your cloud provider) that configures the load balancer
to handle traffic at the HTTP level.
This configuration might also include serving HTTPS (HTTP over TLS) and
reverse-proxying plain HTTP to your workload.
Note:
You can also use an Ingress to expose
HTTP/HTTPS Services.
You might additionally want to specify that the
application protocol
of the connection is http or https. Use http if the session from the
load balancer to your workload is HTTP without TLS, and use https if the
session from the load balancer to your workload uses TLS encryption.
PROXY protocol
If your cloud provider supports it, you can use a Service set to type: LoadBalancer
to configure a load balancer outside of Kubernetes itself, that will forward connections
wrapped with the
PROXY protocol.
The load balancer then sends an initial series of octets describing the
incoming connection, similar to this example (PROXY protocol v1):
PROXY TCP4 192.0.2.202 10.0.42.7 12345 7\r\n
The data after the proxy protocol preamble are the original
data from the client. When either side closes the connection,
the load balancer also triggers a connection close and sends
any remaining data where feasible.
Typically, you define a Service with the protocol to TCP.
You also set an annotation, specific to your
cloud provider, that configures the load balancer to wrap each incoming connection in the PROXY protocol.
TLS
If your cloud provider supports it, you can use a Service set to type: LoadBalancer as
a way to set up external reverse proxying, where the connection from client to load
balancer is TLS encrypted and the load balancer is the TLS server peer.
The connection from the load balancer to your workload can also be TLS,
or might be plain text. The exact options available to you depend on your
cloud provider or custom Service implementation.
Typically, you set the protocol to TCP and set an annotation
(usually specific to your cloud provider) that configures the load balancer
to act as a TLS server. You would configure the TLS identity (as server,
and possibly also as a client that connects to your workload) using
mechanisms that are specific to your cloud provider.
9.2 - Ports and Protocols
When running Kubernetes in an environment with strict network boundaries, such
as on-premises datacenter with physical network firewalls or Virtual
Networks in Public Cloud, it is useful to be aware of the ports and protocols
used by Kubernetes components.
Control plane
Protocol
Direction
Port Range
Purpose
Used By
TCP
Inbound
6443
Kubernetes API server
All
TCP
Inbound
2379-2380
etcd server client API
kube-apiserver, etcd
TCP
Inbound
10250
Kubelet API
Self, Control plane
TCP
Inbound
10259
kube-scheduler
Self
TCP
Inbound
10257
kube-controller-manager
Self
Although etcd ports are included in control plane section, you can also host your own
etcd cluster externally or on custom ports.
All default port numbers can be overridden. When custom ports are used those
ports need to be open instead of defaults mentioned here.
One common example is API server port that is sometimes switched
to 443. Alternatively, the default port is kept as is and API server is put
behind a load balancer that listens on 443 and routes the requests to API server
on the default port.
9.3 - Virtual IPs and Service Proxies
Every node in a Kubernetes
cluster runs a
kube-proxy
(unless you have deployed your own alternative component in place of kube-proxy).
The kube-proxy component is responsible for implementing a virtual IP
mechanism for Services
of type other than
ExternalName.
Each instance of kube-proxy watches the Kubernetes
control plane
for the addition and removal of Service and EndpointSlice
objects. For each Service, kube-proxy
calls appropriate APIs (depending on the kube-proxy mode) to configure
the node to capture traffic to the Service's clusterIP and port,
and redirect that traffic to one of the Service's endpoints
(usually a Pod, but possibly an arbitrary user-provided IP address). A control
loop ensures that the rules on each node are reliably synchronized with
the Service and EndpointSlice state as indicated by the API server.
Virtual IP mechanism for Services, using iptables mode
A question that pops up every now and then is why Kubernetes relies on
proxying to forward inbound traffic to backends. What about other
approaches? For example, would it be possible to configure DNS records that
have multiple A values (or AAAA for IPv6), and rely on round-robin name
resolution?
There are a few reasons for using proxying for Services:
There is a long history of DNS implementations not respecting record TTLs,
and caching the results of name lookups after they should have expired.
Some apps do DNS lookups only once and cache the results indefinitely.
Even if apps and libraries did proper re-resolution, the low or zero TTLs
on the DNS records could impose a high load on DNS that then becomes
difficult to manage.
Later in this page you can read about how various kube-proxy implementations work.
Overall, you should note that, when running kube-proxy, kernel level rules may be modified
(for example, iptables rules might get created), which won't get cleaned up, in some
cases until you reboot. Thus, running kube-proxy is something that should only be done
by an administrator who understands the consequences of having a low level, privileged
network proxying service on a computer. Although the kube-proxy executable supports a
cleanup function, this function is not an official feature and thus is only available
to use as-is.
Some of the details in this reference refer to an example: the backend
Pods for a stateless
image-processing workloads, running with
three replicas. Those replicas are
fungible—frontends do not care which backend they use. While the actual Pods that
compose the backend set may change, the frontend clients should not need to be aware of that,
nor should they need to keep track of the set of backends themselves.
Proxy modes
The kube-proxy starts up in different modes, which are determined by its configuration.
On Linux nodes, the available modes for kube-proxy are:
a mode where the kube-proxy configures packet forwarding rules in the Windows kernel
iptables proxy mode
This proxy mode is only available on Linux nodes.
In this mode, kube-proxy configures packet forwarding rules using the
iptables API of the kernel netfilter subsystem. For each endpoint, it
installs iptables rules which, by default, select a backend Pod at
random.
Example
As an example, consider the image processing application described earlier
in the page.
When the backend Service is created, the Kubernetes control plane assigns a virtual
IP address, for example 10.0.0.1. For this example, assume that the
Service port is 1234.
All of the kube-proxy instances in the cluster observe the creation of the new
Service.
When kube-proxy on a node sees a new Service, it installs a series of iptables rules
which redirect from the virtual IP address to more iptables rules, defined per Service.
The per-Service rules link to further rules for each backend endpoint, and the per-
endpoint rules redirect traffic (using destination NAT) to the backends.
When a client connects to the Service's virtual IP address the iptables rule kicks in.
A backend is chosen (either based on session affinity or randomly) and packets are
redirected to the backend without rewriting the client IP address.
This same basic flow executes when traffic comes in through a type: NodePort Service, or
through a load-balancer, though in those cases the client IP address does get altered.
Optimizing iptables mode performance
In iptables mode, kube-proxy creates a few iptables rules for every
Service, and a few iptables rules for each endpoint IP address. In
clusters with tens of thousands of Pods and Services, this means tens
of thousands of iptables rules, and kube-proxy may take a long time to update the rules
in the kernel when Services (or their EndpointSlices) change. You can adjust the syncing
behavior of kube-proxy via options in the
iptables section
of the kube-proxy configuration file
(which you specify via kube-proxy --config <path>):
...iptables:minSyncPeriod:1ssyncPeriod:30s...
minSyncPeriod
The minSyncPeriod parameter sets the minimum duration between
attempts to resynchronize iptables rules with the kernel. If it is
0s, then kube-proxy will always immediately synchronize the rules
every time any Service or EndpointSlice changes. This works fine in very
small clusters, but it results in a lot of redundant work when lots of
things change in a small time period. For example, if you have a
Service backed by a Deployment
with 100 pods, and you delete the
Deployment, then with minSyncPeriod: 0s, kube-proxy would end up
removing the Service's endpoints from the iptables rules one by one,
resulting in a total of 100 updates. With a larger minSyncPeriod, multiple
Pod deletion events would get aggregated together, so kube-proxy might
instead end up making, say, 5 updates, each removing 20 endpoints,
which will be much more efficient in terms of CPU, and result in the
full set of changes being synchronized faster.
The larger the value of minSyncPeriod, the more work that can be
aggregated, but the downside is that each individual change may end up
waiting up to the full minSyncPeriod before being processed, meaning
that the iptables rules spend more time being out-of-sync with the
current API server state.
The default value of 1s should work well in most clusters, but in very
large clusters it may be necessary to set it to a larger value.
Especially, if kube-proxy's sync_proxy_rules_duration_seconds metric
indicates an average time much larger than 1 second, then bumping up
minSyncPeriod may make updates more efficient.
Updating legacy minSyncPeriod configuration
Older versions of kube-proxy updated all the rules for all Services on
every sync; this led to performance issues (update lag) in large
clusters, and the recommended solution was to set a larger
minSyncPeriod. Since Kubernetes v1.28, the iptables mode of
kube-proxy uses a more minimal approach, only making updates where
Services or EndpointSlices have actually changed.
If you were previously overriding minSyncPeriod, you should try
removing that override and letting kube-proxy use the default value
(1s) or at least a smaller value than you were using before upgrading.
If you are not running kube-proxy from Kubernetes 1.33, check
the behavior and associated advice for the version that you are actually running.
syncPeriod
The syncPeriod parameter controls a handful of synchronization
operations that are not directly related to changes in individual
Services and EndpointSlices. In particular, it controls how quickly
kube-proxy notices if an external component has interfered with
kube-proxy's iptables rules. In large clusters, kube-proxy also only
performs certain cleanup operations once every syncPeriod to avoid
unnecessary work.
For the most part, increasing syncPeriod is not expected to have much
impact on performance, but in the past, it was sometimes useful to set
it to a very large value (eg, 1h). This is no longer recommended,
and is likely to hurt functionality more than it improves performance.
IPVS proxy mode
This proxy mode is only available on Linux nodes.
In ipvs mode, kube-proxy uses the kernel IPVS and iptables APIs to
create rules to redirect traffic from Service IPs to endpoint IPs.
The IPVS proxy mode is based on netfilter hook function that is similar to
iptables mode, but uses a hash table as the underlying data structure and works
in the kernel space.
That means kube-proxy in IPVS mode redirects traffic with lower latency than
kube-proxy in iptables mode, with much better performance when synchronizing
proxy rules. Compared to the iptables proxy mode, IPVS mode also supports a
higher throughput of network traffic.
IPVS provides more options for balancing traffic to backend Pods;
these are:
rr (Round Robin): Traffic is equally distributed amongst the backing servers.
wrr (Weighted Round Robin): Traffic is routed to the backing servers based on
the weights of the servers. Servers with higher weights receive new connections
and get more requests than servers with lower weights.
lc (Least Connection): More traffic is assigned to servers with fewer active connections.
wlc (Weighted Least Connection): More traffic is routed to servers with fewer connections
relative to their weights, that is, connections divided by weight.
lblc (Locality based Least Connection): Traffic for the same IP address is sent to the
same backing server if the server is not overloaded and available; otherwise the traffic
is sent to servers with fewer connections, and keep it for future assignment.
lblcr (Locality Based Least Connection with Replication): Traffic for the same IP
address is sent to the server with least connections. If all the backing servers are
overloaded, it picks up one with fewer connections and adds it to the target set.
If the target set has not changed for the specified time, the server with the highest load
is removed from the set, in order to avoid a high degree of replication.
sh (Source Hashing): Traffic is sent to a backing server by looking up a statically
assigned hash table based on the source IP addresses.
dh (Destination Hashing): Traffic is sent to a backing server by looking up a
statically assigned hash table based on their destination addresses.
sed (Shortest Expected Delay): Traffic forwarded to a backing server with the shortest
expected delay. The expected delay is (C + 1) / U if sent to a server, where C is
the number of connections on the server and U is the fixed service rate (weight) of
the server.
nq (Never Queue): Traffic is sent to an idle server if there is one, instead of
waiting for a fast one; if all servers are busy, the algorithm falls back to the sed
behavior.
mh (Maglev Hashing): Assigns incoming jobs based on
Google's Maglev hashing algorithm,
This scheduler has two flags: mh-fallback, which enables fallback to a different
server if the selected server is unavailable, and mh-port, which adds the source port number to
the hash computation. When using mh, kube-proxy always sets the mh-port flag and does not
enable the mh-fallback flag.
In proxy-mode=ipvs mh will work as source-hashing (sh), but with ports.
These scheduling algorithms are configured through the
ipvs.scheduler
field in the kube-proxy configuration.
Note:
To run kube-proxy in IPVS mode, you must make IPVS available on
the node before starting kube-proxy.
When kube-proxy starts in IPVS proxy mode, it verifies whether IPVS
kernel modules are available. If the IPVS kernel modules are not detected, then kube-proxy
exits with an error.
Virtual IP address mechanism for Services, using IPVS mode
nftables proxy mode
FEATURE STATE:Kubernetes v1.33 [stable] (enabled by default: true)
This proxy mode is only available on Linux nodes, and requires kernel
5.13 or later.
In this mode, kube-proxy configures packet forwarding rules using the
nftables API of the kernel netfilter subsystem. For each endpoint, it
installs nftables rules which, by default, select a backend Pod at
random.
The nftables API is the successor to the iptables API and is designed
to provide better performance and scalability than iptables. The
nftables proxy mode is able to process changes to service endpoints
faster and more efficiently than the iptables mode, and is also able
to more efficiently process packets in the kernel (though this only
becomes noticeable in clusters with tens of thousands of services).
As of Kubernetes 1.33, the nftables mode is
still relatively new, and may not be compatible with all network
plugins; consult the documentation for your network plugin.
Migrating from iptables mode to nftables
Users who want to switch from the default iptables mode to the
nftables mode should be aware that some features work slightly
differently the nftables mode:
NodePort interfaces: In iptables mode, by default,
NodePort services
are reachable on all local IP addresses. This is usually not what
users want, so the nftables mode defaults to
--nodeport-addresses primary, meaning Services using type: NodePort are only
reachable on the node's primary IPv4 and/or IPv6 addresses. You can
override this by specifying an explicit value for that option:
e.g., --nodeport-addresses 0.0.0.0/0 to listen on all (local)
IPv4 IPs.
type: NodePortServices on 127.0.0.1: In iptables mode, if the
--nodeport-addresses range includes 127.0.0.1 (and the option
--iptables-localhost-nodeports false option is not passed), then
Services of type: NodePort are reachable even on "localhost" (127.0.0.1).
In nftables mode (and ipvs mode), this will not work. If you
are not sure if you are depending on this functionality, you can
check kube-proxy's
iptables_localhost_nodeports_accepted_packets_total metric; if it
is non-0, that means that some client has connected to a type: NodePort
Service via localhost/loopback.
NodePort interaction with firewalls: The iptables mode of
kube-proxy tries to be compatible with overly-agressive firewalls;
for each type: NodePort service, it will add rules to accept inbound
traffic on that port, in case that traffic would otherwise be
blocked by a firewall. This approach will not work with firewalls
based on nftables, so kube-proxy's nftables mode does not do
anything here; if you have a local firewall, you must ensure that
it is properly configured to allow Kubernetes traffic through
(e.g., by allowing inbound traffic on the entire NodePort range).
Conntrack bug workarounds: Linux kernels prior to 6.1 have a
bug that can result in long-lived TCP connections to service IPs
being closed with the error "Connection reset by peer". The
iptables mode of kube-proxy installs a workaround for this bug,
but this workaround was later found to cause other problems in some
clusters. The nftables mode does not install any workaround by
default, but you can check kube-proxy's
iptables_ct_state_invalid_dropped_packets_total metric to see if
your cluster is depending on the workaround, and if so, you can run
kube-proxy with the option --conntrack-tcp-be-liberal to work
around the problem in nftables mode.
kernelspace proxy mode
This proxy mode is only available on Windows nodes.
The kube-proxy configures packet filtering rules in the Windows Virtual Filtering Platform (VFP),
an extension to Windows vSwitch. These rules process encapsulated packets within the node-level
virtual networks, and rewrite packets so that the destination IP address (and layer 2 information)
is correct for getting the packet routed to the correct destination.
The Windows VFP is analogous to tools such as Linux nftables or iptables. The Windows VFP extends
the Hyper-V Switch, which was initially implemented to support virtual machine networking.
When a Pod on a node sends traffic to a virtual IP address, and the kube-proxy selects a Pod on
a different node as the load balancing target, the kernelspace proxy mode rewrites that packet
to be destined to the target backend Pod. The Windows Host Networking Service (HNS) ensures that
packet rewriting rules are configured so that the return traffic appears to come from the virtual
IP address and not the specific backend Pod.
Direct server return for kernelspace mode
FEATURE STATE:Kubernetes v1.14 [alpha]
As an alternative to the basic operation, a node that hosts the backend Pod for a Service can
apply the packet rewriting directly, rather than placing this burden on the node where the client
Pod is running. This is called direct server return.
To use this, you must run kube-proxy with the --enable-dsr command line argument and
enable the WinDSRfeature gate.
Direct server return also optimizes the case for Pod return traffic even when both Pods
are running on the same node.
Session affinity
In these proxy models, the traffic bound for the Service's IP:Port is
proxied to an appropriate backend without the clients knowing anything
about Kubernetes or Services or Pods.
If you want to make sure that connections from a particular client
are passed to the same Pod each time, you can select the session affinity based
on the client's IP addresses by setting .spec.sessionAffinity to ClientIP
for a Service (the default is None).
Session stickiness timeout
You can also set the maximum session sticky time by setting
.spec.sessionAffinityConfig.clientIP.timeoutSeconds appropriately for a Service.
(the default value is 10800, which works out to be 3 hours).
Note:
On Windows, setting the maximum session sticky time for Services is not supported.
IP address assignment to Services
Unlike Pod IP addresses, which actually route to a fixed destination,
Service IPs are not actually answered by a single host. Instead, kube-proxy
uses packet processing logic (such as Linux iptables) to define virtual IP
addresses which are transparently redirected as needed.
When clients connect to the VIP, their traffic is automatically transported to an
appropriate endpoint. The environment variables and DNS for Services are actually
populated in terms of the Service's virtual IP address (and port).
Avoiding collisions
One of the primary philosophies of Kubernetes is that you should not be
exposed to situations that could cause your actions to fail through no fault
of your own. For the design of the Service resource, this means not making
you choose your own IP address if that choice might collide with
someone else's choice. That is an isolation failure.
In order to allow you to choose an IP address for your Services, we must
ensure that no two Services can collide. Kubernetes does that by allocating each
Service its own IP address from within the service-cluster-ip-range
CIDR range that is configured for the API Server.
IP address allocation tracking
To ensure each Service receives a unique IP address, an internal allocator atomically
updates a global allocation map in etcd
prior to creating each Service. The map object must exist in the registry for
Services to get IP address assignments, otherwise creations will
fail with a message indicating an IP address could not be allocated.
In the control plane, a background controller is responsible for creating that
map (needed to support migrating from older versions of Kubernetes that used
in-memory locking). Kubernetes also uses controllers to check for invalid
assignments (for example: due to administrator intervention) and for cleaning up allocated
IP addresses that are no longer used by any Services.
IP address allocation tracking using the Kubernetes API
FEATURE STATE:Kubernetes v1.33 [stable] (enabled by default: true)
The control plane replaces the existing etcd allocator with a revised implementation
that uses IPAddress and ServiceCIDR objects instead of an internal global allocation map.
Each cluster IP address associated to a Service then references an IPAddress object.
Enabling the feature gate also replaces a background controller with an alternative
that handles the IPAddress objects and supports migration from the old allocator model.
Kubernetes 1.33 does not support migrating from IPAddress
objects to the internal allocation map.
One of the main benefits of the revised allocator is that it removes the size limitations
for the IP address range that can be used for the cluster IP address of Services.
With MultiCIDRServiceAllocator enabled, there are no limitations for IPv4, and for IPv6
you can use IP address netmasks that are a /64 or smaller (as opposed to /108 with the
legacy implementation).
Making IP address allocations available via the API means that you as a cluster administrator
can allow users to inspect the IP addresses assigned to their Services.
Kubernetes extensions, such as the Gateway API,
can use the IPAddress API to extend Kubernetes' inherent networking capabilities.
Here is a brief example of a user querying for IP addresses:
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 2001:db8:1:2::1 <none> 443/TCP 3d1h
kubectl get ipaddresses
NAME PARENTREF
2001:db8:1:2::1 services/default/kubernetes
2001:db8:1:2::a services/kube-system/kube-dns
Kubernetes also allow users to dynamically define the available IP ranges for Services using
ServiceCIDR objects. During bootstrap, a default ServiceCIDR object named kubernetes is created
from the value of the --service-cluster-ip-range command line argument to kube-apiserver:
kubectl get servicecidrs
NAME CIDRS AGE
kubernetes 10.96.0.0/28 17m
Users can create or delete new ServiceCIDR objects to manage the available IP ranges for Services:
NAME CIDRS AGE
kubernetes 10.96.0.0/28 17m
newservicecidr 10.96.0.0/24 7m
Distributions or administrators of Kubernetes clusters may want to control that
new Service CIDRs added to the cluster does not overlap with other networks on
the cluster, that only belong to a specific range of IPs or just simple retain
the existing behavior of only having one ServiceCIDR per cluster. An example of
a Validation Admission Policy to achieve this is:
IP address ranges for Service virtual IP addresses
FEATURE STATE:Kubernetes v1.26 [stable]
Kubernetes divides the ClusterIP range into two bands, based on
the size of the configured service-cluster-ip-range by using the following formula
min(max(16, cidrSize / 16), 256). That formula means the result is never less than 16 or
more than 256, with a graduated step function between them.
Kubernetes prefers to allocate dynamic IP addresses to Services by choosing from the upper band,
which means that if you want to assign a specific IP address to a type: ClusterIP
Service, you should manually assign an IP address from the lower band. That approach
reduces the risk of a conflict over allocation.
Traffic policies
You can set the .spec.internalTrafficPolicy and .spec.externalTrafficPolicy fields
to control how Kubernetes routes traffic to healthy (“ready”) backends.
Internal traffic policy
FEATURE STATE:Kubernetes v1.26 [stable]
You can set the .spec.internalTrafficPolicy field to control how traffic from
internal sources is routed. Valid values are Cluster and Local. Set the field to
Cluster to route internal traffic to all ready endpoints and Local to only route
to ready node-local endpoints. If the traffic policy is Local and there are no
node-local endpoints, traffic is dropped by kube-proxy.
External traffic policy
You can set the .spec.externalTrafficPolicy field to control how traffic from
external sources is routed. Valid values are Cluster and Local. Set the field
to Cluster to route external traffic to all ready endpoints and Local to only
route to ready node-local endpoints. If the traffic policy is Local and there
are no node-local endpoints, the kube-proxy does not forward any traffic for the
relevant Service.
If Cluster is specified, all nodes are eligible load balancing targets as long as
the node is not being deleted and kube-proxy is healthy. In this mode: load balancer
health checks are configured to target the service proxy's readiness port and path.
In the case of kube-proxy this evaluates to: ${NODE_IP}:10256/healthz. kube-proxy
will return either an HTTP code 200 or 503. kube-proxy's load balancer health check
endpoint returns 200 if:
kube-proxy is healthy, meaning:
it's able to progress programming the network and isn't timing out while doing
so (the timeout is defined to be: 2 × iptables.syncPeriod); and
the node is not being deleted (there is no deletion timestamp set for the Node).
kube-proxy returns 503 and marks the node as not
eligible when it's being deleted because it supports connection
draining for terminating nodes. A couple of important things occur from the point
of view of a Kubernetes-managed load balancer when a node is being / is deleted.
While deleting:
kube-proxy will start failing its readiness probe and essentially mark the
node as not eligible for load balancer traffic. The load balancer health
check failing causes load balancers which support connection draining to
allow existing connections to terminate, and block new connections from
establishing.
When deleted:
The service controller in the Kubernetes cloud controller manager removes the
node from the referenced set of eligible targets. Removing any instance from
the load balancer's set of backend targets immediately terminates all
connections. This is also the reason kube-proxy first fails the health check
while the node is deleting.
It's important to note for Kubernetes vendors that if any vendor configures the
kube-proxy readiness probe as a liveness probe: that kube-proxy will start
restarting continuously when a node is deleting until it has been fully deleted.
kube-proxy exposes a /livez path which, as opposed to the /healthz one, does
not consider the Node's deleting state and only its progress programming the
network. /livez is therefore the recommended path for anyone looking to define
a livenessProbe for kube-proxy.
Users deploying kube-proxy can inspect both the readiness / liveness state by
evaluating the metrics: proxy_livez_total / proxy_healthz_total. Both
metrics publish two series, one with the 200 label and one with the 503 one.
For Local Services: kube-proxy will return 200 if
kube-proxy is healthy/ready, and
has a local endpoint on the node in question.
Node deletion does not have an impact on kube-proxy's return
code for what concerns load balancer health checks. The reason for this is:
deleting nodes could end up causing an ingress outage should all endpoints
simultaneously be running on said nodes.
The Kubernetes project recommends that cloud provider integration code
configures load balancer health checks that target the service proxy's healthz
port. If you are using or implementing your own virtual IP implementation,
that people can use instead of kube-proxy, you should set up a similar health
checking port with logic that matches the kube-proxy implementation.
Traffic to terminating endpoints
FEATURE STATE:Kubernetes v1.28 [stable]
If the ProxyTerminatingEndpointsfeature gate
is enabled in kube-proxy and the traffic policy is Local, that node's
kube-proxy uses a more complicated algorithm to select endpoints for a Service.
With the feature enabled, kube-proxy checks if the node
has local endpoints and whether or not all the local endpoints are marked as terminating.
If there are local endpoints and all of them are terminating, then kube-proxy
will forward traffic to those terminating endpoints. Otherwise, kube-proxy will always
prefer forwarding traffic to endpoints that are not terminating.
This forwarding behavior for terminating endpoints exists to allow NodePort and LoadBalancer
Services to gracefully drain connections when using externalTrafficPolicy: Local.
As a deployment goes through a rolling update, nodes backing a load balancer may transition from
N to 0 replicas of that deployment. In some cases, external load balancers can send traffic to
a node with 0 replicas in between health check probes. Routing traffic to terminating endpoints
ensures that Nodes that are scaling down Pods can gracefully receive and drain traffic to
those terminating Pods. By the time the Pod completes termination, the external load balancer
should have seen the node's health check failing and fully removed the node from the backend
pool.
Traffic Distribution
FEATURE STATE:Kubernetes v1.33 [stable] (enabled by default: true)
The spec.trafficDistribution field within a Kubernetes Service allows you to
express preferences for how traffic should be routed to Service endpoints.
PreferClose
This prioritizes sending traffic to endpoints in the same zone as the client.
The EndpointSlice controller updates EndpointSlices with hints to
communicate this preference, which kube-proxy then uses for routing decisions.
If a client's zone does not have any available endpoints, traffic will be
routed cluster-wide for that client.
FEATURE STATE:Kubernetes v1.33 [alpha] (enabled by default: false)
Two additional values are available when the PreferSameTrafficDistributionfeature gate is
enabled:
PreferSameZone
This means the same thing as PreferClose, but is more explicit. (Originally,
the intention was that PreferClose might later include functionality other
than just "prefer same zone", but this is no longer planned. In the future,
PreferSameZone will be the recommended value to use for this functionality,
and PreferClose will be considered a deprecated alias for it.)
PreferSameNode
This prioritizes sending traffic to endpoints on the same node as the client.
As with PreferClose/PreferSameZone, the EndpointSlice controller updates
EndpointSlices with hints indicating that a slice should be used for a
particular node. If a client's node does not have any available endpoints,
then the service proxy will fall back to "same zone" behavior, or cluster-wide
if there are no same-zone endpoints either.
In the absence of any value for trafficDistribution, the default strategy is
to distribute traffic evenly to all endpoints in the cluster.
Comparison with service.kubernetes.io/topology-mode: Auto
The trafficDistribution field with PreferClose/PreferSameZone, and the older "Topology-Aware
Routing" feature using the service.kubernetes.io/topology-mode: Auto
annotation both aim to prioritize same-zone traffic. However, there is a key
difference in their approaches:
service.kubernetes.io/topology-mode: Auto attempts to distribute traffic
proportionally across zones based on allocatable CPU resources. This heuristic
includes safeguards (such as the fallback
behavior
for small numbers of endpoints), sacrificing some predictability in favor of
potentially better load balancing.
trafficDistribution: PreferClose aims to be simpler and more predictable:
"If there are endpoints in the zone, they will receive all traffic for that
zone, if there are no endpoints in a zone, the traffic will be distributed to
other zones". This approach offers more predictability, but it means that you
are responsible for avoiding endpoint
overload.
If the service.kubernetes.io/topology-mode annotation is set to Auto, it
will take precedence over trafficDistribution. The annotation may be deprecated
in the future in favor of the trafficDistribution field.
Interaction with Traffic Policies
When compared to the trafficDistribution field, the traffic policy fields
(externalTrafficPolicy and internalTrafficPolicy) are meant to offer a
stricter traffic locality requirements. Here's how trafficDistribution
interacts with them:
Precedence of Traffic Policies: For a given Service, if a traffic policy
(externalTrafficPolicy or internalTrafficPolicy) is set to Local, it
takes precedence over trafficDistribution for the corresponding
traffic type (external or internal, respectively).
trafficDistribution Influence: For a given Service, if a traffic policy
(externalTrafficPolicy or internalTrafficPolicy) is set to Cluster (the
default), or if the fields are not set, then trafficDistribution
guides the routing behavior for the corresponding traffic type
(external or internal, respectively). This means that an attempt will be made
to route traffic to an endpoint that is in the same zone as the client.
Considerations for using traffic distribution control
A Service using trafficDistribution will attempt to route traffic to (healthy)
endpoints within the appropriate topology, even if this means that some
endpoints receive much more traffic than other endpoints. If you do not have a
sufficient number of endpoints within the same topology ("same zone", "same
node", etc.) as the clients, then endpoints may become overloaded. This is
especially likely if incoming traffic is not proportionally distributed across
the topology. To mitigate this, consider the following strategies:
Zone-specific Deployments: If you are using "same zone" traffic
distribution, but expect to see different traffic patterns in
different zones, you can create a separate Deployment for each zone.
This approach allows the separate workloads to scale independently.
There are also workload management addons available from the
ecosystem, outside the Kubernetes project itself, that can help
here.
Kubeadm is a tool built to provide kubeadm init and kubeadm join as best-practice "fast paths" for creating Kubernetes clusters.
kubeadm performs the actions necessary to get a minimum viable cluster up and running. By design, it cares only about bootstrapping, not about provisioning machines. Likewise, installing various nice-to-have addons, like the Kubernetes Dashboard, monitoring solutions, and cloud-specific addons, is not in scope.
Instead, we expect higher-level and more tailored tooling to be built on top of kubeadm, and ideally, using kubeadm as the basis of all deployments will make it easier to create conformant clusters.
kubeadm alpha to preview a set of features made available for gathering feedback from the community
10.1.1 - Kubeadm Generated
10.1.1.1 -
kubeadm: easily bootstrap a secure Kubernetes cluster
Synopsis
┌──────────────────────────────────────────────────────────┐
│ KUBEADM │
│ Easily bootstrap a secure Kubernetes cluster │
│ │
│ Please give us feedback at: │
│ https://github.com/kubernetes/kubeadm/issues │
└──────────────────────────────────────────────────────────┘
Example usage:
Create a two-machine cluster with one control-plane node
(which controls the cluster), and one worker node
(where your workloads, like Pods and Deployments run).
┌──────────────────────────────────────────────────────────┐
│ On the first machine: │
├──────────────────────────────────────────────────────────┤
│ control-plane# kubeadm init │
└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ On the second machine: │
├──────────────────────────────────────────────────────────┤
│ worker# kubeadm join <arguments-returned-from-init> │
└──────────────────────────────────────────────────────────┘
You can then repeat the second step on as many other machines as you like.
Options
-h, --help
help for kubeadm
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2 -
Commands related to handling Kubernetes certificates
Synopsis
Commands related to handling Kubernetes certificates
kubeadm certs [flags]
Options
-h, --help
help for certs
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.1 -
Generate certificate keys
Synopsis
This command will print out a secure randomly-generated certificate key that can be used with
the "init" command.
You can also use "kubeadm init --upload-certs" without specifying a certificate key and it will
generate and print one for you.
kubeadm certs certificate-key [flags]
Options
-h, --help
help for certificate-key
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.2 -
Check certificates expiration for a Kubernetes cluster
Synopsis
Checks expiration for the certificates in the local PKI managed by kubeadm.
kubeadm certs check-expiration [flags]
Options
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
-o, --output string Default: "text"
Output format. One of: text|json|yaml|go-template|go-template-file|template|templatefile|jsonpath|jsonpath-as-json|jsonpath-file.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.3 -
Generate keys and certificate signing requests
Synopsis
Generates keys and certificate signing requests (CSRs) for all the certificates required to run the control plane. This command also generates partial kubeconfig files with private key data in the "users > user > client-key-data" field, and for each kubeconfig file an accompanying ".csr" file is created.
This command is designed for use in Kubeadm External CA Mode. It generates CSRs which you can then submit to your external certificate authority for signing.
The PEM encoded signed certificates should then be saved alongside the key files, using ".crt" as the file extension, or in the case of kubeconfig files, the PEM encoded signed certificate should be base64 encoded and added to the kubeconfig file in the "users > user > client-certificate-data" field.
kubeadm certs generate-csr [flags]
Examples
# The following command will generate keys and CSRs for all control-plane certificates and kubeconfig files:
kubeadm certs generate-csr --kubeconfig-dir /tmp/etc-k8s --cert-dir /tmp/etc-k8s/pki
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.4 -
Renew certificates for a Kubernetes cluster
Synopsis
Renew certificates for a Kubernetes cluster
kubeadm certs renew [flags]
Options
-h, --help
help for renew
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.5 -
Renew the certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself
Synopsis
Renew the certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.6 -
Renew all available certificates
Synopsis
Renew all known certificates necessary to run the control plane. Renewals are run unconditionally, regardless of expiration date. Renewals can also be run individually for more control.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.7 -
Renew the certificate the apiserver uses to access etcd
Synopsis
Renew the certificate the apiserver uses to access etcd.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.8 -
Renew the certificate for the API server to connect to kubelet
Synopsis
Renew the certificate for the API server to connect to kubelet.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.9 -
Renew the certificate for serving the Kubernetes API
Synopsis
Renew the certificate for serving the Kubernetes API.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.10 -
Renew the certificate embedded in the kubeconfig file for the controller manager to use
Synopsis
Renew the certificate embedded in the kubeconfig file for the controller manager to use.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.11 -
Renew the certificate for liveness probes to healthcheck etcd
Synopsis
Renew the certificate for liveness probes to healthcheck etcd.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.12 -
Renew the certificate for etcd nodes to communicate with each other
Synopsis
Renew the certificate for etcd nodes to communicate with each other.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.13 -
Renew the certificate for serving etcd
Synopsis
Renew the certificate for serving etcd.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.14 -
Renew the certificate for the front proxy client
Synopsis
Renew the certificate for the front proxy client.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.15 -
Renew the certificate embedded in the kubeconfig file for the scheduler manager to use
Synopsis
Renew the certificate embedded in the kubeconfig file for the scheduler manager to use.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.2.16 -
Renew the certificate embedded in the kubeconfig file for the super-admin
Synopsis
Renew the certificate embedded in the kubeconfig file for the super-admin.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.3 -
Output shell completion code for the specified shell (bash or zsh)
Synopsis
Output shell completion code for the specified shell (bash or zsh).
The shell code must be evaluated to provide interactive
completion of kubeadm commands. This can be done by sourcing it from
the .bash_profile.
Note: this requires the bash-completion framework.
To install it on Mac use homebrew:
$ brew install bash-completion
Once installed, bash_completion must be evaluated. This can be done by adding the
following line to the .bash_profile
$ source $(brew --prefix)/etc/bash_completion
If bash-completion is not installed on Linux, please install the 'bash-completion' package
via your distribution's package manager.
Note for zsh users: [1] zsh completions are only supported in versions of zsh >= 5.2
kubeadm completion SHELL [flags]
Examples
# Install bash completion on a Mac using homebrew
brew install bash-completion
printf "\n# Bash completion support\nsource $(brew --prefix)/etc/bash_completion\n" >> $HOME/.bash_profile
source $HOME/.bash_profile
# Load the kubeadm completion code for bash into the current shell
source <(kubeadm completion bash)
# Write bash completion code to a file and source it from .bash_profile
kubeadm completion bash > ~/.kube/kubeadm_completion.bash.inc
printf "\n# Kubeadm shell completion\nsource '$HOME/.kube/kubeadm_completion.bash.inc'\n" >> $HOME/.bash_profile
source $HOME/.bash_profile
# Load the kubeadm completion code for zsh[1] into the current shell
source <(kubeadm completion zsh)
Options
-h, --help
help for completion
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4 -
Manage configuration for a kubeadm cluster persisted in a ConfigMap in the cluster
Synopsis
There is a ConfigMap in the kube-system namespace called "kubeadm-config" that kubeadm uses to store internal configuration about the
cluster. kubeadm CLI v1.8.0+ automatically creates this ConfigMap with the config used with 'kubeadm init', but if you
initialized your cluster using kubeadm v1.7.x or lower, you must use the 'kubeadm init phase upload-config' command to
create this ConfigMap. This is required so that 'kubeadm upgrade' can configure your upgraded cluster correctly.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4.2 -
Print a list of images kubeadm will use. The configuration file is used in case any images or image repositories are customized
Synopsis
Print a list of images kubeadm will use. The configuration file is used in case any images or image repositories are customized
kubeadm config images list [flags]
Options
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
--config string
Path to a kubeadm configuration file.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4.3 -
Pull images used by kubeadm
Synopsis
Pull images used by kubeadm
kubeadm config images pull [flags]
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4.4 -
Read an older version of the kubeadm configuration API types from a file, and output the similar config object for the newer version
Synopsis
This command lets you convert configuration objects of older versions to the latest supported version,
locally in the CLI tool without ever touching anything in the cluster.
In this version of kubeadm, the following API versions are supported:
kubeadm.k8s.io/v1beta4
Further, kubeadm can only write out config of version "kubeadm.k8s.io/v1beta4", but read both types.
So regardless of what version you pass to the --old-config parameter here, the API object will be
read, deserialized, defaulted, converted, validated, and re-serialized when written to stdout or
--new-config if specified.
In other words, the output of this command is what kubeadm actually would read internally if you
submitted this file to "kubeadm init"
kubeadm config migrate [flags]
Options
--allow-experimental-api
Allow migration to experimental, unreleased APIs.
-h, --help
help for migrate
--new-config string
Path to the resulting equivalent kubeadm config file using the new API version. Optional, if not specified output will be sent to STDOUT.
--old-config string
Path to the kubeadm config file that is using an old API version and should be converted. This flag is mandatory.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4.6 -
Print default init configuration, that can be used for 'kubeadm init'
Synopsis
This command prints objects such as the default init configuration that is used for 'kubeadm init'.
Note that sensitive values like the Bootstrap Token fields are replaced with placeholder values like "abcdef.0123456789abcdef" in order to pass validation but
not perform the real computation for creating a token.
kubeadm config print init-defaults [flags]
Options
--component-configs strings
A comma-separated list for component config API objects to print the default values for. Available values: [KubeProxyConfiguration KubeletConfiguration]. If this flag is not set, no component configs will be printed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4.7 -
Print default join configuration, that can be used for 'kubeadm join'
Synopsis
This command prints objects such as the default join configuration that is used for 'kubeadm join'.
Note that sensitive values like the Bootstrap Token fields are replaced with placeholder values like "abcdef.0123456789abcdef" in order to pass validation but
not perform the real computation for creating a token.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4.8 -
Print default reset configuration, that can be used for 'kubeadm reset'
Synopsis
This command prints objects such as the default reset configuration that is used for 'kubeadm reset'.
Note that sensitive values like the Bootstrap Token fields are replaced with placeholder values like "abcdef.0123456789abcdef" in order to pass validation but
not perform the real computation for creating a token.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4.9 -
Print default upgrade configuration, that can be used for 'kubeadm upgrade'
Synopsis
This command prints objects such as the default upgrade configuration that is used for 'kubeadm upgrade'.
Note that sensitive values like the Bootstrap Token fields are replaced with placeholder values like "abcdef.0123456789abcdef" in order to pass validation but
not perform the real computation for creating a token.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.4.10 -
Read a file containing the kubeadm configuration API and report any validation problems
Synopsis
This command lets you validate a kubeadm configuration API file and report any warnings and errors.
If there are no errors the exit status will be zero, otherwise it will be non-zero.
Any unmarshaling problems such as unknown API fields will trigger errors. Unknown API versions and
fields with invalid values will also trigger errors. Any other errors or warnings may be reported
depending on contents of the input file.
In this version of kubeadm, the following API versions are supported:
kubeadm.k8s.io/v1beta4
kubeadm config validate [flags]
Options
--allow-experimental-api
Allow validation of experimental, unreleased APIs.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5 -
Run this command in order to set up the Kubernetes control plane
Synopsis
Run this command in order to set up the Kubernetes control plane
The "init" command executes the following phases:
preflight Run pre-flight checks
certs Certificate generation
/ca Generate the self-signed Kubernetes CA to provision identities for other Kubernetes components
/apiserver Generate the certificate for serving the Kubernetes API
/apiserver-kubelet-client Generate the certificate for the API server to connect to kubelet
/front-proxy-ca Generate the self-signed CA to provision identities for front proxy
/front-proxy-client Generate the certificate for the front proxy client
/etcd-ca Generate the self-signed CA to provision identities for etcd
/etcd-server Generate the certificate for serving etcd
/etcd-peer Generate the certificate for etcd nodes to communicate with each other
/etcd-healthcheck-client Generate the certificate for liveness probes to healthcheck etcd
/apiserver-etcd-client Generate the certificate the apiserver uses to access etcd
/sa Generate a private key for signing service account tokens along with its public key
kubeconfig Generate all kubeconfig files necessary to establish the control plane and the admin kubeconfig file
/admin Generate a kubeconfig file for the admin to use and for kubeadm itself
/super-admin Generate a kubeconfig file for the super-admin
/kubelet Generate a kubeconfig file for the kubelet to use *only* for cluster bootstrapping purposes
/controller-manager Generate a kubeconfig file for the controller manager to use
/scheduler Generate a kubeconfig file for the scheduler to use
etcd Generate static Pod manifest file for local etcd
/local Generate the static Pod manifest file for a local, single-node local etcd instance
control-plane Generate all static Pod manifest files necessary to establish the control plane
/apiserver Generates the kube-apiserver static Pod manifest
/controller-manager Generates the kube-controller-manager static Pod manifest
/scheduler Generates the kube-scheduler static Pod manifest
kubelet-start Write kubelet settings and (re)start the kubelet
upload-config Upload the kubeadm and kubelet configuration to a ConfigMap
/kubeadm Upload the kubeadm ClusterConfiguration to a ConfigMap
/kubelet Upload the kubelet component config to a ConfigMap
upload-certs Upload certificates to kubeadm-certs
mark-control-plane Mark a node as a control-plane
bootstrap-token Generates bootstrap tokens used to join a node to a cluster
kubelet-finalize Updates settings relevant to the kubelet after TLS bootstrap
/enable-client-cert-rotation Enable kubelet client certificate rotation
addon Install required addons for passing conformance tests
/coredns Install the CoreDNS addon to a Kubernetes cluster
/kube-proxy Install the kube-proxy addon to a Kubernetes cluster
show-join-command Show the join command for control-plane and worker node
kubeadm init [flags]
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
Port for the API Server to bind to.
--apiserver-cert-extra-sans strings
Optional extra Subject Alternative Names (SANs) to use for the API Server serving certificate. Can be both IP addresses and DNS names.
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--certificate-key string
Key used to encrypt the control-plane certificates in the kubeadm-certs Secret. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
-h, --help
help for init
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
Use alternative domain for services, e.g. "myorg.internal".
--skip-certificate-key-print
Don't print the key used to encrypt the control-plane certificates.
--skip-phases strings
List of phases to be skipped
--skip-token-print
Skip printing of the default bootstrap token generated by 'kubeadm init'.
--token string
The token to use for establishing bidirectional trust between nodes and control-plane nodes. The format is [a-z0-9]{6}.[a-z0-9]{16} - e.g. abcdef.0123456789abcdef
--token-ttl duration Default: 24h0m0s
The duration before the token is automatically deleted (e.g. 1s, 2m, 3h). If set to '0', the token will never expire
--upload-certs
Upload control-plane certificates to the kubeadm-certs Secret.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.1 -
Use this command to invoke single phase of the "init" workflow
Synopsis
Use this command to invoke single phase of the "init" workflow
kubeadm init phase [flags]
Options
-h, --help
help for phase
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.2 -
Install required addons for passing conformance tests
Synopsis
Install required addons for passing conformance tests
kubeadm init phase addon [flags]
Options
-h, --help
help for addon
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.3 -
Install all the addons
Synopsis
Install all the addons
kubeadm init phase addon all [flags]
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
Port for the API Server to bind to.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--dry-run
Don't apply any changes; just output what would be done.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
Use alternative domain for services, e.g. "myorg.internal".
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.4 -
Install the CoreDNS addon to a Kubernetes cluster
Synopsis
Install the CoreDNS addon components via the API server. Please note that although the DNS server is deployed, it will not be scheduled until CNI is installed.
kubeadm init phase addon coredns [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--print-manifest
Print the addon manifests to STDOUT instead of installing them
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
--print-manifest
Print the addon manifests to STDOUT instead of installing them
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.6 -
Generates bootstrap tokens used to join a node to a cluster
Synopsis
Bootstrap tokens are used for establishing bidirectional trust between a node joining the cluster and a control-plane node.
This command makes all the configurations required to make bootstrap tokens works and then creates an initial token.
kubeadm init phase bootstrap-token [flags]
Examples
# Make all the bootstrap token configurations and create an initial token, functionally
# equivalent to what generated by kubeadm init.
kubeadm init phase bootstrap-token
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--skip-token-print
Skip printing of the default bootstrap token generated by 'kubeadm init'.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.7 -
Certificate generation
Synopsis
Certificate generation
kubeadm init phase certs [flags]
Options
-h, --help
help for certs
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.8 -
Generate all certificates
Synopsis
Generate all certificates
kubeadm init phase certs all [flags]
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-cert-extra-sans strings
Optional extra Subject Alternative Names (SANs) to use for the API Server serving certificate. Can be both IP addresses and DNS names.
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for all
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for apiserver-etcd-client
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.10 -
Generate the certificate for the API server to connect to kubelet
Synopsis
Generate the certificate for the API server to connect to kubelet, and save them into apiserver-kubelet-client.crt and apiserver-kubelet-client.key files.
If both files already exist, kubeadm skips the generation step and existing files will be used.
Use alternative domain for services, e.g. "myorg.internal".
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.12 -
Generate the self-signed Kubernetes CA to provision identities for other Kubernetes components
Synopsis
Generate the self-signed Kubernetes CA to provision identities for other Kubernetes components, and save them into ca.crt and ca.key files.
If both files already exist, kubeadm skips the generation step and existing files will be used.
kubeadm init phase certs ca [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for ca
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.13 -
Generate the self-signed CA to provision identities for etcd
Synopsis
Generate the self-signed CA to provision identities for etcd, and save them into etcd/ca.crt and etcd/ca.key files.
If both files already exist, kubeadm skips the generation step and existing files will be used.
kubeadm init phase certs etcd-ca [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for etcd-ca
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.14 -
Generate the certificate for liveness probes to healthcheck etcd
Synopsis
Generate the certificate for liveness probes to healthcheck etcd, and save them into etcd/healthcheck-client.crt and etcd/healthcheck-client.key files.
If both files already exist, kubeadm skips the generation step and existing files will be used.
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for front-proxy-client
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.19 -
Generate a private key for signing service account tokens along with its public key
Synopsis
Generate the private key for signing service account tokens along with its public key, and save them into sa.key and sa.pub files.
If both files already exist, kubeadm skips the generation step and existing files will be used.
kubeadm init phase certs sa [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for sa
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.20 -
Generate all static Pod manifest files necessary to establish the control plane
Synopsis
Generate all static Pod manifest files necessary to establish the control plane
kubeadm init phase control-plane [flags]
Options
-h, --help
help for control-plane
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.21 -
Generate all static Pod manifest files
Synopsis
Generate all static Pod manifest files
kubeadm init phase control-plane all [flags]
Examples
# Generates all static Pod manifest files for control plane components,
# functionally equivalent to what is generated by kubeadm init.
kubeadm init phase control-plane all
# Generates all static Pod manifest files using options read from a configuration file.
kubeadm init phase control-plane all --config config.yaml
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
Port for the API Server to bind to.
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--dry-run
Don't apply any changes; just output what would be done.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.23 -
Generates the kube-controller-manager static Pod manifest
Synopsis
Generates the kube-controller-manager static Pod manifest
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.25 -
Generate static Pod manifest file for local etcd
Synopsis
Generate static Pod manifest file for local etcd
kubeadm init phase etcd [flags]
Options
-h, --help
help for etcd
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.26 -
Generate the static Pod manifest file for a local, single-node local etcd instance
Synopsis
Generate the static Pod manifest file for a local, single-node local etcd instance
kubeadm init phase etcd local [flags]
Examples
# Generates the static Pod manifest file for etcd, functionally
# equivalent to what is generated by kubeadm init.
kubeadm init phase etcd local
# Generates the static Pod manifest file for etcd using options
# read from a configuration file.
kubeadm init phase etcd local --config config.yaml
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
Choose a container registry to pull control plane images from
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.27 -
Generate all kubeconfig files necessary to establish the control plane and the admin kubeconfig file
Synopsis
Generate all kubeconfig files necessary to establish the control plane and the admin kubeconfig file
kubeadm init phase kubeconfig [flags]
Options
-h, --help
help for kubeconfig
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.28 -
Generate a kubeconfig file for the admin to use and for kubeadm itself
Synopsis
Generate the kubeconfig file for the admin and for kubeadm itself, and save it to admin.conf file.
kubeadm init phase kubeconfig admin [flags]
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
Port for the API Server to bind to.
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--dry-run
Don't apply any changes; just output what would be done.
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.31 -
Generate a kubeconfig file for the kubelet to use only for cluster bootstrapping purposes
Synopsis
Generate the kubeconfig file for the kubelet to use and save it to kubelet.conf file.
Please note that this should only be used for cluster bootstrapping purposes. After your control plane is up, you should request all kubelet credentials from the CSR API.
kubeadm init phase kubeconfig kubelet [flags]
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
Port for the API Server to bind to.
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--dry-run
Don't apply any changes; just output what would be done.
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for enable-client-cert-rotation
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.37 -
Write kubelet settings and (re)start the kubelet
Synopsis
Write a file with KubeletConfiguration and an environment file with node specific kubelet settings, and then (re)start kubelet.
kubeadm init phase kubelet-start [flags]
Examples
# Writes a dynamic environment file with kubelet flags from a InitConfiguration file.
kubeadm init phase kubelet-start --config config.yaml
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
Choose a container registry to pull control plane images from
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.38 -
Mark a node as a control-plane
Synopsis
Mark a node as a control-plane
kubeadm init phase mark-control-plane [flags]
Examples
# Applies control-plane label and taint to the current node, functionally equivalent to what executed by kubeadm init.
kubeadm init phase mark-control-plane --config config.yaml
# Applies control-plane label and taint to a specific node
kubeadm init phase mark-control-plane --node-name myNode
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for mark-control-plane
--node-name string
Specify the node name.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.39 -
Run pre-flight checks
Synopsis
Run pre-flight checks for kubeadm init.
kubeadm init phase preflight [flags]
Examples
# Run pre-flight checks for kubeadm init using a config file.
kubeadm init phase preflight --config kubeadm-config.yaml
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for preflight
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
Choose a container registry to pull control plane images from
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.40 -
Show the join command for control-plane and worker node
Synopsis
Show the join command for control-plane and worker node
kubeadm init phase show-join-command [flags]
Options
-h, --help
help for show-join-command
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.41 -
Upload certificates to kubeadm-certs
Synopsis
Upload control plane certificates to the kubeadm-certs Secret
kubeadm init phase upload-certs [flags]
Options
--certificate-key string
Key used to encrypt the control-plane certificates in the kubeadm-certs Secret. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--skip-certificate-key-print
Don't print the key used to encrypt the control-plane certificates.
--upload-certs
Upload control-plane certificates to the kubeadm-certs Secret.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.42 -
Upload the kubeadm and kubelet configuration to a ConfigMap
Synopsis
Upload the kubeadm and kubelet configuration to a ConfigMap
kubeadm init phase upload-config [flags]
Options
-h, --help
help for upload-config
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.43 -
Upload all configuration to a config map
Synopsis
Upload all configuration to a config map
kubeadm init phase upload-config all [flags]
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.44 -
Upload the kubeadm ClusterConfiguration to a ConfigMap
Synopsis
Upload the kubeadm ClusterConfiguration to a ConfigMap called kubeadm-config in the kube-system namespace. This enables correct configuration of system components and a seamless user experience when upgrading.
Alternatively, you can use kubeadm config.
kubeadm init phase upload-config kubeadm [flags]
Examples
# upload the configuration of your cluster
kubeadm init phase upload-config kubeadm --config=myConfig.yaml
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.5.45 -
Upload the kubelet component config to a ConfigMap
Synopsis
Upload the kubelet configuration extracted from the kubeadm InitConfiguration object to a kubelet-config ConfigMap in the cluster
kubeadm init phase upload-config kubelet [flags]
Examples
# Upload the kubelet configuration from the kubeadm Config file to a ConfigMap in the cluster.
kubeadm init phase upload-config kubelet --config kubeadm.yaml
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6 -
Run this on any machine you wish to join an existing cluster
Synopsis
When joining a kubeadm initialized cluster, we need to establish
bidirectional trust. This is split into discovery (having the Node
trust the Kubernetes Control Plane) and TLS bootstrap (having the
Kubernetes Control Plane trust the Node).
There are 2 main schemes for discovery. The first is to use a shared
token along with the IP address of the API server. The second is to
provide a file - a subset of the standard kubeconfig file. The
discovery/kubeconfig file supports token, client-go authentication
plugins ("exec"), "tokenFile", and "authProvider". This file can be a
local file or downloaded via an HTTPS URL. The forms are
kubeadm join --discovery-token abcdef.1234567890abcdef 1.2.3.4:6443,
kubeadm join --discovery-file path/to/file.conf, or kubeadm join
--discovery-file https://url/file.conf. Only one form can be used. If
the discovery information is loaded from a URL, HTTPS must be used.
Also, in that case the host installed CA bundle is used to verify
the connection.
If you use a shared token for discovery, you should also pass the
--discovery-token-ca-cert-hash flag to validate the public key of the
root certificate authority (CA) presented by the Kubernetes Control Plane.
The value of this flag is specified as "<hash-type>:<hex-encoded-value>",
where the supported hash type is "sha256". The hash is calculated over
the bytes of the Subject Public Key Info (SPKI) object (as in RFC7469).
This value is available in the output of "kubeadm init" or can be
calculated using standard tools. The --discovery-token-ca-cert-hash flag
may be repeated multiple times to allow more than one public key.
If you cannot know the CA public key hash ahead of time, you can pass
the --discovery-token-unsafe-skip-ca-verification flag to disable this
verification. This weakens the kubeadm security model since other nodes
can potentially impersonate the Kubernetes Control Plane.
The TLS bootstrap mechanism is also driven via a shared token. This is
used to temporarily authenticate with the Kubernetes Control Plane to submit a
certificate signing request (CSR) for a locally created key pair. By
default, kubeadm will set up the Kubernetes Control Plane to automatically
approve these signing requests. This token is passed in with the
--tls-bootstrap-token abcdef.1234567890abcdef flag.
Often times the same token is used for both parts. In this case, the
--token flag can be used instead of specifying each token individually.
The "join [api-server-endpoint]" command executes the following phases:
preflight Run join pre-flight checks
control-plane-prepare Prepare the machine for serving a control plane
/download-certs Download certificates shared among control-plane nodes from the kubeadm-certs Secret
/certs Generate the certificates for the new control plane components
/kubeconfig Generate the kubeconfig for the new control plane components
/control-plane Generate the manifests for the new control plane components
kubelet-start Write kubelet settings, certificates and (re)start the kubelet
control-plane-join Join a machine as a control plane instance
/etcd Add a new local etcd member
/mark-control-plane Mark a node as a control-plane
wait-control-plane EXPERIMENTAL: Wait for the control plane to start
kubeadm join [api-server-endpoint] [flags]
Options
--apiserver-advertise-address string
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
If the node should host a new control plane instance, the port for the API Server to bind to.
--certificate-key string
Use this key to decrypt the certificate secrets uploaded by init. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for join
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--skip-phases strings
List of phases to be skipped
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.1 -
Use this command to invoke single phase of the "join" workflow
Synopsis
Use this command to invoke single phase of the "join" workflow
kubeadm join phase [flags]
Options
-h, --help
help for phase
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.2 -
Join a machine as a control plane instance
Synopsis
Join a machine as a control plane instance
kubeadm join phase control-plane-join [flags]
Examples
# Joins a machine as a control plane instance
kubeadm join phase control-plane-join all
Options
-h, --help
help for control-plane-join
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.3 -
Join a machine as a control plane instance
Synopsis
Join a machine as a control plane instance
kubeadm join phase control-plane-join all [flags]
Options
--apiserver-advertise-address string
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for all
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for etcd
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Don't apply any changes; just output what would be done.
-h, --help
help for mark-control-plane
--node-name string
Specify the node name.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.6 -
Prepare the machine for serving a control plane
Synopsis
Prepare the machine for serving a control plane
kubeadm join phase control-plane-prepare [flags]
Examples
# Prepares the machine for serving a control plane
kubeadm join phase control-plane-prepare all
Options
-h, --help
help for control-plane-prepare
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.7 -
Prepare the machine for serving a control plane
Synopsis
Prepare the machine for serving a control plane
kubeadm join phase control-plane-prepare all [api-server-endpoint] [flags]
Options
--apiserver-advertise-address string
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
If the node should host a new control plane instance, the port for the API Server to bind to.
--certificate-key string
Use this key to decrypt the certificate secrets uploaded by init. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for all
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.8 -
Generate the certificates for the new control plane components
Synopsis
Generate the certificates for the new control plane components
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for certs
--node-name string
Specify the node name.
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.9 -
Generate the manifests for the new control plane components
Synopsis
Generate the manifests for the new control plane components
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
If the node should host a new control plane instance, the port for the API Server to bind to.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for control-plane
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.10 -
Download certificates shared among control-plane nodes from the kubeadm-certs Secret
Synopsis
Download certificates shared among control-plane nodes from the kubeadm-certs Secret
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for kubelet-start
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
# Run join pre-flight checks using a config file.
kubeadm join phase preflight --config kubeadm-config.yaml
Options
--apiserver-advertise-address string
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
If the node should host a new control plane instance, the port for the API Server to bind to.
--certificate-key string
Use this key to decrypt the certificate secrets uploaded by init. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for preflight
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
--node-name string
Specify the node name.
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.6.14 -
EXPERIMENTAL: Wait for the control plane to start
Synopsis
EXPERIMENTAL: Wait for the control plane to start
kubeadm join phase wait-control-plane [flags]
Options
-h, --help
help for wait-control-plane
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.7 -
Kubeconfig file utilities
Synopsis
Kubeconfig file utilities.
Options
-h, --help
help for kubeconfig
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.7.1 -
Output a kubeconfig file for an additional user
Synopsis
Output a kubeconfig file for an additional user.
kubeadm kubeconfig user [flags]
Examples
# Output a kubeconfig file for an additional user named foo
kubeadm kubeconfig user --client-name=foo
# Output a kubeconfig file for an additional user named foo using a kubeadm config file bar
kubeadm kubeconfig user --client-name=foo --config=bar
Options
--client-name string
The name of user. It will be used as the CN if client certificates are created
--config string
Path to a kubeadm configuration file.
-h, --help
help for user
--org strings
The organizations of the client certificate. It will be used as the O if client certificates are created
--token string
The token that should be used as the authentication mechanism for this kubeconfig, instead of client certificates
--validity-period duration Default: 8760h0m0s
The validity period of the client certificate. It is an offset from the current time.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.8 -
Performs a best effort revert of changes made to this host by 'kubeadm init' or 'kubeadm join'
Synopsis
Performs a best effort revert of changes made to this host by 'kubeadm init' or 'kubeadm join'
The "reset" command executes the following phases:
preflight Run reset pre-flight checks
remove-etcd-member Remove a local etcd member.
cleanup-node Run cleanup node.
kubeadm reset [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path to the directory where the certificates are stored. If specified, clean this directory.
--cleanup-tmp-dir
Cleanup the "/etc/kubernetes/tmp" directory
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
-f, --force
Reset the node without prompting for confirmation.
-h, --help
help for reset
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--skip-phases strings
List of phases to be skipped
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.8.1 -
Use this command to invoke single phase of the "reset" workflow
Synopsis
Use this command to invoke single phase of the "reset" workflow
kubeadm reset phase [flags]
Options
-h, --help
help for phase
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.8.2 -
Run cleanup node.
Synopsis
Run cleanup node.
kubeadm reset phase cleanup-node [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path to the directory where the certificates are stored. If specified, clean this directory.
--cleanup-tmp-dir
Cleanup the "/etc/kubernetes/tmp" directory
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for cleanup-node
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.8.3 -
Run reset pre-flight checks
Synopsis
Run pre-flight checks for kubeadm reset.
kubeadm reset phase preflight [flags]
Options
--dry-run
Don't apply any changes; just output what would be done.
-f, --force
Reset the node without prompting for confirmation.
-h, --help
help for preflight
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.8.4 -
Remove a local etcd member.
Synopsis
Remove a local etcd member for a control plane node.
kubeadm reset phase remove-etcd-member [flags]
Options
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.9 -
Manage bootstrap tokens
Synopsis
This command manages bootstrap tokens. It is optional and needed only for advanced use cases.
In short, bootstrap tokens are used for establishing bidirectional trust between a client and a server.
A bootstrap token can be used when a client (for example a node that is about to join the cluster) needs
to trust the server it is talking to. Then a bootstrap token with the "signing" usage can be used.
bootstrap tokens can also function as a way to allow short-lived authentication to the API Server
(the token serves as a way for the API Server to trust the client), for example for doing the TLS Bootstrap.
What is a bootstrap token more exactly?
It is a Secret in the kube-system namespace of type "bootstrap.kubernetes.io/token".
A bootstrap token must be of the form "[a-z0-9]{6}.[a-z0-9]{16}". The former part is the public token ID,
while the latter is the Token Secret and it must be kept private at all circumstances!
The name of the Secret must be named "bootstrap-token-(token-id)".
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.9.1 -
Create bootstrap tokens on the server
Synopsis
This command will create a bootstrap token for you.
You can specify the usages for this token, the "time to live" and an optional human friendly description.
The [token] is the actual token to write.
This should be a securely generated random token of the form "[a-z0-9]{6}.[a-z0-9]{16}".
If no [token] is given, kubeadm will generate a random token instead.
kubeadm token create [token]
Options
--certificate-key string
When used together with '--print-join-command', print the full 'kubeadm join' flag needed to join the cluster as a control-plane. To create a new certificate key you must use 'kubeadm init phase upload-certs --upload-certs'.
--config string
Path to a kubeadm configuration file.
--description string
A human friendly description of how this token is used.
Describes the ways in which this token can be used. You can pass --usages multiple times or provide a comma separated list of options. Valid options: [signing,authentication]
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.9.2 -
Delete bootstrap tokens on the server
Synopsis
This command will delete a list of bootstrap tokens for you.
The [token-value] is the full Token of the form "[a-z0-9]{6}.[a-z0-9]{16}" or the
Token ID of the form "[a-z0-9]{6}" to delete.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.9.3 -
Generate and print a bootstrap token, but do not create it on the server
Synopsis
This command will print out a randomly-generated bootstrap token that can be used with
the "init" and "join" commands.
You don't have to use this command in order to generate a token. You can do so
yourself as long as it is in the format "[a-z0-9]{6}.[a-z0-9]{16}". This
command is provided for convenience to generate tokens in the given format.
You can also use "kubeadm init" without specifying a token and it will
generate and print one for you.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.9.4 -
List bootstrap tokens on the server
Synopsis
This command will list all bootstrap tokens for you.
kubeadm token list [flags]
Options
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
-h, --help
help for list
-o, --output string Default: "text"
Output format. One of: text|json|yaml|go-template|go-template-file|template|templatefile|jsonpath|jsonpath-as-json|jsonpath-file.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10 -
Upgrade your cluster smoothly to a newer version with this command
Synopsis
Upgrade your cluster smoothly to a newer version with this command
kubeadm upgrade [flags]
Options
-h, --help
help for upgrade
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.1 -
Upgrade your Kubernetes cluster to the specified version
Synopsis
Upgrade your Kubernetes cluster to the specified version
The "apply [version]" command executes the following phases:
preflight Run preflight checks before upgrade
control-plane Upgrade the control plane
upload-config Upload the kubeadm and kubelet configurations to ConfigMaps
/kubeadm Upload the kubeadm ClusterConfiguration to a ConfigMap
/kubelet Upload the kubelet configuration to a ConfigMap
kubelet-config Upgrade the kubelet configuration for this node
bootstrap-token Configures bootstrap token and cluster-info RBAC rules
addon Upgrade the default kubeadm addons
/coredns Upgrade the CoreDNS addon
/kube-proxy Upgrade the kube-proxy addon
post-upgrade Run post upgrade tasks
kubeadm upgrade apply [version]
Options
--allow-experimental-upgrades
Show unstable versions of Kubernetes as an upgrade alternative and allow upgrading to an alpha/beta/release candidate versions of Kubernetes.
--allow-release-candidate-upgrades
Show release candidate versions of Kubernetes as an upgrade alternative and allow upgrading to a release candidate versions of Kubernetes.
--certificate-renewal Default: true
Perform the renewal of certificates used by component changed during upgrades.
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
--etcd-upgrade Default: true
Perform the upgrade of etcd.
-f, --force
Force upgrading although some requirements might not be met. This also implies non-interactive mode.
-h, --help
help for apply
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--print-config
Specifies whether the configuration file that will be used in the upgrade should be printed or not.
--skip-phases strings
List of phases to be skipped
-y, --yes
Perform the upgrade and do not prompt for confirmation (non-interactive mode).
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.2 -
Use this command to invoke single phase of the "apply" workflow
Synopsis
Use this command to invoke single phase of the "apply" workflow
kubeadm upgrade apply phase [flags]
Options
-h, --help
help for phase
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.3 -
Upgrade the default kubeadm addons
Synopsis
Upgrade the default kubeadm addons
kubeadm upgrade apply phase addon [flags]
Options
-h, --help
help for addon
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.4 -
Upgrade all the addons
Synopsis
Upgrade all the addons
kubeadm upgrade apply phase addon all [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.5 -
Upgrade the CoreDNS addon
Synopsis
Upgrade the CoreDNS addon
kubeadm upgrade apply phase addon coredns [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.7 -
Configures bootstrap token and cluster-info RBAC rules
Synopsis
Configures bootstrap token and cluster-info RBAC rules
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.8 -
Upgrade the control plane
Synopsis
Upgrade the control plane
kubeadm upgrade apply phase control-plane [flags]
Options
--certificate-renewal Default: true
Perform the renewal of certificates used by component changed during upgrades.
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.9 -
Upgrade the kubelet configuration for this node
Synopsis
Upgrade the kubelet configuration for this node by downloading it from the kubelet-config ConfigMap stored in the cluster
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.10 -
Run post upgrade tasks
Synopsis
Run post upgrade tasks
kubeadm upgrade apply phase post-upgrade [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.11 -
Run preflight checks before upgrade
Synopsis
Run preflight checks before upgrade
kubeadm upgrade apply phase preflight [flags]
Options
--allow-experimental-upgrades
Show unstable versions of Kubernetes as an upgrade alternative and allow upgrading to an alpha/beta/release candidate versions of Kubernetes.
--allow-release-candidate-upgrades
Show release candidate versions of Kubernetes as an upgrade alternative and allow upgrading to a release candidate versions of Kubernetes.
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
-f, --force
Force upgrading although some requirements might not be met. This also implies non-interactive mode.
-h, --help
help for preflight
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
-y, --yes
Perform the upgrade and do not prompt for confirmation (non-interactive mode).
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.12 -
Upload the kubeadm and kubelet configurations to ConfigMaps
Synopsis
Upload the kubeadm and kubelet configurations to ConfigMaps
kubeadm upgrade apply phase upload-config [flags]
Options
-h, --help
help for upload-config
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.13 -
Upload all the configurations to ConfigMaps
Synopsis
Upload all the configurations to ConfigMaps
kubeadm upgrade apply phase upload-config all [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.14 -
Upload the kubeadm ClusterConfiguration to a ConfigMap
Synopsis
Upload the kubeadm ClusterConfiguration to a ConfigMap
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.16 -
Show what differences would be applied to existing static pod manifests. See also: kubeadm upgrade apply --dry-run
Synopsis
Show what differences would be applied to existing static pod manifests. See also: kubeadm upgrade apply --dry-run
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.17 -
Upgrade commands for a node in the cluster
Synopsis
Upgrade commands for a node in the cluster
The "node" command executes the following phases:
preflight Run upgrade node pre-flight checks
control-plane Upgrade the control plane instance deployed on this node, if any
kubelet-config Upgrade the kubelet configuration for this node
addon Upgrade the default kubeadm addons
/coredns Upgrade the CoreDNS addon
/kube-proxy Upgrade the kube-proxy addon
post-upgrade Run post upgrade tasks
kubeadm upgrade node [flags]
Options
--certificate-renewal Default: true
Perform the renewal of certificates used by component changed during upgrades.
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
--etcd-upgrade Default: true
Perform the upgrade of etcd.
-h, --help
help for node
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--skip-phases strings
List of phases to be skipped
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.18 -
Use this command to invoke single phase of the "node" workflow
Synopsis
Use this command to invoke single phase of the "node" workflow
kubeadm upgrade node phase [flags]
Options
-h, --help
help for phase
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.19 -
Upgrade the default kubeadm addons
Synopsis
Upgrade the default kubeadm addons
kubeadm upgrade node phase addon [flags]
Options
-h, --help
help for addon
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.20 -
Upgrade all the addons
Synopsis
Upgrade all the addons
kubeadm upgrade node phase addon all [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.21 -
Upgrade the CoreDNS addon
Synopsis
Upgrade the CoreDNS addon
kubeadm upgrade node phase addon coredns [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.23 -
Upgrade the control plane instance deployed on this node, if any
Synopsis
Upgrade the control plane instance deployed on this node, if any
kubeadm upgrade node phase control-plane [flags]
Options
--certificate-renewal Default: true
Perform the renewal of certificates used by component changed during upgrades.
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.24 -
Upgrade the kubelet configuration for this node
Synopsis
Upgrade the kubelet configuration for this node by downloading it from the kubelet-config ConfigMap stored in the cluster
kubeadm upgrade node phase kubelet-config [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.25 -
Run post upgrade tasks
Synopsis
Run post upgrade tasks
kubeadm upgrade node phase post-upgrade [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.26 -
Run upgrade node pre-flight checks
Synopsis
Run pre-flight checks for kubeadm upgrade node.
kubeadm upgrade node phase preflight [flags]
Options
--config string
Path to a kubeadm configuration file.
-h, --help
help for preflight
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.10.27 -
Check which versions are available to upgrade to and validate whether your current cluster is upgradeable.
Synopsis
Check which versions are available to upgrade to and validate whether your current cluster is upgradeable. This command can only run on the control plane nodes where the kubeconfig file "admin.conf" exists. To skip the internet check, pass in the optional [version] parameter.
kubeadm upgrade plan [version] [flags]
Options
--allow-experimental-upgrades
Show unstable versions of Kubernetes as an upgrade alternative and allow upgrading to an alpha/beta/release candidate versions of Kubernetes.
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
--allow-release-candidate-upgrades
Show release candidate versions of Kubernetes as an upgrade alternative and allow upgrading to a release candidate versions of Kubernetes.
--config string
Path to a kubeadm configuration file.
--etcd-upgrade Default: true
Perform the upgrade of etcd.
-h, --help
help for plan
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
-o, --output string Default: "text"
Output format. One of: text|json|yaml|go-template|go-template-file|template|templatefile|jsonpath|jsonpath-as-json|jsonpath-file.
--print-config
Specifies whether the configuration file that will be used in the upgrade should be printed or not.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.11 -
Print the version of kubeadm
Synopsis
Print the version of kubeadm
kubeadm version [flags]
Options
-h, --help
help for version
-o, --output string
Output format; available options are 'yaml', 'json' and 'short'
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.1.12 -
All files in this directory are auto-generated from other repos. Do not edit them manually. You must edit them in their upstream repo.
10.1.2 - kubeadm init
This command initializes a Kubernetes control plane node.
Run this command in order to set up the Kubernetes control plane
Synopsis
Run this command in order to set up the Kubernetes control plane
The "init" command executes the following phases:
preflight Run pre-flight checks
certs Certificate generation
/ca Generate the self-signed Kubernetes CA to provision identities for other Kubernetes components
/apiserver Generate the certificate for serving the Kubernetes API
/apiserver-kubelet-client Generate the certificate for the API server to connect to kubelet
/front-proxy-ca Generate the self-signed CA to provision identities for front proxy
/front-proxy-client Generate the certificate for the front proxy client
/etcd-ca Generate the self-signed CA to provision identities for etcd
/etcd-server Generate the certificate for serving etcd
/etcd-peer Generate the certificate for etcd nodes to communicate with each other
/etcd-healthcheck-client Generate the certificate for liveness probes to healthcheck etcd
/apiserver-etcd-client Generate the certificate the apiserver uses to access etcd
/sa Generate a private key for signing service account tokens along with its public key
kubeconfig Generate all kubeconfig files necessary to establish the control plane and the admin kubeconfig file
/admin Generate a kubeconfig file for the admin to use and for kubeadm itself
/super-admin Generate a kubeconfig file for the super-admin
/kubelet Generate a kubeconfig file for the kubelet to use *only* for cluster bootstrapping purposes
/controller-manager Generate a kubeconfig file for the controller manager to use
/scheduler Generate a kubeconfig file for the scheduler to use
etcd Generate static Pod manifest file for local etcd
/local Generate the static Pod manifest file for a local, single-node local etcd instance
control-plane Generate all static Pod manifest files necessary to establish the control plane
/apiserver Generates the kube-apiserver static Pod manifest
/controller-manager Generates the kube-controller-manager static Pod manifest
/scheduler Generates the kube-scheduler static Pod manifest
kubelet-start Write kubelet settings and (re)start the kubelet
upload-config Upload the kubeadm and kubelet configuration to a ConfigMap
/kubeadm Upload the kubeadm ClusterConfiguration to a ConfigMap
/kubelet Upload the kubelet component config to a ConfigMap
upload-certs Upload certificates to kubeadm-certs
mark-control-plane Mark a node as a control-plane
bootstrap-token Generates bootstrap tokens used to join a node to a cluster
kubelet-finalize Updates settings relevant to the kubelet after TLS bootstrap
/enable-client-cert-rotation Enable kubelet client certificate rotation
addon Install required addons for passing conformance tests
/coredns Install the CoreDNS addon to a Kubernetes cluster
/kube-proxy Install the kube-proxy addon to a Kubernetes cluster
show-join-command Show the join command for control-plane and worker node
kubeadm init [flags]
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
Port for the API Server to bind to.
--apiserver-cert-extra-sans strings
Optional extra Subject Alternative Names (SANs) to use for the API Server serving certificate. Can be both IP addresses and DNS names.
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--certificate-key string
Key used to encrypt the control-plane certificates in the kubeadm-certs Secret. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
-h, --help
help for init
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
Use alternative domain for services, e.g. "myorg.internal".
--skip-certificate-key-print
Don't print the key used to encrypt the control-plane certificates.
--skip-phases strings
List of phases to be skipped
--skip-token-print
Skip printing of the default bootstrap token generated by 'kubeadm init'.
--token string
The token to use for establishing bidirectional trust between nodes and control-plane nodes. The format is [a-z0-9]{6}.[a-z0-9]{16} - e.g. abcdef.0123456789abcdef
--token-ttl duration Default: 24h0m0s
The duration before the token is automatically deleted (e.g. 1s, 2m, 3h). If set to '0', the token will never expire
--upload-certs
Upload control-plane certificates to the kubeadm-certs Secret.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Init workflow
kubeadm init bootstraps a Kubernetes control plane node by executing the
following steps:
Runs a series of pre-flight checks to validate the system state
before making changes. Some checks only trigger warnings, others are
considered errors and will exit kubeadm until the problem is corrected or the
user specifies --ignore-preflight-errors=<list-of-errors>.
Generates a self-signed CA to set up identities for each component in the cluster. The user can provide their
own CA cert and/or key by dropping it in the cert directory configured via --cert-dir
(/etc/kubernetes/pki by default).
The API server certs will have additional SAN entries for any --apiserver-cert-extra-sans
arguments, lowercased if necessary.
Writes kubeconfig files in /etc/kubernetes/ for the kubelet, the controller-manager, and the
scheduler to connect to the API server, each with its own identity. Also
additional kubeconfig files are written, for kubeadm as administrative entity (admin.conf)
and for a super admin user that can bypass RBAC (super-admin.conf).
Generates static Pod manifests for the API server,
controller-manager and scheduler. In case an external etcd is not provided,
an additional static Pod manifest is generated for etcd.
Static Pod manifests are written to /etc/kubernetes/manifests; the kubelet
watches this directory for Pods to create on startup.
Once control plane Pods are up and running, the kubeadm init sequence can continue.
Apply labels and taints to the control plane node so that no additional workloads will
run there.
Generates the token that additional nodes can use to register
themselves with a control plane in the future. Optionally, the user can provide a
token via --token, as described in the
kubeadm token documents.
Makes all the necessary configurations for allowing node joining with the
Bootstrap Tokens and
TLS Bootstrap
mechanism:
Write a ConfigMap for making available all the information required
for joining, and set up related RBAC access rules.
Installs a DNS server (CoreDNS) and the kube-proxy addon components via the API server.
In Kubernetes version 1.11 and later CoreDNS is the default DNS server.
Please note that although the DNS server is deployed, it will not be scheduled until CNI is installed.
Warning:
kube-dns usage with kubeadm is deprecated as of v1.18 and is removed in v1.21.
Using init phases with kubeadm
kubeadm allows you to create a control plane node in phases using the kubeadm init phase command.
To view the ordered list of phases and sub-phases you can call kubeadm init --help. The list
will be located at the top of the help screen and each phase will have a description next to it.
Note that by calling kubeadm init all of the phases and sub-phases will be executed in this exact order.
Some phases have unique flags, so if you want to have a look at the list of available options add
--help, for example:
You can also use --help to see the list of sub-phases for a certain parent phase:
sudo kubeadm init phase control-plane --help
kubeadm init also exposes a flag called --skip-phases that can be used to skip certain phases.
The flag accepts a list of phase names and the names can be taken from the above ordered list.
An example:
sudo kubeadm init phase control-plane all --config=configfile.yaml
sudo kubeadm init phase etcd local --config=configfile.yaml
# you can now modify the control plane and etcd manifest filessudo kubeadm init --skip-phases=control-plane,etcd --config=configfile.yaml
What this example would do is write the manifest files for the control plane and etcd in
/etc/kubernetes/manifests based on the configuration in configfile.yaml. This allows you to
modify the files and then skip these phases using --skip-phases. By calling the last command you
will create a control plane node with the custom manifest files.
FEATURE STATE:Kubernetes v1.22 [beta]
Alternatively, you can use the skipPhases field under InitConfiguration.
Using kubeadm init with a configuration file
Caution:
The configuration file is still considered beta and may change in future versions.
It's possible to configure kubeadm init with a configuration file instead of command
line flags, and some more advanced features may only be available as
configuration file options. This file is passed using the --config flag and it must
contain a ClusterConfiguration structure and optionally more structures separated by ---\n.
Mixing --config with others flags may not be allowed in some cases.
The default configuration can be printed out using the
kubeadm config print command.
If your configuration is not using the latest version it is recommended that you migrate using
the kubeadm config migrate command.
For more information on the fields and usage of the configuration you can navigate to our
API reference page.
Using kubeadm init with feature gates
kubeadm supports a set of feature gates that are unique to kubeadm and can only be applied
during cluster creation with kubeadm init. These features can control the behavior
of the cluster. Feature gates are removed after a feature graduates to GA.
To pass a feature gate you can either use the --feature-gates flag for
kubeadm init, or you can add items into the featureGates field when you pass
a configuration file
using --config.
Once a feature gate goes GA its value becomes locked to true by default.
Feature gate descriptions:
ControlPlaneKubeletLocalMode
With this feature gate enabled, when joining a new control plane node, kubeadm will configure the kubelet
to connect to the local kube-apiserver. This ensures that there will not be a violation of the version skew
policy during rolling upgrades.
NodeLocalCRISocket
With this feature gate enabled, kubeadm will read/write the CRI socket for each node from/to the file
/var/lib/kubelet/instance-config.yaml instead of reading/writing it from/to the annotation
kubeadm.alpha.kubernetes.io/cri-socket on the Node object. The new file is applied as an instance
configuration patch, before any other user managed patches are applied when the --patches flag
is used. It contains a single field containerRuntimeEndpoint from the
KubeletConfiguration file format. If the feature gate
is enabled during upgrade, but the file /var/lib/kubelet/instance-config.yaml does not exist yet,
kubeadm will attempt to read the CRI socket value from the file /var/lib/kubelet/kubeadm-flags.env.
WaitForAllControlPlaneComponents
With this feature gate enabled, kubeadm will wait for all control plane components (kube-apiserver,
kube-controller-manager, kube-scheduler) on a control plane node to report status 200 on their /livez
or /healthz endpoints. These checks are performed on https://ADDRESS:PORT/ENDPOINT.
PORT is taken from --secure-port of a component.
ADDRESS is --advertise-address for kube-apiserver and --bind-address for the
kube-controller-manager and kube-scheduler.
ENDPOINT is only /healthz for kube-controller-manager until it supports /livez as well.
If you specify custom ADDRESS or PORT in the kubeadm configuration they will be respected.
Without the feature gate enabled, kubeadm will only wait for the kube-apiserver
on a control plane node to become ready. The wait process starts right after the kubelet on the host
is started by kubeadm. You are advised to enable this feature gate in case you wish to observe a ready
state from all control plane components during the kubeadm init or kubeadm join command execution.
List of deprecated feature gates:
kubeadm deprecated feature gates
Feature
Default
Alpha
Beta
GA
Deprecated
PublicKeysECDSA
false
1.19
-
-
1.31
RootlessControlPlane
false
1.22
-
-
1.31
Feature gate descriptions:
PublicKeysECDSA
Can be used to create a cluster that uses ECDSA certificates instead of the default RSA algorithm.
Renewal of existing ECDSA certificates is also supported using kubeadm certs renew, but you cannot
switch between the RSA and ECDSA algorithms on the fly or during upgrades. Kubernetes versions before v1.31
had a bug where keys in generated kubeconfig files were set use RSA, even when you had enabled the
PublicKeysECDSA feature gate. This feature gate is deprecated in favor of the encryptionAlgorithm
functionality available in kubeadm v1beta4.
RootlessControlPlane
Setting this flag configures the kubeadm deployed control plane component static Pod containers
for kube-apiserver, kube-controller-manager, kube-scheduler and etcd to run as non-root users.
If the flag is not set, those components run as root. You can change the value of this feature gate before
you upgrade to a newer version of Kubernetes.
List of removed feature gates:
kubeadm removed feature gates
Feature
Alpha
Beta
GA
Removed
EtcdLearnerMode
1.27
1.29
1.32
1.33
IPv6DualStack
1.16
1.21
1.23
1.24
UnversionedKubeletConfigMap
1.22
1.23
1.25
1.26
UpgradeAddonsBeforeControlPlane
1.28
-
-
1.31
Feature gate descriptions:
EtcdLearnerMode
When joining a new control plane node, a new etcd member will be created
as a learner and promoted to a voting member only after the etcd data are fully aligned.
IPv6DualStack
This flag helps to configure components dual stack when the feature is in progress. For more details on Kubernetes
dual-stack support see Dual-stack support with kubeadm.
UnversionedKubeletConfigMap
This flag controls the name of the ConfigMap where kubeadm stores
kubelet configuration data. With this flag not specified or set to true, the ConfigMap is named kubelet-config.
If you set this flag to false, the name of the ConfigMap includes the major and minor version for Kubernetes
(for example: kubelet-config-1.33). Kubeadm ensures that RBAC rules for reading and writing
that ConfigMap are appropriate for the value you set. When kubeadm writes this ConfigMap (during kubeadm init
or kubeadm upgrade apply), kubeadm respects the value of UnversionedKubeletConfigMap. When reading that ConfigMap
(during kubeadm join, kubeadm reset, kubeadm upgrade...), kubeadm attempts to use unversioned ConfigMap name first.
If that does not succeed, kubeadm falls back to using the legacy (versioned) name for that ConfigMap.
UpgradeAddonsBeforeControlPlane
This feature gate has been removed. It was introduced in v1.28 as a deprecated feature and then removed in v1.31.
For documentation on older versions, please switch to the corresponding website version.
Adding kube-proxy parameters
For information about kube-proxy parameters in the kubeadm configuration see:
For running kubeadm without an Internet connection you have to pre-pull the required control plane images.
You can list and pull the images using the kubeadm config images sub-command:
kubeadm config images list
kubeadm config images pull
You can pass --config to the above commands with a kubeadm configuration file
to control the kubernetesVersion and imageRepository fields.
All default registry.k8s.io images that kubeadm requires support multiple architectures.
Using custom images
By default, kubeadm pulls images from registry.k8s.io. If the
requested Kubernetes version is a CI label (such as ci/latest)
gcr.io/k8s-staging-ci-images is used.
To provide kubernetesVersion which affects the version of the images.
To provide an alternative imageRepository to be used instead of
registry.k8s.io.
To provide a specific imageRepository and imageTag for etcd or CoreDNS.
Image paths between the default registry.k8s.io and a custom repository specified using
imageRepository may differ for backwards compatibility reasons. For example,
one image might have a subpath at registry.k8s.io/subpath/image, but be defaulted
to my.customrepository.io/image when using a custom repository.
To ensure you push the images to your custom repository in paths that kubeadm
can consume, you must:
Pull images from the defaults paths at registry.k8s.io using kubeadm config images {list|pull}.
Push images to the paths from kubeadm config images list --config=config.yaml,
where config.yaml contains the custom imageRepository, and/or imageTag for etcd and CoreDNS.
Pass the same config.yaml to kubeadm init.
Custom sandbox (pause) images
To set a custom image for these you need to configure this in your
container runtime to use the image.
Consult the documentation for your container runtime to find out how to change this setting;
for selected container runtimes, you can also find advice within the
Container Runtimes topic.
Uploading control plane certificates to the cluster
By adding the flag --upload-certs to kubeadm init you can temporary upload
the control plane certificates to a Secret in the cluster. Please note that this Secret
will expire automatically after 2 hours. The certificates are encrypted using
a 32byte key that can be specified using --certificate-key. The same key can be used
to download the certificates when additional control plane nodes are joining, by passing
--control-plane and --certificate-key to kubeadm join.
The following phase command can be used to re-upload the certificates after expiration:
A predefined certificateKey can be provided in InitConfiguration when passing the
configuration file with --config.
If a predefined certificate key is not passed to kubeadm init and
kubeadm init phase upload-certs a new key will be generated automatically.
The following command can be used to generate a new key on demand:
kubeadm certs certificate-key
Certificate management with kubeadm
For detailed information on certificate management with kubeadm see
Certificate Management with kubeadm.
The document includes information about using external CA, custom certificates
and certificate renewal.
Managing the kubeadm drop-in file for the kubelet
The kubeadm package ships with a configuration file for running the kubelet by systemd.
Note that the kubeadm CLI never touches this drop-in file. This drop-in file is part of the kubeadm
DEB/RPM package.
By default, kubeadm attempts to detect your container runtime. For more details on this detection,
see the kubeadm CRI installation guide.
Setting the node name
By default, kubeadm assigns a node name based on a machine's host address.
You can override this setting with the --node-name flag.
The flag passes the appropriate --hostname-override
value to the kubelet.
Rather than copying the token you obtained from kubeadm init to each node, as
in the basic kubeadm tutorial,
you can parallelize the token distribution for easier automation. To implement this automation,
you must know the IP address that the control plane node will have after it is started, or use a
DNS name or an address of a load balancer.
Generate a token. This token must have the form <6 character string>.<16 character string>.
More formally, it must match the regex: [a-z0-9]{6}\.[a-z0-9]{16}.
kubeadm can generate a token for you:
kubeadm token generate
Start both the control plane node and the worker nodes concurrently with this token.
As they come up they should find each other and form the cluster. The same
--token argument can be used on both kubeadm init and kubeadm join.
Similar can be done for --certificate-key when joining additional control plane
nodes. The key can be generated using:
kubeadm certs certificate-key
Once the cluster is up, you can use the /etc/kubernetes/admin.conf file from
a control plane node to talk to the cluster with administrator credentials or
Generating kubeconfig files for additional users.
Note that this style of bootstrap has some relaxed security guarantees because
it does not allow the root CA hash to be validated with
--discovery-token-ca-cert-hash (since it's not generated when the nodes are provisioned).
For details, see the kubeadm join.
kubeadm join to bootstrap a Kubernetes
worker node and join it to the cluster
kubeadm upgrade to upgrade a Kubernetes
cluster to a newer version
kubeadm reset to revert any changes made
to this host by kubeadm init or kubeadm join
10.1.3 - kubeadm join
This command initializes a new Kubernetes node and joins it to the existing cluster.
Run this on any machine you wish to join an existing cluster
Synopsis
When joining a kubeadm initialized cluster, we need to establish
bidirectional trust. This is split into discovery (having the Node
trust the Kubernetes Control Plane) and TLS bootstrap (having the
Kubernetes Control Plane trust the Node).
There are 2 main schemes for discovery. The first is to use a shared
token along with the IP address of the API server. The second is to
provide a file - a subset of the standard kubeconfig file. The
discovery/kubeconfig file supports token, client-go authentication
plugins ("exec"), "tokenFile", and "authProvider". This file can be a
local file or downloaded via an HTTPS URL. The forms are
kubeadm join --discovery-token abcdef.1234567890abcdef 1.2.3.4:6443,
kubeadm join --discovery-file path/to/file.conf, or kubeadm join
--discovery-file https://url/file.conf. Only one form can be used. If
the discovery information is loaded from a URL, HTTPS must be used.
Also, in that case the host installed CA bundle is used to verify
the connection.
If you use a shared token for discovery, you should also pass the
--discovery-token-ca-cert-hash flag to validate the public key of the
root certificate authority (CA) presented by the Kubernetes Control Plane.
The value of this flag is specified as "<hash-type>:<hex-encoded-value>",
where the supported hash type is "sha256". The hash is calculated over
the bytes of the Subject Public Key Info (SPKI) object (as in RFC7469).
This value is available in the output of "kubeadm init" or can be
calculated using standard tools. The --discovery-token-ca-cert-hash flag
may be repeated multiple times to allow more than one public key.
If you cannot know the CA public key hash ahead of time, you can pass
the --discovery-token-unsafe-skip-ca-verification flag to disable this
verification. This weakens the kubeadm security model since other nodes
can potentially impersonate the Kubernetes Control Plane.
The TLS bootstrap mechanism is also driven via a shared token. This is
used to temporarily authenticate with the Kubernetes Control Plane to submit a
certificate signing request (CSR) for a locally created key pair. By
default, kubeadm will set up the Kubernetes Control Plane to automatically
approve these signing requests. This token is passed in with the
--tls-bootstrap-token abcdef.1234567890abcdef flag.
Often times the same token is used for both parts. In this case, the
--token flag can be used instead of specifying each token individually.
The "join [api-server-endpoint]" command executes the following phases:
preflight Run join pre-flight checks
control-plane-prepare Prepare the machine for serving a control plane
/download-certs Download certificates shared among control-plane nodes from the kubeadm-certs Secret
/certs Generate the certificates for the new control plane components
/kubeconfig Generate the kubeconfig for the new control plane components
/control-plane Generate the manifests for the new control plane components
kubelet-start Write kubelet settings, certificates and (re)start the kubelet
control-plane-join Join a machine as a control plane instance
/etcd Add a new local etcd member
/mark-control-plane Mark a node as a control-plane
wait-control-plane EXPERIMENTAL: Wait for the control plane to start
kubeadm join [api-server-endpoint] [flags]
Options
--apiserver-advertise-address string
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
If the node should host a new control plane instance, the port for the API Server to bind to.
--certificate-key string
Use this key to decrypt the certificate secrets uploaded by init. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for join
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--skip-phases strings
List of phases to be skipped
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
The join workflow
kubeadm join bootstraps a Kubernetes worker node or a control-plane node and adds it to the cluster.
This action consists of the following steps for worker nodes:
kubeadm downloads necessary cluster information from the API server.
By default, it uses the bootstrap token and the CA key hash to verify the
authenticity of that data. The root CA can also be discovered directly via a
file or URL.
Once the cluster information is known, kubelet can start the TLS bootstrapping
process.
The TLS bootstrap uses the shared token to temporarily authenticate
with the Kubernetes API server to submit a certificate signing request (CSR); by
default the control plane signs this CSR request automatically.
Finally, kubeadm configures the local kubelet to connect to the API
server with the definitive identity assigned to the node.
For control-plane nodes additional steps are performed:
Downloading certificates shared among control-plane nodes from the cluster
(if explicitly requested by the user).
Generating control-plane component manifests, certificates and kubeconfig.
Adding new local etcd member.
Using join phases with kubeadm
Kubeadm allows you join a node to the cluster in phases using kubeadm join phase.
To view the ordered list of phases and sub-phases you can call kubeadm join --help. The list will be located
at the top of the help screen and each phase will have a description next to it.
Note that by calling kubeadm join all of the phases and sub-phases will be executed in this exact order.
Some phases have unique flags, so if you want to have a look at the list of available options add --help, for example:
kubeadm join phase kubelet-start --help
Similar to the kubeadm init phase
command, kubeadm join phase allows you to skip a list of phases using the --skip-phases flag.
Alternatively, you can use the skipPhases field in JoinConfiguration.
Discovering what cluster CA to trust
The kubeadm discovery has several options, each with security tradeoffs.
The right method for your environment depends on how you provision nodes and the
security expectations you have about your network and node lifecycles.
Token-based discovery with CA pinning
This is the default mode in kubeadm. In this mode, kubeadm downloads
the cluster configuration (including root CA) and validates it using the token
as well as validating that the root CA public key matches the provided hash and
that the API server certificate is valid under the root CA.
The CA key hash has the format sha256:<hex_encoded_hash>.
By default, the hash value is printed at the end of the kubeadm init command or
in the output from the kubeadm token create --print-join-command command.
It is in a standard format (see RFC7469)
and can also be calculated by 3rd party tools or provisioning systems.
For example, using the OpenSSL CLI:
You can also call join for a control-plane node with --certificate-key to copy certificates to this node,
if the kubeadm init command was called with --upload-certs.
Advantages:
Allows bootstrapping nodes to securely discover a root of trust for the
control-plane node even if other worker nodes or the network are compromised.
Convenient to execute manually since all of the information required fits
into a single kubeadm join command.
Disadvantages:
The CA hash is not normally known until the control-plane node has been provisioned,
which can make it more difficult to build automated provisioning tools that
use kubeadm. By generating your CA in beforehand, you may workaround this
limitation.
Token-based discovery without CA pinning
This mode relies only on the symmetric token to sign
(HMAC-SHA256) the discovery information that establishes the root of trust for
the control-plane. To use the mode the joining nodes must skip the hash validation of the
CA public key, using --discovery-token-unsafe-skip-ca-verification. You should consider
using one of the other modes if possible.
Still protects against many network-level attacks.
The token can be generated ahead of time and shared with the control-plane node and
worker nodes, which can then bootstrap in parallel without coordination. This
allows it to be used in many provisioning scenarios.
Disadvantages:
If an attacker is able to steal a bootstrap token via some vulnerability,
they can use that token (along with network-level access) to impersonate the
control-plane node to other bootstrapping nodes. This may or may not be an appropriate
tradeoff in your environment.
File or HTTPS-based discovery
This provides an out-of-band way to establish a root of trust between the control-plane node
and bootstrapping nodes. Consider using this mode if you are building automated provisioning
using kubeadm. The format of the discovery file is a regular Kubernetes
kubeconfig file.
In case the discovery file does not contain credentials, the TLS discovery token will be used.
Allows bootstrapping nodes to securely discover a root of trust for the
control-plane node even if the network or other worker nodes are compromised.
Disadvantages:
Requires that you have some way to carry the discovery information from
the control-plane node to the bootstrapping nodes. If the discovery file contains credentials
you must keep it secret and transfer it over a secure channel. This might be possible with your
cloud provider or provisioning tool.
Use of custom kubelet credentials with kubeadm join
To allow kubeadm join to use predefined kubelet credentials and skip client TLS bootstrap
and CSR approval for a new node:
From a working control plane node in the cluster that has /etc/kubernetes/pki/ca.key
execute kubeadm kubeconfig user --org system:nodes --client-name system:node:$NODE > kubelet.conf.
$NODE must be set to the name of the new node.
Modify the resulted kubelet.conf manually to adjust the cluster name and the server endpoint,
or run kubeadm kubeconfig user --config (it accepts InitConfiguration).
Copy the resulting kubelet.conf to /etc/kubernetes/kubelet.conf on the new node.
Execute kubeadm join with the flag
--ignore-preflight-errors=FileAvailable--etc-kubernetes-kubelet.conf on the new node.
Securing your installation even more
The defaults for kubeadm may not work for everyone. This section documents how to tighten up a kubeadm installation
at the cost of some usability.
Turning off auto-approval of node client certificates
By default, there is a CSR auto-approver enabled that basically approves any client certificate request
for a kubelet when a Bootstrap Token was used when authenticating. If you don't want the cluster to
automatically approve kubelet client certs, you can turn it off by executing this command:
After that, kubeadm join will block until the admin has manually approved the CSR in flight:
Using kubectl get csr, you can see that the original CSR is in the Pending state.
kubectl get csr
The output is similar to this:
NAME AGE REQUESTOR CONDITION
node-csr-c69HXe7aYcqkS1bKmH4faEnHAWxn6i2bHZ2mD04jZyQ 18s system:bootstrap:878f07 Pending
kubectl certificate approve allows the admin to approve CSR.This action tells a certificate signing
controller to issue a certificate to the requestor with the attributes requested in the CSR.
This would change the CSR resource to Active state.
kubectl get csr
The output is similar to this:
NAME AGE REQUESTOR CONDITION
node-csr-c69HXe7aYcqkS1bKmH4faEnHAWxn6i2bHZ2mD04jZyQ 1m system:bootstrap:878f07 Approved,Issued
This forces the workflow that kubeadm join will only succeed if kubectl certificate approve has been run.
Turning off public access to the cluster-info ConfigMap
In order to achieve the joining flow using the token as the only piece of validation information, a
ConfigMap with some data needed for validation of the control-plane node's identity is exposed publicly by
default. While there is no private data in this ConfigMap, some users might wish to turn
it off regardless. Doing so will disable the ability to use the --discovery-token flag of the
kubeadm join flow. Here are the steps to do so:
Fetch the cluster-info file from the API Server:
kubectl -n kube-public get cm cluster-info -o jsonpath='{.data.kubeconfig}' | tee cluster-info.yaml
These commands should be run after kubeadm init but before kubeadm join.
Using kubeadm join with a configuration file
Caution:
The config file is still considered beta and may change in future versions.
It's possible to configure kubeadm join with a configuration file instead of command
line flags, and some more advanced features may only be available as
configuration file options. This file is passed using the --config flag and it must
contain a JoinConfiguration structure. Mixing --config with others flags may not be
allowed in some cases.
The default configuration can be printed out using the
kubeadm config print command.
If your configuration is not using the latest version it is recommended that you migrate using
the kubeadm config migrate command.
For more information on the fields and usage of the configuration you can navigate to our
API reference.
What's next
kubeadm init to bootstrap a Kubernetes control-plane node.
kubeadm reset to revert any changes made to this host by kubeadm init or kubeadm join.
10.1.4 - kubeadm upgrade
kubeadm upgrade is a user-friendly command that wraps complex upgrading logic
behind one command, with support for both planning an upgrade and actually performing it.
kubeadm upgrade guidance
The steps for performing an upgrade using kubeadm are outlined in this document.
For older versions of kubeadm, please refer to older documentation sets of the Kubernetes website.
You can use kubeadm upgrade diff to see the changes that would be applied to static pod manifests.
In Kubernetes v1.15.0 and later, kubeadm upgrade apply and kubeadm upgrade node will also
automatically renew the kubeadm managed certificates on this node, including those stored in kubeconfig files.
To opt-out, it is possible to pass the flag --certificate-renewal=false. For more details about certificate
renewal see the certificate management documentation.
Note:
The commands kubeadm upgrade apply and kubeadm upgrade plan have a legacy --config
flag which makes it possible to reconfigure the cluster, while performing planning or upgrade of that particular
control-plane node. Please be aware that the upgrade workflow was not designed for this scenario and there are
reports of unexpected results.
kubeadm upgrade plan
Check which versions are available to upgrade to and validate whether your current cluster is upgradeable.
Synopsis
Check which versions are available to upgrade to and validate whether your current cluster is upgradeable. This command can only run on the control plane nodes where the kubeconfig file "admin.conf" exists. To skip the internet check, pass in the optional [version] parameter.
kubeadm upgrade plan [version] [flags]
Options
--allow-experimental-upgrades
Show unstable versions of Kubernetes as an upgrade alternative and allow upgrading to an alpha/beta/release candidate versions of Kubernetes.
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
--allow-release-candidate-upgrades
Show release candidate versions of Kubernetes as an upgrade alternative and allow upgrading to a release candidate versions of Kubernetes.
--config string
Path to a kubeadm configuration file.
--etcd-upgrade Default: true
Perform the upgrade of etcd.
-h, --help
help for plan
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
-o, --output string Default: "text"
Output format. One of: text|json|yaml|go-template|go-template-file|template|templatefile|jsonpath|jsonpath-as-json|jsonpath-file.
--print-config
Specifies whether the configuration file that will be used in the upgrade should be printed or not.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm upgrade apply
Upgrade your Kubernetes cluster to the specified version
Synopsis
Upgrade your Kubernetes cluster to the specified version
The "apply [version]" command executes the following phases:
preflight Run preflight checks before upgrade
control-plane Upgrade the control plane
upload-config Upload the kubeadm and kubelet configurations to ConfigMaps
/kubeadm Upload the kubeadm ClusterConfiguration to a ConfigMap
/kubelet Upload the kubelet configuration to a ConfigMap
kubelet-config Upgrade the kubelet configuration for this node
bootstrap-token Configures bootstrap token and cluster-info RBAC rules
addon Upgrade the default kubeadm addons
/coredns Upgrade the CoreDNS addon
/kube-proxy Upgrade the kube-proxy addon
post-upgrade Run post upgrade tasks
kubeadm upgrade apply [version]
Options
--allow-experimental-upgrades
Show unstable versions of Kubernetes as an upgrade alternative and allow upgrading to an alpha/beta/release candidate versions of Kubernetes.
--allow-release-candidate-upgrades
Show release candidate versions of Kubernetes as an upgrade alternative and allow upgrading to a release candidate versions of Kubernetes.
--certificate-renewal Default: true
Perform the renewal of certificates used by component changed during upgrades.
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
--etcd-upgrade Default: true
Perform the upgrade of etcd.
-f, --force
Force upgrading although some requirements might not be met. This also implies non-interactive mode.
-h, --help
help for apply
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--print-config
Specifies whether the configuration file that will be used in the upgrade should be printed or not.
--skip-phases strings
List of phases to be skipped
-y, --yes
Perform the upgrade and do not prompt for confirmation (non-interactive mode).
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm upgrade diff
Show what differences would be applied to existing static pod manifests. See also: kubeadm upgrade apply --dry-run
Synopsis
Show what differences would be applied to existing static pod manifests. See also: kubeadm upgrade apply --dry-run
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm upgrade node
Upgrade commands for a node in the cluster
Synopsis
Upgrade commands for a node in the cluster
The "node" command executes the following phases:
preflight Run upgrade node pre-flight checks
control-plane Upgrade the control plane instance deployed on this node, if any
kubelet-config Upgrade the kubelet configuration for this node
addon Upgrade the default kubeadm addons
/coredns Upgrade the CoreDNS addon
/kube-proxy Upgrade the kube-proxy addon
post-upgrade Run post upgrade tasks
kubeadm upgrade node [flags]
Options
--certificate-renewal Default: true
Perform the renewal of certificates used by component changed during upgrades.
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
--etcd-upgrade Default: true
Perform the upgrade of etcd.
-h, --help
help for node
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--skip-phases strings
List of phases to be skipped
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
What's next
kubeadm config if you initialized your cluster using kubeadm v1.7.x or lower, to configure your cluster for kubeadm upgrade
10.1.5 - kubeadm upgrade phases
kubeadm upgrade apply phase
Using the phases of kubeadm upgrade apply, you can choose to execute the separate steps of the initial upgrade
of a control plane node.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
-y, --yes
Perform the upgrade and do not prompt for confirmation (non-interactive mode).
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upgrade the control plane
Synopsis
Upgrade the control plane
kubeadm upgrade apply phase control-plane [flags]
Options
--certificate-renewal Default: true
Perform the renewal of certificates used by component changed during upgrades.
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upload the kubeadm and kubelet configurations to ConfigMaps
Synopsis
Upload the kubeadm and kubelet configurations to ConfigMaps
kubeadm upgrade apply phase upload-config [flags]
Options
-h, --help
help for upload-config
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upgrade the kubelet configuration for this node
Synopsis
Upgrade the kubelet configuration for this node by downloading it from the kubelet-config ConfigMap stored in the cluster
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Configures bootstrap token and cluster-info RBAC rules
Synopsis
Configures bootstrap token and cluster-info RBAC rules
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upgrade the default kubeadm addons
Synopsis
Upgrade the default kubeadm addons
kubeadm upgrade apply phase addon [flags]
Options
-h, --help
help for addon
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Run post upgrade tasks
Synopsis
Run post upgrade tasks
kubeadm upgrade apply phase post-upgrade [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output what actions would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm upgrade node phase
Using the phases of kubeadm upgrade node you can choose to execute the separate steps of the upgrade of
secondary control-plane or worker nodes.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upgrade the kubelet configuration for this node
Synopsis
Upgrade the kubelet configuration for this node by downloading it from the kubelet-config ConfigMap stored in the cluster
kubeadm upgrade node phase kubelet-config [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upgrade the default kubeadm addons
Synopsis
Upgrade the default kubeadm addons
kubeadm upgrade node phase addon [flags]
Options
-h, --help
help for addon
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Run post upgrade tasks
Synopsis
Run post upgrade tasks
kubeadm upgrade node phase post-upgrade [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Do not change any state, just output the actions that would be performed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
What's next
kubeadm init to bootstrap a Kubernetes control-plane node
During kubeadm init, kubeadm uploads the ClusterConfiguration object to your cluster
in a ConfigMap called kubeadm-config in the kube-system namespace. This configuration is then read during
kubeadm join, kubeadm reset and kubeadm upgrade.
You can use kubeadm config print to print the default static configuration that kubeadm
uses for kubeadm init and kubeadm join.
Note:
The output of the command is meant to serve as an example. You must manually edit the output
of this command to adapt to your setup. Remove the fields that you are not certain about and kubeadm
will try to default them on runtime by examining the host.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm config print init-defaults
Print default init configuration, that can be used for 'kubeadm init'
Synopsis
This command prints objects such as the default init configuration that is used for 'kubeadm init'.
Note that sensitive values like the Bootstrap Token fields are replaced with placeholder values like "abcdef.0123456789abcdef" in order to pass validation but
not perform the real computation for creating a token.
kubeadm config print init-defaults [flags]
Options
--component-configs strings
A comma-separated list for component config API objects to print the default values for. Available values: [KubeProxyConfiguration KubeletConfiguration]. If this flag is not set, no component configs will be printed.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm config print join-defaults
Print default join configuration, that can be used for 'kubeadm join'
Synopsis
This command prints objects such as the default join configuration that is used for 'kubeadm join'.
Note that sensitive values like the Bootstrap Token fields are replaced with placeholder values like "abcdef.0123456789abcdef" in order to pass validation but
not perform the real computation for creating a token.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm config migrate
Read an older version of the kubeadm configuration API types from a file, and output the similar config object for the newer version
Synopsis
This command lets you convert configuration objects of older versions to the latest supported version,
locally in the CLI tool without ever touching anything in the cluster.
In this version of kubeadm, the following API versions are supported:
kubeadm.k8s.io/v1beta4
Further, kubeadm can only write out config of version "kubeadm.k8s.io/v1beta4", but read both types.
So regardless of what version you pass to the --old-config parameter here, the API object will be
read, deserialized, defaulted, converted, validated, and re-serialized when written to stdout or
--new-config if specified.
In other words, the output of this command is what kubeadm actually would read internally if you
submitted this file to "kubeadm init"
kubeadm config migrate [flags]
Options
--allow-experimental-api
Allow migration to experimental, unreleased APIs.
-h, --help
help for migrate
--new-config string
Path to the resulting equivalent kubeadm config file using the new API version. Optional, if not specified output will be sent to STDOUT.
--old-config string
Path to the kubeadm config file that is using an old API version and should be converted. This flag is mandatory.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm config validate
Read a file containing the kubeadm configuration API and report any validation problems
Synopsis
This command lets you validate a kubeadm configuration API file and report any warnings and errors.
If there are no errors the exit status will be zero, otherwise it will be non-zero.
Any unmarshaling problems such as unknown API fields will trigger errors. Unknown API versions and
fields with invalid values will also trigger errors. Any other errors or warnings may be reported
depending on contents of the input file.
In this version of kubeadm, the following API versions are supported:
kubeadm.k8s.io/v1beta4
kubeadm config validate [flags]
Options
--allow-experimental-api
Allow validation of experimental, unreleased APIs.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm config images list
Print a list of images kubeadm will use. The configuration file is used in case any images or image repositories are customized
Synopsis
Print a list of images kubeadm will use. The configuration file is used in case any images or image repositories are customized
kubeadm config images list [flags]
Options
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
--config string
Path to a kubeadm configuration file.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm config images pull
Pull images used by kubeadm
Synopsis
Pull images used by kubeadm
kubeadm config images pull [flags]
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
What's next
kubeadm upgrade to upgrade a Kubernetes cluster to a newer version
10.1.7 - kubeadm reset
Performs a best effort revert of changes made by kubeadm init or kubeadm join.
Performs a best effort revert of changes made to this host by 'kubeadm init' or 'kubeadm join'
Synopsis
Performs a best effort revert of changes made to this host by 'kubeadm init' or 'kubeadm join'
The "reset" command executes the following phases:
preflight Run reset pre-flight checks
remove-etcd-member Remove a local etcd member.
cleanup-node Run cleanup node.
kubeadm reset [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path to the directory where the certificates are stored. If specified, clean this directory.
--cleanup-tmp-dir
Cleanup the "/etc/kubernetes/tmp" directory
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
-f, --force
Reset the node without prompting for confirmation.
-h, --help
help for reset
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--skip-phases strings
List of phases to be skipped
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Reset workflow
kubeadm reset is responsible for cleaning up a node local file system from files that were created using
the kubeadm init or kubeadm join commands. For control-plane nodes reset also removes the local stacked
etcd member of this node from the etcd cluster.
kubeadm reset phase can be used to execute the separate phases of the above workflow.
To skip a list of phases you can use the --skip-phases flag, which works in a similar way to
the kubeadm join and kubeadm init phase runners.
kubeadm reset will not delete any etcd data if external etcd is used. This means that if you run kubeadm init again using the same etcd endpoints, you will see state from previous clusters.
To wipe etcd data it is recommended you use a client like etcdctl, such as:
CNI plugins use the directory /etc/cni/net.d to store their configuration.
The kubeadm reset command does not cleanup that directory. Leaving the configuration
of a CNI plugin on a host can be problematic if the same host is later used
as a new Kubernetes node and a different CNI plugin happens to be deployed in that cluster.
It can result in a configuration conflict between CNI plugins.
To cleanup the directory, backup its contents if needed and then execute
the following command:
sudo rm -rf /etc/cni/net.d
Cleanup of network traffic rules
The kubeadm reset command does not clean any iptables, nftables or IPVS rules applied
to the host by kube-proxy. A control loop in kube-proxy ensures that the rules on each node
host are synchronized. For additional details please see
Virtual IPs and Service Proxies.
Leaving the rules without cleanup should not cause any issues if the host is
later reused as a Kubernetes node or if it will serve a different purpose.
If you wish to perform this cleanup, you can use the same kube-proxy container
which was used in your cluster and the --cleanup flag of the
kube-proxy binary:
docker run --privileged --rm registry.k8s.io/kube-proxy:v1.33.0 sh -c "kube-proxy --cleanup && echo DONE"
The output of the above command should print DONE at the end.
Instead of Docker, you can use your preferred container runtime to start the container.
Cleanup of $HOME/.kube
The $HOME/.kube directory typically contains configuration files and kubectl cache.
While not cleaning the contents of $HOME/.kube/cache is not an issue, there is one important
file in the directory. That is $HOME/.kube/config and it is used by kubectl to authenticate
to the Kubernetes API server. After kubeadm init finishes, the user is instructed to copy the
/etc/kubernetes/admin.conf file to the $HOME/.kube/config location and grant the current
user access to it.
The kubeadm reset command does not clean any of the contents of the $HOME/.kube directory.
Leaving the $HOME/.kube/config file without deleting it, can be problematic depending
on who will have access to this host after kubeadm reset was called.
If the same cluster continues to exist, it is highly recommended to delete the file,
as the admin credentials stored in it will continue to be valid.
To cleanup the directory, examine its contents, perform backup if needed and execute
the following command:
rm -rf $HOME/.kube
Graceful kube-apiserver shutdown
If you have your kube-apiserver configured with the --shutdown-delay-duration flag,
you can run the following commands to attempt a graceful shutdown for the running API server Pod,
before you run kubeadm reset:
yq eval -i '.spec.containers[0].command = []' /etc/kubernetes/manifests/kube-apiserver.yaml
timeout 60 sh -c 'while pgrep kube-apiserver >/dev/null; do sleep 1; done'||true
What's next
kubeadm init to bootstrap a Kubernetes control-plane node
kubeadm join to bootstrap a Kubernetes worker node and join it to the cluster
10.1.8 - kubeadm token
Bootstrap tokens are used for establishing bidirectional trust between a node joining
the cluster and a control-plane node, as described in authenticating with bootstrap tokens.
kubeadm init creates an initial token with a 24-hour TTL. The following commands allow you to manage
such a token and also to create and manage new ones.
kubeadm token create
Create bootstrap tokens on the server
Synopsis
This command will create a bootstrap token for you.
You can specify the usages for this token, the "time to live" and an optional human friendly description.
The [token] is the actual token to write.
This should be a securely generated random token of the form "[a-z0-9]{6}.[a-z0-9]{16}".
If no [token] is given, kubeadm will generate a random token instead.
kubeadm token create [token]
Options
--certificate-key string
When used together with '--print-join-command', print the full 'kubeadm join' flag needed to join the cluster as a control-plane. To create a new certificate key you must use 'kubeadm init phase upload-certs --upload-certs'.
--config string
Path to a kubeadm configuration file.
--description string
A human friendly description of how this token is used.
Describes the ways in which this token can be used. You can pass --usages multiple times or provide a comma separated list of options. Valid options: [signing,authentication]
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm token delete
Delete bootstrap tokens on the server
Synopsis
This command will delete a list of bootstrap tokens for you.
The [token-value] is the full Token of the form "[a-z0-9]{6}.[a-z0-9]{16}" or the
Token ID of the form "[a-z0-9]{6}" to delete.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm token generate
Generate and print a bootstrap token, but do not create it on the server
Synopsis
This command will print out a randomly-generated bootstrap token that can be used with
the "init" and "join" commands.
You don't have to use this command in order to generate a token. You can do so
yourself as long as it is in the format "[a-z0-9]{6}.[a-z0-9]{16}". This
command is provided for convenience to generate tokens in the given format.
You can also use "kubeadm init" without specifying a token and it will
generate and print one for you.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm token list
List bootstrap tokens on the server
Synopsis
This command will list all bootstrap tokens for you.
kubeadm token list [flags]
Options
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
-h, --help
help for list
-o, --output string Default: "text"
Output format. One of: text|json|yaml|go-template|go-template-file|template|templatefile|jsonpath|jsonpath-as-json|jsonpath-file.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
What's next
kubeadm join to bootstrap a Kubernetes worker node and join it to the cluster
10.1.9 - kubeadm version
This command prints the version of kubeadm.
Print the version of kubeadm
Synopsis
Print the version of kubeadm
kubeadm version [flags]
Options
-h, --help
help for version
-o, --output string
Output format; available options are 'yaml', 'json' and 'short'
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.10 - kubeadm alpha
Caution:
kubeadm alpha provides a preview of a set of features made available for gathering feedback
from the community. Please try it out and give us feedback!
Currently there are no experimental commands under kubeadm alpha.
What's next
kubeadm init to bootstrap a Kubernetes control-plane node
kubeadm reset to revert any changes made to this host by kubeadm init or kubeadm join
10.1.11 - kubeadm certs
kubeadm certs provides utilities for managing certificates.
For more details on how these commands can be used, see
Certificate Management with kubeadm.
kubeadm certs
A collection of operations for operating Kubernetes certificates.
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew all available certificates
Synopsis
Renew all known certificates necessary to run the control plane. Renewals are run unconditionally, regardless of expiration date. Renewals can also be run individually for more control.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself
Synopsis
Renew the certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate the apiserver uses to access etcd
Synopsis
Renew the certificate the apiserver uses to access etcd.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate for the API server to connect to kubelet
Synopsis
Renew the certificate for the API server to connect to kubelet.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate for serving the Kubernetes API
Synopsis
Renew the certificate for serving the Kubernetes API.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate embedded in the kubeconfig file for the controller manager to use
Synopsis
Renew the certificate embedded in the kubeconfig file for the controller manager to use.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate for liveness probes to healthcheck etcd
Synopsis
Renew the certificate for liveness probes to healthcheck etcd.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate for etcd nodes to communicate with each other
Synopsis
Renew the certificate for etcd nodes to communicate with each other.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate for serving etcd
Synopsis
Renew the certificate for serving etcd.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate for the front proxy client
Synopsis
Renew the certificate for the front proxy client.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate embedded in the kubeconfig file for the scheduler manager to use
Synopsis
Renew the certificate embedded in the kubeconfig file for the scheduler manager to use.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Renew the certificate embedded in the kubeconfig file for the super-admin
Synopsis
Renew the certificate embedded in the kubeconfig file for the super-admin.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm certs certificate-key
This command can be used to generate a new control-plane certificate key.
The key can be passed as --certificate-key to kubeadm init
and kubeadm join
to enable the automatic copy of certificates when joining additional control-plane nodes.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
-o, --output string Default: "text"
Output format. One of: text|json|yaml|go-template|go-template-file|template|templatefile|jsonpath|jsonpath-as-json|jsonpath-file.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm certs generate-csr
This command can be used to generate keys and CSRs for all control-plane certificates and kubeconfig files.
The user can then sign the CSRs with a CA of their choice. To read more information
on how to use the command see
Signing certificate signing requests (CSR) generated by kubeadm.
Generates keys and certificate signing requests (CSRs) for all the certificates required to run the control plane. This command also generates partial kubeconfig files with private key data in the "users > user > client-key-data" field, and for each kubeconfig file an accompanying ".csr" file is created.
This command is designed for use in Kubeadm External CA Mode. It generates CSRs which you can then submit to your external certificate authority for signing.
The PEM encoded signed certificates should then be saved alongside the key files, using ".crt" as the file extension, or in the case of kubeconfig files, the PEM encoded signed certificate should be base64 encoded and added to the kubeconfig file in the "users > user > client-certificate-data" field.
kubeadm certs generate-csr [flags]
Examples
# The following command will generate keys and CSRs for all control-plane certificates and kubeconfig files:
kubeadm certs generate-csr --kubeconfig-dir /tmp/etc-k8s --cert-dir /tmp/etc-k8s/pki
kubeadm reset to revert any changes made to this host by kubeadm init or kubeadm join
10.1.12 - kubeadm init phase
kubeadm init phase enables you to invoke atomic steps of the bootstrap process.
Hence, you can let kubeadm do some of the work and you can fill in the gaps
if you wish to apply customization.
kubeadm init phase is consistent with the kubeadm init workflow,
and behind the scene both use the same code.
kubeadm init phase preflight
Using this command you can execute preflight checks on a control-plane node.
# Run pre-flight checks for kubeadm init using a config file.
kubeadm init phase preflight --config kubeadm-config.yaml
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for preflight
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
Write a file with KubeletConfiguration and an environment file with node specific kubelet settings, and then (re)start kubelet.
kubeadm init phase kubelet-start [flags]
Examples
# Writes a dynamic environment file with kubelet flags from a InitConfiguration file.
kubeadm init phase kubelet-start --config config.yaml
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
Choose a container registry to pull control plane images from
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm init phase certs
Can be used to create all required certificates by kubeadm.
Use alternative domain for services, e.g. "myorg.internal".
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate the certificate for the API server to connect to kubelet
Synopsis
Generate the certificate for the API server to connect to kubelet, and save them into apiserver-kubelet-client.crt and apiserver-kubelet-client.key files.
If both files already exist, kubeadm skips the generation step and existing files will be used.
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for front-proxy-client
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate the self-signed CA to provision identities for etcd
Synopsis
Generate the self-signed CA to provision identities for etcd, and save them into etcd/ca.crt and etcd/ca.key files.
If both files already exist, kubeadm skips the generation step and existing files will be used.
kubeadm init phase certs etcd-ca [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for etcd-ca
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate the certificate for serving etcd
Synopsis
Generate the certificate for serving etcd, and save them into etcd/server.crt and etcd/server.key files.
Default SANs are localhost, 127.0.0.1, 127.0.0.1, ::1
If both files already exist, kubeadm skips the generation step and existing files will be used.
kubeadm init phase certs etcd-server [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for etcd-server
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate the certificate for etcd nodes to communicate with each other
Synopsis
Generate the certificate for etcd nodes to communicate with each other, and save them into etcd/peer.crt and etcd/peer.key files.
Default SANs are localhost, 127.0.0.1, 127.0.0.1, ::1
If both files already exist, kubeadm skips the generation step and existing files will be used.
kubeadm init phase certs etcd-peer [flags]
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for etcd-peer
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate the certificate for liveness probes to healthcheck etcd
Synopsis
Generate the certificate for liveness probes to healthcheck etcd, and save them into etcd/healthcheck-client.crt and etcd/healthcheck-client.key files.
If both files already exist, kubeadm skips the generation step and existing files will be used.
Choose a specific Kubernetes version for the control plane.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate a kubeconfig file for the kubelet to use only for cluster bootstrapping purposes
Synopsis
Generate the kubeconfig file for the kubelet to use and save it to kubelet.conf file.
Please note that this should only be used for cluster bootstrapping purposes. After your control plane is up, you should request all kubelet credentials from the CSR API.
kubeadm init phase kubeconfig kubelet [flags]
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
Port for the API Server to bind to.
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--dry-run
Don't apply any changes; just output what would be done.
Generate all static Pod manifest files necessary to establish the control plane
Synopsis
Generate all static Pod manifest files necessary to establish the control plane
kubeadm init phase control-plane [flags]
Options
-h, --help
help for control-plane
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate all static Pod manifest files
Synopsis
Generate all static Pod manifest files
kubeadm init phase control-plane all [flags]
Examples
# Generates all static Pod manifest files for control plane components,
# functionally equivalent to what is generated by kubeadm init.
kubeadm init phase control-plane all
# Generates all static Pod manifest files using options read from a configuration file.
kubeadm init phase control-plane all --config config.yaml
Options
--apiserver-advertise-address string
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
Port for the API Server to bind to.
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--control-plane-endpoint string
Specify a stable IP address or DNS name for the control plane.
--dry-run
Don't apply any changes; just output what would be done.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generates the kube-controller-manager static Pod manifest
Synopsis
Generates the kube-controller-manager static Pod manifest
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Choose a container registry to pull control plane images from
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm init phase etcd
Use the following phase to create a local etcd instance based on a static Pod file.
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate the static Pod manifest file for a local, single-node local etcd instance
Synopsis
Generate the static Pod manifest file for a local, single-node local etcd instance
kubeadm init phase etcd local [flags]
Examples
# Generates the static Pod manifest file for etcd, functionally
# equivalent to what is generated by kubeadm init.
kubeadm init phase etcd local
# Generates the static Pod manifest file for etcd using options
# read from a configuration file.
kubeadm init phase etcd local --config config.yaml
Options
--cert-dir string Default: "/etc/kubernetes/pki"
The path where to save and store the certificates.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
Choose a container registry to pull control plane images from
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm init phase upload-config
You can use this command to upload the kubeadm configuration to your cluster.
Alternatively, you can use kubeadm config.
Upload the kubeadm and kubelet configuration to a ConfigMap
Synopsis
Upload the kubeadm and kubelet configuration to a ConfigMap
kubeadm init phase upload-config [flags]
Options
-h, --help
help for upload-config
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upload all configuration to a config map
Synopsis
Upload all configuration to a config map
kubeadm init phase upload-config all [flags]
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upload the kubeadm ClusterConfiguration to a ConfigMap
Synopsis
Upload the kubeadm ClusterConfiguration to a ConfigMap called kubeadm-config in the kube-system namespace. This enables correct configuration of system components and a seamless user experience when upgrading.
Alternatively, you can use kubeadm config.
kubeadm init phase upload-config kubeadm [flags]
Examples
# upload the configuration of your cluster
kubeadm init phase upload-config kubeadm --config=myConfig.yaml
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Upload the kubelet component config to a ConfigMap
Synopsis
Upload the kubelet configuration extracted from the kubeadm InitConfiguration object to a kubelet-config ConfigMap in the cluster
kubeadm init phase upload-config kubelet [flags]
Examples
# Upload the kubelet configuration from the kubeadm Config file to a ConfigMap in the cluster.
kubeadm init phase upload-config kubelet --config kubeadm.yaml
Options
--config string
Path to a kubeadm configuration file.
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm init phase upload-certs
Use the following phase to upload control-plane certificates to the cluster.
By default the certs and encryption key expire after two hours.
Upload control plane certificates to the kubeadm-certs Secret
kubeadm init phase upload-certs [flags]
Options
--certificate-key string
Key used to encrypt the control-plane certificates in the kubeadm-certs Secret. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--skip-certificate-key-print
Don't print the key used to encrypt the control-plane certificates.
--upload-certs
Upload control-plane certificates to the kubeadm-certs Secret.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm init phase mark-control-plane
Use the following phase to label and taint the node as a control plane node.
# Applies control-plane label and taint to the current node, functionally equivalent to what executed by kubeadm init.
kubeadm init phase mark-control-plane --config config.yaml
# Applies control-plane label and taint to a specific node
kubeadm init phase mark-control-plane --node-name myNode
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for mark-control-plane
--node-name string
Specify the node name.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm init phase bootstrap-token
Use the following phase to configure bootstrap tokens.
Generates bootstrap tokens used to join a node to a cluster
Synopsis
Bootstrap tokens are used for establishing bidirectional trust between a node joining the cluster and a control-plane node.
This command makes all the configurations required to make bootstrap tokens works and then creates an initial token.
kubeadm init phase bootstrap-token [flags]
Examples
# Make all the bootstrap token configurations and create an initial token, functionally
# equivalent to what generated by kubeadm init.
kubeadm init phase bootstrap-token
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--skip-token-print
Skip printing of the default bootstrap token generated by 'kubeadm init'.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm init phase kubelet-finalize
Use the following phase to update settings relevant to the kubelet after TLS
bootstrap. You can use the all subcommand to run all kubelet-finalize
phases.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
Use alternative domain for services, e.g. "myorg.internal".
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Install the CoreDNS addon to a Kubernetes cluster
Synopsis
Install the CoreDNS addon components via the API server. Please note that although the DNS server is deployed, it will not be scheduled until CNI is installed.
kubeadm init phase addon coredns [flags]
Options
--config string
Path to a kubeadm configuration file.
--dry-run
Don't apply any changes; just output what would be done.
--feature-gates string
A set of key=value pairs that describe feature gates for various features. Options are: ControlPlaneKubeletLocalMode=true|false (BETA - default=true) NodeLocalCRISocket=true|false (ALPHA - default=false) PublicKeysECDSA=true|false (DEPRECATED - default=false) RootlessControlPlane=true|false (ALPHA - default=false) WaitForAllControlPlaneComponents=true|false (BETA - default=true)
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--print-manifest
Print the addon manifests to STDOUT instead of installing them
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
--kubernetes-version string Default: "stable-1"
Choose a specific Kubernetes version for the control plane.
--pod-network-cidr string
Specify range of IP addresses for the pod network. If set, the control plane will automatically allocate CIDRs for every node.
--print-manifest
Print the addon manifests to STDOUT instead of installing them
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
For more details on each field in the v1beta4 configuration you can navigate to our
API reference pages.
What's next
kubeadm init to bootstrap a Kubernetes control-plane node
kubeadm join phase enables you to invoke atomic steps of the join process.
Hence, you can let kubeadm do some of the work and you can fill in the gaps
if you wish to apply customization.
kubeadm join phase is consistent with the kubeadm join workflow,
and behind the scene both use the same code.
# Run join pre-flight checks using a config file.
kubeadm join phase preflight --config kubeadm-config.yaml
Options
--apiserver-advertise-address string
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
If the node should host a new control plane instance, the port for the API Server to bind to.
--certificate-key string
Use this key to decrypt the certificate secrets uploaded by init. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for preflight
--ignore-preflight-errors strings
A list of checks whose errors will be shown as warnings. Example: 'IsPrivilegedUser,Swap'. Value 'all' ignores errors from all checks.
--node-name string
Specify the node name.
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm join phase control-plane-prepare
Using this phase you can prepare a node for serving a control-plane.
# Prepares the machine for serving a control plane
kubeadm join phase control-plane-prepare all
Options
-h, --help
help for control-plane-prepare
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Prepare the machine for serving a control plane
Synopsis
Prepare the machine for serving a control plane
kubeadm join phase control-plane-prepare all [api-server-endpoint] [flags]
Options
--apiserver-advertise-address string
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
If the node should host a new control plane instance, the port for the API Server to bind to.
--certificate-key string
Use this key to decrypt the certificate secrets uploaded by init. The certificate key is a hex encoded string that is an AES key of size 32 bytes.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for all
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Download certificates shared among control-plane nodes from the kubeadm-certs Secret
Synopsis
Download certificates shared among control-plane nodes from the kubeadm-certs Secret
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for certs
--node-name string
Specify the node name.
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Generate the kubeconfig for the new control plane components
Synopsis
Generate the kubeconfig for the new control plane components
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--apiserver-bind-port int32 Default: 6443
If the node should host a new control plane instance, the port for the API Server to bind to.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for control-plane
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm join phase kubelet-start
Using this phase you can write the kubelet settings, certificates and (re)start the kubelet.
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--discovery-file string
For file-based discovery, a file or URL from which to load cluster information.
--discovery-token string
For token-based discovery, the token used to validate cluster information fetched from the API server.
--discovery-token-ca-cert-hash strings
For token-based discovery, validate that the root CA public key matches this hash (format: "<type>:<value>").
--discovery-token-unsafe-skip-ca-verification
For token-based discovery, allow joining without --discovery-token-ca-cert-hash pinning.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for kubelet-start
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
--tls-bootstrap-token string
Specify the token used to temporarily authenticate with the Kubernetes Control Plane while joining the node.
--token string
Use this token for both discovery-token and tls-bootstrap-token when those values are not provided.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm join phase control-plane-join
Using this phase you can join a node as a control-plane instance.
# Joins a machine as a control plane instance
kubeadm join phase control-plane-join all
Options
-h, --help
help for control-plane-join
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
Join a machine as a control plane instance
Synopsis
Join a machine as a control plane instance
kubeadm join phase control-plane-join all [flags]
Options
--apiserver-advertise-address string
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for all
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
If the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
--config string
Path to a kubeadm configuration file.
--control-plane
Create a new control plane instance on this node
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for etcd
--node-name string
Specify the node name.
--patches string
Path to a directory that contains files named "target[suffix][+patchtype].extension". For example, "kube-apiserver0+merge.yaml" or just "etcd.json". "target" can be one of "kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd", "kubeletconfiguration", "corednsdeployment". "patchtype" can be one of "strategic", "merge" or "json" and they match the patch formats supported by kubectl. The default "patchtype" is "strategic". "extension" must be either "json" or "yaml". "suffix" is an optional string that can be used to determine which patches are applied first alpha-numerically.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
# Output a kubeconfig file for an additional user named foo
kubeadm kubeconfig user --client-name=foo
# Output a kubeconfig file for an additional user named foo using a kubeadm config file bar
kubeadm kubeconfig user --client-name=foo --config=bar
Options
--client-name string
The name of user. It will be used as the CN if client certificates are created
--config string
Path to a kubeadm configuration file.
-h, --help
help for user
--org strings
The organizations of the client certificate. It will be used as the O if client certificates are created
--token string
The token that should be used as the authentication mechanism for this kubeconfig, instead of client certificates
--validity-period duration Default: 8760h0m0s
The validity period of the client certificate. It is an offset from the current time.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
10.1.15 - kubeadm reset phase
kubeadm reset phase enables you to invoke atomic steps of the node reset process.
Hence, you can let kubeadm do some of the work and you can fill in the gaps
if you wish to apply customization.
kubeadm reset phase is consistent with the kubeadm reset workflow,
and behind the scene both use the same code.
The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
kubeadm reset phase cleanup-node
Using this phase you can perform cleanup on this node.
The path to the directory where the certificates are stored. If specified, clean this directory.
--cleanup-tmp-dir
Cleanup the "/etc/kubernetes/tmp" directory
--cri-socket string
Path to the CRI socket to connect. If empty kubeadm will try to auto-detect this value; use this option only if you have more than one CRI installed or if you have non-standard CRI socket.
--dry-run
Don't apply any changes; just output what would be done.
-h, --help
help for cleanup-node
Options inherited from parent commands
--rootfs string
The path to the 'real' host root filesystem. This will cause kubeadm to chroot into the provided path.
What's next
kubeadm init to bootstrap a Kubernetes control-plane node
kubeadm init and kubeadm join together provide a nice user experience for creating a
bare Kubernetes cluster from scratch, that aligns with the best-practices.
However, it might not be obvious how kubeadm does that.
This document provides additional details on what happens under the hood, with the aim of sharing
knowledge on the best practices for a Kubernetes cluster.
Core design principles
The cluster that kubeadm init and kubeadm join set up should be:
Secure: It should adopt latest best-practices like:
enforcing RBAC
using the Node Authorizer
using secure communication between the control plane components
using secure communication between the API server and the kubelets
lock-down the kubelet API
locking down access to the API for system components like the kube-proxy and CoreDNS
locking down what a Bootstrap Token can access
User-friendly: The user should not have to run anything more than a couple of commands:
kubeadm init
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl apply -f <network-plugin-of-choice.yaml>
kubeadm join --token <token> <endpoint>:<port>
Extendable:
It should not favor any particular network provider. Configuring the cluster network is out-of-scope
It should provide the possibility to use a config file for customizing various parameters
Constants and well-known values and paths
In order to reduce complexity and to simplify development of higher level tools that build on top of kubeadm, it uses a
limited set of constant values for well-known paths and file names.
The Kubernetes directory /etc/kubernetes is a constant in the application, since it is clearly the given path
in a majority of cases, and the most intuitive location; other constant paths and file names are:
/etc/kubernetes/manifests as the path where the kubelet should look for static Pod manifests.
Names of static Pod manifests are:
etcd.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
/etc/kubernetes/ as the path where kubeconfig files with identities for control plane
components are stored. Names of kubeconfig files are:
kubelet.conf (bootstrap-kubelet.conf during TLS bootstrap)
controller-manager.conf
scheduler.conf
admin.conf for the cluster admin and kubeadm itself
super-admin.conf for the cluster super-admin that can bypass RBAC
Names of certificates and key files:
ca.crt, ca.key for the Kubernetes certificate authority
apiserver.crt, apiserver.key for the API server certificate
apiserver-kubelet-client.crt, apiserver-kubelet-client.key for the client certificate used
by the API server to connect to the kubelets securely
sa.pub, sa.key for the key used by the controller manager when signing ServiceAccount
front-proxy-ca.crt, front-proxy-ca.key for the front proxy certificate authority
front-proxy-client.crt, front-proxy-client.key for the front proxy client
The kubeadm configuration file format
Most kubeadm commands support a --config flag which allows passing a configuration file from
disk. The configuration file format follows the common Kubernetes API apiVersion / kind scheme,
but is considered a component configuration format. Several Kubernetes components, such as the kubelet,
also support file-based configuration.
Different kubeadm subcommands require a different kind of configuration file.
For example, InitConfiguration for kubeadm init, JoinConfiguration for kubeadm join, UpgradeConfiguration for kubeadm upgrade and ResetConfiguration
for kubeadm reset.
The command kubeadm config migrate can be used to migrate an older format configuration
file to a newer (current) configuration format. The kubeadm tool only supports migrating from
deprecated configuration formats to the current format.
The kubeadm init consists of a sequence of atomic work tasks to perform,
as described in the kubeadm initinternal workflow.
The kubeadm init phase command allows
users to invoke each task individually, and ultimately offers a reusable and composable
API/toolbox that can be used by other Kubernetes bootstrap tools, by any IT automation tool or by
an advanced user for creating custom clusters.
Preflight checks
Kubeadm executes a set of preflight checks before starting the init, with the aim to verify
preconditions and avoid common cluster startup problems.
The user can skip specific preflight checks or all of them with the --ignore-preflight-errors option.
[Warning] if the Kubernetes version to use (specified with the --kubernetes-version flag) is
at least one minor version higher than the kubeadm CLI version.
Kubernetes system requirements:
if running on linux:
[Error] if Kernel is older than the minimum required version
[Error] if required cgroups subsystem aren't set up
[Error] if the CRI endpoint does not answer
[Error] if user is not root
[Error] if the machine hostname is not a valid DNS subdomain
[Warning] if the host name cannot be reached via network lookup
[Error] if kubelet version is lower that the minimum kubelet version supported by kubeadm (current minor -1)
[Error] if kubelet version is at least one minor higher than the required controlplane version (unsupported version skew)
[Warning] if kubelet service does not exist or if it is disabled
[Warning] if firewalld is active
[Error] if API server bindPort or ports 10250/10251/10252 are used
[Error] if /etc/kubernetes/manifest folder already exists and it is not empty
[Error] if swap is on
[Error] if ip, iptables, mount, nsenter commands are not present in the command path
[Warning] if ethtool, tc, touch commands are not present in the command path
[Warning] if extra arg flags for API server, controller manager, scheduler contains some invalid options
[Warning] if connection to https://API.AdvertiseAddress:API.BindPort goes through proxy
[Warning] if connection to services subnet goes through proxy (only first address checked)
[Warning] if connection to Pods subnet goes through proxy (only first address checked)
If external etcd is provided:
[Error] if etcd version is older than the minimum required version
[Error] if etcd certificates or keys are specified, but not provided
If external etcd is NOT provided (and thus local etcd will be installed):
[Error] if ports 2379 is used
[Error] if Etcd.DataDir folder already exists and it is not empty
Kubeadm generates certificate and private key pairs for different purposes:
A self signed certificate authority for the Kubernetes cluster saved into ca.crt file and
ca.key private key file
A serving certificate for the API server, generated using ca.crt as the CA, and saved into
apiserver.crt file with its private key apiserver.key. This certificate should contain
the following alternative names:
The Kubernetes service's internal clusterIP (the first address in the services CIDR, e.g.
10.96.0.1 if service subnet is 10.96.0.0/12)
Kubernetes DNS names, e.g. kubernetes.default.svc.cluster.local if --service-dns-domain
flag value is cluster.local, plus default DNS names kubernetes.default.svc,
kubernetes.default, kubernetes
The node-name
The --apiserver-advertise-address
Additional alternative names specified by the user
A client certificate for the API server to connect to the kubelets securely, generated using
ca.crt as the CA and saved into apiserver-kubelet-client.crt file with its private key
apiserver-kubelet-client.key.
This certificate should be in the system:masters organization
A private key for signing ServiceAccount Tokens saved into sa.key file along with its public key sa.pub
A certificate authority for the front proxy saved into front-proxy-ca.crt file with its key
front-proxy-ca.key
A client certificate for the front proxy client, generated using front-proxy-ca.crt as the CA and
saved into front-proxy-client.crt file with its private keyfront-proxy-client.key
Certificates are stored by default in /etc/kubernetes/pki, but this directory is configurable
using the --cert-dir flag.
Please note that:
If a given certificate and private key pair both exist, and their content is evaluated to be compliant with the above specs, the existing files will
be used and the generation phase for the given certificate will be skipped. This means the user can, for example, copy an existing CA to
/etc/kubernetes/pki/ca.{crt,key}, and then kubeadm will use those files for signing the rest of the certs.
See also using custom certificates
For the CA, it is possible to provide the ca.crt file but not the ca.key file. If all other certificates and kubeconfig files
are already in place, kubeadm recognizes this condition and activates the ExternalCA, which also implies the csrsigner controller in
controller-manager won't be started
If kubeadm is running in external CA mode;
all the certificates must be provided by the user, because kubeadm cannot generate them by itself
In case kubeadm is executed in the --dry-run mode, certificate files are written in a temporary folder
Generate kubeconfig files for control plane components
Kubeadm generates kubeconfig files with identities for control plane components:
A kubeconfig file for the kubelet to use during TLS bootstrap -
/etc/kubernetes/bootstrap-kubelet.conf. Inside this file, there is a bootstrap-token or embedded
client certificates for authenticating this node with the cluster.
This client certificate should:
Be in the system:nodes organization, as required by the
Node Authorization module
Have the Common Name (CN) system:node:<hostname-lowercased>
A kubeconfig file for controller-manager, /etc/kubernetes/controller-manager.conf; inside this
file is embedded a client certificate with controller-manager identity. This client certificate should
have the CN system:kube-controller-manager, as defined by default
RBAC core components roles
A kubeconfig file for scheduler, /etc/kubernetes/scheduler.conf; inside this file is embedded
a client certificate with scheduler identity.
This client certificate should have the CN system:kube-scheduler, as defined by default
RBAC core components roles
Additionally, a kubeconfig file for kubeadm as an administrative entity is generated and stored
in /etc/kubernetes/admin.conf. This file includes a certificate with
Subject: O = kubeadm:cluster-admins, CN = kubernetes-admin. kubeadm:cluster-admins
is a group managed by kubeadm. It is bound to the cluster-admin ClusterRole during kubeadm init,
by using the super-admin.conf file, which does not require RBAC.
This admin.conf file must remain on control plane nodes and should not be shared with additional users.
During kubeadm init another kubeconfig file is generated and stored in /etc/kubernetes/super-admin.conf.
This file includes a certificate with Subject: O = system:masters, CN = kubernetes-super-admin.
system:masters is a superuser group that bypasses RBAC and makes super-admin.conf useful in case
of an emergency where a cluster is locked due to RBAC misconfiguration.
The super-admin.conf file must be stored in a safe location and should not be shared with additional users.
The generated configuration files include an embedded authentication key, and you should treat
them as confidential.
Also note that:
ca.crt certificate is embedded in all the kubeconfig files.
If a given kubeconfig file exists, and its content is evaluated as compliant with the above specs,
the existing file will be used and the generation phase for the given kubeconfig will be skipped
If kubeadm is running in ExternalCA mode,
all the required kubeconfig must be provided by the user as well, because kubeadm cannot
generate any of them by itself
In case kubeadm is executed in the --dry-run mode, kubeconfig files are written in a temporary folder
Generate static Pod manifests for control plane components
Kubeadm writes static Pod manifest files for control plane components to
/etc/kubernetes/manifests. The kubelet watches this directory for Pods to be created on startup.
Static Pod manifests share a set of common properties:
All static Pods are deployed on kube-system namespace
All static Pods get tier:control-plane and component:{component-name} labels
All static Pods use the system-node-critical priority class
hostNetwork: true is set on all static Pods to allow control plane startup before a network is
configured; as a consequence:
The address that the controller-manager and the scheduler use to refer to the API server is 127.0.0.1
If the etcd server is set up locally, the etcd-server address will be set to 127.0.0.1:2379
Leader election is enabled for both the controller-manager and the scheduler
Controller-manager and the scheduler will reference kubeconfig files with their respective, unique identities
The static Pod manifest for the API server is affected by the following parameters provided by the users:
The apiserver-advertise-address and apiserver-bind-port to bind to; if not provided, those
values default to the IP address of the default network interface on the machine and port 6443
The service-cluster-ip-range to use for services
If an external etcd server is specified, the etcd-servers address and related TLS settings
(etcd-cafile, etcd-certfile, etcd-keyfile);
if an external etcd server is not provided, a local etcd will be used (via host network)
If a cloud provider is specified, the corresponding --cloud-provider parameter is configured together
with the --cloud-config path if such file exists (this is experimental, alpha and will be
removed in a future version)
Other API server flags that are set unconditionally are:
--insecure-port=0 to avoid insecure connections to the api server
--enable-bootstrap-token-auth=true to enable the BootstrapTokenAuthenticator authentication module.
See TLS Bootstrapping for more details
--allow-privileged to true (required e.g. by kube proxy)
--requestheader-client-ca-file to front-proxy-ca.crt
PersistentVolumeLabel
attaches region or zone labels to PersistentVolumes as defined by the cloud provider (This
admission controller is deprecated and will be removed in a future version.
It is not deployed by kubeadm by default with v1.9 onwards when not explicitly opting into
using gce or aws as cloud providers)
DefaultStorageClass
to enforce default storage class on PersistentVolumeClaim objects
NodeRestriction
to limit what a kubelet can modify (e.g. only pods on this node)
--kubelet-preferred-address-types to InternalIP,ExternalIP,Hostname; this makes kubectl logs and other API server-kubelet communication work in environments where the hostnames of the
nodes aren't resolvable
Flags for using certificates generated in previous steps:
--client-ca-file to ca.crt
--tls-cert-file to apiserver.crt
--tls-private-key-file to apiserver.key
--kubelet-client-certificate to apiserver-kubelet-client.crt
--kubelet-client-key to apiserver-kubelet-client.key
--service-account-key-file to sa.pub
--requestheader-client-ca-file to front-proxy-ca.crt
--proxy-client-cert-file to front-proxy-client.crt
--proxy-client-key-file to front-proxy-client.key
Other flags for securing the front proxy
(API Aggregation)
communications:
The static Pod manifest for the controller manager is affected by following parameters provided by
the users:
If kubeadm is invoked specifying a --pod-network-cidr, the subnet manager feature required for
some CNI network plugins is enabled by setting:
--allocate-node-cidrs=true
--cluster-cidr and --node-cidr-mask-size flags according to the given CIDR
Other flags that are set unconditionally are:
--controllers enabling all the default controllers plus BootstrapSigner and TokenCleaner
controllers for TLS bootstrap. See TLS Bootstrapping
for more details.
--use-service-account-credentials to true
Flags for using certificates generated in previous steps:
--root-ca-file to ca.crt
--cluster-signing-cert-file to ca.crt, if External CA mode is disabled, otherwise to ""
--cluster-signing-key-file to ca.key, if External CA mode is disabled, otherwise to ""
--service-account-private-key-file to sa.key
Scheduler
The static Pod manifest for the scheduler is not affected by parameters provided by the user.
Generate static Pod manifest for local etcd
If you specified an external etcd, this step will be skipped, otherwise kubeadm generates a
static Pod manifest file for creating a local etcd instance running in a Pod with following attributes:
listen on localhost:2379 and use HostNetwork=true
make a hostPath mount out from the dataDir to the host's filesystem
Any extra flags specified by the user
Please note that:
The etcd container image will be pulled from registry.gcr.io by default. See
using custom images
for customizing the image repository.
If you run kubeadm in --dry-run mode, the etcd static Pod manifest is written
into a temporary folder.
You can directly invoke static Pod manifest generation for local etcd, using the
kubeadm init phase etcd local
command.
Wait for the control plane to come up
On control plane nodes, kubeadm waits up to 4 minutes for the control plane components
and the kubelet to be available. It does that by performing a health check on the respective
component /healthz or /livez endpoints.
After the control plane is up, kubeadm completes the tasks described in following paragraphs.
Save the kubeadm ClusterConfiguration in a ConfigMap for later reference
kubeadm saves the configuration passed to kubeadm init in a ConfigMap named kubeadm-config
under kube-system namespace.
This will ensure that kubeadm actions executed in future (e.g kubeadm upgrade) will be able to
determine the actual/current cluster state and make new decisions based on that data.
Please note that:
Before saving the ClusterConfiguration, sensitive information like the token is stripped from the configuration
kubeadm init ensures that everything is properly configured for this process, and this includes
following steps as well as setting API server and controller flags as already described in
previous paragraphs.
Note:
TLS bootstrapping for nodes can be configured with the command
kubeadm init phase bootstrap-token,
executing all the configuration steps described in following paragraphs;
alternatively, each step can be invoked individually.
Create a bootstrap token
kubeadm init creates a first bootstrap token, either generated automatically or provided by the
user with the --token flag; as documented in bootstrap token specification, token should be
saved as a secret with name bootstrap-token-<token-id> under kube-system namespace.
Please note that:
The default token created by kubeadm init will be used to validate temporary user during TLS
bootstrap process; those users will be member of
system:bootstrappers:kubeadm:default-node-token group
The token has a limited validity, default 24 hours (the interval may be changed with the —token-ttl flag)
Additional tokens can be created with the kubeadm token
command, that provide other useful functions for token management as well.
Allow joining nodes to call CSR API
Kubeadm ensures that users in system:bootstrappers:kubeadm:default-node-token group are able to
access the certificate signing API.
This is implemented by creating a ClusterRoleBinding named kubeadm:kubelet-bootstrap between the
group above and the default RBAC role system:node-bootstrapper.
Set up auto approval for new bootstrap tokens
Kubeadm ensures that the Bootstrap Token will get its CSR request automatically approved by the
csrapprover controller.
This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-bootstrap
between the system:bootstrappers:kubeadm:default-node-token group and the default role
system:certificates.k8s.io:certificatesigningrequests:nodeclient.
The role system:certificates.k8s.io:certificatesigningrequests:nodeclient should be created as
well, granting POST permission to
/apis/certificates.k8s.io/certificatesigningrequests/nodeclient.
Set up nodes certificate rotation with auto approval
Kubeadm ensures that certificate rotation is enabled for nodes, and that a new certificate request
for nodes will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named
kubeadm:node-autoapprove-certificate-rotation between the system:nodes group and the default
role system:certificates.k8s.io:certificatesigningrequests:selfnodeclient.
Create the public cluster-info ConfigMap
This phase creates the cluster-info ConfigMap in the kube-public namespace.
Additionally, it creates a Role and a RoleBinding granting access to the ConfigMap for
unauthenticated users (i.e. users in RBAC group system:unauthenticated).
Note:
The access to the cluster-info ConfigMap is not rate-limited. This may or may not be a
problem if you expose your cluster's API server to the internet; worst-case scenario here is a
DoS attack where an attacker uses all the in-flight requests the kube-apiserver can handle to
serve the cluster-info ConfigMap.
Install addons
Kubeadm installs the internal DNS server and the kube-proxy addon components via the API server.
A ServiceAccount for kube-proxy is created in the kube-system namespace; then kube-proxy is
deployed as a DaemonSet:
The credentials (ca.crt and token) to the control plane come from the ServiceAccount
The location (URL) of the API server comes from a ConfigMap
The kube-proxy ServiceAccount is bound to the privileges in the system:node-proxier ClusterRole
DNS
The CoreDNS service is named kube-dns for compatibility reasons with the legacy kube-dns
addon.
A ServiceAccount for CoreDNS is created in the kube-system namespace.
The coredns ServiceAccount is bound to the privileges in the system:coredns ClusterRole
In Kubernetes version 1.21, support for using kube-dns with kubeadm was removed.
You can use CoreDNS with kubeadm even when the related Service is named kube-dns.
kubeadm join phases internal design
Similarly to kubeadm init, also kubeadm join internal workflow consists of a sequence of
atomic work tasks to perform.
This is split into discovery (having the Node trust the Kubernetes API Server) and TLS bootstrap
(having the Kubernetes API Server trust the Node).
kubeadm executes a set of preflight checks before starting the join, with the aim to verify
preconditions and avoid common cluster startup problems.
Also note that:
kubeadm join preflight checks are basically a subset of kubeadm init preflight checks
If you are joining a Windows node, Linux specific controls are skipped.
In any case the user can skip specific preflight checks (or eventually all preflight checks)
with the --ignore-preflight-errors option.
Discovery cluster-info
There are 2 main schemes for discovery. The first is to use a shared token along with the IP
address of the API server.
The second is to provide a file (that is a subset of the standard kubeconfig file).
Shared token discovery
If kubeadm join is invoked with --discovery-token, token discovery is used; in this case the
node basically retrieves the cluster CA certificates from the cluster-info ConfigMap in the
kube-public namespace.
In order to prevent "man in the middle" attacks, several steps are taken:
First, the CA certificate is retrieved via insecure connection (this is possible because
kubeadm init is granted access to cluster-info users for system:unauthenticated)
Then the CA certificate goes through following validation steps:
Basic validation: using the token ID against a JWT signature
Pub key validation: using provided --discovery-token-ca-cert-hash. This value is available
in the output of kubeadm init or can be calculated using standard tools (the hash is
calculated over the bytes of the Subject Public Key Info (SPKI) object as in RFC7469). The
--discovery-token-ca-cert-hash flag may be repeated multiple times to allow more than one public key.
As an additional validation, the CA certificate is retrieved via secure connection and then
compared with the CA retrieved initially
Note:
You can skip CA validation by passing the --discovery-token-unsafe-skip-ca-verification flag on the command line.
This weakens the kubeadm security model since others can potentially impersonate the Kubernetes API server.
File/https discovery
If kubeadm join is invoked with --discovery-file, file discovery is used; this file can be a
local file or downloaded via an HTTPS URL; in case of HTTPS, the host installed CA bundle is used
to verify the connection.
With file discovery, the cluster CA certificate is provided into the file itself; in fact, the
discovery file is a kubeconfig file with only server and certificate-authority-data attributes
set, as described in the kubeadm join
reference doc; when the connection with the cluster is established, kubeadm tries to access the
cluster-info ConfigMap, and if available, uses it.
TLS Bootstrap
Once the cluster info is known, the file bootstrap-kubelet.conf is written, thus allowing
kubelet to do TLS Bootstrapping.
The TLS bootstrap mechanism uses the shared token to temporarily authenticate with the Kubernetes
API server to submit a certificate signing request (CSR) for a locally created key pair.
The request is then automatically approved and the operation completes saving ca.crt file and
kubelet.conf file to be used by the kubelet for joining the cluster, while bootstrap-kubelet.conf
is deleted.
Note:
The temporary authentication is validated against the token saved during the kubeadm init
process (or with additional tokens created with kubeadm token command)
The temporary authentication resolves to a user member of
system:bootstrappers:kubeadm:default-node-token group which was granted access to the CSR api
during the kubeadm init process
The automatic CSR approval is managed by the csrapprover controller, according to
the configuration present in the kubeadm init process
kubeadm upgrade workflow internal design
kubeadm upgrade has sub-commands for handling the upgrade of the Kubernets cluster created by kubeadm.
You must run kubeadm upgrade apply on a control plane node (you can choose which one);
this starts the upgrade process. You then run kubeadm upgrade node on all remaining
nodes (both worker nodes and control plane nodes).
Both kubeadm upgrade apply and kubeadm upgrade node have a phase subcommand which provides access
to the internal phases of the upgrade process.
See kubeadm upgrade phase for more details.
Additional utility upgrade commands are kubeadm upgrade plan and kubeadm upgrade diff.
All upgrade sub-commands support passing a configuration file.
kubeadm upgrade plan
You can optionally run kubeadm upgrade plan before you run kubeadm upgrade apply.
The plan subcommand checks which versions are available to upgrade
to and validates whether your current cluster is upgradeable.
kubeadm upgrade diff
This shows what differences would be applied to existing static pod manifests for control plane nodes.
A more verbose way to do the same thing is running kubeadm upgrade apply --dry-run or
kubeadm upgrade node --dry-run.
kubeadm upgrade apply
kubeadm upgrade apply prepares the cluster for the upgrade of all nodes, and also
upgrades the control plane node where it's run. The steps it performs are:
Runs preflight checks similarly to kubeadm init and kubeadm join, ensuring container images are downloaded
and the cluster is in a good state to be upgraded.
Upgrades the control plane manifest files on disk in /etc/kubernetes/manifests and waits
for the kubelet to restart the components if the files have changed.
Uploads the updated kubeadm and kubelet configurations to the cluster in the kubeadm-config
and the kubelet-config ConfigMaps (both in the kube-system namespace).
Writes updated kubelet configuration for this node in /var/lib/kubelet/config.yaml.
Configures bootstrap token and the cluster-info ConfigMap for RBAC rules. This is the same as
in the kubeadm init stage and ensures that the cluster continues to support nodes joining with bootstrap tokens.
Upgrades the kube-proxy and CoreDNS addons conditionally if all existing kube-apiservers in the cluster
have already been upgraded to the target version.
Performs any post-upgrade tasks, such as, cleaning up deprecated features which are release specific.
kubeadm upgrade node
kubeadm upgrade node upgrades a single control plane or worker node after the cluster upgrade has
started (by running kubeadm upgrade apply). The command detects if the node is a control plane node by checking
if the file /etc/kubernetes/manifests/kube-apiserver.yaml exists. On finding that file, the kubeadm tool
infers that there is a running kube-apiserver Pod on this node.
Runs preflight checks similarly to kubeadm upgrade apply.
For control plane nodes, upgrades the control plane manifest files on disk in /etc/kubernetes/manifests
and waits for the kubelet to restart the components if the files have changed.
Writes the updated kubelet configuration for this node in /var/lib/kubelet/config.yaml.
(For control plane nodes) upgrades the kube-proxy and CoreDNS
addons conditionally, provided that all existing
API servers in the cluster have already been upgraded to the target version.
Performs any post-upgrade tasks, such as cleaning up deprecated features which are release specific.
kubeadm reset workflow internal design
You can use the kubeadm reset subcommand on a node where kubeadm commands previously executed.
This subcommand performs a best-effort cleanup of the node.
If certain actions fail you must intervene and perform manual cleanup.
IPVS, iptables and nftables rules are not cleaned up.
CNI (network plugin) configuration is not cleaned up.
.kube/ in the user's home directory is not cleaned up.
The command has the following stages:
Runs preflight checks on the node to determine if its healthy.
For control plane nodes, removes any local etcd member data.
Stops the kubelet.
Stops running containers.
Unmounts any mounted directories in /var/lib/kubelet.
Deletes any files and directories managed by kubeadm in /var/lib/kubelet and /etc/kubernetes.
11 - Command line tool (kubectl)
Kubernetes provides a command line tool for communicating with a Kubernetes cluster's
control plane,
using the Kubernetes API.
This tool is named kubectl.
For configuration, kubectl looks for a file named config in the $HOME/.kube directory.
You can specify other kubeconfig
files by setting the KUBECONFIG environment variable or by setting the
--kubeconfig flag.
This overview covers kubectl syntax, describes the command operations, and provides common examples.
For details about each command, including all the supported flags and subcommands, see the
kubectl reference documentation.
For installation instructions, see Installing kubectl;
for a quick guide, see the cheat sheet.
If you're used to using the docker command-line tool,
kubectl for Docker Users explains some equivalent commands for Kubernetes.
Syntax
Use the following syntax to run kubectl commands from your terminal window:
kubectl [command][TYPE][NAME][flags]
where command, TYPE, NAME, and flags are:
command: Specifies the operation that you want to perform on one or more resources,
for example create, get, describe, delete.
TYPE: Specifies the resource type. Resource types are case-insensitive and
you can specify the singular, plural, or abbreviated forms.
For example, the following commands produce the same output:
kubectl get pod pod1
kubectl get pods pod1
kubectl get po pod1
NAME: Specifies the name of the resource. Names are case-sensitive. If the name is omitted,
details for all resources are displayed, for example kubectl get pods.
When performing an operation on multiple resources, you can specify each resource by
type and name or specify one or more files:
To specify resources by type and name:
To group resources if they are all the same type: TYPE1 name1 name2 name<#>. Example: kubectl get pod example-pod1 example-pod2
To specify multiple resource types individually: TYPE1/name1 TYPE1/name2 TYPE2/name3 TYPE<#>/name<#>. Example: kubectl get pod/example-pod1 replicationcontroller/example-rc1
To specify resources with one or more files: -f file1 -f file2 -f file<#>
Use YAML rather than JSON
since YAML tends to be more user-friendly, especially for configuration files. Example: kubectl get -f ./pod.yaml
flags: Specifies optional flags. For example, you can use the -s or --server flags
to specify the address and port of the Kubernetes API server.
Caution:
Flags that you specify from the command line override default values and any corresponding environment variables.
If you need help, run kubectl help from the terminal window.
In-cluster authentication and namespace overrides
By default kubectl will first determine if it is running within a pod, and thus in a cluster.
It starts by checking for the KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT environment
variables and the existence of a service account token file at /var/run/secrets/kubernetes.io/serviceaccount/token.
If all three are found in-cluster authentication is assumed.
To maintain backwards compatibility, if the POD_NAMESPACE environment variable is set
during in-cluster authentication it will override the default namespace from the
service account token. Any manifests or tools relying on namespace defaulting will be affected by this.
POD_NAMESPACE environment variable
If the POD_NAMESPACE environment variable is set, cli operations on namespaced resources
will default to the variable value. For example, if the variable is set to seattle,
kubectl get pods would return pods in the seattle namespace. This is because pods are
a namespaced resource, and no namespace was provided in the command. Review the output
of kubectl api-resources to determine if a resource is namespaced.
Explicit use of --namespace <value> overrides this behavior.
How kubectl handles ServiceAccount tokens
If:
there is Kubernetes service account token file mounted at
/var/run/secrets/kubernetes.io/serviceaccount/token, and
the KUBERNETES_SERVICE_HOST environment variable is set, and
the KUBERNETES_SERVICE_PORT environment variable is set, and
you don't explicitly specify a namespace on the kubectl command line
then kubectl assumes it is running in your cluster. The kubectl tool looks up the
namespace of that ServiceAccount (this is the same as the namespace of the Pod)
and acts against that namespace. This is different from what happens outside of a
cluster; when kubectl runs outside a cluster and you don't specify a namespace,
the kubectl command acts against the namespace set for the current context in your
client configuration. To change the default namespace for your kubectl you can use the
following command:
Get documentation of various resources. For instance pods, nodes, services, etc.
expose
kubectl expose (-f FILENAME | TYPE NAME | TYPE/NAME) [--port=port] [--protocol=TCP|UDP] [--target-port=number-or-name] [--name=name] [--external-ip=external-ip-of-service] [--type=type] [flags]
Expose a replication controller, service, or pod as a new Kubernetes service.
get
kubectl get (-f FILENAME | TYPE [NAME | /NAME | -l label]) [--watch] [--sort-by=FIELD] [[-o | --output]=OUTPUT_FORMAT] [flags]
List one or more resources.
kustomize
kubectl kustomize <dir> [flags] [options]
List a set of API resources generated from instructions in a kustomization.yaml file. The argument must be the path to the directory containing the file, or a git repository URL with a path suffix specifying same with respect to the repository root.
label
kubectl label (-f FILENAME | TYPE NAME | TYPE/NAME) KEY_1=VAL_1 ... KEY_N=VAL_N [--overwrite] [--all] [--resource-version=version] [flags]
Add or update the labels of one or more resources.
logs
kubectl logs POD [-c CONTAINER] [--follow] [flags]
Print the logs for a container in a pod.
options
kubectl options
List of global command-line options, which apply to all commands.
patch
kubectl patch (-f FILENAME | TYPE NAME | TYPE/NAME) --patch PATCH [flags]
Update one or more fields of a resource by using the strategic merge patch process.
plugin
kubectl plugin [flags] [options]
Provides utilities for interacting with plugins.
port-forward
kubectl port-forward POD [LOCAL_PORT:]REMOTE_PORT [...[LOCAL_PORT_N:]REMOTE_PORT_N] [flags]
Experimental: Wait for a specific condition on one or many resources.
To learn more about command operations, see the kubectl reference documentation.
Resource types
The following table includes a list of all the supported resource types and their abbreviated aliases.
(This output can be retrieved from kubectl api-resources, and was accurate as of Kubernetes 1.25.0)
NAME
SHORTNAMES
APIVERSION
NAMESPACED
KIND
bindings
v1
true
Binding
componentstatuses
cs
v1
false
ComponentStatus
configmaps
cm
v1
true
ConfigMap
endpoints
ep
v1
true
Endpoints
events
ev
v1
true
Event
limitranges
limits
v1
true
LimitRange
namespaces
ns
v1
false
Namespace
nodes
no
v1
false
Node
persistentvolumeclaims
pvc
v1
true
PersistentVolumeClaim
persistentvolumes
pv
v1
false
PersistentVolume
pods
po
v1
true
Pod
podtemplates
v1
true
PodTemplate
replicationcontrollers
rc
v1
true
ReplicationController
resourcequotas
quota
v1
true
ResourceQuota
secrets
v1
true
Secret
serviceaccounts
sa
v1
true
ServiceAccount
services
svc
v1
true
Service
mutatingwebhookconfigurations
admissionregistration.k8s.io/v1
false
MutatingWebhookConfiguration
validatingwebhookconfigurations
admissionregistration.k8s.io/v1
false
ValidatingWebhookConfiguration
customresourcedefinitions
crd,crds
apiextensions.k8s.io/v1
false
CustomResourceDefinition
apiservices
apiregistration.k8s.io/v1
false
APIService
controllerrevisions
apps/v1
true
ControllerRevision
daemonsets
ds
apps/v1
true
DaemonSet
deployments
deploy
apps/v1
true
Deployment
replicasets
rs
apps/v1
true
ReplicaSet
statefulsets
sts
apps/v1
true
StatefulSet
tokenreviews
authentication.k8s.io/v1
false
TokenReview
localsubjectaccessreviews
authorization.k8s.io/v1
true
LocalSubjectAccessReview
selfsubjectaccessreviews
authorization.k8s.io/v1
false
SelfSubjectAccessReview
selfsubjectrulesreviews
authorization.k8s.io/v1
false
SelfSubjectRulesReview
subjectaccessreviews
authorization.k8s.io/v1
false
SubjectAccessReview
horizontalpodautoscalers
hpa
autoscaling/v2
true
HorizontalPodAutoscaler
cronjobs
cj
batch/v1
true
CronJob
jobs
batch/v1
true
Job
certificatesigningrequests
csr
certificates.k8s.io/v1
false
CertificateSigningRequest
leases
coordination.k8s.io/v1
true
Lease
endpointslices
discovery.k8s.io/v1
true
EndpointSlice
events
ev
events.k8s.io/v1
true
Event
flowschemas
flowcontrol.apiserver.k8s.io/v1beta2
false
FlowSchema
prioritylevelconfigurations
flowcontrol.apiserver.k8s.io/v1beta2
false
PriorityLevelConfiguration
ingressclasses
networking.k8s.io/v1
false
IngressClass
ingresses
ing
networking.k8s.io/v1
true
Ingress
networkpolicies
netpol
networking.k8s.io/v1
true
NetworkPolicy
runtimeclasses
node.k8s.io/v1
false
RuntimeClass
poddisruptionbudgets
pdb
policy/v1
true
PodDisruptionBudget
podsecuritypolicies
psp
policy/v1beta1
false
PodSecurityPolicy
clusterrolebindings
rbac.authorization.k8s.io/v1
false
ClusterRoleBinding
clusterroles
rbac.authorization.k8s.io/v1
false
ClusterRole
rolebindings
rbac.authorization.k8s.io/v1
true
RoleBinding
roles
rbac.authorization.k8s.io/v1
true
Role
priorityclasses
pc
scheduling.k8s.io/v1
false
PriorityClass
csidrivers
storage.k8s.io/v1
false
CSIDriver
csinodes
storage.k8s.io/v1
false
CSINode
csistoragecapacities
storage.k8s.io/v1
true
CSIStorageCapacity
storageclasses
sc
storage.k8s.io/v1
false
StorageClass
volumeattachments
storage.k8s.io/v1
false
VolumeAttachment
Output options
Use the following sections for information about how you can format or sort the output
of certain commands. For details about which commands support the various output options,
see the kubectl reference documentation.
Formatting output
The default output format for all kubectl commands is the human readable plain-text format.
To output details to your terminal window in a specific format, you can add either the -o
or --output flags to a supported kubectl command.
Syntax
kubectl [command][TYPE][NAME] -o <output_format>
Depending on the kubectl operation, the following output formats are supported:
Output format
Description
-o custom-columns=<spec>
Print a table using a comma separated list of custom columns.
-o custom-columns-file=<filename>
Print a table using the custom columns template in the <filename> file.
-o json
Output a JSON formatted API object.
-o jsonpath=<template>
Print the fields defined in a jsonpath expression.
-o jsonpath-file=<filename>
Print the fields defined by the jsonpath expression in the <filename> file.
-o name
Print only the resource name and nothing else.
-o wide
Output in the plain-text format with any additional information. For pods, the node name is included.
-o yaml
Output a YAML formatted API object.
Example
In this example, the following command outputs the details for a single pod as a YAML formatted object:
kubectl get pod web-pod-13je7 -o yaml
Remember: See the kubectl reference documentation
for details about which output format is supported by each command.
Custom columns
To define custom columns and output only the details that you want into a table, you can use the custom-columns option.
You can choose to define the custom columns inline or use a template file: -o custom-columns=<spec> or -o custom-columns-file=<filename>.
Examples
Inline:
kubectl get pods <pod-name> -o custom-columns=NAME:.metadata.name,RSRC:.metadata.resourceVersion
Template file:
kubectl get pods <pod-name> -o custom-columns-file=template.txt
where the template.txt file contains:
NAME RSRC
metadata.name metadata.resourceVersion
The result of running either command is similar to:
NAME RSRC
submit-queue 610995
Server-side columns
kubectl supports receiving specific column information from the server about objects.
This means that for any given resource, the server will return columns and rows relevant to that resource, for the client to print.
This allows for consistent human-readable output across clients used against the same cluster, by having the server encapsulate the details of printing.
This feature is enabled by default. To disable it, add the
--server-print=false flag to the kubectl get command.
Examples
To print information about the status of a pod, use a command like the following:
kubectl get pods <pod-name> --server-print=false
The output is similar to:
NAME AGE
pod-name 1m
Sorting list objects
To output objects to a sorted list in your terminal window, you can add the --sort-by flag
to a supported kubectl command. Sort your objects by specifying any numeric or string field
with the --sort-by flag. To specify a field, use a jsonpath expression.
Use the following set of examples to help you familiarize yourself with running the commonly used kubectl operations:
kubectl apply - Apply or Update a resource from a file or stdin.
# Create a service using the definition in example-service.yaml.kubectl apply -f example-service.yaml
# Create a replication controller using the definition in example-controller.yaml.kubectl apply -f example-controller.yaml
# Create the objects that are defined in any .yaml, .yml, or .json file within the <directory> directory.kubectl apply -f <directory>
kubectl get - List one or more resources.
# List all pods in plain-text output format.kubectl get pods
# List all pods in plain-text output format and include additional information (such as node name).kubectl get pods -o wide
# List the replication controller with the specified name in plain-text output format. Tip: You can shorten and replace the 'replicationcontroller' resource type with the alias 'rc'.kubectl get replicationcontroller <rc-name>
# List all replication controllers and services together in plain-text output format.kubectl get rc,services
# List all daemon sets in plain-text output format.kubectl get ds
# List all pods running on node server01kubectl get pods --field-selector=spec.nodeName=server01
kubectl describe - Display detailed state of one or more resources, including the uninitialized ones by default.
# Display the details of the node with name <node-name>.kubectl describe nodes <node-name>
# Display the details of the pod with name <pod-name>.kubectl describe pods/<pod-name>
# Display the details of all the pods that are managed by the replication controller named <rc-name>.# Remember: Any pods that are created by the replication controller get prefixed with the name of the replication controller.kubectl describe pods <rc-name>
# Describe all podskubectl describe pods
Note:
The kubectl get command is usually used for retrieving one or more
resources of the same resource type. It features a rich set of flags that allows
you to customize the output format using the -o or --output flag, for example.
You can specify the -w or --watch flag to start watching updates to a particular
object. The kubectl describe command is more focused on describing the many
related aspects of a specified resource. It may invoke several API calls to the
API server to build a view for the user. For example, the kubectl describe node
command retrieves not only the information about the node, but also a summary of
the pods running on it, the events generated for the node etc.
kubectl delete - Delete resources either from a file, stdin, or specifying label selectors, names, resource selectors, or resources.
# Delete a pod using the type and name specified in the pod.yaml file.kubectl delete -f pod.yaml
# Delete all the pods and services that have the label '<label-key>=<label-value>'.kubectl delete pods,services -l <label-key>=<label-value>
# Delete all pods, including uninitialized ones.kubectl delete pods --all
kubectl exec - Execute a command against a container in a pod.
# Get output from running 'date' from pod <pod-name>. By default, output is from the first container.kubectl exec <pod-name> -- date
# Get output from running 'date' in container <container-name> of pod <pod-name>.kubectl exec <pod-name> -c <container-name> -- date
# Get an interactive TTY and run /bin/bash from pod <pod-name>. By default, output is from the first container.kubectl exec -ti <pod-name> -- /bin/bash
kubectl logs - Print the logs for a container in a pod.
# Return a snapshot of the logs from pod <pod-name>.kubectl logs <pod-name>
# Start streaming the logs from pod <pod-name>. This is similar to the 'tail -f' Linux command.kubectl logs -f <pod-name>
kubectl diff - View a diff of the proposed updates to a cluster.
# Diff resources included in "pod.json".kubectl diff -f pod.json
# Diff file read from stdin.cat service.yaml | kubectl diff -f -
Examples: Creating and using plugins
Use the following set of examples to help you familiarize yourself with writing and using kubectl plugins:
# create a simple plugin in any language and name the resulting executable file# so that it begins with the prefix "kubectl-"cat ./kubectl-hello
#!/bin/sh
# this plugin prints the words "hello world"echo"hello world"
With a plugin written, let's make it executable:
chmod a+x ./kubectl-hello
# and move it to a location in our PATHsudo mv ./kubectl-hello /usr/local/bin
sudo chown root:root /usr/local/bin
# You have now created and "installed" a kubectl plugin.# You can begin using this plugin by invoking it from kubectl as if it were a regular commandkubectl hello
hello world
# You can "uninstall" a plugin, by removing it from the folder in your# $PATH where you placed itsudo rm /usr/local/bin/kubectl-hello
In order to view all of the plugins that are available to kubectl, use
the kubectl plugin list subcommand:
kubectl plugin list
The output is similar to:
The following kubectl-compatible plugins are available:
/usr/local/bin/kubectl-hello
/usr/local/bin/kubectl-foo
/usr/local/bin/kubectl-bar
kubectl plugin list also warns you about plugins that are not
executable, or that are shadowed by other plugins; for example:
sudo chmod -x /usr/local/bin/kubectl-foo # remove execute permissionkubectl plugin list
The following kubectl-compatible plugins are available:
/usr/local/bin/kubectl-hello
/usr/local/bin/kubectl-foo
- warning: /usr/local/bin/kubectl-foo identified as a plugin, but it is not executable
/usr/local/bin/kubectl-bar
error: one plugin warning was found
You can think of plugins as a means to build more complex functionality on top
of the existing kubectl commands:
cat ./kubectl-whoami
The next few examples assume that you already made kubectl-whoami have
the following contents:
#!/bin/bash
# this plugin makes use of the `kubectl config` command in order to output# information about the current user, based on the currently selected contextkubectl config view --template='{{ range .contexts }}{{ if eq .name "'$(kubectl config current-context)'" }}Current user: {{ printf "%s\n" .context.user }}{{ end }}{{ end }}'
Running the above command gives you an output containing the user for the
current context in your KUBECONFIG file:
# make the file executablesudo chmod +x ./kubectl-whoami
# and move it into your PATHsudo mv ./kubectl-whoami /usr/local/bin
kubectl whoami
Current user: plugins-user
kubectl is the Kubernetes cli version of a swiss army knife, and can do many things.
While this Book is focused on using kubectl to declaratively manage applications in Kubernetes, it
also covers other kubectl functions.
Command Families
Most kubectl commands typically fall into one of a few categories:
Type
Used For
Description
Declarative Resource Management
Deployment and operations (e.g. GitOps)
Declaratively manage Kubernetes workloads using resource configuration
Imperative Resource Management
Development Only
Run commands to manage Kubernetes workloads using Command Line arguments and flags
Printing Workload State
Debugging
Print information about workloads
Interacting with Containers
Debugging
Exec, attach, cp, logs
Cluster Management
Cluster operations
Drain and cordon Nodes
Declarative Application Management
The preferred approach for managing resources is through
declarative files called resource configuration used with the kubectl Apply command.
This command reads a local (or remote) file structure and modifies cluster state to
reflect the declared intent.
Apply
Apply is the preferred mechanism for managing resources in a Kubernetes cluster.
Printing State about Workloads
Users will need to view workload state.
Printing summarize state and information about resources
Printing complete state and information about resources
Printing specific fields from resources
Query resources matching labels
Debugging Workloads
kubectl supports debugging by providing commands for:
Printing Container logs
Printing cluster events
Exec or attaching to a Container
Copying files from Containers in the cluster to a user's filesystem
Cluster Management
On occasion, users may need to perform operations to the Nodes of cluster. kubectl supports
commands to drain workloads from a Node so that it can be decommissioned or debugged.
Porcelain
Users may find using resource configuration overly verbose for development and prefer to work with
the cluster imperatively with a shell-like workflow. kubectl offers porcelain commands for
generating and modifying resources.
Generating + creating resources such as Deployments, StatefulSets, Services, ConfigMaps, etc.
Setting fields on resources
Editing (live) resources in a text editor
Porcelain for Dev Only
Porcelain commands are time saving for experimenting with workloads in a dev cluster, but shouldn't
be used for production.
11.2 - kubectl Quick Reference
This page contains a list of commonly used kubectl commands and flags.
Note:
These instructions are for Kubernetes v1.33. To check the version, use the kubectl version command.
Kubectl autocomplete
BASH
source <(kubectl completion bash)# set up autocomplete in bash into the current shell, bash-completion package should be installed first.echo"source <(kubectl completion bash)" >> ~/.bashrc # add autocomplete permanently to your bash shell.
You can also use a shorthand alias for kubectl that also works with completion:
aliask=kubectl
complete -o default -F __start_kubectl k
ZSH
source <(kubectl completion zsh)# set up autocomplete in zsh into the current shellecho'[[ $commands[kubectl] ]] && source <(kubectl completion zsh)' >> ~/.zshrc # add autocomplete permanently to your zsh shell
FISH
Note:
Requires kubectl version 1.23 or above.
echo'kubectl completion fish | source' > ~/.config/fish/completions/kubectl.fish &&source ~/.config/fish/completions/kubectl.fish
A note on --all-namespaces
Appending --all-namespaces happens frequently enough that you should be aware of the shorthand for --all-namespaces:
kubectl -A
Kubectl context and configuration
Set which Kubernetes cluster kubectl communicates with and modifies configuration
information. See Authenticating Across Clusters with kubeconfig documentation for
detailed config file information.
kubectl config view # Show Merged kubeconfig settings.# use multiple kubeconfig files at the same time and view merged configKUBECONFIG=~/.kube/config:~/.kube/kubconfig2
kubectl config view
# Show merged kubeconfig settings and raw certificate data and exposed secretskubectl config view --raw
# get the password for the e2e userkubectl config view -o jsonpath='{.users[?(@.name == "e2e")].user.password}'# get the certificate for the e2e userkubectl config view --raw -o jsonpath='{.users[?(.name == "e2e")].user.client-certificate-data}' | base64 -d
kubectl config view -o jsonpath='{.users[].name}'# display the first userkubectl config view -o jsonpath='{.users[*].name}'# get a list of userskubectl config get-contexts # display list of contextskubectl config get-contexts -o name # get all context nameskubectl config current-context # display the current-contextkubectl config use-context my-cluster-name # set the default context to my-cluster-namekubectl config set-cluster my-cluster-name # set a cluster entry in the kubeconfig# configure the URL to a proxy server to use for requests made by this client in the kubeconfigkubectl config set-cluster my-cluster-name --proxy-url=my-proxy-url
# add a new user to your kubeconf that supports basic authkubectl config set-credentials kubeuser/foo.kubernetes.com --username=kubeuser --password=kubepassword
# permanently save the namespace for all subsequent kubectl commands in that context.kubectl config set-context --current --namespace=ggckad-s2
# set a context utilizing a specific username and namespace.kubectl config set-context gce --user=cluster-admin --namespace=foo \
&& kubectl config use-context gce
kubectl config unset users.foo # delete user foo# short alias to set/show context/namespace (only works for bash and bash-compatible shells, current context to be set before using kn to set namespace)aliaskx='f() { [ "$1" ] && kubectl config use-context $1 || kubectl config current-context ; } ; f'aliaskn='f() { [ "$1" ] && kubectl config set-context --current --namespace $1 || kubectl config view --minify | grep namespace | cut -d" " -f6 ; } ; f'
Kubectl apply
apply manages applications through files defining Kubernetes resources. It creates and updates resources in a cluster through running kubectl apply. This is the recommended way of managing Kubernetes applications on production. See Kubectl Book.
Creating objects
Kubernetes manifests can be defined in YAML or JSON. The file extension .yaml,
.yml, and .json can be used.
kubectl apply -f ./my-manifest.yaml # create resource(s)kubectl apply -f ./my1.yaml -f ./my2.yaml # create from multiple fileskubectl apply -f ./dir # create resource(s) in all manifest files in dirkubectl apply -f https://example.com/manifest.yaml # create resource(s) from url (Note: this is an example domain and does not contain a valid manifest)kubectl create deployment nginx --image=nginx # start a single instance of nginx# create a Job which prints "Hello World"kubectl create job hello --image=busybox:1.28 -- echo"Hello World"# create a CronJob that prints "Hello World" every minutekubectl create cronjob hello --image=busybox:1.28 --schedule="*/1 * * * *" -- echo"Hello World"kubectl explain pods # get the documentation for pod manifests# Create multiple YAML objects from stdinkubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1000000"
---
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep-less
spec:
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1000"
EOF# Create a secret with several keyskubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
password: $(echo -n "s33msi4" | base64 -w0)
username: $(echo -n "jane" | base64 -w0)
EOF
Viewing and finding resources
# Get commands with basic outputkubectl get services # List all services in the namespacekubectl get pods --all-namespaces # List all pods in all namespaceskubectl get pods -o wide # List all pods in the current namespace, with more detailskubectl get deployment my-dep # List a particular deploymentkubectl get pods # List all pods in the namespacekubectl get pod my-pod -o yaml # Get a pod's YAML# Describe commands with verbose outputkubectl describe nodes my-node
kubectl describe pods my-pod
# List Services Sorted by Namekubectl get services --sort-by=.metadata.name
# List pods Sorted by Restart Countkubectl get pods --sort-by='.status.containerStatuses[0].restartCount'# List PersistentVolumes sorted by capacitykubectl get pv --sort-by=.spec.capacity.storage
# Get the version label of all pods with label app=cassandrakubectl get pods --selector=app=cassandra -o \
jsonpath='{.items[*].metadata.labels.version}'# Retrieve the value of a key with dots, e.g. 'ca.crt'kubectl get configmap myconfig \
-o jsonpath='{.data.ca\.crt}'# Retrieve a base64 encoded value with dashes instead of underscores.kubectl get secret my-secret --template='{{index .data "key-name-with-dashes"}}'# Get all worker nodes (use a selector to exclude results that have a label# named 'node-role.kubernetes.io/control-plane')kubectl get node --selector='!node-role.kubernetes.io/control-plane'# Get all running pods in the namespacekubectl get pods --field-selector=status.phase=Running
# Get ExternalIPs of all nodeskubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}'# List Names of Pods that belong to Particular RC# "jq" command useful for transformations that are too complex for jsonpath, it can be found at https://jqlang.github.io/jq/sel=${$(kubectl get rc my-rc --output=json | jq -j '.spec.selector | to_entries | .[] | "\(.key)=\(.value),"')%?}echo$(kubectl get pods --selector=$sel --output=jsonpath={.items..metadata.name})# Show labels for all pods (or any other Kubernetes object that supports labelling)kubectl get pods --show-labels
# Check which nodes are readyJSONPATH='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}'\
&& kubectl get nodes -o jsonpath="$JSONPATH" | grep "Ready=True"# Check which nodes are ready with custom-columnskubectl get node -o custom-columns='NODE_NAME:.metadata.name,STATUS:.status.conditions[?(@.type=="Ready")].status'# Output decoded secrets without external toolskubectl get secret my-secret -o go-template='{{range $k,$v := .data}}{{"### "}}{{$k}}{{"\n"}}{{$v|base64decode}}{{"\n\n"}}{{end}}'# List all Secrets currently in use by a podkubectl get pods -o json | jq '.items[].spec.containers[].env[]?.valueFrom.secretKeyRef.name' | grep -v null | sort | uniq
# List all containerIDs of initContainer of all pods# Helpful when cleaning up stopped containers, while avoiding removal of initContainers.kubectl get pods --all-namespaces -o jsonpath='{range .items[*].status.initContainerStatuses[*]}{.containerID}{"\n"}{end}' | cut -d/ -f3
# List Events sorted by timestampkubectl get events --sort-by=.metadata.creationTimestamp
# List all warning eventskubectl events --types=Warning
# Compares the current state of the cluster against the state that the cluster would be in if the manifest was applied.kubectl diff -f ./my-manifest.yaml
# Produce a period-delimited tree of all keys returned for nodes# Helpful when locating a key within a complex nested JSON structurekubectl get nodes -o json | jq -c 'paths|join(".")'# Produce a period-delimited tree of all keys returned for pods, etckubectl get pods -o json | jq -c 'paths|join(".")'# Produce ENV for all pods, assuming you have a default container for the pods, default namespace and the `env` command is supported.# Helpful when running any supported command across all pods, not just `env`for pod in $(kubectl get po --output=jsonpath={.items..metadata.name}); doecho$pod&& kubectl exec -it $pod -- env; done# Get a deployment's status subresourcekubectl get deployment nginx-deployment --subresource=status
Updating resources
kubectl set image deployment/frontend www=image:v2 # Rolling update "www" containers of "frontend" deployment, updating the imagekubectl rollout history deployment/frontend # Check the history of deployments including the revisionkubectl rollout undo deployment/frontend # Rollback to the previous deploymentkubectl rollout undo deployment/frontend --to-revision=2# Rollback to a specific revisionkubectl rollout status -w deployment/frontend # Watch rolling update status of "frontend" deployment until completionkubectl rollout restart deployment/frontend # Rolling restart of the "frontend" deploymentcat pod.json | kubectl replace -f - # Replace a pod based on the JSON passed into stdin# Force replace, delete and then re-create the resource. Will cause a service outage.kubectl replace --force -f ./pod.json
# Create a service for a replicated nginx, which serves on port 80 and connects to the containers on port 8000kubectl expose rc nginx --port=80 --target-port=8000# Update a single-container pod's image version (tag) to v4kubectl get pod mypod -o yaml | sed 's/\(image: myimage\):.*$/\1:v4/' | kubectl replace -f -
kubectl label pods my-pod new-label=awesome # Add a Labelkubectl label pods my-pod new-label- # Remove a labelkubectl label pods my-pod new-label=new-value --overwrite # Overwrite an existing valuekubectl annotate pods my-pod icon-url=http://goo.gl/XXBTWq # Add an annotationkubectl annotate pods my-pod icon-url- # Remove annotationkubectl autoscale deployment foo --min=2 --max=10# Auto scale a deployment "foo"
Patching resources
# Partially update a nodekubectl patch node k8s-node-1 -p '{"spec":{"unschedulable":true}}'# Update a container's image; spec.containers[*].name is required because it's a merge keykubectl patch pod valid-pod -p '{"spec":{"containers":[{"name":"kubernetes-serve-hostname","image":"new image"}]}}'# Update a container's image using a json patch with positional arrayskubectl patch pod valid-pod --type='json' -p='[{"op": "replace", "path": "/spec/containers/0/image", "value":"new image"}]'# Disable a deployment livenessProbe using a json patch with positional arrayskubectl patch deployment valid-deployment --type json -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/livenessProbe"}]'# Add a new element to a positional arraykubectl patch sa default --type='json' -p='[{"op": "add", "path": "/secrets/1", "value": {"name": "whatever" } }]'# Update a deployment's replica count by patching its scale subresourcekubectl patch deployment nginx-deployment --subresource='scale' --type='merge' -p '{"spec":{"replicas":2}}'
Editing resources
Edit any API resource in your preferred editor.
kubectl edit svc/docker-registry # Edit the service named docker-registryKUBE_EDITOR="nano" kubectl edit svc/docker-registry # Use an alternative editor
Scaling resources
kubectl scale --replicas=3 rs/foo # Scale a replicaset named 'foo' to 3kubectl scale --replicas=3 -f foo.yaml # Scale a resource specified in "foo.yaml" to 3kubectl scale --current-replicas=2 --replicas=3 deployment/mysql # If the deployment named mysql's current size is 2, scale mysql to 3kubectl scale --replicas=5 rc/foo rc/bar rc/baz # Scale multiple replication controllers
Deleting resources
kubectl delete -f ./pod.json # Delete a pod using the type and name specified in pod.jsonkubectl delete pod unwanted --now # Delete a pod with no grace periodkubectl delete pod,service baz foo # Delete pods and services with same names "baz" and "foo"kubectl delete pods,services -l name=myLabel # Delete pods and services with label name=myLabelkubectl -n my-ns delete pod,svc --all # Delete all pods and services in namespace my-ns,# Delete all pods matching the awk pattern1 or pattern2kubectl get pods -n mynamespace --no-headers=true | awk '/pattern1|pattern2/{print $1}' | xargs kubectl delete -n mynamespace pod
Interacting with running Pods
kubectl logs my-pod # dump pod logs (stdout)kubectl logs -l name=myLabel # dump pod logs, with label name=myLabel (stdout)kubectl logs my-pod --previous # dump pod logs (stdout) for a previous instantiation of a containerkubectl logs my-pod -c my-container # dump pod container logs (stdout, multi-container case)kubectl logs -l name=myLabel -c my-container # dump pod container logs, with label name=myLabel (stdout)kubectl logs my-pod -c my-container --previous # dump pod container logs (stdout, multi-container case) for a previous instantiation of a containerkubectl logs -f my-pod # stream pod logs (stdout)kubectl logs -f my-pod -c my-container # stream pod container logs (stdout, multi-container case)kubectl logs -f -l name=myLabel --all-containers # stream all pods logs with label name=myLabel (stdout)kubectl run -i --tty busybox --image=busybox:1.28 -- sh # Run pod as interactive shellkubectl run nginx --image=nginx -n mynamespace # Start a single instance of nginx pod in the namespace of mynamespacekubectl run nginx --image=nginx --dry-run=client -o yaml > pod.yaml
# Generate spec for running pod nginx and write it into a file called pod.yamlkubectl attach my-pod -i # Attach to Running Containerkubectl port-forward my-pod 5000:6000 # Listen on port 5000 on the local machine and forward to port 6000 on my-podkubectl exec my-pod -- ls / # Run command in existing pod (1 container case)kubectl exec --stdin --tty my-pod -- /bin/sh # Interactive shell access to a running pod (1 container case)kubectl exec my-pod -c my-container -- ls / # Run command in existing pod (multi-container case)kubectl debug my-pod -it --image=busybox:1.28 # Create an interactive debugging session within existing pod and immediately attach to itkubectl debug node/my-node -it --image=busybox:1.28 # Create an interactive debugging session on a node and immediately attach to itkubectl top pod # Show metrics for all pods in the default namespacekubectl top pod POD_NAME --containers # Show metrics for a given pod and its containerskubectl top pod POD_NAME --sort-by=cpu # Show metrics for a given pod and sort it by 'cpu' or 'memory'
Copying files and directories to and from containers
kubectl cp /tmp/foo_dir my-pod:/tmp/bar_dir # Copy /tmp/foo_dir local directory to /tmp/bar_dir in a remote pod in the current namespacekubectl cp /tmp/foo my-pod:/tmp/bar -c my-container # Copy /tmp/foo local file to /tmp/bar in a remote pod in a specific containerkubectl cp /tmp/foo my-namespace/my-pod:/tmp/bar # Copy /tmp/foo local file to /tmp/bar in a remote pod in namespace my-namespacekubectl cp my-namespace/my-pod:/tmp/foo /tmp/bar # Copy /tmp/foo from a remote pod to /tmp/bar locally
Note:
kubectl cp requires that the 'tar' binary is present in your container image. If 'tar' is not present, kubectl cp will fail.
For advanced use cases, such as symlinks, wildcard expansion or file mode preservation consider using kubectl exec.
tar cf - /tmp/foo | kubectl exec -i -n my-namespace my-pod -- tar xf - -C /tmp/bar # Copy /tmp/foo local file to /tmp/bar in a remote pod in namespace my-namespacekubectl exec -n my-namespace my-pod -- tar cf - /tmp/foo | tar xf - -C /tmp/bar # Copy /tmp/foo from a remote pod to /tmp/bar locally
Interacting with Deployments and Services
kubectl logs deploy/my-deployment # dump Pod logs for a Deployment (single-container case)kubectl logs deploy/my-deployment -c my-container # dump Pod logs for a Deployment (multi-container case)kubectl port-forward svc/my-service 5000# listen on local port 5000 and forward to port 5000 on Service backendkubectl port-forward svc/my-service 5000:my-service-port # listen on local port 5000 and forward to Service target port with name <my-service-port>kubectl port-forward deploy/my-deployment 5000:6000 # listen on local port 5000 and forward to port 6000 on a Pod created by <my-deployment>kubectl exec deploy/my-deployment -- ls # run command in first Pod and first container in Deployment (single- or multi-container cases)
Interacting with Nodes and cluster
kubectl cordon my-node # Mark my-node as unschedulablekubectl drain my-node # Drain my-node in preparation for maintenancekubectl uncordon my-node # Mark my-node as schedulablekubectl top node # Show metrics for all nodeskubectl top node my-node # Show metrics for a given nodekubectl cluster-info # Display addresses of the master and serviceskubectl cluster-info dump # Dump current cluster state to stdoutkubectl cluster-info dump --output-directory=/path/to/cluster-state # Dump current cluster state to /path/to/cluster-state# View existing taints on which exist on current nodes.kubectl get nodes -o='custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect'# If a taint with that key and effect already exists, its value is replaced as specified.kubectl taint nodes foo dedicated=special-user:NoSchedule
Resource types
List all supported resource types along with their shortnames, API group, whether they are namespaced, and kind:
kubectl api-resources
Other operations for exploring API resources:
kubectl api-resources --namespaced=true# All namespaced resourceskubectl api-resources --namespaced=false# All non-namespaced resourceskubectl api-resources -o name # All resources with simple output (only the resource name)kubectl api-resources -o wide # All resources with expanded (aka "wide") outputkubectl api-resources --verbs=list,get # All resources that support the "list" and "get" request verbskubectl api-resources --api-group=extensions # All resources in the "extensions" API group
Formatting output
To output details to your terminal window in a specific format, add the -o (or --output) flag to a supported kubectl command.
Output format
Description
-o=custom-columns=<spec>
Print a table using a comma separated list of custom columns
-o=custom-columns-file=<filename>
Print a table using the custom columns template in the <filename> file
Print the fields defined by the jsonpath expression in the <filename> file
-o=name
Print only the resource name and nothing else
-o=wide
Output in the plain-text format with any additional information, and for pods, the node name is included
-o=yaml
Output a YAML formatted API object
Examples using -o=custom-columns:
# All images running in a clusterkubectl get pods -A -o=custom-columns='DATA:spec.containers[*].image'# All images running in namespace: default, grouped by Podkubectl get pods --namespace default --output=custom-columns="NAME:.metadata.name,IMAGE:.spec.containers[*].image"# All images excluding "registry.k8s.io/coredns:1.6.2"kubectl get pods -A -o=custom-columns='DATA:spec.containers[?(@.image!="registry.k8s.io/coredns:1.6.2")].image'# All fields under metadata regardless of namekubectl get pods -A -o=custom-columns='DATA:metadata.*'
Kubectl verbosity is controlled with the -v or --v flags followed by an integer representing the log level. General Kubernetes logging conventions and the associated log levels are described here.
Verbosity
Description
--v=0
Generally useful for this to always be visible to a cluster operator.
--v=1
A reasonable default log level if you don't want verbosity.
--v=2
Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems.
--v=3
Extended information about changes.
--v=4
Debug level verbosity.
--v=5
Trace level verbosity.
--v=6
Display requested resources.
--v=7
Display HTTP request headers.
--v=8
Display HTTP request contents.
--v=9
Display HTTP request contents without truncation of contents.
Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--as-group strings
Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--as-uid string
UID to impersonate for the operation.
--cache-dir string Default: "$HOME/.kube/cache"
Default cache directory
--certificate-authority string
Path to a cert file for the certificate authority
--client-certificate string
Path to a client certificate file for TLS
--client-key string
Path to a client key file for TLS
--cluster string
The name of the kubeconfig cluster to use
--context string
The name of the kubeconfig context to use
--default-not-ready-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for unreachable:NoExecute that is added by default to every pod that does not already have such a toleration.
--disable-compression
If true, opt-out of response compression for all requests to the server
-h, --help
help for kubectl
--insecure-skip-tls-verify
If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
--kubeconfig string
Path to the kubeconfig file to use for CLI requests.
--match-server-version
Require server version to match client version
-n, --namespace string
If present, the namespace scope for this CLI request
--password string
Password for basic authentication to the API server
--profile string Default: "none"
Name of profile to capture. One of (none|cpu|heap|goroutine|threadcreate|block|mutex)
--profile-output string Default: "profile.pprof"
Name of the file to write the profile to
--request-timeout string Default: "0"
The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests.
kubectl version - Print the client and server version information
kubectl wait - Experimental: Wait for a specific condition on one or many resources
11.3.2 - kubectl annotate
Synopsis
Update the annotations on one or more resources.
All Kubernetes objects support the ability to store additional data with the object as annotations. Annotations are key/value pairs that can be larger than labels and include arbitrary string values such as structured JSON. Tools and system extensions may use annotations to store their own data.
Attempting to set an annotation that already exists will fail unless --overwrite is set. If --resource-version is specified and does not match the current resource version on the server the command will fail.
Use "kubectl api-resources" for a complete list of supported resources.
# Update pod 'foo' with the annotation 'description' and the value 'my frontend'
# If the same annotation is set multiple times, only the last value will be applied
kubectl annotate pods foo description='my frontend'
# Update a pod identified by type and name in "pod.json"
kubectl annotate -f pod.json description='my frontend'
# Update pod 'foo' with the annotation 'description' and the value 'my frontend running nginx', overwriting any existing value
kubectl annotate --overwrite pods foo description='my frontend running nginx'
# Update all pods in the namespace
kubectl annotate pods --all description='my frontend running nginx'
# Update pod 'foo' only if the resource is unchanged from version 1
kubectl annotate pods foo description='my frontend running nginx' --resource-version=1
# Update pod 'foo' by removing an annotation named 'description' if it exists
# Does not require the --overwrite flag
kubectl annotate pods foo description-
Options
--all
Select all resources, in the namespace of the specified resource types.
-A, --all-namespaces
If true, check the specified action in all namespaces.
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
--dry-run string[="unchanged"] Default: "none"
Must be "none", "server", or "client". If client strategy, only print the object that would be sent, without sending it. If server strategy, submit server-side request without persisting the resource.
Name of the manager used to track field ownership.
--field-selector string
Selector (field query) to filter on, supports '=', '==', and '!='.(e.g. --field-selector key1=value1,key2=value2). The server only supports a limited number of field queries per type.
-f, --filename strings
Filename, directory, or URL to files identifying the resource to update the annotation
-h, --help
help for annotate
-k, --kustomize string
Process the kustomization directory. This flag can't be used together with -f or -R.
--list
If true, display the annotations for a given resource.
--local
If true, annotation will NOT contact api-server but run locally.
If true, allow annotations to be overwritten, otherwise reject annotation updates that overwrite existing annotations.
-R, --recursive
Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory.
--resource-version string
If non-empty, the annotation update will only succeed if this is the current resource-version for the object. Only valid when specifying a single resource.
-l, --selector string
Selector (label query) to filter on, supports '=', '==', '!=', 'in', 'notin'.(e.g. -l key1=value1,key2=value2,key3 in (value3)). Matching objects must satisfy all of the specified label constraints.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
--template string
Template string or path to template file to use when -o=go-template, -o=go-template-file. The template format is golang templates [http://golang.org/pkg/text/template/#pkg-overview].
Parent Options Inherited
--as string
Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--as-group strings
Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--as-uid string
UID to impersonate for the operation.
--cache-dir string Default: "$HOME/.kube/cache"
Default cache directory
--certificate-authority string
Path to a cert file for the certificate authority
--client-certificate string
Path to a client certificate file for TLS
--client-key string
Path to a client key file for TLS
--cluster string
The name of the kubeconfig cluster to use
--context string
The name of the kubeconfig context to use
--default-not-ready-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for unreachable:NoExecute that is added by default to every pod that does not already have such a toleration.
--disable-compression
If true, opt-out of response compression for all requests to the server
--insecure-skip-tls-verify
If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
--kubeconfig string
Path to the kubeconfig file to use for CLI requests.
--match-server-version
Require server version to match client version
-n, --namespace string
If present, the namespace scope for this CLI request
--password string
Password for basic authentication to the API server
--profile string Default: "none"
Name of profile to capture. One of (none|cpu|heap|goroutine|threadcreate|block|mutex)
--profile-output string Default: "profile.pprof"
Name of the file to write the profile to
--request-timeout string Default: "0"
The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests.
Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used
--token string
Bearer token for authentication to the API server
--user string
The name of the kubeconfig user to use
--username string
Username for basic authentication to the API server
--version version[=true]
--version, --version=raw prints version information and quits; --version=vX.Y.Z... sets the reported version
--warnings-as-errors
Treat warnings received from the server as errors and exit with a non-zero exit code
See Also
kubectl - kubectl controls the Kubernetes cluster manager
11.3.3 - kubectl api-resources
Synopsis
Print the supported API resources on the server.
kubectl api-resources [flags]
Examples
# Print the supported API resources
kubectl api-resources
# Print the supported API resources with more information
kubectl api-resources -o wide
# Print the supported API resources sorted by a column
kubectl api-resources --sort-by=name
# Print the supported namespaced resources
kubectl api-resources --namespaced=true
# Print the supported non-namespaced resources
kubectl api-resources --namespaced=false
# Print the supported API resources with a specific APIGroup
kubectl api-resources --api-group=rbac.authorization.k8s.io
Options
--api-group string
Limit to resources in the specified API group.
--cached
Use the cached list of resources if available.
--categories strings
Limit to resources that belong to the specified categories.
-h, --help
help for api-resources
--namespaced Default: true
If false, non-namespaced resources will be returned, otherwise returning namespaced resources by default.
--no-headers
When using the default or custom-column output format, don't print headers (default print headers).
-o, --output string
Output format. One of: (wide, name).
--sort-by string
If non-empty, sort list of resources using specified field. The field can be either 'name' or 'kind'.
--verbs strings
Limit to resources that support the specified verbs.
Parent Options Inherited
--as string
Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--as-group strings
Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--as-uid string
UID to impersonate for the operation.
--cache-dir string Default: "$HOME/.kube/cache"
Default cache directory
--certificate-authority string
Path to a cert file for the certificate authority
--client-certificate string
Path to a client certificate file for TLS
--client-key string
Path to a client key file for TLS
--cluster string
The name of the kubeconfig cluster to use
--context string
The name of the kubeconfig context to use
--default-not-ready-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for unreachable:NoExecute that is added by default to every pod that does not already have such a toleration.
--disable-compression
If true, opt-out of response compression for all requests to the server
--insecure-skip-tls-verify
If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
--kubeconfig string
Path to the kubeconfig file to use for CLI requests.
--match-server-version
Require server version to match client version
-n, --namespace string
If present, the namespace scope for this CLI request
--password string
Password for basic authentication to the API server
--profile string Default: "none"
Name of profile to capture. One of (none|cpu|heap|goroutine|threadcreate|block|mutex)
--profile-output string Default: "profile.pprof"
Name of the file to write the profile to
--request-timeout string Default: "0"
The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests.
Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used
--token string
Bearer token for authentication to the API server
--user string
The name of the kubeconfig user to use
--username string
Username for basic authentication to the API server
--version version[=true]
--version, --version=raw prints version information and quits; --version=vX.Y.Z... sets the reported version
--warnings-as-errors
Treat warnings received from the server as errors and exit with a non-zero exit code
See Also
kubectl - kubectl controls the Kubernetes cluster manager
11.3.4 - kubectl api-versions
Synopsis
Print the supported API versions on the server, in the form of "group/version".
kubectl api-versions
Examples
# Print the supported API versions
kubectl api-versions
Options
-h, --help
help for api-versions
Parent Options Inherited
--as string
Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--as-group strings
Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--as-uid string
UID to impersonate for the operation.
--cache-dir string Default: "$HOME/.kube/cache"
Default cache directory
--certificate-authority string
Path to a cert file for the certificate authority
--client-certificate string
Path to a client certificate file for TLS
--client-key string
Path to a client key file for TLS
--cluster string
The name of the kubeconfig cluster to use
--context string
The name of the kubeconfig context to use
--default-not-ready-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for unreachable:NoExecute that is added by default to every pod that does not already have such a toleration.
--disable-compression
If true, opt-out of response compression for all requests to the server
--insecure-skip-tls-verify
If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
--kubeconfig string
Path to the kubeconfig file to use for CLI requests.
--match-server-version
Require server version to match client version
-n, --namespace string
If present, the namespace scope for this CLI request
--password string
Password for basic authentication to the API server
--profile string Default: "none"
Name of profile to capture. One of (none|cpu|heap|goroutine|threadcreate|block|mutex)
--profile-output string Default: "profile.pprof"
Name of the file to write the profile to
--request-timeout string Default: "0"
The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests.
Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used
--token string
Bearer token for authentication to the API server
--user string
The name of the kubeconfig user to use
--username string
Username for basic authentication to the API server
--version version[=true]
--version, --version=raw prints version information and quits; --version=vX.Y.Z... sets the reported version
--warnings-as-errors
Treat warnings received from the server as errors and exit with a non-zero exit code
See Also
kubectl - kubectl controls the Kubernetes cluster manager
11.3.5 - kubectl apply
Synopsis
Apply a configuration to a resource by file name or stdin. The resource name must be specified. This resource will be created if it doesn't exist yet. To use 'apply', always create the resource initially with either 'apply' or 'create --save-config'.
JSON and YAML formats are accepted.
Alpha Disclaimer: the --prune functionality is not yet complete. Do not use unless you are aware of what the current state is. See https://issues.k8s.io/34274.
kubectl apply (-f FILENAME | -k DIRECTORY)
Examples
# Apply the configuration in pod.json to a pod
kubectl apply -f ./pod.json
# Apply resources from a directory containing kustomization.yaml - e.g. dir/kustomization.yaml
kubectl apply -k dir/
# Apply the JSON passed into stdin to a pod
cat pod.json | kubectl apply -f -
# Apply the configuration from all files that end with '.json'
kubectl apply -f '*.json'
# Note: --prune is still in Alpha
# Apply the configuration in manifest.yaml that matches label app=nginx and delete all other resources that are not in the file and match label app=nginx
kubectl apply --prune -f manifest.yaml -l app=nginx
# Apply the configuration in manifest.yaml and delete all the other config maps that are not in the file
kubectl apply --prune -f manifest.yaml --all --prune-allowlist=core/v1/ConfigMap
Options
--all
Select all resources in the namespace of the specified resource types.
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
Must be "background", "orphan", or "foreground". Selects the deletion cascading strategy for the dependents (e.g. Pods created by a ReplicationController). Defaults to background.
--dry-run string[="unchanged"] Default: "none"
Must be "none", "server", or "client". If client strategy, only print the object that would be sent, without sending it. If server strategy, submit server-side request without persisting the resource.
Name of the manager used to track field ownership.
-f, --filename strings
The files that contain the configurations to apply.
--force
If true, immediately remove resources from API and bypass graceful deletion. Note that immediate deletion of some resources may result in inconsistency or data loss and requires confirmation.
--force-conflicts
If true, server-side apply will force the changes against conflicts.
--grace-period int Default: -1
Period of time in seconds given to the resource to terminate gracefully. Ignored if negative. Set to 1 for immediate shutdown. Can only be set to 0 when --force is true (force deletion).
-h, --help
help for apply
-k, --kustomize string
Process a kustomization directory. This flag can't be used together with -f or -R.
--openapi-patch Default: true
If true, use openapi to calculate diff when the openapi presents and the resource can be found in the openapi spec. Otherwise, fall back to use baked-in types.
Automatically resolve conflicts between the modified and live configuration by using values from the modified configuration
--prune
Automatically delete resource objects, that do not appear in the configs and are created by either apply or create --save-config. Should be used with either -l or --all.
--prune-allowlist strings
Overwrite the default allowlist with <group/version/kind> for --prune
-R, --recursive
Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory.
-l, --selector string
Selector (label query) to filter on, supports '=', '==', '!=', 'in', 'notin'.(e.g. -l key1=value1,key2=value2,key3 in (value3)). Matching objects must satisfy all of the specified label constraints.
--server-side
If true, apply runs in the server instead of the client.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
--subresource string
If specified, apply will operate on the subresource of the requested object. Only allowed when using --server-side.
--template string
Template string or path to template file to use when -o=go-template, -o=go-template-file. The template format is golang templates [http://golang.org/pkg/text/template/#pkg-overview].
--timeout duration
The length of time to wait before giving up on a delete, zero means determine a timeout from the size of the object
--validate string[="strict"] Default: "strict"
Must be one of: strict (or true), warn, ignore (or false). "true" or "strict" will use a schema to validate the input and fail the request if invalid. It will perform server side validation if ServerSideFieldValidation is enabled on the api-server, but will fall back to less reliable client-side validation if not. "warn" will warn about unknown or duplicate fields without blocking the request if server-side field validation is enabled on the API server, and behave as "ignore" otherwise. "false" or "ignore" will not perform any schema validation, silently dropping any unknown or duplicate fields.
--wait
If true, wait for resources to be gone before returning. This waits for finalizers.
Parent Options Inherited
--as string
Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--as-group strings
Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--as-uid string
UID to impersonate for the operation.
--cache-dir string Default: "$HOME/.kube/cache"
Default cache directory
--certificate-authority string
Path to a cert file for the certificate authority
--client-certificate string
Path to a client certificate file for TLS
--client-key string
Path to a client key file for TLS
--cluster string
The name of the kubeconfig cluster to use
--context string
The name of the kubeconfig context to use
--default-not-ready-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for unreachable:NoExecute that is added by default to every pod that does not already have such a toleration.
--disable-compression
If true, opt-out of response compression for all requests to the server
--insecure-skip-tls-verify
If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
--kubeconfig string
Path to the kubeconfig file to use for CLI requests.
--match-server-version
Require server version to match client version
-n, --namespace string
If present, the namespace scope for this CLI request
--password string
Password for basic authentication to the API server
--profile string Default: "none"
Name of profile to capture. One of (none|cpu|heap|goroutine|threadcreate|block|mutex)
--profile-output string Default: "profile.pprof"
Name of the file to write the profile to
--request-timeout string Default: "0"
The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests.
Edit the latest last-applied-configuration annotations of resources from the default editor.
The edit-last-applied command allows you to directly edit any API resource you can retrieve via the command-line tools. It will open the editor defined by your KUBE_EDITOR, or EDITOR environment variables, or fall back to 'vi' for Linux or 'notepad' for Windows. You can edit multiple objects, although changes are applied one at a time. The command accepts file names as well as command-line arguments, although the files you point to must be previously saved versions of resources.
The default format is YAML. To edit in JSON, specify "-o json".
The flag --windows-line-endings can be used to force Windows line endings, otherwise the default for your operating system will be used.
In the event an error occurs while updating, a temporary file will be created on disk that contains your unapplied changes. The most common error when updating a resource is another editor changing the resource on the server. When this occurs, you will have to apply your changes to the newer version of the resource, or update your temporary saved copy to include the latest resource version.
# Edit the last-applied-configuration annotations by type/name in YAML
kubectl apply edit-last-applied deployment/nginx
# Edit the last-applied-configuration annotations by file in JSON
kubectl apply edit-last-applied -f deploy.yaml -o json
Options
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
Process the directory used in -f, --filename recursively. Useful when you want to manage related manifests organized within the same directory.
--show-managed-fields
If true, keep the managedFields when printing objects in JSON or YAML format.
--template string
Template string or path to template file to use when -o=go-template, -o=go-template-file. The template format is golang templates [http://golang.org/pkg/text/template/#pkg-overview].
--validate string[="strict"] Default: "strict"
Must be one of: strict (or true), warn, ignore (or false). "true" or "strict" will use a schema to validate the input and fail the request if invalid. It will perform server side validation if ServerSideFieldValidation is enabled on the api-server, but will fall back to less reliable client-side validation if not. "warn" will warn about unknown or duplicate fields without blocking the request if server-side field validation is enabled on the API server, and behave as "ignore" otherwise. "false" or "ignore" will not perform any schema validation, silently dropping any unknown or duplicate fields.
--windows-line-endings
Defaults to the line ending native to your platform.
Parent Options Inherited
--as string
Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--as-group strings
Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--as-uid string
UID to impersonate for the operation.
--cache-dir string Default: "$HOME/.kube/cache"
Default cache directory
--certificate-authority string
Path to a cert file for the certificate authority
--client-certificate string
Path to a client certificate file for TLS
--client-key string
Path to a client key file for TLS
--cluster string
The name of the kubeconfig cluster to use
--context string
The name of the kubeconfig context to use
--default-not-ready-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int Default: 300
Indicates the tolerationSeconds of the toleration for unreachable:NoExecute that is added by default to every pod that does not already have such a toleration.
--disable-compression
If true, opt-out of response compression for all requests to the server
--insecure-skip-tls-verify
If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
--kubeconfig string
Path to the kubeconfig file to use for CLI requests.
--match-server-version
Require server version to match client version
-n, --namespace string
If present, the namespace scope for this CLI request
--password string
Password for basic authentication to the API server
--profile string Default: "none"
Name of profile to capture. One of (none|cpu|heap|goroutine|threadcreate|block|mutex)
--profile-output string Default: "profile.pprof"
Name of the file to write the profile to
--request-timeout string Default: "0"
The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests.
Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used
--token string
Bearer token for authentication to the API server
--user string
The name of the kubeconfig user to use
--username string
Username for basic authentication to the API server
--version version[=true]
--version, --version=raw prints version information and quits; --version=vX.Y.Z... sets the reported version
--warnings-as-errors
Treat warnings received from the server as errors and exit with a non-zero exit code
See Also
kubectl apply - Apply a configuration to a resource by file name or stdin
11.3.5.2 - kubectl apply set-last-applied
Synopsis
Set the latest last-applied-configuration annotations by setting it to match the contents of a file. This results in the last-applied-configuration being updated as though 'kubectl apply -f<file> ' was run, without updating any other parts of the object.
kubectl apply set-last-applied -f FILENAME
Examples
# Set the last-applied-configuration of a resource to match the contents of a file
kubectl apply set-last-applied -f deploy.yaml
# Execute set-last-applied against each configuration file in a directory
kubectl apply set-last-applied -f path/
# Set the last-applied-configuration of a resource to match the contents of a file; will create the annotation if it does not already exist
kubectl apply set-last-applied -f deploy.yaml --create-annotation=true
Options
--allow-missing-template-keys Default: true
If true, ignore any errors in templates when a field or map key is missing in the template. Only applies to golang and jsonpath output formats.
--create-annotation
Will create 'last-applied-configuration' annotations if current objects doesn't have one
--dry-run string[="unchanged"] Default: "none"
Must be "none", "server", or "client". If client strategy, only print the object that would be sent, without sending it. If server strategy, submit server-side request without persisting the resource.
-f, --filename strings
Filename, directory, or URL to files that contains the last-applied-configuration annotations