This is the multi-page printable view of this section. Click here to print.
1 - Limit Ranges
By default, containers run with unbounded compute resources on a Kubernetes cluster. With resource quotas, cluster administrators can restrict resource consumption and creation on a namespace basis. Within a namespace, a Pod or Container can consume as much CPU and memory as defined by the namespace's resource quota. There is a concern that one Pod or Container could monopolize all available resources. A LimitRange is a policy to constrain resource allocations (to Pods or Containers) in a namespace.
A LimitRange provides constraints that can:
- Enforce minimum and maximum compute resources usage per Pod or Container in a namespace.
- Enforce minimum and maximum storage request per PersistentVolumeClaim in a namespace.
- Enforce a ratio between request and limit for a resource in a namespace.
- Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime.
LimitRange support has been enabled by default since Kubernetes 1.10.
A LimitRange is enforced in a particular namespace when there is a LimitRange object in that namespace.
The name of a LimitRange object must be a valid DNS subdomain name.
Overview of Limit Range
- The administrator creates one LimitRange in one namespace.
- Users create resources like Pods, Containers, and PersistentVolumeClaims in the namespace.
LimitRangeradmission controller enforces defaults and limits for all Pods and Containers that do not set compute resource requirements and tracks usage to ensure it does not exceed resource minimum, maximum and ratio defined in any LimitRange present in the namespace.
- If creating or updating a resource (Pod, Container, PersistentVolumeClaim) that violates a LimitRange constraint, the request to the API server will fail with an HTTP status code
403 FORBIDDENand a message explaining the constraint that have been violated.
- If a LimitRange is activated in a namespace for compute resources like
memory, users must specify requests or limits for those values. Otherwise, the system may reject Pod creation.
- LimitRange validations occurs only at Pod Admission stage, not on Running Pods.
Examples of policies that could be created using limit range are:
- In a 2 node cluster with a capacity of 8 GiB RAM and 16 cores, constrain Pods in a namespace to request 100m of CPU with a max limit of 500m for CPU and request 200Mi for Memory with a max limit of 600Mi for Memory.
- Define default CPU limit and request to 150m and memory default request to 300Mi for Containers started with no cpu and memory requests in their specs.
In the case where the total limits of the namespace is less than the sum of the limits of the Pods/Containers, there may be contention for resources. In this case, the Containers or Pods will not be created.
Neither contention nor changes to a LimitRange will affect already created resources.
Refer to the LimitRanger design document for more information.
For examples on using limits, see:
- how to configure minimum and maximum CPU constraints per namespace.
- how to configure minimum and maximum Memory constraints per namespace.
- how to configure default CPU Requests and Limits per namespace.
- how to configure default Memory Requests and Limits per namespace.
- how to configure minimum and maximum Storage consumption per namespace.
- a detailed example on configuring quota per namespace.
2 - Resource Quotas
When several users or teams share a cluster with a fixed number of nodes, there is a concern that one team could use more than its fair share of resources.
Resource quotas are a tool for administrators to address this concern.
A resource quota, defined by a
ResourceQuota object, provides constraints that limit
aggregate resource consumption per namespace. It can limit the quantity of objects that can
be created in a namespace by type, as well as the total amount of compute resources that may
be consumed by resources in that namespace.
Resource quotas work like this:
Different teams work in different namespaces. Currently this is voluntary, but support for making this mandatory via ACLs is planned.
The administrator creates one ResourceQuota for each namespace.
Users create resources (pods, services, etc.) in the namespace, and the quota system tracks usage to ensure it does not exceed hard resource limits defined in a ResourceQuota.
If creating or updating a resource violates a quota constraint, the request will fail with HTTP status code
403 FORBIDDENwith a message explaining the constraint that would have been violated.
If quota is enabled in a namespace for compute resources like
memory, users must specify requests or limits for those values; otherwise, the quota system may reject pod creation. Hint: Use the
LimitRangeradmission controller to force defaults for pods that make no compute resource requirements.
See the walkthrough for an example of how to avoid this problem.
The name of a ResourceQuota object must be a valid DNS subdomain name.
Examples of policies that could be created using namespaces and quotas are:
- In a cluster with a capacity of 32 GiB RAM, and 16 cores, let team A use 20 GiB and 10 cores, let B use 10GiB and 4 cores, and hold 2GiB and 2 cores in reserve for future allocation.
- Limit the "testing" namespace to using 1 core and 1GiB RAM. Let the "production" namespace use any amount.
In the case where the total capacity of the cluster is less than the sum of the quotas of the namespaces, there may be contention for resources. This is handled on a first-come-first-served basis.
Neither contention nor changes to quota will affect already created resources.
Enabling Resource Quota
Resource Quota support is enabled by default for many Kubernetes distributions. It is
enabled when the API server
--enable-admission-plugins= flag has
one of its arguments.
A resource quota is enforced in a particular namespace when there is a ResourceQuota in that namespace.
Compute Resource Quota
You can limit the total sum of compute resources that can be requested in a given namespace.
The following resource types are supported:
||Across all pods in a non-terminal state, the sum of CPU limits cannot exceed this value.|
||Across all pods in a non-terminal state, the sum of memory limits cannot exceed this value.|
||Across all pods in a non-terminal state, the sum of CPU requests cannot exceed this value.|
||Across all pods in a non-terminal state, the sum of memory requests cannot exceed this value.|
||Across all pods in a non-terminal state, the number of huge page requests of the specified size cannot exceed this value.|
Resource Quota For Extended Resources
In addition to the resources mentioned above, in release 1.10, quota support for extended resources is added.
As overcommit is not allowed for extended resources, it makes no sense to specify both
limits for the same extended resource in a quota. So for extended resources, only quota items
requests. is allowed for now.
Take the GPU resource as an example, if the resource name is
nvidia.com/gpu, and you want to
limit the total number of GPUs requested in a namespace to 4, you can define a quota as follows:
See Viewing and Setting Quotas for more detail information.
Storage Resource Quota
You can limit the total sum of storage resources that can be requested in a given namespace.
In addition, you can limit consumption of storage resources based on associated storage-class.
||Across all persistent volume claims, the sum of storage requests cannot exceed this value.|
||The total number of PersistentVolumeClaims that can exist in the namespace.|
||Across all persistent volume claims associated with the
||Across all persistent volume claims associated with the storage-class-name, the total number of persistent volume claims that can exist in the namespace.|
For example, if an operator wants to quota storage with
gold storage class separate from
bronze storage class, the operator can
define a quota as follows:
In release 1.8, quota support for local ephemeral storage is added as an alpha feature:
||Across all pods in the namespace, the sum of local ephemeral storage requests cannot exceed this value.|
||Across all pods in the namespace, the sum of local ephemeral storage limits cannot exceed this value.|
Object Count Quota
You can set quota for the total number of certain resources of all standard, namespaced resource types using the following syntax:
count/<resource>.<group>for resources from non-core groups
count/<resource>for resources from the core group
Here is an example set of resources users may want to put under object count quota:
The same syntax can be used for custom resources.
For example, to create a quota on a
widgets custom resource in the
example.com API group, use
count/* resource quota, an object is charged against the quota if it exists in server storage.
These types of quotas are useful to protect against exhaustion of storage resources. For example, you may
want to limit the number of Secrets in a server given their large size. Too many Secrets in a cluster can
actually prevent servers and controllers from starting. You can set a quota for Jobs to protect against
a poorly configured CronJob. CronJobs that create too many Jobs in a namespace can lead to a denial of service.
It is also possible to do generic object count quota on a limited set of resources. The following types are supported:
||The total number of ConfigMaps that can exist in the namespace.|
||The total number of PersistentVolumeClaims that can exist in the namespace.|
||The total number of Pods in a non-terminal state that can exist in the namespace. A pod is in a terminal state if
||The total number of ReplicationControllers that can exist in the namespace.|
||The total number of ResourceQuotas that can exist in the namespace.|
||The total number of Services that can exist in the namespace.|
||The total number of Services of type
||The total number of Services of type
||The total number of Secrets that can exist in the namespace.|
pods quota counts and enforces a maximum on the number of
created in a single namespace that are not terminal. You might want to set a
quota on a namespace to avoid the case where a user creates many small pods and
exhausts the cluster's supply of Pod IPs.
Each quota can have an associated set of
scopes. A quota will only measure usage for a resource if it matches
the intersection of enumerated scopes.
When a scope is added to the quota, it limits the number of resources it supports to those that pertain to the scope. Resources specified on the quota outside of the allowed set results in a validation error.
||Match pods where
||Match pods where
||Match pods that have best effort quality of service.|
||Match pods that do not have best effort quality of service.|
||Match pods that references the specified priority class.|
||Match pods that have cross-namespace pod (anti)affinity terms.|
BestEffort scope restricts a quota to tracking the following resource:
scopes restrict a quota to tracking the following resources:
Note that you cannot specify both the
Terminating and the
scopes in the same quota, and you cannot specify both the
NotBestEffort scopes in the same quota either.
scopeSelector supports the following values in the
When using one of the following values as the
scopeName when defining the
operator must be
values field must have at least
one value. For example:
scopeSelector: matchExpressions: - scopeName: PriorityClass operator: In values: - middle
values field must NOT be
Resource Quota Per PriorityClass
Kubernetes v1.17 [stable]
Pods can be created at a specific priority.
You can control a pod's consumption of system resources based on a pod's priority, by using the
field in the quota spec.
A quota is matched and consumed only if
scopeSelector in the quota spec selects the pod.
When quota is scoped for priority class using
scopeSelector field, quota object
is restricted to track only following resources:
This example creates a quota object and matches it with pods at specific priorities. The example works as follows:
- Pods in the cluster have one of the three priority classes, "low", "medium", "high".
- One quota object is created for each priority.
Save the following YAML to a file
apiVersion: v1 kind: List items: - apiVersion: v1 kind: ResourceQuota metadata: name: pods-high spec: hard: cpu: "1000" memory: 200Gi pods: "10" scopeSelector: matchExpressions: - operator : In scopeName: PriorityClass values: ["high"] - apiVersion: v1 kind: ResourceQuota metadata: name: pods-medium spec: hard: cpu: "10" memory: 20Gi pods: "10" scopeSelector: matchExpressions: - operator : In scopeName: PriorityClass values: ["medium"] - apiVersion: v1 kind: ResourceQuota metadata: name: pods-low spec: hard: cpu: "5" memory: 10Gi pods: "10" scopeSelector: matchExpressions: - operator : In scopeName: PriorityClass values: ["low"]
Apply the YAML using
kubectl create -f ./quota.yml
resourcequota/pods-high created resourcequota/pods-medium created resourcequota/pods-low created
Used quota is
kubectl describe quota.
kubectl describe quota
Name: pods-high Namespace: default Resource Used Hard -------- ---- ---- cpu 0 1k memory 0 200Gi pods 0 10 Name: pods-low Namespace: default Resource Used Hard -------- ---- ---- cpu 0 5 memory 0 10Gi pods 0 10 Name: pods-medium Namespace: default Resource Used Hard -------- ---- ---- cpu 0 10 memory 0 20Gi pods 0 10
Create a pod with priority "high". Save the following YAML to a
apiVersion: v1 kind: Pod metadata: name: high-priority spec: containers: - name: high-priority image: ubuntu command: ["/bin/sh"] args: ["-c", "while true; do echo hello; sleep 10;done"] resources: requests: memory: "10Gi" cpu: "500m" limits: memory: "10Gi" cpu: "500m" priorityClassName: high
Apply it with
kubectl create -f ./high-priority-pod.yml
Verify that "Used" stats for "high" priority quota,
pods-high, has changed and that
the other two quotas are unchanged.
kubectl describe quota
Name: pods-high Namespace: default Resource Used Hard -------- ---- ---- cpu 500m 1k memory 10Gi 200Gi pods 1 10 Name: pods-low Namespace: default Resource Used Hard -------- ---- ---- cpu 0 5 memory 0 10Gi pods 0 10 Name: pods-medium Namespace: default Resource Used Hard -------- ---- ---- cpu 0 10 memory 0 20Gi pods 0 10
Cross-namespace Pod Affinity Quota
Kubernetes v1.24 [stable]
Operators can use
CrossNamespacePodAffinity quota scope to limit which namespaces are allowed to
have pods with affinity terms that cross namespaces. Specifically, it controls which pods are allowed
namespaceSelector fields in pod affinity terms.
Preventing users from using cross-namespace affinity terms might be desired since a pod with anti-affinity constraints can block pods from all other namespaces from getting scheduled in a failure domain.
Using this scope operators can prevent certain namespaces (
foo-ns in the example below)
from having pods that use cross-namespace pod affinity by creating a resource quota object in
that namespace with
CrossNamespaceAffinity scope and hard limit of 0:
apiVersion: v1 kind: ResourceQuota metadata: name: disable-cross-namespace-affinity namespace: foo-ns spec: hard: pods: "0" scopeSelector: matchExpressions: - scopeName: CrossNamespaceAffinity
If operators want to disallow using
namespaceSelector by default, and
only allow it for specific namespaces, they could configure
as a limited resource by setting the kube-apiserver flag --admission-control-config-file
to the path of the following configuration file:
apiVersion: apiserver.config.k8s.io/v1 kind: AdmissionConfiguration plugins: - name: "ResourceQuota" configuration: apiVersion: apiserver.config.k8s.io/v1 kind: ResourceQuotaConfiguration limitedResources: - resource: pods matchScopes: - scopeName: CrossNamespaceAffinity
With the above configuration, pods can use
namespaceSelector in pod affinity only
if the namespace where they are created have a resource quota object with
CrossNamespaceAffinity scope and a hard limit greater than or equal to the number of pods using those fields.
Requests compared to Limits
When allocating compute resources, each container may specify a request and a limit value for either CPU or memory. The quota can be configured to quota either value.
If the quota has a value specified for
requests.memory, then it requires that every incoming
container makes an explicit request for those resources. If the quota has a value specified for
then it requires that every incoming container specifies an explicit limit for those resources.
Viewing and Setting Quotas
Kubectl supports creating, updating, and viewing quotas:
kubectl create namespace myspace
cat <<EOF > compute-resources.yaml apiVersion: v1 kind: ResourceQuota metadata: name: compute-resources spec: hard: requests.cpu: "1" requests.memory: 1Gi limits.cpu: "2" limits.memory: 2Gi requests.nvidia.com/gpu: 4 EOF
kubectl create -f ./compute-resources.yaml --namespace=myspace
cat <<EOF > object-counts.yaml apiVersion: v1 kind: ResourceQuota metadata: name: object-counts spec: hard: configmaps: "10" persistentvolumeclaims: "4" pods: "4" replicationcontrollers: "20" secrets: "10" services: "10" services.loadbalancers: "2" EOF
kubectl create -f ./object-counts.yaml --namespace=myspace
kubectl get quota --namespace=myspace
NAME AGE compute-resources 30s object-counts 32s
kubectl describe quota compute-resources --namespace=myspace
Name: compute-resources Namespace: myspace Resource Used Hard -------- ---- ---- limits.cpu 0 2 limits.memory 0 2Gi requests.cpu 0 1 requests.memory 0 1Gi requests.nvidia.com/gpu 0 4
kubectl describe quota object-counts --namespace=myspace
Name: object-counts Namespace: myspace Resource Used Hard -------- ---- ---- configmaps 0 10 persistentvolumeclaims 0 4 pods 0 4 replicationcontrollers 0 20 secrets 1 10 services 0 10 services.loadbalancers 0 2
Kubectl also supports object count quota for all standard namespaced resources
using the syntax
kubectl create namespace myspace
kubectl create quota test --hard=count/deployments.apps=2,count/replicasets.apps=4,count/pods=3,count/secrets=4 --namespace=myspace
kubectl create deployment nginx --image=nginx --namespace=myspace --replicas=2
kubectl describe quota --namespace=myspace
Name: test Namespace: myspace Resource Used Hard -------- ---- ---- count/deployments.apps 1 2 count/pods 2 3 count/replicasets.apps 1 4 count/secrets 1 4
Quota and Cluster Capacity
ResourceQuotas are independent of the cluster capacity. They are expressed in absolute units. So, if you add nodes to your cluster, this does not automatically give each namespace the ability to consume more resources.
Sometimes more complex policies may be desired, such as:
- Proportionally divide total cluster resources among several teams.
- Allow each tenant to grow resource usage as needed, but have a generous limit to prevent accidental resource exhaustion.
- Detect demand from one namespace, add nodes, and increase quota.
Such policies could be implemented using
ResourceQuotas as building blocks, by
writing a "controller" that watches the quota usage and adjusts the quota
hard limits of each namespace according to other signals.
Note that resource quota divides up aggregate cluster resources, but it creates no restrictions around nodes: pods from several namespaces may run on the same node.
Limit Priority Class consumption by default
It may be desired that pods at a particular priority, eg. "cluster-services", should be allowed in a namespace, if and only if, a matching quota object exists.
With this mechanism, operators are able to restrict usage of certain high priority classes to a limited number of namespaces and not every namespace will be able to consume these priority classes by default.
To enforce this,
--admission-control-config-file should be
used to pass path to the following configuration file:
apiVersion: apiserver.config.k8s.io/v1 kind: AdmissionConfiguration plugins: - name: "ResourceQuota" configuration: apiVersion: apiserver.config.k8s.io/v1 kind: ResourceQuotaConfiguration limitedResources: - resource: pods matchScopes: - scopeName: PriorityClass operator: In values: ["cluster-services"]
Then, create a resource quota object in the
apiVersion: v1 kind: ResourceQuota metadata: name: pods-cluster-services spec: scopeSelector: matchExpressions: - operator : In scopeName: PriorityClass values: ["cluster-services"]
kubectl apply -f https://k8s.io/examples/policy/priority-class-resourcequota.yaml -n kube-system
In this case, a pod creation will be allowed if:
- the Pod's
priorityClassNameis not specified.
- the Pod's
priorityClassNameis specified to a value other than
- the Pod's
priorityClassNameis set to
cluster-services, it is to be created in the
kube-systemnamespace, and it has passed the resource quota check.
A Pod creation request is rejected if its
priorityClassName is set to
and it is to be created in a namespace other than
3 - Process ID Limits And Reservations
Kubernetes v1.20 [stable]
Kubernetes allow you to limit the number of process IDs (PIDs) that a Pod can use. You can also reserve a number of allocatable PIDs for each node for use by the operating system and daemons (rather than by Pods).
Process IDs (PIDs) are a fundamental resource on nodes. It is trivial to hit the task limit without hitting any other resource limits, which can then cause instability to a host machine.
Cluster administrators require mechanisms to ensure that Pods running in the cluster cannot induce PID exhaustion that prevents host daemons (such as the kubelet or kube-proxy, and potentially also the container runtime) from running. In addition, it is important to ensure that PIDs are limited among Pods in order to ensure they have limited impact on other workloads on the same node.
32768. Consider raising the value of
You can configure a kubelet to limit the number of PIDs a given Pod can consume.
For example, if your node's host OS is set to use a maximum of
262144 PIDs and
expect to host less than
250 Pods, one can give each Pod a budget of
PIDs to prevent using up that node's overall number of available PIDs. If the
admin wants to overcommit PIDs similar to CPU or memory, they may do so as well
with some additional risks. Either way, a single Pod will not be able to bring
the whole machine down. This kind of resource limiting helps to prevent simple
fork bombs from affecting operation of an entire cluster.
Per-Pod PID limiting allows administrators to protect one Pod from another, but does not ensure that all Pods scheduled onto that host are unable to impact the node overall. Per-Pod limiting also does not protect the node agents themselves from PID exhaustion.
You can also reserve an amount of PIDs for node overhead, separate from the allocation to Pods. This is similar to how you can reserve CPU, memory, or other resources for use by the operating system and other facilities outside of Pods and their containers.
PID limiting is a an important sibling to compute
and limits. However, you specify it in a different way: rather than defining a
Pod's resource limit in the
.spec for a Pod, you configure the limit as a
setting on the kubelet. Pod-defined PID limits are not currently supported.
Node PID limits
Kubernetes allows you to reserve a number of process IDs for the system use. To
configure the reservation, use the parameter
pid=<number> in the
--kube-reserved command line options to the kubelet.
The value you specified declares that the specified number of process IDs will
be reserved for the system as a whole and for Kubernetes system daemons
Pod PID limits
Kubernetes allows you to limit the number of processes running in a Pod. You
specify this limit at the node level, rather than configuring it as a resource
limit for a particular Pod. Each Node can have a different PID limit.
To configure the limit, you can specify the command line parameter
to the kubelet, or set
PodPidsLimit in the kubelet
PID based eviction
You can configure kubelet to start terminating a Pod when it is misbehaving and consuming abnormal amount of resources.
This feature is called eviction. You can
Configure Out of Resource Handling
for various eviction signals.
pid.available eviction signal to configure the threshold for number of PIDs used by Pod.
You can set soft and hard eviction policies.
However, even with the hard eviction policy, if the number of PIDs growing very fast,
node can still get into unstable state by hitting the node PIDs limit.
Eviction signal value is calculated periodically and does NOT enforce the limit.
PID limiting - per Pod and per Node sets the hard limit. Once the limit is hit, workload will start experiencing failures when trying to get a new PID. It may or may not lead to rescheduling of a Pod, depending on how workload reacts on these failures and how liveleness and readiness probes are configured for the Pod. However, if limits were set correctly, you can guarantee that other Pods workload and system processes will not run out of PIDs when one Pod is misbehaving.
- Refer to the PID Limiting enhancement document for more information.
- For historical context, read Process ID Limiting for Stability Improvements in Kubernetes 1.14.
- Read Managing Resources for Containers.
- Learn how to Configure Out of Resource Handling.
4 - Node Resource Managers
In order to support latency-critical and high-throughput workloads, Kubernetes offers a suite of Resource Managers. The managers aim to co-ordinate and optimise node's resources alignment for pods configured with a specific requirement for CPUs, devices, and memory (hugepages) resources.
The main manager, the Topology Manager, is a Kubelet component that co-ordinates the overall resource management process through its policy.
The configuration of individual managers is elaborated in dedicated documents: