This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

PodGroup API

FEATURE STATE: Kubernetes v1.35 [alpha](disabled by default)

A PodGroup is a runtime object that represents a group of Pods scheduled together as a single unit. While the Workload API defines scheduling policy templates, PodGroups are the runtime counterparts that carry both the policy and the scheduling status for a specific instance of that group.

What is a PodGroup?

The PodGroup API resource is part of the scheduling.k8s.io/v1alpha2 API group and your cluster must have that API group enabled, as well as the GenericWorkload feature gate, before you can use this API.

A PodGroup is a self-contained scheduling unit. It defines the group of Pods that should be scheduled together, carries the scheduling policy that governs placement, and records the runtime status of that scheduling decision.

API structure

A PodGroup consists of a spec that defines the desired scheduling behavior and a status that reflects the current scheduling state.

Scheduling policy

Each PodGroup carries a scheduling policy (basic or gang) in spec.schedulingPolicy. When a workload controller creates the PodGroup, this policy is copied from the Workload's PodGroupTemplate at creation time. For standalone PodGroups, you set the policy directly.

spec:
  schedulingPolicy:
    gang:
      minCount: 4

Template reference

The optional spec.podGroupTemplateRef links the PodGroup back to the PodGroupTemplate in the Workload it was created from. This is useful for observability and tooling.

spec:
  podGroupTemplateRef:
    workload:
      workloadName: training-policy
      podGroupTemplateName: worker

Requesting DRA devices for a PodGroup

FEATURE STATE: Kubernetes v1.36 [alpha](disabled by default)

Devices available through Dynamic Resource Allocation (DRA) can be requested by a PodGroup through its spec.resourceClaims field:

apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
  name: training-group
  namespace: some-ns
spec:
  ...
  resourceClaims:
  - name: pg-claim
    resourceClaimName: my-pg-claim
  - name: pg-claim-template
    resourceClaimTemplateName: my-pg-template

ResourceClaims associated with PodGroups can be shared by all Pods belonging to the group. With only a reference to the PodGroup in the ResourceClaim's status.reservedFor instead of each individual Pod, any number of Pods in the same PodGroup can share a ResourceClaim. ResourceClaims can also be generated from ResourceClaimTemplates for each PodGroup, allowing the devices allocated to each generated ResourceClaim to be shared by the Pods in each PodGroup.

For more details and a more complete example, see the DRA documentation.

Status

The scheduler updates status.conditions to report whether the group has been successfully scheduled. The primary condition is PodGroupScheduled, which is True when all required Pods have been placed and False when scheduling fails.

Note:

The PodGroupScheduled condition reflects the initial scheduling decision only. The scheduler does not update it if Pods later fail or are evicted. See Limitations for details.

See the PodGroup lifecycle page for the full list of conditions and reasons.

Creating a PodGroup

A PodGroup API resource is part of the scheduling.k8s.io/v1alpha2 API group. (and your cluster must have that API group enabled, as well as the GenericWorkload feature gate, before you can use this API).

The following manifest creates a PodGroup with a gang scheduling policy that requires at least 4 Pods to be schedulable simultaneously:

apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
  name: training-worker-0
  namespace: default
spec:
  schedulingPolicy:
    gang:
      minCount: 4

You can inspect PodGroups in your cluster:

kubectl get podgroups

To see the full status including scheduling conditions:

kubectl describe podgroup training-worker-0

How it fits together

The relationship between controllers, Workloads, PodGroups, and Pods follows this pattern:

  1. The workload controller creates a Workload that defines PodGroupTemplates with scheduling policies.
  2. For each runtime instance, the controller creates a PodGroup from one of the Workload's PodGroupTemplates.
  3. The controller creates Pods that reference the PodGroup via the spec.schedulingGroup.podGroupName field.

The Job controller is the only built-in workload controller that follows this pattern for now. Custom controllers can implement the same flow for their own workload types.

apiVersion: scheduling.k8s.io/v1alpha2
kind: Workload
metadata:
  name: training-policy
spec:
  podGroupTemplates:
  - name: worker
    schedulingPolicy:
      gang:
        minCount: 4
---
apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
  name: training-worker-0
spec:
  podGroupTemplateRef:
    workload:
      workloadName: training-policy
      podGroupTemplateName: worker
  schedulingPolicy:
    gang:
      minCount: 4
---
apiVersion: v1
kind: Pod
metadata:
  name: worker-0
spec:
  schedulingGroup:
    podGroupName: training-worker-0
  containers:
  - name: ml-worker
    image: training:v1

The Workload acts as a long-lived policy definition, while PodGroups handle the transient, per-instance runtime state. This separation means that status updates for individual PodGroups do not contend on the shared Workload object.

What's next

1 - PodGroup Lifecycle

FEATURE STATE: Kubernetes v1.35 [alpha](disabled by default)

A PodGroup is scheduled as a unit and protected from premature deletion while its Pods are still running.

Ownership and lifecycle

PodGroups are owned by the workload controller that created them (for example, a Job) via standard ownerReferences. When the owning object is deleted, PodGroups are automatically garbage collected.

PodGroup names must be unique within a namespace and must be valid DNS subdomains.

Creation ordering

Controllers must create objects in this order:

  1. Workload — the scheduling policy template.
  2. PodGroup — the runtime instance.
  3. Pods — with spec.schedulingGroup.podGroupName pointing to the PodGroup.

If a PodGroup includes a podGroupTemplateRef that points to a Workload that does not exist (or is being deleted), the API server rejects the PodGroup creation request. The referenced Workload must exist before the PodGroup can be created.

If a Pod references a PodGroup that does not yet exist, the Pod remains pending. The scheduler automatically queues the Pod for scheduling once the PodGroup is created.

Deletion protection

A PodGroup cannot be fully deleted while any of its Pods are still running. A dedicated finalizer ensures that deletion is blocked until all Pods referencing the PodGroup have reached a terminal phase (Succeeded or Failed).

Controller-managed and user-managed PodGroups

In most cases, workload controllers (for example, Job) create PodGroups automatically (controller-managed). The controller determines the podGroupName for each Pod at creation time, similar to how a DaemonSet sets node affinity per Pod.

If you need more control over naming and lifecycle, you can create PodGroup objects directly and set spec.schedulingGroup.podGroupName in your Pod templates yourself (user-managed). This gives you full control over PodGroup creation and naming.

Limitations

  • All Pods in a PodGroup must use the same .spec.schedulerName. If a mismatch is detected, the scheduler rejects all Pods in the group as unschedulable.
  • The spec.schedulingPolicy.gang.minCount field on a PodGroup is immutable. Once created, you cannot change the minimum number of Pods that must be schedulable for the group to be admitted.
  • The spec.schedulingGroup field on a Pod is immutable. Once set, a Pod cannot move to a different PodGroup.
  • The maximum number of PodGroupTemplates in a single Workload is 8.
  • The PodGroupScheduled condition reflects the outcome of the initial scheduling attempt only. Once the condition is set to True, the scheduler does not update it if Pods later fail, are evicted, or stop running.

What's next