Note: Impatient readers may head straight to Quick Start.
Using previous version of Kubebuilder v1 or v2? Check the legacy documentation for v1, v2 or v3
Who is this for
Users of Kubernetes
Users of Kubernetes will develop a deeper understanding of Kubernetes through learning the fundamental concepts behind how APIs are designed and implemented. This book will teach readers how to develop their own Kubernetes APIs and the principles from which the core Kubernetes APIs are designed.
Including:
- The structure of Kubernetes APIs and Resources
- API versioning semantics
- Self-healing
- Garbage Collection and Finalizers
- Declarative vs Imperative APIs
- Level-Based vs Edge-Base APIs
- Resources vs Subresources
Kubernetes API extension developers
API extension developers will learn the principles and concepts behind implementing canonical Kubernetes APIs, as well as simple tools and libraries for rapid execution. This book covers pitfalls and misconceptions that extension developers commonly encounter.
Including:
- How to batch multiple events into a single reconciliation call
- How to configure periodic reconciliation
- Forthcoming
- When to use the lister cache vs live lookups
- Garbage Collection vs Finalizers
- How to use Declarative vs Webhook Validation
- How to implement API versioning
Why Kubernetes APIs
Kubernetes APIs provide consistent and well defined endpoints for objects adhering to a consistent and rich structure.
This approach has fostered a rich ecosystem of tools and libraries for working with Kubernetes APIs.
Users work with the APIs through declaring objects as yaml or json config, and using common tooling to manage the objects.
Building services as Kubernetes APIs provides many advantages to plain old REST, including:
- Hosted API endpoints, storage, and validation.
- Rich tooling and CLIs such as
kubectl
andkustomize
. - Support for AuthN and granular AuthZ.
- Support for API evolution through API versioning and conversion.
- Facilitation of adaptive / self-healing APIs that continuously respond to changes in the system state without user intervention.
- Kubernetes as a hosting environment
Developers may build and publish their own Kubernetes APIs for installation into running Kubernetes clusters.
Contribution
If you like to contribute to either this book or the code, please be so kind to read our Contribution guidelines first.
Resources
-
Repository: sigs.k8s.io/kubebuilder
-
Slack channel: #kubebuilder
-
Google Group: kubebuilder@googlegroups.com
Architecture Concept Diagram
The following diagram will help you get a better idea over the Kubebuilder concepts and architecture.
Quick Start
This Quick Start guide will cover:
Prerequisites
- go version v1.20.0+
- docker version 17.03+.
- kubectl version v1.11.3+.
- Access to a Kubernetes v1.11.3+ cluster.
Installation
Install kubebuilder:
# download kubebuilder and install locally.
curl -L -o kubebuilder "https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)"
chmod +x kubebuilder && sudo mv kubebuilder /usr/local/bin/
Create a Project
Create a directory, and then run the init command inside of it to initialize a new project. Follows an example.
mkdir -p ~/projects/guestbook
cd ~/projects/guestbook
kubebuilder init --domain my.domain --repo my.domain/guestbook
Create an API
Run the following command to create a new API (group/version) as webapp/v1
and the new Kind(CRD) Guestbook
on it:
kubebuilder create api --group webapp --version v1 --kind Guestbook
OPTIONAL: Edit the API definition and the reconciliation business logic. For more info see Designing an API and What’s in a Controller.
If you are editing the API definitions, generate the manifests such as Custom Resources (CRs) or Custom Resource Definitions (CRDs) using
make manifests
Click here to see an example. (api/v1/guestbook_types.go)
// GuestbookSpec defines the desired state of Guestbook
type GuestbookSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file
// Quantity of instances
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=10
Size int32 `json:"size"`
// Name of the ConfigMap for GuestbookSpec's configuration
// +kubebuilder:validation:MaxLength=15
// +kubebuilder:validation:MinLength=1
ConfigMapName string `json:"configMapName"`
// +kubebuilder:validation:Enum=Phone;Address;Name
Type string `json:"alias,omitempty"`
}
// GuestbookStatus defines the observed state of Guestbook
type GuestbookStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
// PodName of the active Guestbook node.
Active string `json:"active"`
// PodNames of the standby Guestbook nodes.
Standby []string `json:"standby"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope=Cluster
// Guestbook is the Schema for the guestbooks API
type Guestbook struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec GuestbookSpec `json:"spec,omitempty"`
Status GuestbookStatus `json:"status,omitempty"`
}
Test It Out
You’ll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster.
Install the CRDs into the cluster:
make install
For quick feedback and code-level debugging, run your controller (this will run in the foreground, so switch to a new terminal if you want to leave it running):
make run
Install Instances of Custom Resources
If you pressed y
for Create Resource [y/n] then you created a CR for your CRD in your samples (make sure to edit them first if you’ve changed the
API definition):
kubectl apply -k config/samples/
Run It On the Cluster
When your controller is ready to be packaged and tested in other clusters.
Build and push your image to the location specified by IMG
:
make docker-build docker-push IMG=<some-registry>/<project-name>:tag
Deploy the controller to the cluster with image specified by IMG
:
make deploy IMG=<some-registry>/<project-name>:tag
Uninstall CRDs
To delete your CRDs from the cluster:
make uninstall
Undeploy controller
Undeploy the controller to the cluster:
make undeploy
Next Step
- Now, take a look at the Architecture Concept Diagram for a clearer overview.
- Next, proceed with the Getting Started Guide, which should take no more than 30 minutes and will provide a solid foundation. Afterward, dive into the CronJob Tutorial to deepen your understanding by developing a demo project.
Getting Started
We will create a sample project to let you know how it works. This sample will:
- Reconcile a Memcached CR - which represents an instance of a Memcached deployed/managed on cluster
- Create a Deployment with the Memcached image
- Not allow more instances than the size defined in the CR which will be applied
- Update the Memcached CR status
Create a project
First, create and navigate into a directory for your project. Then, initialize it using kubebuilder
:
mkdir $GOPATH/memcached-operator
cd $GOPATH/memcached-operator
kubebuilder init --domain=example.com
Create the Memcached API (CRD):
Next, we’ll create the API which will be responsible for deploying and managing Memcached(s) instances on the cluster.
kubebuilder create api --group cache --version v1alpha1 --kind Memcached
Understanding APIs
This command’s primary aim is to produce the Custom Resource (CR) and Custom Resource Definition (CRD) for the Memcached Kind.
It creates the API with the group cache.example.com
and version v1alpha1
, uniquely identifying the new CRD of the Memcached Kind.
By leveraging the Kubebuilder tool, we can define our APIs and objects representing our solutions for these platforms.
While we’ve added only one Kind of resource in this example, we can have as many Groups
and Kinds
as necessary.
To make it easier to understand, think of CRDs as the definition of our custom Objects, while CRs are instances of them.
Defining our API
Defining the Specs
Now, we will define the values that each instance of your Memcached resource on the cluster can assume. In this example, we will allow configuring the number of instances with the following:
type MemcachedSpec struct {
...
Size int32 `json:"size,omitempty"`
}
Creating Status definitions
We also want to track the status of our Operations which will be done to manage the Memcached CR(s). This allows us to verify the Custom Resource’s description of our own API and determine if everything occurred successfully or if any errors were encountered, similar to how we do with any resource from the Kubernetes API.
// MemcachedStatus defines the observed state of Memcached
type MemcachedStatus struct {
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,1,rep,name=conditions"`
}
Markers and validations
Furthermore, we want to validate the values added in our CustomResource
to ensure that those are valid. To do it we are will use refer markers,
such as +kubebuilder:validation:Minimum=1
.
Now, see our example fully completed.
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Imports
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
// MemcachedSpec defines the desired state of Memcached.
type MemcachedSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file
// Size defines the number of Memcached instances
// The following markers will use OpenAPI v3 schema to validate the value
// More info: https://book.kubebuilder.io/reference/markers/crd-validation.html
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=3
// +kubebuilder:validation:ExclusiveMaximum=false
Size int32 `json:"size,omitempty"`
}
// MemcachedStatus defines the observed state of Memcached.
type MemcachedStatus struct {
// Represents the observations of a Memcached's current state.
// Memcached.status.conditions.type are: "Available", "Progressing", and "Degraded"
// Memcached.status.conditions.status are one of True, False, Unknown.
// Memcached.status.conditions.reason the value should be a CamelCase string and producers of specific
// condition types may define expected values and meanings for this field, and whether the values
// are considered a guaranteed API.
// Memcached.status.conditions.Message is a human readable message indicating details about the transition.
// For further information see: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,1,rep,name=conditions"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// Memcached is the Schema for the memcacheds API.
type Memcached struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MemcachedSpec `json:"spec,omitempty"`
Status MemcachedStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// MemcachedList contains a list of Memcached.
type MemcachedList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []Memcached `json:"items"`
}
func init() {
SchemeBuilder.Register(&Memcached{}, &MemcachedList{})
}
Generating manifests with the specs and validations
To generate the required CRDs we will run make generate
command, which will call controller-gen
to generate the CRD manifest, which is located under the config/crd/bases
directory.
config/crd/bases/cache.example.com_memcacheds.yam
: Our Memcached CRD
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.4
name: memcacheds.cache.example.com
spec:
group: cache.example.com
names:
kind: Memcached
listKind: MemcachedList
plural: memcacheds
singular: memcached
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: Memcached is the Schema for the memcacheds API.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: MemcachedSpec defines the desired state of Memcached.
properties:
size:
description: |-
Size defines the number of Memcached instances
The following markers will use OpenAPI v3 schema to validate the value
More info: https://book.kubebuilder.io/reference/markers/crd-validation.html
format: int32
maximum: 3
minimum: 1
type: integer
type: object
status:
description: MemcachedStatus defines the observed state of Memcached.
properties:
conditions:
items:
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
type: object
type: object
served: true
storage: true
subresources:
status: {}
Sample of Custom Resources
The manifests located under the config/samples
directory serve as examples of Custom Resources that can be applied to the cluster.
In this particular example, by applying the given resource to the cluster, we would generate
a Deployment with a single instance size (see size: 1
).
apiVersion: cache.example.com/v1alpha1
kind: Memcached
metadata:
labels:
app.kubernetes.io/name: project
app.kubernetes.io/managed-by: kustomize
name: memcached-sample
spec:
# TODO(user): edit the following value to ensure the number
# of Pods/Instances your Operand must have on cluster
size: 1
Reconciliation Process
In a simplified way, Kubernetes works by allowing us to declare the desired state of our system, and then its controllers continuously observe the cluster and take actions to ensure that the actual state matches the desired state. For our custom APIs and controllers, the process is similar. Remember, we are extending Kubernetes’ behaviors and its APIs to fit our specific needs.
In our controller, we will implement a reconciliation process.
Essentially, the reconciliation process functions as a loop, continuously checking conditions and performing necessary actions until the desired state is achieved. This process will keep running until all conditions in the system align with the desired state defined in our implementation.
Here’s a pseudo-code example to illustrate this:
reconcile App {
// Check if a Deployment for the app exists, if not, create one
// If there's an error, then restart from the beginning of the reconcile
if err != nil {
return reconcile.Result{}, err
}
// Check if a Service for the app exists, if not, create one
// If there's an error, then restart from the beginning of the reconcile
if err != nil {
return reconcile.Result{}, err
}
// Look for Database CR/CRD
// Check the Database Deployment's replicas size
// If deployment.replicas size doesn't match cr.size, then update it
// Then, restart from the beginning of the reconcile. For example, by returning `reconcile.Result{Requeue: true}, nil`.
if err != nil {
return reconcile.Result{Requeue: true}, nil
}
...
// If at the end of the loop:
// Everything was executed successfully, and the reconcile can stop
return reconcile.Result{}, nil
}
In the context of our example
When our sample Custom Resource (CR) is applied to the cluster (i.e. kubectl apply -f config/sample/cache_v1alpha1_memcached.yaml
),
we want to ensure that a Deployment is created for our Memcached image and that it matches the number of replicas defined in the CR.
To achieve this, we need to first implement an operation that checks whether the Deployment for our Memcached instance already exists on the cluster. If it does not, the controller will create the Deployment accordingly. Therefore, our reconciliation process must include an operation to ensure that this desired state is consistently maintained. This operation would involve:
// Check if the deployment already exists, if not create a new one
found := &appsv1.Deployment{}
err = r.Get(ctx, types.NamespacedName{Name: memcached.Name, Namespace: memcached.Namespace}, found)
if err != nil && apierrors.IsNotFound(err) {
// Define a new deployment
dep := r.deploymentForMemcached()
// Create the Deployment on the cluster
if err = r.Create(ctx, dep); err != nil {
log.Error(err, "Failed to create new Deployment",
"Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
return ctrl.Result{}, err
}
...
}
Next, note that the deploymentForMemcached()
function will need to define and return the Deployment that should be
created on the cluster. This function should construct the Deployment object with the necessary
specifications, as demonstrated in the following example:
dep := &appsv1.Deployment{
Spec: appsv1.DeploymentSpec{
Replicas: &replicas,
Template: corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Image: "memcached:1.6.26-alpine3.19",
Name: "memcached",
ImagePullPolicy: corev1.PullIfNotPresent,
Ports: []corev1.ContainerPort{{
ContainerPort: 11211,
Name: "memcached",
}},
Command: []string{"memcached", "--memory-limit=64", "-o", "modern", "-v"},
}},
},
},
},
}
Additionally, we need to implement a mechanism to verify that the number of Memcached replicas on the cluster matches the desired count specified in the Custom Resource (CR). If there is a discrepancy, the reconciliation must update the cluster to ensure consistency. This means that whenever a CR of the Memcached Kind is created or updated on the cluster, the controller will continuously reconcile the state until the actual number of replicas matches the desired count. The following example illustrates this process:
...
size := memcached.Spec.Size
if *found.Spec.Replicas != size {
found.Spec.Replicas = &size
if err = r.Update(ctx, found); err != nil {
log.Error(err, "Failed to update Deployment",
"Deployment.Namespace", found.Namespace, "Deployment.Name", found.Name)
return ctrl.Result{}, err
}
...
Now, you can review the complete controller responsible for managing Custom Resources of the Memcached Kind. This controller ensures that the desired state is maintained in the cluster, making sure that our Memcached instance continues running with the number of replicas specified by the users.
internal/controller/memcached_controller.go
: Our Controller Implementation
/*
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package controller
import (
"context"
"fmt"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"time"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
cachev1alpha1 "example.com/memcached/api/v1alpha1"
)
// Definitions to manage status conditions
const (
// typeAvailableMemcached represents the status of the Deployment reconciliation
typeAvailableMemcached = "Available"
// typeDegradedMemcached represents the status used when the custom resource is deleted and the finalizer operations are yet to occur.
typeDegradedMemcached = "Degraded"
)
// MemcachedReconciler reconciles a Memcached object
type MemcachedReconciler struct {
client.Client
Scheme *runtime.Scheme
}
// +kubebuilder:rbac:groups=cache.example.com,resources=memcacheds,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/finalizers,verbs=update
// +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// It is essential for the controller's reconciliation loop to be idempotent. By following the Operator
// pattern you will create Controllers which provide a reconcile function
// responsible for synchronizing resources until the desired state is reached on the cluster.
// Breaking this recommendation goes against the design principles of controller-runtime.
// and may lead to unforeseen consequences such as resources becoming stuck and requiring manual intervention.
// For further info:
// - About Operator Pattern: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
// - About Controllers: https://kubernetes.io/docs/concepts/architecture/controller/
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.19.0/pkg/reconcile
func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// Fetch the Memcached instance
// The purpose is check if the Custom Resource for the Kind Memcached
// is applied on the cluster if not we return nil to stop the reconciliation
memcached := &cachev1alpha1.Memcached{}
err := r.Get(ctx, req.NamespacedName, memcached)
if err != nil {
if apierrors.IsNotFound(err) {
// If the custom resource is not found then it usually means that it was deleted or not created
// In this way, we will stop the reconciliation
log.Info("memcached resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
// Error reading the object - requeue the request.
log.Error(err, "Failed to get memcached")
return ctrl.Result{}, err
}
// Let's just set the status as Unknown when no status is available
if memcached.Status.Conditions == nil || len(memcached.Status.Conditions) == 0 {
meta.SetStatusCondition(&memcached.Status.Conditions, metav1.Condition{Type: typeAvailableMemcached, Status: metav1.ConditionUnknown, Reason: "Reconciling", Message: "Starting reconciliation"})
if err = r.Status().Update(ctx, memcached); err != nil {
log.Error(err, "Failed to update Memcached status")
return ctrl.Result{}, err
}
// Let's re-fetch the memcached Custom Resource after updating the status
// so that we have the latest state of the resource on the cluster and we will avoid
// raising the error "the object has been modified, please apply
// your changes to the latest version and try again" which would re-trigger the reconciliation
// if we try to update it again in the following operations
if err := r.Get(ctx, req.NamespacedName, memcached); err != nil {
log.Error(err, "Failed to re-fetch memcached")
return ctrl.Result{}, err
}
}
// Check if the deployment already exists, if not create a new one
found := &appsv1.Deployment{}
err = r.Get(ctx, types.NamespacedName{Name: memcached.Name, Namespace: memcached.Namespace}, found)
if err != nil && apierrors.IsNotFound(err) {
// Define a new deployment
dep, err := r.deploymentForMemcached(memcached)
if err != nil {
log.Error(err, "Failed to define new Deployment resource for Memcached")
// The following implementation will update the status
meta.SetStatusCondition(&memcached.Status.Conditions, metav1.Condition{Type: typeAvailableMemcached,
Status: metav1.ConditionFalse, Reason: "Reconciling",
Message: fmt.Sprintf("Failed to create Deployment for the custom resource (%s): (%s)", memcached.Name, err)})
if err := r.Status().Update(ctx, memcached); err != nil {
log.Error(err, "Failed to update Memcached status")
return ctrl.Result{}, err
}
return ctrl.Result{}, err
}
log.Info("Creating a new Deployment",
"Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
if err = r.Create(ctx, dep); err != nil {
log.Error(err, "Failed to create new Deployment",
"Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
return ctrl.Result{}, err
}
// Deployment created successfully
// We will requeue the reconciliation so that we can ensure the state
// and move forward for the next operations
return ctrl.Result{RequeueAfter: time.Minute}, nil
} else if err != nil {
log.Error(err, "Failed to get Deployment")
// Let's return the error for the reconciliation be re-trigged again
return ctrl.Result{}, err
}
// The CRD API defines that the Memcached type have a MemcachedSpec.Size field
// to set the quantity of Deployment instances to the desired state on the cluster.
// Therefore, the following code will ensure the Deployment size is the same as defined
// via the Size spec of the Custom Resource which we are reconciling.
size := memcached.Spec.Size
if *found.Spec.Replicas != size {
found.Spec.Replicas = &size
if err = r.Update(ctx, found); err != nil {
log.Error(err, "Failed to update Deployment",
"Deployment.Namespace", found.Namespace, "Deployment.Name", found.Name)
// Re-fetch the memcached Custom Resource before updating the status
// so that we have the latest state of the resource on the cluster and we will avoid
// raising the error "the object has been modified, please apply
// your changes to the latest version and try again" which would re-trigger the reconciliation
if err := r.Get(ctx, req.NamespacedName, memcached); err != nil {
log.Error(err, "Failed to re-fetch memcached")
return ctrl.Result{}, err
}
// The following implementation will update the status
meta.SetStatusCondition(&memcached.Status.Conditions, metav1.Condition{Type: typeAvailableMemcached,
Status: metav1.ConditionFalse, Reason: "Resizing",
Message: fmt.Sprintf("Failed to update the size for the custom resource (%s): (%s)", memcached.Name, err)})
if err := r.Status().Update(ctx, memcached); err != nil {
log.Error(err, "Failed to update Memcached status")
return ctrl.Result{}, err
}
return ctrl.Result{}, err
}
// Now, that we update the size we want to requeue the reconciliation
// so that we can ensure that we have the latest state of the resource before
// update. Also, it will help ensure the desired state on the cluster
return ctrl.Result{Requeue: true}, nil
}
// The following implementation will update the status
meta.SetStatusCondition(&memcached.Status.Conditions, metav1.Condition{Type: typeAvailableMemcached,
Status: metav1.ConditionTrue, Reason: "Reconciling",
Message: fmt.Sprintf("Deployment for custom resource (%s) with %d replicas created successfully", memcached.Name, size)})
if err := r.Status().Update(ctx, memcached); err != nil {
log.Error(err, "Failed to update Memcached status")
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
// SetupWithManager sets up the controller with the Manager.
func (r *MemcachedReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&cachev1alpha1.Memcached{}).
Owns(&appsv1.Deployment{}).
Named("memcached").
Complete(r)
}
// deploymentForMemcached returns a Memcached Deployment object
func (r *MemcachedReconciler) deploymentForMemcached(
memcached *cachev1alpha1.Memcached) (*appsv1.Deployment, error) {
replicas := memcached.Spec.Size
image := "memcached:1.6.26-alpine3.19"
dep := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: memcached.Name,
Namespace: memcached.Namespace,
},
Spec: appsv1.DeploymentSpec{
Replicas: &replicas,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{"app.kubernetes.io/name": "project"},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{"app.kubernetes.io/name": "project"},
},
Spec: corev1.PodSpec{
SecurityContext: &corev1.PodSecurityContext{
RunAsNonRoot: &[]bool{true}[0],
SeccompProfile: &corev1.SeccompProfile{
Type: corev1.SeccompProfileTypeRuntimeDefault,
},
},
Containers: []corev1.Container{{
Image: image,
Name: "memcached",
ImagePullPolicy: corev1.PullIfNotPresent,
// Ensure restrictive context for the container
// More info: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted
SecurityContext: &corev1.SecurityContext{
RunAsNonRoot: &[]bool{true}[0],
RunAsUser: &[]int64{1001}[0],
AllowPrivilegeEscalation: &[]bool{false}[0],
Capabilities: &corev1.Capabilities{
Drop: []corev1.Capability{
"ALL",
},
},
},
Ports: []corev1.ContainerPort{{
ContainerPort: 11211,
Name: "memcached",
}},
Command: []string{"memcached", "--memory-limit=64", "-o", "modern", "-v"},
}},
},
},
},
}
// Set the ownerRef for the Deployment
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/
if err := ctrl.SetControllerReference(memcached, dep, r.Scheme); err != nil {
return nil, err
}
return dep, nil
}
Diving Into the Controller Implementation
Setting Manager to Watching Resources
The whole idea is to be Watching the resources that matter for the controller. When a resource that the controller is interested in changes, the Watch triggers the controller’s reconciliation loop, ensuring that the actual state of the resource matches the desired state as defined in the controller’s logic.
Notice how we configured the Manager to monitor events such as the creation, update, or deletion of a Custom Resource (CR) of the Memcached kind, as well as any changes to the Deployment that the controller manages and owns:
// SetupWithManager sets up the controller with the Manager.
// The Deployment is also watched to ensure its
// desired state in the cluster.
func (r *MemcachedReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
// Watch the Memcached Custom Resource and trigger reconciliation whenever it
//is created, updated, or deleted
For(&cachev1alpha1.Memcached{}).
// Watch the Deployment managed by the Memcached controller. If any changes occur to the Deployment
// owned and managed by this controller, it will trigger reconciliation, ensuring that the cluster
// state aligns with the desired state.
Owns(&appsv1.Deployment{}).
Complete(r)
}
But, How Does the Manager Know Which Resources Are Owned by It?
We do not want our Controller to watch any Deployment on the cluster and trigger our reconciliation loop. Instead, we only want to trigger reconciliation when the specific Deployment running our Memcached instance is changed. For example, if someone accidentally deletes our Deployment or changes the number of replicas, we want to trigger the reconciliation to ensure that it returns to the desired state.
The Manager knows which Deployment to observe because we set the ownerRef
(Owner Reference):
if err := ctrl.SetControllerReference(memcached, dep, r.Scheme); err != nil {
return nil, err
}
Granting Permissions
It’s important to ensure that the Controller has the necessary permissions(i.e. to create, get, update, and list) the resources it manages.
The RBAC permissions are now configured via RBAC markers, which are used to generate and update the
manifest files present in config/rbac/
. These markers can be found (and should be defined) on the Reconcile()
method of each controller, see
how it is implemented in our example:
// +kubebuilder:rbac:groups=cache.example.com,resources=memcacheds,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/finalizers,verbs=update
// +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch
After making changes to the controller, run the make generate command. This will prompt controller-gen
to refresh the files located under config/rbac
.
config/rbac/role.yaml
: Our RBAC Role generated
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: manager-role
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- deployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- cache.example.com
resources:
- memcacheds
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- cache.example.com
resources:
- memcacheds/finalizers
verbs:
- update
- apiGroups:
- cache.example.com
resources:
- memcacheds/status
verbs:
- get
- patch
- update
Manager (main.go)
The Manager in the cmd/main.go
file is responsible for managing the controllers in your application.
cmd/main.gol
: Our main.go
/*
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package main
import (
"crypto/tls"
"flag"
"os"
// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
"sigs.k8s.io/controller-runtime/pkg/metrics/filters"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
"sigs.k8s.io/controller-runtime/pkg/webhook"
cachev1alpha1 "example.com/memcached/api/v1alpha1"
"example.com/memcached/internal/controller"
// +kubebuilder:scaffold:imports
)
var (
scheme = runtime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
)
func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
utilruntime.Must(cachev1alpha1.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
func main() {
var metricsAddr string
var enableLeaderElection bool
var probeAddr string
var secureMetrics bool
var enableHTTP2 bool
var tlsOpts []func(*tls.Config)
flag.StringVar(&metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+
"Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
flag.BoolVar(&secureMetrics, "metrics-secure", true,
"If set, the metrics endpoint is served securely via HTTPS. Use --metrics-secure=false to use HTTP instead.")
flag.BoolVar(&enableHTTP2, "enable-http2", false,
"If set, HTTP/2 will be enabled for the metrics and webhook servers")
opts := zap.Options{
Development: true,
}
opts.BindFlags(flag.CommandLine)
flag.Parse()
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))
// if the enable-http2 flag is false (the default), http/2 should be disabled
// due to its vulnerabilities. More specifically, disabling http/2 will
// prevent from being vulnerable to the HTTP/2 Stream Cancellation and
// Rapid Reset CVEs. For more information see:
// - https://github.com/advisories/GHSA-qppj-fm5r-hxr3
// - https://github.com/advisories/GHSA-4374-p667-p6c8
disableHTTP2 := func(c *tls.Config) {
setupLog.Info("disabling http/2")
c.NextProtos = []string{"http/1.1"}
}
if !enableHTTP2 {
tlsOpts = append(tlsOpts, disableHTTP2)
}
webhookServer := webhook.NewServer(webhook.Options{
TLSOpts: tlsOpts,
})
// Metrics endpoint is enabled in 'config/default/kustomization.yaml'. The Metrics options configure the server.
// More info:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.19.0/pkg/metrics/server
// - https://book.kubebuilder.io/reference/metrics.html
metricsServerOptions := metricsserver.Options{
BindAddress: metricsAddr,
SecureServing: secureMetrics,
// TODO(user): TLSOpts is used to allow configuring the TLS config used for the server. If certificates are
// not provided, self-signed certificates will be generated by default. This option is not recommended for
// production environments as self-signed certificates do not offer the same level of trust and security
// as certificates issued by a trusted Certificate Authority (CA). The primary risk is potentially allowing
// unauthorized access to sensitive metrics data. Consider replacing with CertDir, CertName, and KeyName
// to provide certificates, ensuring the server communicates using trusted and secure certificates.
TLSOpts: tlsOpts,
}
if secureMetrics {
// FilterProvider is used to protect the metrics endpoint with authn/authz.
// These configurations ensure that only authorized users and service accounts
// can access the metrics endpoint. The RBAC are configured in 'config/rbac/kustomization.yaml'. More info:
// https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.19.0/pkg/metrics/filters#WithAuthenticationAndAuthorization
metricsServerOptions.FilterProvider = filters.WithAuthenticationAndAuthorization
}
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
Metrics: metricsServerOptions,
WebhookServer: webhookServer,
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "4b13cc52.example.com",
// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader don't have to wait
// LeaseDuration time first.
//
// In the default scaffold provided, the program ends immediately after
// the manager stops, so would be fine to enable this option. However,
// if you are doing or is intended to do any operation such as perform cleanups
// after the manager stops then its usage might be unsafe.
// LeaderElectionReleaseOnCancel: true,
})
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
}
if err = (&controller.MemcachedReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "Memcached")
os.Exit(1)
}
// +kubebuilder:scaffold:builder
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up health check")
os.Exit(1)
}
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up ready check")
os.Exit(1)
}
setupLog.Info("starting manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
setupLog.Error(err, "problem running manager")
os.Exit(1)
}
}
Checking the Project running in the cluster
At this point you can check the steps to validate the project on the cluster by looking the steps defined in the Quick Start, see: Run It On the Cluster
Next Steps
- To delve deeper into developing your solution, consider going through the CronJob Tutorial
- For insights on optimizing your approach, refer to the Best Practices documentation.
Tutorial: Building CronJob
Too many tutorials start out with some really contrived setup, or some toy application that gets the basics across, and then stalls out on the more complicated stuff. Instead, this tutorial should take you through (almost) the full gamut of complexity with Kubebuilder, starting off simple and building up to something pretty full-featured.
Let’s pretend (and sure, this is a teensy bit contrived) that we’ve finally gotten tired of the maintenance burden of the non-Kubebuilder implementation of the CronJob controller in Kubernetes, and we’d like to rewrite it using Kubebuilder.
The job (no pun intended) of the CronJob controller is to run one-off tasks on the Kubernetes cluster at regular intervals. It does this by building on top of the Job controller, whose task is to run one-off tasks once, seeing them to completion.
Instead of trying to tackle rewriting the Job controller as well, we’ll use this as an opportunity to see how to interact with external types.
Scaffolding Out Our Project
As covered in the quick start, we’ll need to scaffold out a new project. Make sure you’ve installed Kubebuilder, then scaffold out a new project:
# create a project directory, and then run the init command.
mkdir project
cd project
# we'll use a domain of tutorial.kubebuilder.io,
# so all API groups will be <group>.tutorial.kubebuilder.io.
kubebuilder init --domain tutorial.kubebuilder.io --repo tutorial.kubebuilder.io/project
Now that we’ve got a project in place, let’s take a look at what Kubebuilder has scaffolded for us so far…
What’s in a basic project?
When scaffolding out a new project, Kubebuilder provides us with a few basic pieces of boilerplate.
Build Infrastructure
First up, basic infrastructure for building your project:
go.mod
: A new Go module matching our project, with
basic dependencies
module tutorial.kubebuilder.io/project
go 1.22.0
require (
github.com/onsi/ginkgo/v2 v2.19.0
github.com/onsi/gomega v1.33.1
github.com/robfig/cron v1.2.0
k8s.io/api v0.31.0
k8s.io/apimachinery v0.31.0
k8s.io/client-go v0.31.0
sigs.k8s.io/controller-runtime v0.19.0
)
require (
github.com/antlr4-go/antlr/v4 v4.13.0 // indirect
github.com/asaskevich/govalidator v0.0.0-20190424111038-f61b66f89f4a // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/blang/semver/v4 v4.0.0 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
github.com/evanphx/json-patch/v5 v5.9.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-logr/zapr v1.3.0 // indirect
github.com/go-openapi/jsonpointer v0.19.6 // indirect
github.com/go-openapi/jsonreference v0.20.2 // indirect
github.com/go-openapi/swag v0.22.4 // indirect
github.com/go-task/slim-sprig/v3 v3.0.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/cel-go v0.20.1 // indirect
github.com/google/gnostic-models v0.6.8 // indirect
github.com/google/go-cmp v0.6.0 // indirect
github.com/google/gofuzz v1.2.0 // indirect
github.com/google/pprof v0.0.0-20240525223248-4bfdf5a9a2af // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0 // indirect
github.com/imdario/mergo v0.3.6 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/prometheus/client_golang v1.19.1 // indirect
github.com/prometheus/client_model v0.6.1 // indirect
github.com/prometheus/common v0.55.0 // indirect
github.com/prometheus/procfs v0.15.1 // indirect
github.com/spf13/cobra v1.8.1 // indirect
github.com/spf13/pflag v1.0.5 // indirect
github.com/stoewer/go-strcase v1.2.0 // indirect
github.com/x448/float16 v0.8.4 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.53.0 // indirect
go.opentelemetry.io/otel v1.28.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.28.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.27.0 // indirect
go.opentelemetry.io/otel/metric v1.28.0 // indirect
go.opentelemetry.io/otel/sdk v1.28.0 // indirect
go.opentelemetry.io/otel/trace v1.28.0 // indirect
go.opentelemetry.io/proto/otlp v1.3.1 // indirect
go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.26.0 // indirect
golang.org/x/exp v0.0.0-20230515195305-f3d0a9c9a5cc // indirect
golang.org/x/net v0.26.0 // indirect
golang.org/x/oauth2 v0.21.0 // indirect
golang.org/x/sync v0.7.0 // indirect
golang.org/x/sys v0.21.0 // indirect
golang.org/x/term v0.21.0 // indirect
golang.org/x/text v0.16.0 // indirect
golang.org/x/time v0.3.0 // indirect
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d // indirect
gomodules.xyz/jsonpatch/v2 v2.4.0 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20240528184218-531527333157 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240701130421-f6361c86f094 // indirect
google.golang.org/grpc v1.65.0 // indirect
google.golang.org/protobuf v1.34.2 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/apiextensions-apiserver v0.31.0 // indirect
k8s.io/apiserver v0.31.0 // indirect
k8s.io/component-base v0.31.0 // indirect
k8s.io/klog/v2 v2.130.1 // indirect
k8s.io/kube-openapi v0.0.0-20240228011516-70dd3763d340 // indirect
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8 // indirect
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.30.3 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
Makefile
: Make targets for building and deploying your controller
# Image URL to use all building/pushing image targets
IMG ?= controller:latest
# ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary.
ENVTEST_K8S_VERSION = 1.31.0
# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
ifeq (,$(shell go env GOBIN))
GOBIN=$(shell go env GOPATH)/bin
else
GOBIN=$(shell go env GOBIN)
endif
# CONTAINER_TOOL defines the container tool to be used for building images.
# Be aware that the target commands are only tested with Docker which is
# scaffolded by default. However, you might want to replace it to use other
# tools. (i.e. podman)
CONTAINER_TOOL ?= docker
# Setting SHELL to bash allows bash commands to be executed by recipes.
# Options are set to exit when a recipe line exits non-zero or a piped command fails.
SHELL = /usr/bin/env bash -o pipefail
.SHELLFLAGS = -ec
.PHONY: all
all: build
##@ General
# The help target prints out all targets with their descriptions organized
# beneath their categories. The categories are represented by '##@' and the
# target descriptions by '##'. The awk command is responsible for reading the
# entire set of makefiles included in this invocation, looking for lines of the
# file as xyz: ## something, and then pretty-format the target and help. Then,
# if there's a line with ##@ something, that gets pretty-printed as a category.
# More info on the usage of ANSI control characters for terminal formatting:
# https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_parameters
# More info on the awk command:
# http://linuxcommand.org/lc3_adv_awk.php
.PHONY: help
help: ## Display this help.
@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make \033[36m<target>\033[0m\n"} /^[a-zA-Z_0-9-]+:.*?##/ { printf " \033[36m%-15s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)
##@ Development
.PHONY: manifests
manifests: controller-gen ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.
# Note that the option maxDescLen=0 was added in the default scaffold in order to sort out the issue
# Too long: must have at most 262144 bytes. By using kubectl apply to create / update resources an annotation
# is created by K8s API to store the latest version of the resource ( kubectl.kubernetes.io/last-applied-configuration).
# However, it has a size limit and if the CRD is too big with so many long descriptions as this one it will cause the failure.
$(CONTROLLER_GEN) rbac:roleName=manager-role crd:maxDescLen=0 webhook paths="./..." output:crd:artifacts:config=config/crd/bases
.PHONY: generate
generate: controller-gen ## Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
$(CONTROLLER_GEN) object:headerFile="hack/boilerplate.go.txt" paths="./..."
.PHONY: fmt
fmt: ## Run go fmt against code.
go fmt ./...
.PHONY: vet
vet: ## Run go vet against code.
go vet ./...
.PHONY: test
test: manifests generate fmt vet envtest ## Run tests.
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test $$(go list ./... | grep -v /e2e) -coverprofile cover.out
# TODO(user): To use a different vendor for e2e tests, modify the setup under 'tests/e2e'.
# The default setup assumes Kind is pre-installed and builds/loads the Manager Docker image locally.
# Prometheus and CertManager are installed by default; skip with:
# - PROMETHEUS_INSTALL_SKIP=true
# - CERT_MANAGER_INSTALL_SKIP=true
.PHONY: test-e2e
test-e2e: manifests generate fmt vet ## Run the e2e tests. Expected an isolated environment using Kind.
@command -v kind >/dev/null 2>&1 || { \
echo "Kind is not installed. Please install Kind manually."; \
exit 1; \
}
@kind get clusters | grep -q 'kind' || { \
echo "No Kind cluster is running. Please start a Kind cluster before running the e2e tests."; \
exit 1; \
}
go test ./test/e2e/ -v -ginkgo.v
.PHONY: lint
lint: golangci-lint ## Run golangci-lint linter
$(GOLANGCI_LINT) run
.PHONY: lint-fix
lint-fix: golangci-lint ## Run golangci-lint linter and perform fixes
$(GOLANGCI_LINT) run --fix
##@ Build
.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
go build -o bin/manager cmd/main.go
.PHONY: run
run: manifests generate fmt vet ## Run a controller from your host.
go run ./cmd/main.go
# If you wish to build the manager image targeting other platforms you can use the --platform flag.
# (i.e. docker build --platform linux/arm64). However, you must enable docker buildKit for it.
# More info: https://docs.docker.com/develop/develop-images/build_enhancements/
.PHONY: docker-build
docker-build: ## Build docker image with the manager.
$(CONTAINER_TOOL) build -t ${IMG} .
.PHONY: docker-push
docker-push: ## Push docker image with the manager.
$(CONTAINER_TOOL) push ${IMG}
# PLATFORMS defines the target platforms for the manager image be built to provide support to multiple
# architectures. (i.e. make docker-buildx IMG=myregistry/mypoperator:0.0.1). To use this option you need to:
# - be able to use docker buildx. More info: https://docs.docker.com/build/buildx/
# - have enabled BuildKit. More info: https://docs.docker.com/develop/develop-images/build_enhancements/
# - be able to push the image to your registry (i.e. if you do not set a valid value via IMG=<myregistry/image:<tag>> then the export will fail)
# To adequately provide solutions that are compatible with multiple platforms, you should consider using this option.
PLATFORMS ?= linux/arm64,linux/amd64,linux/s390x,linux/ppc64le
.PHONY: docker-buildx
docker-buildx: ## Build and push docker image for the manager for cross-platform support
# copy existing Dockerfile and insert --platform=${BUILDPLATFORM} into Dockerfile.cross, and preserve the original Dockerfile
sed -e '1 s/\(^FROM\)/FROM --platform=\$$\{BUILDPLATFORM\}/; t' -e ' 1,// s//FROM --platform=\$$\{BUILDPLATFORM\}/' Dockerfile > Dockerfile.cross
- $(CONTAINER_TOOL) buildx create --name project-builder
$(CONTAINER_TOOL) buildx use project-builder
- $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --tag ${IMG} -f Dockerfile.cross .
- $(CONTAINER_TOOL) buildx rm project-builder
rm Dockerfile.cross
.PHONY: build-installer
build-installer: manifests generate kustomize ## Generate a consolidated YAML with CRDs and deployment.
mkdir -p dist
cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}
$(KUSTOMIZE) build config/default > dist/install.yaml
##@ Deployment
ifndef ignore-not-found
ignore-not-found = false
endif
.PHONY: install
install: manifests kustomize ## Install CRDs into the K8s cluster specified in ~/.kube/config.
$(KUSTOMIZE) build config/crd | $(KUBECTL) apply -f -
.PHONY: uninstall
uninstall: manifests kustomize ## Uninstall CRDs from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
$(KUSTOMIZE) build config/crd | $(KUBECTL) delete --ignore-not-found=$(ignore-not-found) -f -
.PHONY: deploy
deploy: manifests kustomize ## Deploy controller to the K8s cluster specified in ~/.kube/config.
cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}
$(KUSTOMIZE) build config/default | $(KUBECTL) apply -f -
.PHONY: undeploy
undeploy: kustomize ## Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
$(KUSTOMIZE) build config/default | $(KUBECTL) delete --ignore-not-found=$(ignore-not-found) -f -
##@ Dependencies
## Location to install dependencies to
LOCALBIN ?= $(shell pwd)/bin
$(LOCALBIN):
mkdir -p $(LOCALBIN)
## Tool Binaries
KUBECTL ?= kubectl
KUSTOMIZE ?= $(LOCALBIN)/kustomize
CONTROLLER_GEN ?= $(LOCALBIN)/controller-gen
ENVTEST ?= $(LOCALBIN)/setup-envtest
GOLANGCI_LINT = $(LOCALBIN)/golangci-lint
## Tool Versions
KUSTOMIZE_VERSION ?= v5.4.3
CONTROLLER_TOOLS_VERSION ?= v0.16.4
ENVTEST_VERSION ?= release-0.19
GOLANGCI_LINT_VERSION ?= v1.59.1
.PHONY: kustomize
kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary.
$(KUSTOMIZE): $(LOCALBIN)
$(call go-install-tool,$(KUSTOMIZE),sigs.k8s.io/kustomize/kustomize/v5,$(KUSTOMIZE_VERSION))
.PHONY: controller-gen
controller-gen: $(CONTROLLER_GEN) ## Download controller-gen locally if necessary.
$(CONTROLLER_GEN): $(LOCALBIN)
$(call go-install-tool,$(CONTROLLER_GEN),sigs.k8s.io/controller-tools/cmd/controller-gen,$(CONTROLLER_TOOLS_VERSION))
.PHONY: envtest
envtest: $(ENVTEST) ## Download setup-envtest locally if necessary.
$(ENVTEST): $(LOCALBIN)
$(call go-install-tool,$(ENVTEST),sigs.k8s.io/controller-runtime/tools/setup-envtest,$(ENVTEST_VERSION))
.PHONY: golangci-lint
golangci-lint: $(GOLANGCI_LINT) ## Download golangci-lint locally if necessary.
$(GOLANGCI_LINT): $(LOCALBIN)
$(call go-install-tool,$(GOLANGCI_LINT),github.com/golangci/golangci-lint/cmd/golangci-lint,$(GOLANGCI_LINT_VERSION))
# go-install-tool will 'go install' any package with custom target and name of binary, if it doesn't exist
# $1 - target path with name of binary
# $2 - package url which can be installed
# $3 - specific version of package
define go-install-tool
@[ -f "$(1)-$(3)" ] || { \
set -e; \
package=$(2)@$(3) ;\
echo "Downloading $${package}" ;\
rm -f $(1) || true ;\
GOBIN=$(LOCALBIN) go install $${package} ;\
mv $(1) $(1)-$(3) ;\
} ;\
ln -sf $(1)-$(3) $(1)
endef
PROJECT
: Kubebuilder metadata for scaffolding new components
# Code generated by tool. DO NOT EDIT.
# This file is used to track the info used to scaffold your project
# and allow the plugins properly work.
# More info: https://book.kubebuilder.io/reference/project-config.html
domain: tutorial.kubebuilder.io
layout:
- go.kubebuilder.io/v4
projectName: project
repo: tutorial.kubebuilder.io/project
resources:
- api:
crdVersion: v1
namespaced: true
controller: true
domain: tutorial.kubebuilder.io
group: batch
kind: CronJob
path: tutorial.kubebuilder.io/project/api/v1
version: v1
webhooks:
defaulting: true
validation: true
webhookVersion: v1
version: "3"
Launch Configuration
We also get launch configurations under the
config/
directory. Right now, it just contains
Kustomize YAML definitions required to
launch our controller on a cluster, but once we get started writing our
controller, it’ll also hold our CustomResourceDefinitions, RBAC
configuration, and WebhookConfigurations.
config/default
contains a Kustomize base for launching
the controller in a standard configuration.
Each other directory contains a different piece of configuration, refactored out into its own base:
-
config/manager
: launch your controllers as pods in the cluster -
config/rbac
: permissions required to run your controllers under their own service account
The Entrypoint
Last, but certainly not least, Kubebuilder scaffolds out the basic
entrypoint of our project: main.go
. Let’s take a look at that next…
Every journey needs a start, every program needs a main
Apache License
Copyright 2022 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Our package starts out with some basic imports. Particularly:
- The core controller-runtime library
- The default controller-runtime logging, Zap (more on that a bit later)
package main
import (
"flag"
"os"
// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
"sigs.k8s.io/controller-runtime/pkg/metrics/server"
"sigs.k8s.io/controller-runtime/pkg/webhook"
// +kubebuilder:scaffold:imports
)
Every set of controllers needs a Scheme, which provides mappings between Kinds and their corresponding Go types. We’ll talk a bit more about Kinds when we write our API definition, so just keep this in mind for later.
var (
scheme = runtime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
)
func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
At this point, our main function is fairly simple:
-
We set up some basic flags for metrics.
-
We instantiate a manager, which keeps track of running all of our controllers, as well as setting up shared caches and clients to the API server (notice we tell the manager about our Scheme).
-
We run our manager, which in turn runs all of our controllers and webhooks. The manager is set up to run until it receives a graceful shutdown signal. This way, when we’re running on Kubernetes, we behave nicely with graceful pod termination.
While we don’t have anything to run just yet, remember where that
+kubebuilder:scaffold:builder
comment is – things’ll get interesting there
soon.
func main() {
var metricsAddr string
var enableLeaderElection bool
var probeAddr string
flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
opts := zap.Options{
Development: true,
}
opts.BindFlags(flag.CommandLine)
flag.Parse()
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
Metrics: server.Options{
BindAddress: metricsAddr,
},
WebhookServer: webhook.NewServer(webhook.Options{Port: 9443}),
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "80807133.tutorial.kubebuilder.io",
})
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
}
Note that the Manager
can restrict the namespace that all controllers will watch for resources by:
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
Cache: cache.Options{
DefaultNamespaces: map[string]cache.Config{
namespace: {},
},
},
Metrics: server.Options{
BindAddress: metricsAddr,
},
WebhookServer: webhook.NewServer(webhook.Options{Port: 9443}),
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "80807133.tutorial.kubebuilder.io",
})
The above example will change the scope of your project to a single Namespace
. In this scenario,
it is also suggested to restrict the provided authorization to this namespace by replacing the default
ClusterRole
and ClusterRoleBinding
to Role
and RoleBinding
respectively.
For further information see the Kubernetes documentation about Using RBAC Authorization.
Also, it is possible to use the DefaultNamespaces
from cache.Options{}
to cache objects in a specific set of namespaces:
var namespaces []string // List of Namespaces
defaultNamespaces := make(map[string]cache.Config)
for _, ns := range namespaces {
defaultNamespaces[ns] = cache.Config{}
}
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
Cache: cache.Options{
DefaultNamespaces: defaultNamespaces,
},
Metrics: server.Options{
BindAddress: metricsAddr,
},
WebhookServer: webhook.NewServer(webhook.Options{Port: 9443}),
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "80807133.tutorial.kubebuilder.io",
})
For further information see cache.Options{}
// +kubebuilder:scaffold:builder
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up health check")
os.Exit(1)
}
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up ready check")
os.Exit(1)
}
setupLog.Info("starting manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
setupLog.Error(err, "problem running manager")
os.Exit(1)
}
}
With that out of the way, we can get on to scaffolding our API!
Groups and Versions and Kinds, oh my!
Actually, before we get started with our API, we should talk terminology a bit.
When we talk about APIs in Kubernetes, we often use 4 terms: groups, versions, kinds, and resources.
Groups and Versions
An API Group in Kubernetes is simply a collection of related functionality. Each group has one or more versions, which, as the name suggests, allow us to change how an API works over time.
Kinds and Resources
Each API group-version contains one or more API types, which we call Kinds. While a Kind may change forms between versions, each form must be able to store all the data of the other forms, somehow (we can store the data in fields, or in annotations). This means that using an older API version won’t cause newer data to be lost or corrupted. See the Kubernetes API guidelines for more information.
You’ll also hear mention of resources on occasion. A resource is simply
a use of a Kind in the API. Often, there’s a one-to-one mapping between
Kinds and resources. For instance, the pods
resource corresponds to the
Pod
Kind. However, sometimes, the same Kind may be returned by multiple
resources. For instance, the Scale
Kind is returned by all scale
subresources, like deployments/scale
or replicasets/scale
. This is
what allows the Kubernetes HorizontalPodAutoscaler to interact with
different resources. With CRDs, however, each Kind will correspond to
a single resource.
Notice that resources are always lowercase, and by convention are the lowercase form of the Kind.
So, how does that correspond to Go?
When we refer to a kind in a particular group-version, we’ll call it a GroupVersionKind, or GVK for short. Same with resources and GVR. As we’ll see shortly, each GVK corresponds to a given root Go type in a package.
Now that we have our terminology straight, we can actually create our API!
So, how can we create our API?
In the next section, Adding a new API, we will check how the tool helps us to
create our own APIs with the command kubebuilder create api
.
The goal of this command is to create Custom Resource (CR) and Custom Resource Definition (CRD) for our Kind(s). To check it further see; Extend the Kubernetes API with CustomResourceDefinitions.
But, why create APIs at all?
New APIs are how we teach Kubernetes about our custom objects. The Go structs are used to generate a CRD which includes the schema for our data as well as tracking data like what our new type is called. We can then create instances of our custom objects which will be managed by our controllers.
Our APIs and resources represent our solutions on the clusters. Basically, the CRDs are a definition of our customized Objects, and the CRs are an instance of it.
Ah, do you have an example?
Let’s think about the classic scenario where the goal is to have an application and its database running on the platform with Kubernetes. Then, one CRD could represent the App, and another one could represent the DB. By having one CRD to describe the App and another one for the DB, we will not be hurting concepts such as encapsulation, the single responsibility principle, and cohesion. Damaging these concepts could cause unexpected side effects, such as difficulty in extending, reuse, or maintenance, just to mention a few.
In this way, we can create the App CRD which will have its controller and which would be responsible for things like creating Deployments that contain the App and creating Services to access it and etc. Similarly, we could create a CRD to represent the DB, and deploy a controller that would manage DB instances.
Err, but what’s that Scheme thing?
The Scheme
we saw before is simply a way to keep track of what Go type
corresponds to a given GVK (don’t be overwhelmed by its
godocs).
For instance, suppose we mark the
"tutorial.kubebuilder.io/api/v1".CronJob{}
type as being in the
batch.tutorial.kubebuilder.io/v1
API group (implicitly saying it has the
Kind CronJob
).
Then, we can later construct a new &CronJob{}
given some JSON from the
API server that says
{
"kind": "CronJob",
"apiVersion": "batch.tutorial.kubebuilder.io/v1",
...
}
or properly look up the group-version when we go to submit a &CronJob{}
in an update.
Adding a new API
To scaffold out a new Kind (you were paying attention to the last
chapter, right?) and corresponding
controller, we can use kubebuilder create api
:
kubebuilder create api --group batch --version v1 --kind CronJob
Press y
for “Create Resource” and “Create Controller”.
The first time we call this command for each group-version, it will create a directory for the new group-version.
In this case, the
api/v1/
directory is created, corresponding to the
batch.tutorial.kubebuilder.io/v1
(remember our --domain
setting from the
beginning?).
It has also added a file for our CronJob
Kind,
api/v1/cronjob_types.go
. Each time we call the command with a different
kind, it’ll add a corresponding new file.
Let’s take a look at what we’ve been given out of the box, then we can move on to filling it out.
Apache License
Copyright 2022.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
We start out simply enough: we import the meta/v1
API group, which is not
normally exposed by itself, but instead contains metadata common to all
Kubernetes Kinds.
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
Next, we define types for the Spec and Status of our Kind. Kubernetes functions
by reconciling desired state (Spec
) with actual cluster state (other objects’
Status
) and external state, and then recording what it observed (Status
).
Thus, every functional object includes spec and status. A few types, like
ConfigMap
don’t follow this pattern, since they don’t encode desired state,
but most types do.
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
// CronJobSpec defines the desired state of CronJob
type CronJobSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file
}
// CronJobStatus defines the observed state of CronJob
type CronJobStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
}
Next, we define the types corresponding to actual Kinds, CronJob
and CronJobList
.
CronJob
is our root type, and describes the CronJob
kind. Like all Kubernetes objects, it contains
TypeMeta
(which describes API version and Kind), and also contains ObjectMeta
, which holds things
like name, namespace, and labels.
CronJobList
is simply a container for multiple CronJob
s. It’s the Kind used in bulk operations,
like LIST.
In general, we never modify either of these – all modifications go in either Spec or Status.
That little +kubebuilder:object:root
comment is called a marker. We’ll see
more of them in a bit, but know that they act as extra metadata, telling
controller-tools (our code and YAML generator) extra information.
This particular one tells the object
generator that this type represents
a Kind. Then, the object
generator generates an implementation of the
runtime.Object interface for us, which is the standard
interface that all types representing Kinds must implement.
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// CronJob is the Schema for the cronjobs API
type CronJob struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec CronJobSpec `json:"spec,omitempty"`
Status CronJobStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// CronJobList contains a list of CronJob
type CronJobList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []CronJob `json:"items"`
}
Finally, we add the Go types to the API group. This allows us to add the types in this API group to any Scheme.
func init() {
SchemeBuilder.Register(&CronJob{}, &CronJobList{})
}
Now that we’ve seen the basic structure, let’s fill it out!
Designing an API
In Kubernetes, we have a few rules for how we design APIs. Namely, all
serialized fields must be camelCase
, so we use JSON struct tags to
specify this. We can also use the omitempty
struct tag to mark that
a field should be omitted from serialization when empty.
Fields may use most of the primitive types. Numbers are the exception:
for API compatibility purposes, we accept three forms of numbers: int32
and int64
for integers, and resource.Quantity
for decimals.
Hold up, what's a Quantity?
Quantities are a special notation for decimal numbers that have an explicitly fixed representation that makes them more portable across machines. You’ve probably noticed them when specifying resources requests and limits on pods in Kubernetes.
They conceptually work similar to floating point numbers: they have a significant, base, and exponent. Their serializable and human readable format uses whole numbers and suffixes to specify values much the way we describe computer storage.
For instance, the value 2m
means 0.002
in decimal notation. 2Ki
means 2048
in decimal, while 2K
means 2000
in decimal. If we want
to specify fractions, we switch to a suffix that lets us use a whole
number: 2.5
is 2500m
.
There are two supported bases: 10 and 2 (called decimal and binary,
respectively). Decimal base is indicated with “normal” SI suffixes (e.g.
M
and K
), while Binary base is specified in “mebi” notation (e.g. Mi
and Ki
). Think megabytes vs
mebibytes.
There’s one other special type that we use: metav1.Time
. This functions
identically to time.Time
, except that it has a fixed, portable
serialization format.
With that out of the way, let’s take a look at what our CronJob object looks like!
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
package v1
Imports
import (
batchv1 "k8s.io/api/batch/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
First, let’s take a look at our spec. As we discussed before, spec holds desired state, so any “inputs” to our controller go here.
Fundamentally a CronJob needs the following pieces:
- A schedule (the cron in CronJob)
- A template for the Job to run (the job in CronJob)
We’ll also want a few extras, which will make our users’ lives easier:
- A deadline for starting jobs (if we miss this deadline, we’ll just wait till the next scheduled time)
- What to do if multiple jobs would run at once (do we wait? stop the old one? run both?)
- A way to pause the running of a CronJob, in case something’s wrong with it
- Limits on old job history
Remember, since we never read our own status, we need to have some other way to keep track of whether a job has run. We can use at least one old job to do this.
We’ll use several markers (// +comment
) to specify additional metadata. These
will be used by controller-tools when generating our CRD manifest.
As we’ll see in a bit, controller-tools will also use GoDoc to form descriptions for
the fields.
// CronJobSpec defines the desired state of CronJob.
type CronJobSpec struct {
// +kubebuilder:validation:MinLength=0
// The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
Schedule string `json:"schedule"`
// +kubebuilder:validation:Minimum=0
// Optional deadline in seconds for starting the job if it misses scheduled
// time for any reason. Missed jobs executions will be counted as failed ones.
// +optional
StartingDeadlineSeconds *int64 `json:"startingDeadlineSeconds,omitempty"`
// Specifies how to treat concurrent executions of a Job.
// Valid values are:
// - "Allow" (default): allows CronJobs to run concurrently;
// - "Forbid": forbids concurrent runs, skipping next run if previous run hasn't finished yet;
// - "Replace": cancels currently running job and replaces it with a new one
// +optional
ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`
// This flag tells the controller to suspend subsequent executions, it does
// not apply to already started executions. Defaults to false.
// +optional
Suspend *bool `json:"suspend,omitempty"`
// Specifies the job that will be created when executing a CronJob.
JobTemplate batchv1.JobTemplateSpec `json:"jobTemplate"`
// +kubebuilder:validation:Minimum=0
// The number of successful finished jobs to retain.
// This is a pointer to distinguish between explicit zero and not specified.
// +optional
SuccessfulJobsHistoryLimit *int32 `json:"successfulJobsHistoryLimit,omitempty"`
// +kubebuilder:validation:Minimum=0
// The number of failed finished jobs to retain.
// This is a pointer to distinguish between explicit zero and not specified.
// +optional
FailedJobsHistoryLimit *int32 `json:"failedJobsHistoryLimit,omitempty"`
}
We define a custom type to hold our concurrency policy. It’s actually just a string under the hood, but the type gives extra documentation, and allows us to attach validation on the type instead of the field, making the validation more easily reusable.
// ConcurrencyPolicy describes how the job will be handled.
// Only one of the following concurrent policies may be specified.
// If none of the following policies is specified, the default one
// is AllowConcurrent.
// +kubebuilder:validation:Enum=Allow;Forbid;Replace
type ConcurrencyPolicy string
const (
// AllowConcurrent allows CronJobs to run concurrently.
AllowConcurrent ConcurrencyPolicy = "Allow"
// ForbidConcurrent forbids concurrent runs, skipping next run if previous
// hasn't finished yet.
ForbidConcurrent ConcurrencyPolicy = "Forbid"
// ReplaceConcurrent cancels currently running job and replaces it with a new one.
ReplaceConcurrent ConcurrencyPolicy = "Replace"
)
Next, let’s design our status, which holds observed state. It contains any information we want users or other controllers to be able to easily obtain.
We’ll keep a list of actively running jobs, as well as the last time that we successfully
ran our job. Notice that we use metav1.Time
instead of time.Time
to get the stable
serialization, as mentioned above.
// CronJobStatus defines the observed state of CronJob.
type CronJobStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
// A list of pointers to currently running jobs.
// +optional
Active []corev1.ObjectReference `json:"active,omitempty"`
// Information when was the last time the job was successfully scheduled.
// +optional
LastScheduleTime *metav1.Time `json:"lastScheduleTime,omitempty"`
}
Finally, we have the rest of the boilerplate that we’ve already discussed. As previously noted, we don’t need to change this, except to mark that we want a status subresource, so that we behave like built-in kubernetes types.
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// CronJob is the Schema for the cronjobs API.
type CronJob struct {
Root Object Definitions
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec CronJobSpec `json:"spec,omitempty"`
Status CronJobStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// CronJobList contains a list of CronJob.
type CronJobList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []CronJob `json:"items"`
}
func init() {
SchemeBuilder.Register(&CronJob{}, &CronJobList{})
}
Now that we have an API, we’ll need to write a controller to actually implement the functionality.
A Brief Aside: What’s the rest of this stuff?
If you’ve taken a peek at the rest of the files in the
api/v1/
directory, you might have noticed two additional files beyond
cronjob_types.go
: groupversion_info.go
and zz_generated.deepcopy.go
.
Neither of these files ever needs to be edited (the former stays the same and the latter is autogenerated), but it’s useful to know what’s in them.
groupversion_info.go
groupversion_info.go
contains common metadata about the group-version:
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
First, we have some package-level markers that denote that there are
Kubernetes objects in this package, and that this package represents the group
batch.tutorial.kubebuilder.io
. The object
generator makes use of the
former, while the latter is used by the CRD generator to generate the right
metadata for the CRDs it creates from this package.
// Package v1 contains API Schema definitions for the batch v1 API group.
// +kubebuilder:object:generate=true
// +groupName=batch.tutorial.kubebuilder.io
package v1
import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/scheme"
)
Then, we have the commonly useful variables that help us set up our Scheme.
Since we need to use all the types in this package in our controller, it’s
helpful (and the convention) to have a convenient method to add all the types to
some other Scheme
. SchemeBuilder makes this easy for us.
var (
// GroupVersion is group version used to register these objects.
GroupVersion = schema.GroupVersion{Group: "batch.tutorial.kubebuilder.io", Version: "v1"}
// SchemeBuilder is used to add go types to the GroupVersionKind scheme.
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}
// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)
zz_generated.deepcopy.go
zz_generated.deepcopy.go
contains the autogenerated implementation of
the aforementioned runtime.Object
interface, which marks all of our root
types as representing Kinds.
The core of the runtime.Object
interface is a deep-copy method,
DeepCopyObject
.
The object
generator in controller-tools also generates two other handy
methods for each root type and all its sub-types: DeepCopy
and
DeepCopyInto
.
What’s in a controller?
Controllers are the core of Kubernetes, and of any operator.
It’s a controller’s job to ensure that, for any given object, the actual state of the world (both the cluster state, and potentially external state like running containers for Kubelet or loadbalancers for a cloud provider) matches the desired state in the object. Each controller focuses on one root Kind, but may interact with other Kinds.
We call this process reconciling.
In controller-runtime, the logic that implements the reconciling for a specific kind is called a Reconciler. A reconciler takes the name of an object, and returns whether or not we need to try again (e.g. in case of errors or periodic controllers, like the HorizontalPodAutoscaler).
Apache License
Copyright 2022.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
First, we start out with some standard imports. As before, we need the core controller-runtime library, as well as the client package, and the package for our API types.
package controllers
import (
"context"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
batchv1 "tutorial.kubebuilder.io/project/api/v1"
)
Next, kubebuilder has scaffolded a basic reconciler struct for us. Pretty much every reconciler needs to log, and needs to be able to fetch objects, so these are added out of the box.
// CronJobReconciler reconciles a CronJob object
type CronJobReconciler struct {
client.Client
Scheme *runtime.Scheme
}
Most controllers eventually end up running on the cluster, so they need RBAC permissions, which we specify using controller-tools RBAC markers. These are the bare minimum permissions needed to run. As we add more functionality, we’ll need to revisit these.
// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs/status,verbs=get;update;patch
The ClusterRole
manifest at config/rbac/role.yaml
is generated from the above markers via controller-gen with the following command:
// make manifests
NOTE: If you receive an error, please run the specified command in the error and re-run make manifests
.
Reconcile
actually performs the reconciling for a single named object.
Our Request just has a name, but we can use the client to fetch
that object from the cache.
We return an empty result and no error, which indicates to controller-runtime that we’ve successfully reconciled this object and don’t need to try again until there’s some changes.
Most controllers need a logging handle and a context, so we set them up here.
The context is used to allow cancellation of
requests, and potentially things like tracing. It’s the first argument to all
client methods. The Background
context is just a basic context without any
extra data or timing restrictions.
The logging handle lets us log. controller-runtime uses structured logging through a library called logr. As we’ll see shortly, logging works by attaching key-value pairs to a static message. We can pre-assign some pairs at the top of our reconcile method to have those attached to all log lines in this reconciler.
func (r *CronJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
_ = log.FromContext(ctx)
// your logic here
return ctrl.Result{}, nil
}
Finally, we add this reconciler to the manager, so that it gets started when the manager is started.
For now, we just note that this reconciler operates on CronJob
s. Later,
we’ll use this to mark that we care about related objects as well.
func (r *CronJobReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&batchv1.CronJob{}).
Complete(r)
}
Now that we’ve seen the basic structure of a reconciler, let’s fill out
the logic for CronJob
s.
Implementing a controller
The basic logic of our CronJob controller is this:
-
Load the named CronJob
-
List all active jobs, and update the status
-
Clean up old jobs according to the history limits
-
Check if we’re suspended (and don’t do anything else if we are)
-
Get the next scheduled run
-
Run a new job if it’s on schedule, not past the deadline, and not blocked by our concurrency policy
-
Requeue when we either see a running job (done automatically) or it’s time for the next scheduled run.
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
We’ll start out with some imports. You’ll see below that we’ll need a few more imports than those scaffolded for us. We’ll talk about each one when we use it.
package controller
import (
"context"
"fmt"
"sort"
"time"
"github.com/robfig/cron"
kbatch "k8s.io/api/batch/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
ref "k8s.io/client-go/tools/reference"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
batchv1 "tutorial.kubebuilder.io/project/api/v1"
)
Next, we’ll need a Clock, which will allow us to fake timing in our tests.
// CronJobReconciler reconciles a CronJob object
type CronJobReconciler struct {
client.Client
Scheme *runtime.Scheme
Clock
}
Clock
We’ll mock out the clock to make it easier to jump around in time while testing,
the “real” clock just calls time.Now
.
type realClock struct{}
func (_ realClock) Now() time.Time { return time.Now() }
// Clock knows how to get the current time.
// It can be used to fake out timing for testing.
type Clock interface {
Now() time.Time
}
Notice that we need a few more RBAC permissions – since we’re creating and managing jobs now, we’ll need permissions for those, which means adding a couple more markers.
// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs/finalizers,verbs=update
// +kubebuilder:rbac:groups=batch,resources=jobs,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=batch,resources=jobs/status,verbs=get
Now, we get to the heart of the controller – the reconciler logic.
var (
scheduledTimeAnnotation = "batch.tutorial.kubebuilder.io/scheduled-at"
)
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the CronJob object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.19.0/pkg/reconcile
func (r *CronJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
1: Load the CronJob by name
We’ll fetch the CronJob using our client. All client methods take a
context (to allow for cancellation) as their first argument, and the object
in question as their last. Get is a bit special, in that it takes a
NamespacedName
as the middle argument (most don’t have a middle argument, as we’ll see
below).
Many client methods also take variadic options at the end.
var cronJob batchv1.CronJob
if err := r.Get(ctx, req.NamespacedName, &cronJob); err != nil {
log.Error(err, "unable to fetch CronJob")
// we'll ignore not-found errors, since they can't be fixed by an immediate
// requeue (we'll need to wait for a new notification), and we can get them
// on deleted requests.
return ctrl.Result{}, client.IgnoreNotFound(err)
}
2: List all active jobs, and update the status
To fully update our status, we’ll need to list all child jobs in this namespace that belong to this CronJob. Similarly to Get, we can use the List method to list the child jobs. Notice that we use variadic options to set the namespace and field match (which is actually an index lookup that we set up below).
var childJobs kbatch.JobList
if err := r.List(ctx, &childJobs, client.InNamespace(req.Namespace), client.MatchingFields{jobOwnerKey: req.Name}); err != nil {
log.Error(err, "unable to list child Jobs")
return ctrl.Result{}, err
}
Once we have all the jobs we own, we’ll split them into active, successful, and failed jobs, keeping track of the most recent run so that we can record it in status. Remember, status should be able to be reconstituted from the state of the world, so it’s generally not a good idea to read from the status of the root object. Instead, you should reconstruct it every run. That’s what we’ll do here.
We can check if a job is “finished” and whether it succeeded or failed using status conditions. We’ll put that logic in a helper to make our code cleaner.
// find the active list of jobs
var activeJobs []*kbatch.Job
var successfulJobs []*kbatch.Job
var failedJobs []*kbatch.Job
var mostRecentTime *time.Time // find the last run so we can update the status
isJobFinished
We consider a job “finished” if it has a “Complete” or “Failed” condition marked as true. Status conditions allow us to add extensible status information to our objects that other humans and controllers can examine to check things like completion and health.
isJobFinished := func(job *kbatch.Job) (bool, kbatch.JobConditionType) {
for _, c := range job.Status.Conditions {
if (c.Type == kbatch.JobComplete || c.Type == kbatch.JobFailed) && c.Status == corev1.ConditionTrue {
return true, c.Type
}
}
return false, ""
}
getScheduledTimeForJob
We’ll use a helper to extract the scheduled time from the annotation that we added during job creation.
getScheduledTimeForJob := func(job *kbatch.Job) (*time.Time, error) {
timeRaw := job.Annotations[scheduledTimeAnnotation]
if len(timeRaw) == 0 {
return nil, nil
}
timeParsed, err := time.Parse(time.RFC3339, timeRaw)
if err != nil {
return nil, err
}
return &timeParsed, nil
}
for i, job := range childJobs.Items {
_, finishedType := isJobFinished(&job)
switch finishedType {
case "": // ongoing
activeJobs = append(activeJobs, &childJobs.Items[i])
case kbatch.JobFailed:
failedJobs = append(failedJobs, &childJobs.Items[i])
case kbatch.JobComplete:
successfulJobs = append(successfulJobs, &childJobs.Items[i])
}
// We'll store the launch time in an annotation, so we'll reconstitute that from
// the active jobs themselves.
scheduledTimeForJob, err := getScheduledTimeForJob(&job)
if err != nil {
log.Error(err, "unable to parse schedule time for child job", "job", &job)
continue
}
if scheduledTimeForJob != nil {
if mostRecentTime == nil || mostRecentTime.Before(*scheduledTimeForJob) {
mostRecentTime = scheduledTimeForJob
}
}
}
if mostRecentTime != nil {
cronJob.Status.LastScheduleTime = &metav1.Time{Time: *mostRecentTime}
} else {
cronJob.Status.LastScheduleTime = nil
}
cronJob.Status.Active = nil
for _, activeJob := range activeJobs {
jobRef, err := ref.GetReference(r.Scheme, activeJob)
if err != nil {
log.Error(err, "unable to make reference to active job", "job", activeJob)
continue
}
cronJob.Status.Active = append(cronJob.Status.Active, *jobRef)
}
Here, we’ll log how many jobs we observed at a slightly higher logging level, for debugging. Notice how instead of using a format string, we use a fixed message, and attach key-value pairs with the extra information. This makes it easier to filter and query log lines.
log.V(1).Info("job count", "active jobs", len(activeJobs), "successful jobs", len(successfulJobs), "failed jobs", len(failedJobs))
Using the data we’ve gathered, we’ll update the status of our CRD.
Just like before, we use our client. To specifically update the status
subresource, we’ll use the Status
part of the client, with the Update
method.
The status subresource ignores changes to spec, so it’s less likely to conflict with any other updates, and can have separate permissions.
if err := r.Status().Update(ctx, &cronJob); err != nil {
log.Error(err, "unable to update CronJob status")
return ctrl.Result{}, err
}
Once we’ve updated our status, we can move on to ensuring that the status of the world matches what we want in our spec.
3: Clean up old jobs according to the history limit
First, we’ll try to clean up old jobs, so that we don’t leave too many lying around.
// NB: deleting these are "best effort" -- if we fail on a particular one,
// we won't requeue just to finish the deleting.
if cronJob.Spec.FailedJobsHistoryLimit != nil {
sort.Slice(failedJobs, func(i, j int) bool {
if failedJobs[i].Status.StartTime == nil {
return failedJobs[j].Status.StartTime != nil
}
return failedJobs[i].Status.StartTime.Before(failedJobs[j].Status.StartTime)
})
for i, job := range failedJobs {
if int32(i) >= int32(len(failedJobs))-*cronJob.Spec.FailedJobsHistoryLimit {
break
}
if err := r.Delete(ctx, job, client.PropagationPolicy(metav1.DeletePropagationBackground)); client.IgnoreNotFound(err) != nil {
log.Error(err, "unable to delete old failed job", "job", job)
} else {
log.V(0).Info("deleted old failed job", "job", job)
}
}
}
if cronJob.Spec.SuccessfulJobsHistoryLimit != nil {
sort.Slice(successfulJobs, func(i, j int) bool {
if successfulJobs[i].Status.StartTime == nil {
return successfulJobs[j].Status.StartTime != nil
}
return successfulJobs[i].Status.StartTime.Before(successfulJobs[j].Status.StartTime)
})
for i, job := range successfulJobs {
if int32(i) >= int32(len(successfulJobs))-*cronJob.Spec.SuccessfulJobsHistoryLimit {
break
}
if err := r.Delete(ctx, job, client.PropagationPolicy(metav1.DeletePropagationBackground)); err != nil {
log.Error(err, "unable to delete old successful job", "job", job)
} else {
log.V(0).Info("deleted old successful job", "job", job)
}
}
}
4: Check if we’re suspended
If this object is suspended, we don’t want to run any jobs, so we’ll stop now. This is useful if something’s broken with the job we’re running and we want to pause runs to investigate or putz with the cluster, without deleting the object.
if cronJob.Spec.Suspend != nil && *cronJob.Spec.Suspend {
log.V(1).Info("cronjob suspended, skipping")
return ctrl.Result{}, nil
}
5: Get the next scheduled run
If we’re not paused, we’ll need to calculate the next scheduled run, and whether or not we’ve got a run that we haven’t processed yet.
getNextSchedule
We’ll calculate the next scheduled time using our helpful cron library. We’ll start calculating appropriate times from our last run, or the creation of the CronJob if we can’t find a last run.
If there are too many missed runs and we don’t have any deadlines set, we’ll bail so that we don’t cause issues on controller restarts or wedges.
Otherwise, we’ll just return the missed runs (of which we’ll just use the latest), and the next run, so that we can know when it’s time to reconcile again.
getNextSchedule := func(cronJob *batchv1.CronJob, now time.Time) (lastMissed time.Time, next time.Time, err error) {
sched, err := cron.ParseStandard(cronJob.Spec.Schedule)
if err != nil {
return time.Time{}, time.Time{}, fmt.Errorf("Unparseable schedule %q: %v", cronJob.Spec.Schedule, err)
}
// for optimization purposes, cheat a bit and start from our last observed run time
// we could reconstitute this here, but there's not much point, since we've
// just updated it.
var earliestTime time.Time
if cronJob.Status.LastScheduleTime != nil {
earliestTime = cronJob.Status.LastScheduleTime.Time
} else {
earliestTime = cronJob.ObjectMeta.CreationTimestamp.Time
}
if cronJob.Spec.StartingDeadlineSeconds != nil {
// controller is not going to schedule anything below this point
schedulingDeadline := now.Add(-time.Second * time.Duration(*cronJob.Spec.StartingDeadlineSeconds))
if schedulingDeadline.After(earliestTime) {
earliestTime = schedulingDeadline
}
}
if earliestTime.After(now) {
return time.Time{}, sched.Next(now), nil
}
starts := 0
for t := sched.Next(earliestTime); !t.After(now); t = sched.Next(t) {
lastMissed = t
// An object might miss several starts. For example, if
// controller gets wedged on Friday at 5:01pm when everyone has
// gone home, and someone comes in on Tuesday AM and discovers
// the problem and restarts the controller, then all the hourly
// jobs, more than 80 of them for one hourly scheduledJob, should
// all start running with no further intervention (if the scheduledJob
// allows concurrency and late starts).
//
// However, if there is a bug somewhere, or incorrect clock
// on controller's server or apiservers (for setting creationTimestamp)
// then there could be so many missed start times (it could be off
// by decades or more), that it would eat up all the CPU and memory
// of this controller. In that case, we want to not try to list
// all the missed start times.
starts++
if starts > 100 {
// We can't get the most recent times so just return an empty slice
return time.Time{}, time.Time{}, fmt.Errorf("Too many missed start times (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.")
}
}
return lastMissed, sched.Next(now), nil
}
// figure out the next times that we need to create
// jobs at (or anything we missed).
missedRun, nextRun, err := getNextSchedule(&cronJob, r.Now())
if err != nil {
log.Error(err, "unable to figure out CronJob schedule")
// we don't really care about requeuing until we get an update that
// fixes the schedule, so don't return an error
return ctrl.Result{}, nil
}
We’ll prep our eventual request to requeue until the next job, and then figure out if we actually need to run.
scheduledResult := ctrl.Result{RequeueAfter: nextRun.Sub(r.Now())} // save this so we can re-use it elsewhere
log = log.WithValues("now", r.Now(), "next run", nextRun)
6: Run a new job if it’s on schedule, not past the deadline, and not blocked by our concurrency policy
If we’ve missed a run, and we’re still within the deadline to start it, we’ll need to run a job.
if missedRun.IsZero() {
log.V(1).Info("no upcoming scheduled times, sleeping until next")
return scheduledResult, nil
}
// make sure we're not too late to start the run
log = log.WithValues("current run", missedRun)
tooLate := false
if cronJob.Spec.StartingDeadlineSeconds != nil {
tooLate = missedRun.Add(time.Duration(*cronJob.Spec.StartingDeadlineSeconds) * time.Second).Before(r.Now())
}
if tooLate {
log.V(1).Info("missed starting deadline for last run, sleeping till next")
// TODO(directxman12): events
return scheduledResult, nil
}
If we actually have to run a job, we’ll need to either wait till existing ones finish, replace the existing ones, or just add new ones. If our information is out of date due to cache delay, we’ll get a requeue when we get up-to-date information.
// figure out how to run this job -- concurrency policy might forbid us from running
// multiple at the same time...
if cronJob.Spec.ConcurrencyPolicy == batchv1.ForbidConcurrent && len(activeJobs) > 0 {
log.V(1).Info("concurrency policy blocks concurrent runs, skipping", "num active", len(activeJobs))
return scheduledResult, nil
}
// ...or instruct us to replace existing ones...
if cronJob.Spec.ConcurrencyPolicy == batchv1.ReplaceConcurrent {
for _, activeJob := range activeJobs {
// we don't care if the job was already deleted
if err := r.Delete(ctx, activeJob, client.PropagationPolicy(metav1.DeletePropagationBackground)); client.IgnoreNotFound(err) != nil {
log.Error(err, "unable to delete active job", "job", activeJob)
return ctrl.Result{}, err
}
}
}
Once we’ve figured out what to do with existing jobs, we’ll actually create our desired job
constructJobForCronJob
We need to construct a job based on our CronJob’s template. We’ll copy over the spec from the template and copy some basic object meta.
Then, we’ll set the “scheduled time” annotation so that we can reconstitute our
LastScheduleTime
field each reconcile.
Finally, we’ll need to set an owner reference. This allows the Kubernetes garbage collector to clean up jobs when we delete the CronJob, and allows controller-runtime to figure out which cronjob needs to be reconciled when a given job changes (is added, deleted, completes, etc).
constructJobForCronJob := func(cronJob *batchv1.CronJob, scheduledTime time.Time) (*kbatch.Job, error) {
// We want job names for a given nominal start time to have a deterministic name to avoid the same job being created twice
name := fmt.Sprintf("%s-%d", cronJob.Name, scheduledTime.Unix())
job := &kbatch.Job{
ObjectMeta: metav1.ObjectMeta{
Labels: make(map[string]string),
Annotations: make(map[string]string),
Name: name,
Namespace: cronJob.Namespace,
},
Spec: *cronJob.Spec.JobTemplate.Spec.DeepCopy(),
}
for k, v := range cronJob.Spec.JobTemplate.Annotations {
job.Annotations[k] = v
}
job.Annotations[scheduledTimeAnnotation] = scheduledTime.Format(time.RFC3339)
for k, v := range cronJob.Spec.JobTemplate.Labels {
job.Labels[k] = v
}
if err := ctrl.SetControllerReference(cronJob, job, r.Scheme); err != nil {
return nil, err
}
return job, nil
}
// actually make the job...
job, err := constructJobForCronJob(&cronJob, missedRun)
if err != nil {
log.Error(err, "unable to construct job from template")
// don't bother requeuing until we get a change to the spec
return scheduledResult, nil
}
// ...and create it on the cluster
if err := r.Create(ctx, job); err != nil {
log.Error(err, "unable to create Job for CronJob", "job", job)
return ctrl.Result{}, err
}
log.V(1).Info("created Job for CronJob run", "job", job)
7: Requeue when we either see a running job or it’s time for the next scheduled run
Finally, we’ll return the result that we prepped above, that says we want to requeue when our next run would need to occur. This is taken as a maximum deadline – if something else changes in between, like our job starts or finishes, we get modified, etc, we might reconcile again sooner.
// we'll requeue once we see the running job, and update our status
return scheduledResult, nil
}
Setup
Finally, we’ll update our setup. In order to allow our reconciler to quickly look up Jobs by their owner, we’ll need an index. We declare an index key that we can later use with the client as a pseudo-field name, and then describe how to extract the indexed value from the Job object. The indexer will automatically take care of namespaces for us, so we just have to extract the owner name if the Job has a CronJob owner.
Additionally, we’ll inform the manager that this controller owns some Jobs, so that it will automatically call Reconcile on the underlying CronJob when a Job changes, is deleted, etc.
var (
jobOwnerKey = ".metadata.controller"
apiGVStr = batchv1.GroupVersion.String()
)
// SetupWithManager sets up the controller with the Manager.
func (r *CronJobReconciler) SetupWithManager(mgr ctrl.Manager) error {
// set up a real clock, since we're not in a test
if r.Clock == nil {
r.Clock = realClock{}
}
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &kbatch.Job{}, jobOwnerKey, func(rawObj client.Object) []string {
// grab the job object, extract the owner...
job := rawObj.(*kbatch.Job)
owner := metav1.GetControllerOf(job)
if owner == nil {
return nil
}
// ...make sure it's a CronJob...
if owner.APIVersion != apiGVStr || owner.Kind != "CronJob" {
return nil
}
// ...and if so, return it
return []string{owner.Name}
}); err != nil {
return err
}
return ctrl.NewControllerManagedBy(mgr).
For(&batchv1.CronJob{}).
Owns(&kbatch.Job{}).
Named("cronjob").
Complete(r)
}
That was a doozy, but now we’ve got a working controller. Let’s test against the cluster, then, if we don’t have any issues, deploy it!
You said something about main?
But first, remember how we said we’d come back to main.go
again? Let’s take a look and see what’s
changed, and what we need to add.
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Imports
package main
import (
"crypto/tls"
"flag"
"os"
// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
"sigs.k8s.io/controller-runtime/pkg/metrics/filters"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
"sigs.k8s.io/controller-runtime/pkg/webhook"
batchv1 "tutorial.kubebuilder.io/project/api/v1"
"tutorial.kubebuilder.io/project/internal/controller"
webhookbatchv1 "tutorial.kubebuilder.io/project/internal/webhook/v1"
// +kubebuilder:scaffold:imports
)
The first difference to notice is that kubebuilder has added the new API
group’s package (batchv1
) to our scheme. This means that we can use those
objects in our controller.
If we would be using any other CRD we would have to add their scheme the same way.
Builtin types such as Job have their scheme added by clientgoscheme
.
var (
scheme = runtime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
)
func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
utilruntime.Must(batchv1.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
The other thing that’s changed is that kubebuilder has added a block calling our
CronJob controller’s SetupWithManager
method.
func main() {
old stuff
var metricsAddr string
var enableLeaderElection bool
var probeAddr string
var secureMetrics bool
var enableHTTP2 bool
var tlsOpts []func(*tls.Config)
flag.StringVar(&metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+
"Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
flag.BoolVar(&secureMetrics, "metrics-secure", true,
"If set, the metrics endpoint is served securely via HTTPS. Use --metrics-secure=false to use HTTP instead.")
flag.BoolVar(&enableHTTP2, "enable-http2", false,
"If set, HTTP/2 will be enabled for the metrics and webhook servers")
opts := zap.Options{
Development: true,
}
opts.BindFlags(flag.CommandLine)
flag.Parse()
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))
// if the enable-http2 flag is false (the default), http/2 should be disabled
// due to its vulnerabilities. More specifically, disabling http/2 will
// prevent from being vulnerable to the HTTP/2 Stream Cancellation and
// Rapid Reset CVEs. For more information see:
// - https://github.com/advisories/GHSA-qppj-fm5r-hxr3
// - https://github.com/advisories/GHSA-4374-p667-p6c8
disableHTTP2 := func(c *tls.Config) {
setupLog.Info("disabling http/2")
c.NextProtos = []string{"http/1.1"}
}
if !enableHTTP2 {
tlsOpts = append(tlsOpts, disableHTTP2)
}
webhookServer := webhook.NewServer(webhook.Options{
TLSOpts: tlsOpts,
})
// Metrics endpoint is enabled in 'config/default/kustomization.yaml'. The Metrics options configure the server.
// More info:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.19.0/pkg/metrics/server
// - https://book.kubebuilder.io/reference/metrics.html
metricsServerOptions := metricsserver.Options{
BindAddress: metricsAddr,
SecureServing: secureMetrics,
// TODO(user): TLSOpts is used to allow configuring the TLS config used for the server. If certificates are
// not provided, self-signed certificates will be generated by default. This option is not recommended for
// production environments as self-signed certificates do not offer the same level of trust and security
// as certificates issued by a trusted Certificate Authority (CA). The primary risk is potentially allowing
// unauthorized access to sensitive metrics data. Consider replacing with CertDir, CertName, and KeyName
// to provide certificates, ensuring the server communicates using trusted and secure certificates.
TLSOpts: tlsOpts,
}
if secureMetrics {
// FilterProvider is used to protect the metrics endpoint with authn/authz.
// These configurations ensure that only authorized users and service accounts
// can access the metrics endpoint. The RBAC are configured in 'config/rbac/kustomization.yaml'. More info:
// https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.19.0/pkg/metrics/filters#WithAuthenticationAndAuthorization
metricsServerOptions.FilterProvider = filters.WithAuthenticationAndAuthorization
}
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
Metrics: metricsServerOptions,
WebhookServer: webhookServer,
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "80807133.tutorial.kubebuilder.io",
// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader don't have to wait
// LeaseDuration time first.
//
// In the default scaffold provided, the program ends immediately after
// the manager stops, so would be fine to enable this option. However,
// if you are doing or is intended to do any operation such as perform cleanups
// after the manager stops then its usage might be unsafe.
// LeaderElectionReleaseOnCancel: true,
})
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
}
if err = (&controller.CronJobReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "CronJob")
os.Exit(1)
}
old stuff
We’ll also set up webhooks for our type, which we’ll talk about next. We just need to add them to the manager. Since we might want to run the webhooks separately, or not run them when testing our controller locally, we’ll put them behind an environment variable.
We’ll just make sure to set ENABLE_WEBHOOKS=false
when we run locally.
// nolint:goconst
if os.Getenv("ENABLE_WEBHOOKS") != "false" {
if err = webhookbatchv1.SetupCronJobWebhookWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create webhook", "webhook", "CronJob")
os.Exit(1)
}
}
// +kubebuilder:scaffold:builder
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up health check")
os.Exit(1)
}
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up ready check")
os.Exit(1)
}
setupLog.Info("starting manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
setupLog.Error(err, "problem running manager")
os.Exit(1)
}
}
Now we can implement our controller.
Implementing defaulting/validating webhooks
If you want to implement admission webhooks
for your CRD, the only thing you need to do is to implement the CustomDefaulter
and (or) the CustomValidator
interface.
Kubebuilder takes care of the rest for you, such as
- Creating the webhook server.
- Ensuring the server has been added in the manager.
- Creating handlers for your webhooks.
- Registering each handler with a path in your server.
First, let’s scaffold the webhooks for our CRD (CronJob). We’ll need to run the following command with the --defaulting
and --programmatic-validation
flags (since our test project will use defaulting and validating webhooks):
kubebuilder create webhook --group batch --version v1 --kind CronJob --defaulting --programmatic-validation
This will scaffold the webhook functions and register your webhook with the manager in your main.go
for you.
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Go imports
package v1
import (
"context"
"fmt"
"github.com/robfig/cron"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime/schema"
validationutils "k8s.io/apimachinery/pkg/util/validation"
"k8s.io/apimachinery/pkg/util/validation/field"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
logf "sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/webhook"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"
batchv1 "tutorial.kubebuilder.io/project/api/v1"
)
Next, we’ll setup a logger for the webhooks.
var cronjoblog = logf.Log.WithName("cronjob-resource")
Then, we set up the webhook with the manager.
// SetupCronJobWebhookWithManager registers the webhook for CronJob in the manager.
func SetupCronJobWebhookWithManager(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr).For(&batchv1.CronJob{}).
WithValidator(&CronJobCustomValidator{}).
WithDefaulter(&CronJobCustomDefaulter{
DefaultConcurrencyPolicy: batchv1.AllowConcurrent,
DefaultSuspend: false,
DefaultSuccessfulJobsHistoryLimit: 3,
DefaultFailedJobsHistoryLimit: 1,
}).
Complete()
}
Notice that we use kubebuilder markers to generate webhook manifests. This marker is responsible for generating a mutating webhook manifest.
The meaning of each marker can be found here.
This marker is responsible for generating a mutation webhook manifest.
// +kubebuilder:webhook:path=/mutate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=true,failurePolicy=fail,sideEffects=None,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=mcronjob-v1.kb.io,admissionReviewVersions=v1
// CronJobCustomDefaulter struct is responsible for setting default values on the custom resource of the
// Kind CronJob when those are created or updated.
//
// NOTE: The +kubebuilder:object:generate=false marker prevents controller-gen from generating DeepCopy methods,
// as it is used only for temporary operations and does not need to be deeply copied.
type CronJobCustomDefaulter struct {
// Default values for various CronJob fields
DefaultConcurrencyPolicy batchv1.ConcurrencyPolicy
DefaultSuspend bool
DefaultSuccessfulJobsHistoryLimit int32
DefaultFailedJobsHistoryLimit int32
}
var _ webhook.CustomDefaulter = &CronJobCustomDefaulter{}
We use the webhook.CustomDefaulter
interface to set defaults to our CRD.
A webhook will automatically be served that calls this defaulting.
The Default
method is expected to mutate the receiver, setting the defaults.
// Default implements webhook.CustomDefaulter so a webhook will be registered for the Kind CronJob.
func (d *CronJobCustomDefaulter) Default(ctx context.Context, obj runtime.Object) error {
cronjob, ok := obj.(*batchv1.CronJob)
if !ok {
return fmt.Errorf("expected an CronJob object but got %T", obj)
}
cronjoblog.Info("Defaulting for CronJob", "name", cronjob.GetName())
// Set default values
d.applyDefaults(cronjob)
return nil
}
// applyDefaults applies default values to CronJob fields.
func (d *CronJobCustomDefaulter) applyDefaults(cronJob *batchv1.CronJob) {
if cronJob.Spec.ConcurrencyPolicy == "" {
cronJob.Spec.ConcurrencyPolicy = d.DefaultConcurrencyPolicy
}
if cronJob.Spec.Suspend == nil {
cronJob.Spec.Suspend = new(bool)
*cronJob.Spec.Suspend = d.DefaultSuspend
}
if cronJob.Spec.SuccessfulJobsHistoryLimit == nil {
cronJob.Spec.SuccessfulJobsHistoryLimit = new(int32)
*cronJob.Spec.SuccessfulJobsHistoryLimit = d.DefaultSuccessfulJobsHistoryLimit
}
if cronJob.Spec.FailedJobsHistoryLimit == nil {
cronJob.Spec.FailedJobsHistoryLimit = new(int32)
*cronJob.Spec.FailedJobsHistoryLimit = d.DefaultFailedJobsHistoryLimit
}
}
We can validate our CRD beyond what’s possible with declarative validation. Generally, declarative validation should be sufficient, but sometimes more advanced use cases call for complex validation.
For instance, we’ll see below that we use this to validate a well-formed cron schedule without making up a long regular expression.
If webhook.CustomValidator
interface is implemented, a webhook will automatically be
served that calls the validation.
The ValidateCreate
, ValidateUpdate
and ValidateDelete
methods are expected
to validate its receiver upon creation, update and deletion respectively.
We separate out ValidateCreate from ValidateUpdate to allow behavior like making
certain fields immutable, so that they can only be set on creation.
ValidateDelete is also separated from ValidateUpdate to allow different
validation behavior on deletion.
Here, however, we just use the same shared validation for ValidateCreate
and
ValidateUpdate
. And we do nothing in ValidateDelete
, since we don’t need to
validate anything on deletion.
This marker is responsible for generating a validation webhook manifest.
// +kubebuilder:webhook:path=/validate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=false,failurePolicy=fail,sideEffects=None,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=vcronjob-v1.kb.io,admissionReviewVersions=v1
// CronJobCustomValidator struct is responsible for validating the CronJob resource
// when it is created, updated, or deleted.
//
// NOTE: The +kubebuilder:object:generate=false marker prevents controller-gen from generating DeepCopy methods,
// as this struct is used only for temporary operations and does not need to be deeply copied.
type CronJobCustomValidator struct {
//TODO(user): Add more fields as needed for validation
}
var _ webhook.CustomValidator = &CronJobCustomValidator{}
// ValidateCreate implements webhook.CustomValidator so a webhook will be registered for the type CronJob.
func (v *CronJobCustomValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
cronjob, ok := obj.(*batchv1.CronJob)
if !ok {
return nil, fmt.Errorf("expected a CronJob object but got %T", obj)
}
cronjoblog.Info("Validation for CronJob upon creation", "name", cronjob.GetName())
return nil, validateCronJob(cronjob)
}
// ValidateUpdate implements webhook.CustomValidator so a webhook will be registered for the type CronJob.
func (v *CronJobCustomValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) {
cronjob, ok := newObj.(*batchv1.CronJob)
if !ok {
return nil, fmt.Errorf("expected a CronJob object for the newObj but got %T", newObj)
}
cronjoblog.Info("Validation for CronJob upon update", "name", cronjob.GetName())
return nil, validateCronJob(cronjob)
}
// ValidateDelete implements webhook.CustomValidator so a webhook will be registered for the type CronJob.
func (v *CronJobCustomValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
cronjob, ok := obj.(*batchv1.CronJob)
if !ok {
return nil, fmt.Errorf("expected a CronJob object but got %T", obj)
}
cronjoblog.Info("Validation for CronJob upon deletion", "name", cronjob.GetName())
// TODO(user): fill in your validation logic upon object deletion.
return nil, nil
}
We validate the name and the spec of the CronJob.
// validateCronJob validates the fields of a CronJob object.
func validateCronJob(cronjob *batchv1.CronJob) error {
var allErrs field.ErrorList
if err := validateCronJobName(cronjob); err != nil {
allErrs = append(allErrs, err)
}
if err := validateCronJobSpec(cronjob); err != nil {
allErrs = append(allErrs, err)
}
if len(allErrs) == 0 {
return nil
}
return apierrors.NewInvalid(
schema.GroupKind{Group: "batch.tutorial.kubebuilder.io", Kind: "CronJob"},
cronjob.Name, allErrs)
}
Some fields are declaratively validated by OpenAPI schema.
You can find kubebuilder validation markers (prefixed
with // +kubebuilder:validation
) in the
Designing an API section.
You can find all of the kubebuilder supported markers for
declaring validation by running controller-gen crd -w
,
or here.
func validateCronJobSpec(cronjob *batchv1.CronJob) *field.Error {
// The field helpers from the kubernetes API machinery help us return nicely
// structured validation errors.
return validateScheduleFormat(
cronjob.Spec.Schedule,
field.NewPath("spec").Child("schedule"))
}
We’ll need to validate the cron schedule is well-formatted.
func validateScheduleFormat(schedule string, fldPath *field.Path) *field.Error {
if _, err := cron.ParseStandard(schedule); err != nil {
return field.Invalid(fldPath, schedule, err.Error())
}
return nil
}
Validate object name
Validating the length of a string field can be done declaratively by the validation schema.
But the ObjectMeta.Name
field is defined in a shared package under
the apimachinery repo, so we can’t declaratively validate it using
the validation schema.
func validateCronJobName(cronjob *batchv1.CronJob) *field.Error {
if len(cronjob.ObjectMeta.Name) > validationutils.DNS1035LabelMaxLength-11 {
// The job name length is 63 characters like all Kubernetes objects
// (which must fit in a DNS subdomain). The cronjob controller appends
// a 11-character suffix to the cronjob (`-$TIMESTAMP`) when creating
// a job. The job name length limit is 63 characters. Therefore cronjob
// names must have length <= 63-11=52. If we don't validate this here,
// then job creation will fail later.
return field.Invalid(field.NewPath("metadata").Child("name"), cronjob.ObjectMeta.Name, "must be no more than 52 characters")
}
return nil
}
Running and deploying the controller
Optional
If opting to make any changes to the API definitions, then before proceeding, generate the manifests like CRs or CRDs with
make manifests
To test out the controller, we can run it locally against the cluster. Before we do so, though, we’ll need to install our CRDs, as per the quick start. This will automatically update the YAML manifests using controller-tools, if needed:
make install
Now that we’ve installed our CRDs, we can run the controller against our cluster. This will use whatever credentials that we connect to the cluster with, so we don’t need to worry about RBAC just yet.
In a separate terminal, run
export ENABLE_WEBHOOKS=false
make run
You should see logs from the controller about starting up, but it won’t do anything just yet.
At this point, we need a CronJob to test with. Let’s write a sample to
config/samples/batch_v1_cronjob.yaml
, and use that:
apiVersion: batch.tutorial.kubebuilder.io/v1
kind: CronJob
metadata:
labels:
app.kubernetes.io/name: project
app.kubernetes.io/managed-by: kustomize
name: cronjob-sample
spec:
schedule: "*/1 * * * *"
startingDeadlineSeconds: 60
concurrencyPolicy: Allow # explicitly specify, but Allow is also default.
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
kubectl create -f config/samples/batch_v1_cronjob.yaml
At this point, you should see a flurry of activity. If you watch the changes, you should see your cronjob running, and updating status:
kubectl get cronjob.batch.tutorial.kubebuilder.io -o yaml
kubectl get job
Now that we know it’s working, we can run it in the cluster. Stop the
make run
invocation, and run
make docker-build docker-push IMG=<some-registry>/<project-name>:tag
make deploy IMG=<some-registry>/<project-name>:tag
If we list cronjobs again like we did before, we should see the controller functioning again!
Deploying cert-manager
We suggest using cert-manager for provisioning the certificates for the webhook server. Other solutions should also work as long as they put the certificates in the desired location.
You can follow the cert-manager documentation to install it.
cert-manager also has a component called CA
Injector, which is responsible for
injecting the CA bundle into the MutatingWebhookConfiguration
/ ValidatingWebhookConfiguration
.
To accomplish that, you need to use an annotation with key
cert-manager.io/inject-ca-from
in the MutatingWebhookConfiguration
/ ValidatingWebhookConfiguration
objects.
The value of the annotation should point to an existing certificate request instance
in the format of <certificate-namespace>/<certificate-name>
.
This is the kustomize patch we
used for annotating the MutatingWebhookConfiguration
/ ValidatingWebhookConfiguration
objects.
Deploying Admission Webhooks
cert-manager
You need to follow this to install the cert-manager bundle.
Build your image
Run the following command to build your image locally.
make docker-build docker-push IMG=<some-registry>/<project-name>:tag
Deploy Webhooks
You need to enable the webhook and cert manager configuration through kustomize.
config/default/kustomization.yaml
should now look like the following:
# Adds namespace to all resources.
namespace: project-system
# Value of this field is prepended to the
# names of all resources, e.g. a deployment named
# "wordpress" becomes "alices-wordpress".
# Note that it should also match with the prefix (text before '-') of the namespace
# field above.
namePrefix: project-
# Labels to add to all resources and selectors.
#labels:
#- includeSelectors: true
# pairs:
# someName: someValue
resources:
- ../crd
- ../rbac
- ../manager
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
- ../webhook
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
- ../prometheus
# [METRICS] Expose the controller manager metrics service.
- metrics_service.yaml
# [NETWORK POLICY] Protect the /metrics endpoint and Webhook Server with NetworkPolicy.
# Only Pod(s) running a namespace labeled with 'metrics: enabled' will be able to gather the metrics.
# Only CR(s) which requires webhooks and are applied on namespaces labeled with 'webhooks: enabled' will
# be able to communicate with the Webhook Server.
#- ../network-policy
# Uncomment the patches line if you enable Metrics, and/or are using webhooks and cert-manager
patches:
# [METRICS] The following patch will enable the metrics endpoint using HTTPS and the port :8443.
# More info: https://book.kubebuilder.io/reference/metrics
- path: manager_metrics_patch.yaml
target:
kind: Deployment
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
- path: manager_webhook_patch.yaml
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
# Uncomment the following replacements to add the cert-manager CA injection annotations
replacements:
- source: # Uncomment the following block if you have a ValidatingWebhook (--programmatic-validation)
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # This name should match the one in certificate.yaml
fieldPath: .metadata.namespace # Namespace of the certificate CR
targets:
- select:
kind: ValidatingWebhookConfiguration
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 0
create: true
- source:
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # This name should match the one in certificate.yaml
fieldPath: .metadata.name
targets:
- select:
kind: ValidatingWebhookConfiguration
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 1
create: true
- source: # Uncomment the following block if you have a DefaultingWebhook (--defaulting )
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # This name should match the one in certificate.yaml
fieldPath: .metadata.namespace # Namespace of the certificate CR
targets:
- select:
kind: MutatingWebhookConfiguration
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 0
create: true
- source:
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # This name should match the one in certificate.yaml
fieldPath: .metadata.name
targets:
- select:
kind: MutatingWebhookConfiguration
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 1
create: true
- source: # Uncomment the following block if you have a ConversionWebhook (--conversion)
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # This name should match the one in certificate.yaml
fieldPath: .metadata.namespace # Namespace of the certificate CR
targets:
- select:
kind: CustomResourceDefinition
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 0
create: true
- source:
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # This name should match the one in certificate.yaml
fieldPath: .metadata.name
targets:
- select:
kind: CustomResourceDefinition
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 1
create: true
- source: # Uncomment the following block if you enable cert-manager
kind: Service
version: v1
name: webhook-service
fieldPath: .metadata.name # Name of the service
targets:
- select:
kind: Certificate
group: cert-manager.io
version: v1
fieldPaths:
- .spec.dnsNames.0
- .spec.dnsNames.1
options:
delimiter: '.'
index: 0
create: true
- source:
kind: Service
version: v1
name: webhook-service
fieldPath: .metadata.namespace # Namespace of the service
targets:
- select:
kind: Certificate
group: cert-manager.io
version: v1
fieldPaths:
- .spec.dnsNames.0
- .spec.dnsNames.1
options:
delimiter: '.'
index: 1
create: true
And config/crd/kustomization.yaml
should now look like the following:
# This kustomization.yaml is not intended to be run by itself,
# since it depends on service name and namespace that are out of this kustomize package.
# It should be run by config/default
resources:
- bases/batch.tutorial.kubebuilder.io_cronjobs.yaml
# +kubebuilder:scaffold:crdkustomizeresource
patches:
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix.
# patches here are for enabling the conversion webhook for each CRD
- path: patches/webhook_in_cronjobs.yaml
# +kubebuilder:scaffold:crdkustomizewebhookpatch
# [CERTMANAGER] To enable cert-manager, uncomment all the sections with [CERTMANAGER] prefix.
# patches here are for enabling the CA injection for each CRD
- path: patches/cainjection_in_cronjobs.yaml
# +kubebuilder:scaffold:crdkustomizecainjectionpatch
# [WEBHOOK] To enable webhook, uncomment the following section
# the following config is for teaching kustomize how to do kustomization for CRDs.
configurations:
- kustomizeconfig.yaml
Now you can deploy it to your cluster by
make deploy IMG=<some-registry>/<project-name>:tag
Wait a while till the webhook pod comes up and the certificates are provisioned. It usually completes within 1 minute.
Now you can create a valid CronJob to test your webhooks. The creation should successfully go through.
kubectl create -f config/samples/batch_v1_cronjob.yaml
You can also try to create an invalid CronJob (e.g. use an ill-formatted schedule field). You should see a creation failure with a validation error.
Writing controller tests
Testing Kubernetes controllers is a big subject, and the boilerplate testing files generated for you by kubebuilder are fairly minimal.
To walk you through integration testing patterns for Kubebuilder-generated controllers, we will revisit the CronJob we built in our first tutorial and write a simple test for it.
The basic approach is that, in your generated suite_test.go
file, you will use envtest to create a local Kubernetes API server, instantiate and run your controllers, and then write additional *_test.go
files to test it using Ginkgo.
If you want to tinker with how your envtest cluster is configured, see section Configuring envtest for integration tests as well as the envtest docs
.
Test Environment Setup
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Imports
When we created the CronJob API with kubebuilder create api
in a previous chapter, Kubebuilder already did some test work for you.
Kubebuilder scaffolded a internal/controller/suite_test.go
file that does the bare bones of setting up a test environment.
First, it will contain the necessary imports.
package controller
import (
"context"
"fmt"
"path/filepath"
"runtime"
"testing"
ctrl "sigs.k8s.io/controller-runtime"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"k8s.io/client-go/kubernetes/scheme"
"k8s.io/client-go/rest"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/envtest"
logf "sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
batchv1 "tutorial.kubebuilder.io/project/api/v1"
// +kubebuilder:scaffold:imports
)
// These tests use Ginkgo (BDD-style Go testing framework). Refer to
// http://onsi.github.io/ginkgo/ to learn more about Ginkgo.
Now, let’s go through the code generated.
var (
cfg *rest.Config
k8sClient client.Client // You'll be using this client in your tests.
testEnv *envtest.Environment
)
var ctx context.Context
var cancel context.CancelFunc
func TestControllers(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Controller Suite")
}
var _ = BeforeSuite(func() {
logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(true)))
ctx, cancel = context.WithCancel(context.TODO())
First, the envtest cluster is configured to read CRDs from the CRD directory Kubebuilder scaffolds for you.
By("bootstrapping test environment")
testEnv = &envtest.Environment{
CRDDirectoryPaths: []string{filepath.Join("..", "..", "config", "crd", "bases")},
ErrorIfCRDPathMissing: true,
// The BinaryAssetsDirectory is only required if you want to run the tests directly
// without call the makefile target test. If not informed it will look for the
// default path defined in controller-runtime which is /usr/local/kubebuilder/.
// Note that you must have the required binaries setup under the bin directory to perform
// the tests directly. When we run make test it will be setup and used automatically.
BinaryAssetsDirectory: filepath.Join("..", "..", "bin", "k8s",
fmt.Sprintf("1.31.0-%s-%s", runtime.GOOS, runtime.GOARCH)),
}
Then, we start the envtest cluster.
var err error
// cfg is defined in this file globally.
cfg, err = testEnv.Start()
Expect(err).NotTo(HaveOccurred())
Expect(cfg).NotTo(BeNil())
The autogenerated test code will add the CronJob Kind schema to the default client-go k8s scheme. This ensures that the CronJob API/Kind will be used in our test controller.
err = batchv1.AddToScheme(scheme.Scheme)
Expect(err).NotTo(HaveOccurred())
After the schemas, you will see the following marker. This marker is what allows new schemas to be added here automatically when a new API is added to the project.
// +kubebuilder:scaffold:scheme
A client is created for our test CRUD operations.
k8sClient, err = client.New(cfg, client.Options{Scheme: scheme.Scheme})
Expect(err).NotTo(HaveOccurred())
Expect(k8sClient).NotTo(BeNil())
One thing that this autogenerated file is missing, however, is a way to actually start your controller. The code above will set up a client for interacting with your custom Kind, but will not be able to test your controller behavior. If you want to test your custom controller logic, you’ll need to add some familiar-looking manager logic to your BeforeSuite() function, so you can register your custom controller to run on this test cluster.
You may notice that the code below runs your controller with nearly identical logic to your CronJob project’s main.go! The only difference is that the manager is started in a separate goroutine so it does not block the cleanup of envtest when you’re done running your tests.
Note that we set up both a “live” k8s client and a separate client from the manager. This is because when making
assertions in tests, you generally want to assert against the live state of the API server. If you use the client
from the manager (k8sManager.GetClient
), you’d end up asserting against the contents of the cache instead, which is
slower and can introduce flakiness into your tests. We could use the manager’s APIReader
to accomplish the same
thing, but that would leave us with two clients in our test assertions and setup (one for reading, one for writing),
and it’d be easy to make mistakes.
Note that we keep the reconciler running against the manager’s cache client, though – we want our controller to behave as it would in production, and we use features of the cache (like indices) in our controller which aren’t available when talking directly to the API server.
k8sManager, err := ctrl.NewManager(cfg, ctrl.Options{
Scheme: scheme.Scheme,
})
Expect(err).ToNot(HaveOccurred())
err = (&CronJobReconciler{
Client: k8sManager.GetClient(),
Scheme: k8sManager.GetScheme(),
}).SetupWithManager(k8sManager)
Expect(err).ToNot(HaveOccurred())
go func() {
defer GinkgoRecover()
err = k8sManager.Start(ctx)
Expect(err).ToNot(HaveOccurred(), "failed to run manager")
}()
})
Kubebuilder also generates boilerplate functions for cleaning up envtest and actually running your test files in your controllers/ directory. You won’t need to touch these.
var _ = AfterSuite(func() {
By("tearing down the test environment")
cancel()
err := testEnv.Stop()
Expect(err).NotTo(HaveOccurred())
})
Now that you have your controller running on a test cluster and a client ready to perform operations on your CronJob, we can start writing integration tests!
Testing your Controller’s Behavior
Apache License
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Ideally, we should have one <kind>_controller_test.go
for each controller scaffolded and called in the suite_test.go
.
So, let’s write our example test for the CronJob controller (cronjob_controller_test.go.
)
Imports
As usual, we start with the necessary imports. We also define some utility variables.
package controller
import (
"context"
"reflect"
"time"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
batchv1 "k8s.io/api/batch/v1"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
cronjobv1 "tutorial.kubebuilder.io/project/api/v1"
)
The first step to writing a simple integration test is to actually create an instance of CronJob you can run tests against. Note that to create a CronJob, you’ll need to create a stub CronJob struct that contains your CronJob’s specifications.
Note that when we create a stub CronJob, the CronJob also needs stubs of its required downstream objects. Without the stubbed Job template spec and the Pod template spec below, the Kubernetes API will not be able to create the CronJob.
var _ = Describe("CronJob controller", func() {
// Define utility constants for object names and testing timeouts/durations and intervals.
const (
CronjobName = "test-cronjob"
CronjobNamespace = "default"
JobName = "test-job"
timeout = time.Second * 10
duration = time.Second * 10
interval = time.Millisecond * 250
)
Context("When updating CronJob Status", func() {
It("Should increase CronJob Status.Active count when new Jobs are created", func() {
By("By creating a new CronJob")
ctx := context.Background()
cronJob := &cronjobv1.CronJob{
TypeMeta: metav1.TypeMeta{
APIVersion: "batch.tutorial.kubebuilder.io/v1",
Kind: "CronJob",
},
ObjectMeta: metav1.ObjectMeta{
Name: CronjobName,
Namespace: CronjobNamespace,
},
Spec: cronjobv1.CronJobSpec{
Schedule: "1 * * * *",
JobTemplate: batchv1.JobTemplateSpec{
Spec: batchv1.JobSpec{
// For simplicity, we only fill out the required fields.
Template: v1.PodTemplateSpec{
Spec: v1.PodSpec{
// For simplicity, we only fill out the required fields.
Containers: []v1.Container{
{
Name: "test-container",
Image: "test-image",
},
},
RestartPolicy: v1.RestartPolicyOnFailure,
},
},
},
},
},
}
Expect(k8sClient.Create(ctx, cronJob)).To(Succeed())
After creating this CronJob, let’s check that the CronJob’s Spec fields match what we passed in.
Note that, because the k8s apiserver may not have finished creating a CronJob after our Create()
call from earlier, we will use Gomega’s Eventually() testing function instead of Expect() to give the apiserver an opportunity to finish creating our CronJob.
Eventually()
will repeatedly run the function provided as an argument every interval seconds until
(a) the assertions done by the passed-in Gomega
succeed, or
(b) the number of attempts * interval period exceed the provided timeout value.
In the examples below, timeout and interval are Go Duration values of our choosing.
cronjobLookupKey := types.NamespacedName{Name: CronjobName, Namespace: CronjobNamespace}
createdCronjob := &cronjobv1.CronJob{}
// We'll need to retry getting this newly created CronJob, given that creation may not immediately happen.
Eventually(func(g Gomega) {
g.Expect(k8sClient.Get(ctx, cronjobLookupKey, createdCronjob)).To(Succeed())
}, timeout, interval).Should(Succeed())
// Let's make sure our Schedule string value was properly converted/handled.
Expect(createdCronjob.Spec.Schedule).To(Equal("1 * * * *"))
Now that we’ve created a CronJob in our test cluster, the next step is to write a test that actually tests our CronJob controller’s behavior. Let’s test the CronJob controller’s logic responsible for updating CronJob.Status.Active with actively running jobs. We’ll verify that when a CronJob has a single active downstream Job, its CronJob.Status.Active field contains a reference to this Job.
First, we should get the test CronJob we created earlier, and verify that it currently does not have any active jobs.
We use Gomega’s Consistently()
check here to ensure that the active job count remains 0 over a duration of time.
By("By checking the CronJob has zero active Jobs")
Consistently(func(g Gomega) {
g.Expect(k8sClient.Get(ctx, cronjobLookupKey, createdCronjob)).To(Succeed())
g.Expect(createdCronjob.Status.Active).To(HaveLen(0))
}, duration, interval).Should(Succeed())
Next, we actually create a stubbed Job that will belong to our CronJob, as well as its downstream template specs. We set the Job’s status’s “Active” count to 2 to simulate the Job running two pods, which means the Job is actively running.
We then take the stubbed Job and set its owner reference to point to our test CronJob. This ensures that the test Job belongs to, and is tracked by, our test CronJob. Once that’s done, we create our new Job instance.
By("By creating a new Job")
testJob := &batchv1.Job{
ObjectMeta: metav1.ObjectMeta{
Name: JobName,
Namespace: CronjobNamespace,
},
Spec: batchv1.JobSpec{
Template: v1.PodTemplateSpec{
Spec: v1.PodSpec{
// For simplicity, we only fill out the required fields.
Containers: []v1.Container{
{
Name: "test-container",
Image: "test-image",
},
},
RestartPolicy: v1.RestartPolicyOnFailure,
},
},
},
}
// Note that your CronJob’s GroupVersionKind is required to set up this owner reference.
kind := reflect.TypeOf(cronjobv1.CronJob{}).Name()
gvk := cronjobv1.GroupVersion.WithKind(kind)
controllerRef := metav1.NewControllerRef(createdCronjob, gvk)
testJob.SetOwnerReferences([]metav1.OwnerReference{*controllerRef})
Expect(k8sClient.Create(ctx, testJob)).To(Succeed())
// Note that you can not manage the status values while creating the resource.
// The status field is managed separately to reflect the current state of the resource.
// Therefore, it should be updated using a PATCH or PUT operation after the resource has been created.
// Additionally, it is recommended to use StatusConditions to manage the status. For further information see:
// https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
testJob.Status.Active = 2
Expect(k8sClient.Status().Update(ctx, testJob)).To(Succeed())
Adding this Job to our test CronJob should trigger our controller’s reconciler logic. After that, we can write a test that evaluates whether our controller eventually updates our CronJob’s Status field as expected!
By("By checking that the CronJob has one active Job")
Eventually(func(g Gomega) {
g.Expect(k8sClient.Get(ctx, cronjobLookupKey, createdCronjob)).To(Succeed(), "should GET the CronJob")
g.Expect(createdCronjob.Status.Active).To(HaveLen(1), "should have exactly one active job")
g.Expect(createdCronjob.Status.Active[0].Name).To(Equal(JobName), "the wrong job is active")
}, timeout, interval).Should(Succeed(), "should list our active job %s in the active jobs list in status", JobName)
})
})
})
After writing all this code, you can run go test ./...
in your controllers/
directory again to run your new test!
This Status update example above demonstrates a general testing strategy for a custom Kind with downstream objects. By this point, you hopefully have learned the following methods for testing your controller behavior:
- Setting up your controller to run on an envtest cluster
- Writing stubs for creating test objects
- Isolating changes to an object to test specific controller behavior
Advanced Examples
There are more involved examples of using envtest to rigorously test controller behavior. Examples include:
- Azure Databricks Operator: see their fully fleshed-out
suite_test.go
as well as any*_test.go
file in that directory like this one.
Epilogue
By this point, we’ve got a pretty full-featured implementation of the CronJob controller, made use of most of the features of Kubebuilder, and written tests for the controller using envtest.
If you want more, head over to the Multi-Version Tutorial to learn how to add new API versions to a project.
Additionally, you can try the following steps on your own – we’ll have a tutorial section on them Soon™:
- adding additional printer columns
kubectl get
Tutorial: Multi-Version API
Most projects start out with an alpha API that changes release to release. However, eventually, most projects will need to move to a more stable API. Once your API is stable though, you can’t make breaking changes to it. That’s where API versions come into play.
Let’s make some changes to the CronJob
API spec and make sure all the
different versions are supported by our CronJob project.
If you haven’t already, make sure you’ve gone through the base CronJob Tutorial.
Next, let’s figure out what changes we want to make…
Changing things up
A fairly common change in a Kubernetes API is to take some data that used
to be unstructured or stored in some special string format, and change it
to structured data. Our schedule
field fits the bill quite nicely for
this – right now, in v1
, our schedules look like
schedule: "*/1 * * * *"
That’s a pretty textbook example of a special string format (it’s also pretty unreadable unless you’re a Unix sysadmin).
Let’s make it a bit more structured. According to our CronJob code, we support “standard” Cron format.
In Kubernetes, all versions must be safely round-tripable through each other. This means that if we convert from version 1 to version 2, and then back to version 1, we must not lose information. Thus, any change we make to our API must be compatible with whatever we supported in v1, and also need to make sure anything we add in v2 is supported in v1. In some cases, this means we need to add new fields to v1, but in our case, we won’t have to, since we’re not adding new functionality.
Keeping all that in mind, let’s convert our example above to be slightly more structured:
schedule:
minute: */1
Now, at least, we’ve got labels for each of our fields, but we can still easily support all the different syntax for each field.
We’ll need a new API version for this change. Let’s call it v2:
kubebuilder create api --group batch --version v2 --kind CronJob
Press y
for “Create Resource” and n
for “Create Controller”.
Now, let’s copy over our existing types, and make the change:
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Since we’re in a v2 package, controller-gen will assume this is for the v2
version automatically. We could override that with the +versionName
marker.
package v2
Imports
import (
batchv1 "k8s.io/api/batch/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
We’ll leave our spec largely unchanged, except to change the schedule field to a new type.
// CronJobSpec defines the desired state of CronJob.
type CronJobSpec struct {
// The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
Schedule CronSchedule `json:"schedule"`
The rest of Spec
// +kubebuilder:validation:Minimum=0
// Optional deadline in seconds for starting the job if it misses scheduled
// time for any reason. Missed jobs executions will be counted as failed ones.
// +optional
StartingDeadlineSeconds *int64 `json:"startingDeadlineSeconds,omitempty"`
// Specifies how to treat concurrent executions of a Job.
// Valid values are:
// - "Allow" (default): allows CronJobs to run concurrently;
// - "Forbid": forbids concurrent runs, skipping next run if previous run hasn't finished yet;
// - "Replace": cancels currently running job and replaces it with a new one
// +optional
ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`
// This flag tells the controller to suspend subsequent executions, it does
// not apply to already started executions. Defaults to false.
// +optional
Suspend *bool `json:"suspend,omitempty"`
// Specifies the job that will be created when executing a CronJob.
JobTemplate batchv1.JobTemplateSpec `json:"jobTemplate"`
// +kubebuilder:validation:Minimum=0
// The number of successful finished jobs to retain.
// This is a pointer to distinguish between explicit zero and not specified.
// +optional
SuccessfulJobsHistoryLimit *int32 `json:"successfulJobsHistoryLimit,omitempty"`
// +kubebuilder:validation:Minimum=0
// The number of failed finished jobs to retain.
// This is a pointer to distinguish between explicit zero and not specified.
// +optional
FailedJobsHistoryLimit *int32 `json:"failedJobsHistoryLimit,omitempty"`
}
Next, we’ll need to define a type to hold our schedule. Based on our proposed YAML above, it’ll have a field for each corresponding Cron “field”.
// describes a Cron schedule.
type CronSchedule struct {
// specifies the minute during which the job executes.
// +optional
Minute *CronField `json:"minute,omitempty"`
// specifies the hour during which the job executes.
// +optional
Hour *CronField `json:"hour,omitempty"`
// specifies the day of the month during which the job executes.
// +optional
DayOfMonth *CronField `json:"dayOfMonth,omitempty"`
// specifies the month during which the job executes.
// +optional
Month *CronField `json:"month,omitempty"`
// specifies the day of the week during which the job executes.
// +optional
DayOfWeek *CronField `json:"dayOfWeek,omitempty"`
}
Finally, we’ll define a wrapper type to represent a field. We could attach additional validation to this field, but for now we’ll just use it for documentation purposes.
// represents a Cron field specifier.
type CronField string
Other Types
All the other types will stay the same as before.
// ConcurrencyPolicy describes how the job will be handled.
// Only one of the following concurrent policies may be specified.
// If none of the following policies is specified, the default one
// is AllowConcurrent.
// +kubebuilder:validation:Enum=Allow;Forbid;Replace
type ConcurrencyPolicy string
const (
// AllowConcurrent allows CronJobs to run concurrently.
AllowConcurrent ConcurrencyPolicy = "Allow"
// ForbidConcurrent forbids concurrent runs, skipping next run if previous
// hasn't finished yet.
ForbidConcurrent ConcurrencyPolicy = "Forbid"
// ReplaceConcurrent cancels currently running job and replaces it with a new one.
ReplaceConcurrent ConcurrencyPolicy = "Replace"
)
// CronJobStatus defines the observed state of CronJob
type CronJobStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
// A list of pointers to currently running jobs.
// +optional
Active []corev1.ObjectReference `json:"active,omitempty"`
// Information when was the last time the job was successfully scheduled.
// +optional
LastScheduleTime *metav1.Time `json:"lastScheduleTime,omitempty"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +versionName=v2
// CronJob is the Schema for the cronjobs API.
type CronJob struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec CronJobSpec `json:"spec,omitempty"`
Status CronJobStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// CronJobList contains a list of CronJob.
type CronJobList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []CronJob `json:"items"`
}
func init() {
SchemeBuilder.Register(&CronJob{}, &CronJobList{})
}
Storage Versions
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
package v1
Imports
import (
batchv1 "k8s.io/api/batch/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
old stuff
// CronJobSpec defines the desired state of CronJob.
type CronJobSpec struct {
// +kubebuilder:validation:MinLength=0
// The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
Schedule string `json:"schedule"`
// +kubebuilder:validation:Minimum=0
// Optional deadline in seconds for starting the job if it misses scheduled
// time for any reason. Missed jobs executions will be counted as failed ones.
// +optional
StartingDeadlineSeconds *int64 `json:"startingDeadlineSeconds,omitempty"`
// Specifies how to treat concurrent executions of a Job.
// Valid values are:
// - "Allow" (default): allows CronJobs to run concurrently;
// - "Forbid": forbids concurrent runs, skipping next run if previous run hasn't finished yet;
// - "Replace": cancels currently running job and replaces it with a new one
// +optional
ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`
// This flag tells the controller to suspend subsequent executions, it does
// not apply to already started executions. Defaults to false.
// +optional
Suspend *bool `json:"suspend,omitempty"`
// Specifies the job that will be created when executing a CronJob.
JobTemplate batchv1.JobTemplateSpec `json:"jobTemplate"`
// +kubebuilder:validation:Minimum=0
// The number of successful finished jobs to retain.
// This is a pointer to distinguish between explicit zero and not specified.
// +optional
SuccessfulJobsHistoryLimit *int32 `json:"successfulJobsHistoryLimit,omitempty"`
// +kubebuilder:validation:Minimum=0
// The number of failed finished jobs to retain.
// This is a pointer to distinguish between explicit zero and not specified.
// +optional
FailedJobsHistoryLimit *int32 `json:"failedJobsHistoryLimit,omitempty"`
}
// ConcurrencyPolicy describes how the job will be handled.
// Only one of the following concurrent policies may be specified.
// If none of the following policies is specified, the default one
// is AllowConcurrent.
// +kubebuilder:validation:Enum=Allow;Forbid;Replace
type ConcurrencyPolicy string
const (
// AllowConcurrent allows CronJobs to run concurrently.
AllowConcurrent ConcurrencyPolicy = "Allow"
// ForbidConcurrent forbids concurrent runs, skipping next run if previous
// hasn't finished yet.
ForbidConcurrent ConcurrencyPolicy = "Forbid"
// ReplaceConcurrent cancels currently running job and replaces it with a new one.
ReplaceConcurrent ConcurrencyPolicy = "Replace"
)
// CronJobStatus defines the observed state of CronJob.
type CronJobStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
// A list of pointers to currently running jobs.
// +optional
Active []corev1.ObjectReference `json:"active,omitempty"`
// Information when was the last time the job was successfully scheduled.
// +optional
LastScheduleTime *metav1.Time `json:"lastScheduleTime,omitempty"`
}
Since we’ll have more than one version, we’ll need to mark a storage version. This is the version that the Kubernetes API server uses to store our data. We’ll chose the v1 version for our project.
We’ll use the +kubebuilder:storageversion
to do this.
Note that multiple versions may exist in storage if they were written before the storage version changes – changing the storage version only affects how objects are created/updated after the change.
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +versionName=v1
// +kubebuilder:storageversion
// CronJob is the Schema for the cronjobs API.
type CronJob struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec CronJobSpec `json:"spec,omitempty"`
Status CronJobStatus `json:"status,omitempty"`
}
old stuff
// +kubebuilder:object:root=true
// CronJobList contains a list of CronJob.
type CronJobList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []CronJob `json:"items"`
}
func init() {
SchemeBuilder.Register(&CronJob{}, &CronJobList{})
}
Now that we’ve got our types in place, we’ll need to set up conversion…
Hubs, spokes, and other wheel metaphors
Since we now have two different versions, and users can request either version, we’ll have to define a way to convert between our version. For CRDs, this is done using a webhook, similar to the defaulting and validating webhooks we defined in the base tutorial. Like before, controller-runtime will help us wire together the nitty-gritty bits, we just have to implement the actual conversion.
Before we do that, though, we’ll need to understand how controller-runtime thinks about versions. Namely:
Complete graphs are insufficiently nautical
A simple approach to defining conversion might be to define conversion functions to convert between each of our versions. Then, whenever we need to convert, we’d look up the appropriate function, and call it to run the conversion.
This works fine when we just have two versions, but what if we had 4 types? 8 types? That’d be a lot of conversion functions.
Instead, controller-runtime models conversion in terms of a “hub and spoke” model – we mark one version as the “hub”, and all other versions just define conversion to and from the hub:
Then, if we have to convert between two non-hub versions, we first convert to the hub version, and then to our desired version:
This cuts down on the number of conversion functions that we have to define, and is modeled off of what Kubernetes does internally.
What does that have to do with Webhooks?
When API clients, like kubectl or your controller, request a particular version of your resource, the Kubernetes API server needs to return a result that’s of that version. However, that version might not match the version stored by the API server.
In that case, the API server needs to know how to convert between the desired version and the stored version. Since the conversions aren’t built in for CRDs, the Kubernetes API server calls out to a webhook to do the conversion instead. For Kubebuilder, this webhook is implemented by controller-runtime, and performs the hub-and-spoke conversions that we discussed above.
Now that we have the model for conversion down pat, we can actually implement our conversions.
Implementing conversion
With our model for conversion in place, it’s time to actually implement
the conversion functions. We’ll put them in a file called
cronjob_conversion.go
next to our cronjob_types.go
file, to avoid
cluttering up our main types file with extra functions.
Hub…
First, we’ll implement the hub. We’ll choose the v1 version as the hub:
Apache License
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
package v1
Implementing the hub method is pretty easy – we just have to add an empty
method called Hub()
to serve as a
marker.
We could also just put this inline in our cronjob_types.go file.
// Hub marks this type as a conversion hub.
func (*CronJob) Hub() {}
… and Spokes
Then, we’ll implement our spoke, the v2 version:
Apache License
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
package v2
Imports
For imports, we’ll need the controller-runtime
conversion
package, plus the API version for our hub type (v1), and finally some of the
standard packages.
import (
"fmt"
"strings"
"sigs.k8s.io/controller-runtime/pkg/conversion"
v1 "tutorial.kubebuilder.io/project/api/v1"
)
Our “spoke” versions need to implement the
Convertible
interface. Namely, they’ll need ConvertTo()
and ConvertFrom()
methods to convert to/from the hub version.
ConvertTo is expected to modify its argument to contain the converted object. Most of the conversion is straightforward copying, except for converting our changed field.
// ConvertTo converts this CronJob to the Hub version (v1).
func (src *CronJob) ConvertTo(dstRaw conversion.Hub) error {
dst := dstRaw.(*v1.CronJob)
sched := src.Spec.Schedule
scheduleParts := []string{"*", "*", "*", "*", "*"}
if sched.Minute != nil {
scheduleParts[0] = string(*sched.Minute)
}
if sched.Hour != nil {
scheduleParts[1] = string(*sched.Hour)
}
if sched.DayOfMonth != nil {
scheduleParts[2] = string(*sched.DayOfMonth)
}
if sched.Month != nil {
scheduleParts[3] = string(*sched.Month)
}
if sched.DayOfWeek != nil {
scheduleParts[4] = string(*sched.DayOfWeek)
}
dst.Spec.Schedule = strings.Join(scheduleParts, " ")
rote conversion
The rest of the conversion is pretty rote.
// ObjectMeta
dst.ObjectMeta = src.ObjectMeta
// Spec
dst.Spec.StartingDeadlineSeconds = src.Spec.StartingDeadlineSeconds
dst.Spec.ConcurrencyPolicy = v1.ConcurrencyPolicy(src.Spec.ConcurrencyPolicy)
dst.Spec.Suspend = src.Spec.Suspend
dst.Spec.JobTemplate = src.Spec.JobTemplate
dst.Spec.SuccessfulJobsHistoryLimit = src.Spec.SuccessfulJobsHistoryLimit
dst.Spec.FailedJobsHistoryLimit = src.Spec.FailedJobsHistoryLimit
// Status
dst.Status.Active = src.Status.Active
dst.Status.LastScheduleTime = src.Status.LastScheduleTime
return nil
}
ConvertFrom is expected to modify its receiver to contain the converted object. Most of the conversion is straightforward copying, except for converting our changed field.
// ConvertFrom converts from the Hub version (v1) to this version.
func (dst *CronJob) ConvertFrom(srcRaw conversion.Hub) error {
src := srcRaw.(*v1.CronJob)
schedParts := strings.Split(src.Spec.Schedule, " ")
if len(schedParts) != 5 {
return fmt.Errorf("invalid schedule: not a standard 5-field schedule")
}
partIfNeeded := func(raw string) *CronField {
if raw == "*" {
return nil
}
part := CronField(raw)
return &part
}
dst.Spec.Schedule.Minute = partIfNeeded(schedParts[0])
dst.Spec.Schedule.Hour = partIfNeeded(schedParts[1])
dst.Spec.Schedule.DayOfMonth = partIfNeeded(schedParts[2])
dst.Spec.Schedule.Month = partIfNeeded(schedParts[3])
dst.Spec.Schedule.DayOfWeek = partIfNeeded(schedParts[4])
rote conversion
The rest of the conversion is pretty rote.
// ObjectMeta
dst.ObjectMeta = src.ObjectMeta
// Spec
dst.Spec.StartingDeadlineSeconds = src.Spec.StartingDeadlineSeconds
dst.Spec.ConcurrencyPolicy = ConcurrencyPolicy(src.Spec.ConcurrencyPolicy)
dst.Spec.Suspend = src.Spec.Suspend
dst.Spec.JobTemplate = src.Spec.JobTemplate
dst.Spec.SuccessfulJobsHistoryLimit = src.Spec.SuccessfulJobsHistoryLimit
dst.Spec.FailedJobsHistoryLimit = src.Spec.FailedJobsHistoryLimit
// Status
dst.Status.Active = src.Status.Active
dst.Status.LastScheduleTime = src.Status.LastScheduleTime
return nil
}
Now that we’ve got our conversions in place, all that we need to do is wire up our main to serve the webhook!
Setting up the webhooks
Our conversion is in place, so all that’s left is to tell controller-runtime about our conversion.
Normally, we’d run
kubebuilder create webhook --group batch --version v1 --kind CronJob --conversion
to scaffold out the webhook setup. However, we’ve already got webhook setup, from when we built our defaulting and validating webhooks!
Webhook setup…
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Go imports
package v1
import (
"context"
"fmt"
"github.com/robfig/cron"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime/schema"
validationutils "k8s.io/apimachinery/pkg/util/validation"
"k8s.io/apimachinery/pkg/util/validation/field"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
logf "sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/webhook"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"
batchv1 "tutorial.kubebuilder.io/project/api/v1"
)
Next, we’ll setup a logger for the webhooks.
var cronjoblog = logf.Log.WithName("cronjob-resource")
This setup doubles as setup for our conversion webhooks: as long as our types implement the Hub and Convertible interfaces, a conversion webhook will be registered.
// SetupCronJobWebhookWithManager registers the webhook for CronJob in the manager.
func SetupCronJobWebhookWithManager(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr).For(&batchv1.CronJob{}).
WithValidator(&CronJobCustomValidator{}).
WithDefaulter(&CronJobCustomDefaulter{
DefaultConcurrencyPolicy: batchv1.AllowConcurrent,
DefaultSuspend: false,
DefaultSuccessfulJobsHistoryLimit: 3,
DefaultFailedJobsHistoryLimit: 1,
}).
Complete()
}
Notice that we use kubebuilder markers to generate webhook manifests. This marker is responsible for generating a mutating webhook manifest.
The meaning of each marker can be found here.
This marker is responsible for generating a mutation webhook manifest.
// +kubebuilder:webhook:path=/mutate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=true,failurePolicy=fail,sideEffects=None,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=mcronjob-v1.kb.io,admissionReviewVersions=v1
// CronJobCustomDefaulter struct is responsible for setting default values on the custom resource of the
// Kind CronJob when those are created or updated.
//
// NOTE: The +kubebuilder:object:generate=false marker prevents controller-gen from generating DeepCopy methods,
// as it is used only for temporary operations and does not need to be deeply copied.
type CronJobCustomDefaulter struct {
// Default values for various CronJob fields
DefaultConcurrencyPolicy batchv1.ConcurrencyPolicy
DefaultSuspend bool
DefaultSuccessfulJobsHistoryLimit int32
DefaultFailedJobsHistoryLimit int32
}
var _ webhook.CustomDefaulter = &CronJobCustomDefaulter{}
We use the webhook.CustomDefaulter
interface to set defaults to our CRD.
A webhook will automatically be served that calls this defaulting.
The Default
method is expected to mutate the receiver, setting the defaults.
// Default implements webhook.CustomDefaulter so a webhook will be registered for the Kind CronJob.
func (d *CronJobCustomDefaulter) Default(ctx context.Context, obj runtime.Object) error {
cronjob, ok := obj.(*batchv1.CronJob)
if !ok {
return fmt.Errorf("expected an CronJob object but got %T", obj)
}
cronjoblog.Info("Defaulting for CronJob", "name", cronjob.GetName())
// Set default values
d.applyDefaults(cronjob)
return nil
}
// applyDefaults applies default values to CronJob fields.
func (d *CronJobCustomDefaulter) applyDefaults(cronJob *batchv1.CronJob) {
if cronJob.Spec.ConcurrencyPolicy == "" {
cronJob.Spec.ConcurrencyPolicy = d.DefaultConcurrencyPolicy
}
if cronJob.Spec.Suspend == nil {
cronJob.Spec.Suspend = new(bool)
*cronJob.Spec.Suspend = d.DefaultSuspend
}
if cronJob.Spec.SuccessfulJobsHistoryLimit == nil {
cronJob.Spec.SuccessfulJobsHistoryLimit = new(int32)
*cronJob.Spec.SuccessfulJobsHistoryLimit = d.DefaultSuccessfulJobsHistoryLimit
}
if cronJob.Spec.FailedJobsHistoryLimit == nil {
cronJob.Spec.FailedJobsHistoryLimit = new(int32)
*cronJob.Spec.FailedJobsHistoryLimit = d.DefaultFailedJobsHistoryLimit
}
}
We can validate our CRD beyond what’s possible with declarative validation. Generally, declarative validation should be sufficient, but sometimes more advanced use cases call for complex validation.
For instance, we’ll see below that we use this to validate a well-formed cron schedule without making up a long regular expression.
If webhook.CustomValidator
interface is implemented, a webhook will automatically be
served that calls the validation.
The ValidateCreate
, ValidateUpdate
and ValidateDelete
methods are expected
to validate its receiver upon creation, update and deletion respectively.
We separate out ValidateCreate from ValidateUpdate to allow behavior like making
certain fields immutable, so that they can only be set on creation.
ValidateDelete is also separated from ValidateUpdate to allow different
validation behavior on deletion.
Here, however, we just use the same shared validation for ValidateCreate
and
ValidateUpdate
. And we do nothing in ValidateDelete
, since we don’t need to
validate anything on deletion.
This marker is responsible for generating a validation webhook manifest.
// +kubebuilder:webhook:path=/validate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=false,failurePolicy=fail,sideEffects=None,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=vcronjob-v1.kb.io,admissionReviewVersions=v1
// CronJobCustomValidator struct is responsible for validating the CronJob resource
// when it is created, updated, or deleted.
//
// NOTE: The +kubebuilder:object:generate=false marker prevents controller-gen from generating DeepCopy methods,
// as this struct is used only for temporary operations and does not need to be deeply copied.
type CronJobCustomValidator struct {
//TODO(user): Add more fields as needed for validation
}
var _ webhook.CustomValidator = &CronJobCustomValidator{}
// ValidateCreate implements webhook.CustomValidator so a webhook will be registered for the type CronJob.
func (v *CronJobCustomValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
cronjob, ok := obj.(*batchv1.CronJob)
if !ok {
return nil, fmt.Errorf("expected a CronJob object but got %T", obj)
}
cronjoblog.Info("Validation for CronJob upon creation", "name", cronjob.GetName())
return nil, validateCronJob(cronjob)
}
// ValidateUpdate implements webhook.CustomValidator so a webhook will be registered for the type CronJob.
func (v *CronJobCustomValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) {
cronjob, ok := newObj.(*batchv1.CronJob)
if !ok {
return nil, fmt.Errorf("expected a CronJob object for the newObj but got %T", newObj)
}
cronjoblog.Info("Validation for CronJob upon update", "name", cronjob.GetName())
return nil, validateCronJob(cronjob)
}
// ValidateDelete implements webhook.CustomValidator so a webhook will be registered for the type CronJob.
func (v *CronJobCustomValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
cronjob, ok := obj.(*batchv1.CronJob)
if !ok {
return nil, fmt.Errorf("expected a CronJob object but got %T", obj)
}
cronjoblog.Info("Validation for CronJob upon deletion", "name", cronjob.GetName())
// TODO(user): fill in your validation logic upon object deletion.
return nil, nil
}
We validate the name and the spec of the CronJob.
// validateCronJob validates the fields of a CronJob object.
func validateCronJob(cronjob *batchv1.CronJob) error {
var allErrs field.ErrorList
if err := validateCronJobName(cronjob); err != nil {
allErrs = append(allErrs, err)
}
if err := validateCronJobSpec(cronjob); err != nil {
allErrs = append(allErrs, err)
}
if len(allErrs) == 0 {
return nil
}
return apierrors.NewInvalid(
schema.GroupKind{Group: "batch.tutorial.kubebuilder.io", Kind: "CronJob"},
cronjob.Name, allErrs)
}
Some fields are declaratively validated by OpenAPI schema.
You can find kubebuilder validation markers (prefixed
with // +kubebuilder:validation
) in the
Designing an API section.
You can find all of the kubebuilder supported markers for
declaring validation by running controller-gen crd -w
,
or here.
func validateCronJobSpec(cronjob *batchv1.CronJob) *field.Error {
// The field helpers from the kubernetes API machinery help us return nicely
// structured validation errors.
return validateScheduleFormat(
cronjob.Spec.Schedule,
field.NewPath("spec").Child("schedule"))
}
We’ll need to validate the cron schedule is well-formatted.
func validateScheduleFormat(schedule string, fldPath *field.Path) *field.Error {
if _, err := cron.ParseStandard(schedule); err != nil {
return field.Invalid(fldPath, schedule, err.Error())
}
return nil
}
Validate object name
Validating the length of a string field can be done declaratively by the validation schema.
But the ObjectMeta.Name
field is defined in a shared package under
the apimachinery repo, so we can’t declaratively validate it using
the validation schema.
func validateCronJobName(cronjob *batchv1.CronJob) *field.Error {
if len(cronjob.ObjectMeta.Name) > validationutils.DNS1035LabelMaxLength-11 {
// The job name length is 63 characters like all Kubernetes objects
// (which must fit in a DNS subdomain). The cronjob controller appends
// a 11-character suffix to the cronjob (`-$TIMESTAMP`) when creating
// a job. The job name length limit is 63 characters. Therefore cronjob
// names must have length <= 63-11=52. If we don't validate this here,
// then job creation will fail later.
return field.Invalid(field.NewPath("metadata").Child("name"), cronjob.ObjectMeta.Name, "must be no more than 52 characters")
}
return nil
}
…and main.go
Similarly, our existing main file is sufficient:
Apache License
Copyright 2024 The Kubernetes authors.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Imports
package main
import (
"crypto/tls"
"flag"
"os"
// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"
kbatchv1 "k8s.io/api/batch/v1"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
"sigs.k8s.io/controller-runtime/pkg/metrics/filters"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
"sigs.k8s.io/controller-runtime/pkg/webhook"
batchv1 "tutorial.kubebuilder.io/project/api/v1"
batchv2 "tutorial.kubebuilder.io/project/api/v2"
"tutorial.kubebuilder.io/project/internal/controller"
webhookbatchv1 "tutorial.kubebuilder.io/project/internal/webhook/v1"
webhookbatchv2 "tutorial.kubebuilder.io/project/internal/webhook/v2"
// +kubebuilder:scaffold:imports
)
existing setup
var (
scheme = runtime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
)
func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
utilruntime.Must(kbatchv1.AddToScheme(scheme)) // we've added this ourselves
utilruntime.Must(batchv1.AddToScheme(scheme))
utilruntime.Must(batchv2.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
func main() {
existing setup
var metricsAddr string
var enableLeaderElection bool
var probeAddr string
var secureMetrics bool
var enableHTTP2 bool
var tlsOpts []func(*tls.Config)
flag.StringVar(&metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+
"Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
flag.BoolVar(&secureMetrics, "metrics-secure", true,
"If set, the metrics endpoint is served securely via HTTPS. Use --metrics-secure=false to use HTTP instead.")
flag.BoolVar(&enableHTTP2, "enable-http2", false,
"If set, HTTP/2 will be enabled for the metrics and webhook servers")
opts := zap.Options{
Development: true,
}
opts.BindFlags(flag.CommandLine)
flag.Parse()
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))
// if the enable-http2 flag is false (the default), http/2 should be disabled
// due to its vulnerabilities. More specifically, disabling http/2 will
// prevent from being vulnerable to the HTTP/2 Stream Cancellation and
// Rapid Reset CVEs. For more information see:
// - https://github.com/advisories/GHSA-qppj-fm5r-hxr3
// - https://github.com/advisories/GHSA-4374-p667-p6c8
disableHTTP2 := func(c *tls.Config) {
setupLog.Info("disabling http/2")
c.NextProtos = []string{"http/1.1"}
}
if !enableHTTP2 {
tlsOpts = append(tlsOpts, disableHTTP2)
}
webhookServer := webhook.NewServer(webhook.Options{
TLSOpts: tlsOpts,
})
// Metrics endpoint is enabled in 'config/default/kustomization.yaml'. The Metrics options configure the server.
// More info:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.19.0/pkg/metrics/server
// - https://book.kubebuilder.io/reference/metrics.html
metricsServerOptions := metricsserver.Options{
BindAddress: metricsAddr,
SecureServing: secureMetrics,
// TODO(user): TLSOpts is used to allow configuring the TLS config used for the server. If certificates are
// not provided, self-signed certificates will be generated by default. This option is not recommended for
// production environments as self-signed certificates do not offer the same level of trust and security
// as certificates issued by a trusted Certificate Authority (CA). The primary risk is potentially allowing
// unauthorized access to sensitive metrics data. Consider replacing with CertDir, CertName, and KeyName
// to provide certificates, ensuring the server communicates using trusted and secure certificates.
TLSOpts: tlsOpts,
}
if secureMetrics {
// FilterProvider is used to protect the metrics endpoint with authn/authz.
// These configurations ensure that only authorized users and service accounts
// can access the metrics endpoint. The RBAC are configured in 'config/rbac/kustomization.yaml'. More info:
// https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.19.0/pkg/metrics/filters#WithAuthenticationAndAuthorization
metricsServerOptions.FilterProvider = filters.WithAuthenticationAndAuthorization
}
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
Metrics: metricsServerOptions,
WebhookServer: webhookServer,
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "80807133.tutorial.kubebuilder.io",
// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader don't have to wait
// LeaseDuration time first.
//
// In the default scaffold provided, the program ends immediately after
// the manager stops, so would be fine to enable this option. However,
// if you are doing or is intended to do any operation such as perform cleanups
// after the manager stops then its usage might be unsafe.
// LeaderElectionReleaseOnCancel: true,
})
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
}
if err = (&controller.CronJobReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "CronJob")
os.Exit(1)
}
Our existing call to SetupWebhookWithManager registers our conversion webhooks with the manager, too.
// nolint:goconst
if os.Getenv("ENABLE_WEBHOOKS") != "false" {
if err = webhookbatchv1.SetupCronJobWebhookWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create webhook", "webhook", "CronJob")
os.Exit(1)
}
}
// nolint:goconst
if os.Getenv("ENABLE_WEBHOOKS") != "false" {
if err = webhookbatchv2.SetupCronJobWebhookWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create webhook", "webhook", "CronJob")
os.Exit(1)
}
}
// +kubebuilder:scaffold:builder
existing setup
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up health check")
os.Exit(1)
}
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up ready check")
os.Exit(1)
}
setupLog.Info("starting manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
setupLog.Error(err, "problem running manager")
os.Exit(1)
}
}
Everything’s set up and ready to go! All that’s left now is to test out our webhooks.
Deployment and Testing
Before we can test out our conversion, we’ll need to enable them in our CRD:
Kubebuilder generates Kubernetes manifests under the config
directory with webhook
bits disabled. To enable them, we need to:
-
Enable
patches/webhook_in_<kind>.yaml
andpatches/cainjection_in_<kind>.yaml
inconfig/crd/kustomization.yaml
file. -
Enable
../certmanager
and../webhook
directories under thebases
section inconfig/default/kustomization.yaml
file. -
Enable all the vars under the
CERTMANAGER
section inconfig/default/kustomization.yaml
file.
Additionally, if present in our Makefile, we’ll need to set the CRD_OPTIONS
variable to just
"crd"
, removing the trivialVersions
option (this ensures that we
actually generate validation for each version, instead of
telling Kubernetes that they’re the same):
CRD_OPTIONS ?= "crd"
Now we have all our code changes and manifests in place, so let’s deploy it to the cluster and test it out.
You’ll need cert-manager installed
(version 0.9.0+
) unless you’ve got some other certificate management
solution. The Kubebuilder team has tested the instructions in this tutorial
with
0.9.0-alpha.0
release.
Once all our ducks are in a row with certificates, we can run make install deploy
(as normal) to deploy all the bits (CRD,
controller-manager deployment) onto the cluster.
Testing
Once all of the bits are up and running on the cluster with conversion enabled, we can test out our conversion by requesting different versions.
We’ll make a v2 version based on our v1 version (put it under config/samples
)
apiVersion: batch.tutorial.kubebuilder.io/v2
kind: CronJob
metadata:
labels:
app.kubernetes.io/name: project
app.kubernetes.io/managed-by: kustomize
name: cronjob-sample
spec:
schedule:
minute: "*/1"
startingDeadlineSeconds: 60
concurrencyPolicy: Allow # explicitly specify, but Allow is also default.
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Then, we can create it on the cluster:
kubectl apply -f config/samples/batch_v2_cronjob.yaml
If we’ve done everything correctly, it should create successfully, and we should be able to fetch it using both the v2 resource
kubectl get cronjobs.v2.batch.tutorial.kubebuilder.io -o yaml
apiVersion: batch.tutorial.kubebuilder.io/v2
kind: CronJob
metadata:
labels:
app.kubernetes.io/name: project
app.kubernetes.io/managed-by: kustomize
name: cronjob-sample
spec:
schedule:
minute: "*/1"
startingDeadlineSeconds: 60
concurrencyPolicy: Allow # explicitly specify, but Allow is also default.
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
and the v1 resource
kubectl get cronjobs.v1.batch.tutorial.kubebuilder.io -o yaml
apiVersion: batch.tutorial.kubebuilder.io/v1
kind: CronJob
metadata:
labels:
app.kubernetes.io/name: project
app.kubernetes.io/managed-by: kustomize
name: cronjob-sample
spec:
schedule: "*/1 * * * *"
startingDeadlineSeconds: 60
concurrencyPolicy: Allow # explicitly specify, but Allow is also default.
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Both should be filled out, and look equivalent to our v2 and v1 samples, respectively. Notice that each has a different API version.
Finally, if we wait a bit, we should notice that our CronJob continues to reconcile, even though our controller is written against our v1 API version.
Troubleshooting
Migrations
Migrating between project structures in Kubebuilder generally involves a bit of manual work.
This section details what’s required to migrate, between different versions of Kubebuilder scaffolding, as well as to more complex project layout structures.
Migration guides from Legacy versions < 3.0.0
Follow the migration guides from the legacy Kubebuilder versions up the required latest v3x version. Note that from v3, a new ecosystem using plugins is introduced for better maintainability, reusability and user experience .
For more info, see the design docs of:
- Extensible CLI and Scaffolding Plugins: phase 1
- Extensible CLI and Scaffolding Plugins: phase 1.5
- Extensible CLI and Scaffolding Plugins - Phase 2
Also, you can check the Plugins section.
Kubebuilder v1 vs v2 (Legacy v1.0.0+ to v2.0.0 Kubebuilder CLI versions)
This document cover all breaking changes when migrating from v1 to v2.
The details of all changes (breaking or otherwise) can be found in controller-runtime, controller-tools and kubebuilder release notes.
Common changes
V2 project uses go modules. But kubebuilder will continue to support dep
until
go 1.13 is out.
controller-runtime
-
Client.List
now uses functional options (List(ctx, list, ...option)
) instead ofList(ctx, ListOptions, list)
. -
Client.DeleteAllOf
was added to theClient
interface. -
Metrics are on by default now.
-
A number of packages under
pkg/runtime
have been moved, with their old locations deprecated. The old locations will be removed before controller-runtime v1.0.0. See the godocs for more information.
Webhook-related
-
Automatic certificate generation for webhooks has been removed, and webhooks will no longer self-register. Use controller-tools to generate a webhook configuration. If you need certificate generation, we recommend using cert-manager. Kubebuilder v2 will scaffold out cert manager configs for you to use – see the Webhook Tutorial for more details.
-
The
builder
package now has separate builders for controllers and webhooks, which facilitates choosing which to run.
controller-tools
The generator framework has been rewritten in v2. It still works the same as before in many cases, but be aware that there are some breaking changes. Please check marker documentation for more details.
Kubebuilder
-
Kubebuilder v2 introduces a simplified project layout. You can find the design doc here.
-
In v1, the manager is deployed as a
StatefulSet
, while it’s deployed as aDeployment
in v2. -
The
kubebuilder create webhook
command was added to scaffold mutating/validating/conversion webhooks. It replaces thekubebuilder alpha webhook
command. -
v2 uses
distroless/static
instead of Ubuntu as base image. This reduces image size and attack surface. -
v2 requires kustomize v3.1.0+.
Migration from v1 to v2
Make sure you understand the differences between Kubebuilder v1 and v2 before continuing
Please ensure you have followed the installation guide to install the required components.
The recommended way to migrate a v1 project is to create a new v2 project and copy over the API and the reconciliation code. The conversion will end up with a project that looks like a native v2 project. However, in some cases, it’s possible to do an in-place upgrade (i.e. reuse the v1 project layout, upgrading controller-runtime and controller-tools.
Let’s take as example an V1 project and migrate it to Kubebuilder v2. At the end, we should have something that looks like the example v2 project.
Preparation
We’ll need to figure out what the group, version, kind and domain are.
Let’s take a look at our current v1 project structure:
pkg/
├── apis
│ ├── addtoscheme_batch_v1.go
│ ├── apis.go
│ └── batch
│ ├── group.go
│ └── v1
│ ├── cronjob_types.go
│ ├── cronjob_types_test.go
│ ├── doc.go
│ ├── register.go
│ ├── v1_suite_test.go
│ └── zz_generated.deepcopy.go
├── controller
└── webhook
All of our API information is stored in pkg/apis/batch
, so we can look
there to find what we need to know.
In cronjob_types.go
, we can find
type CronJob struct {...}
In register.go
, we can find
SchemeGroupVersion = schema.GroupVersion{Group: "batch.tutorial.kubebuilder.io", Version: "v1"}
Putting that together, we get CronJob
as the kind, and batch.tutorial.kubebuilder.io/v1
as the group-version
Initialize a v2 Project
Now, we need to initialize a v2 project. Before we do that, though, we’ll need
to initialize a new go module if we’re not on the gopath
:
go mod init tutorial.kubebuilder.io/project
Then, we can finish initializing the project with kubebuilder:
kubebuilder init --domain tutorial.kubebuilder.io
Migrate APIs and Controllers
Next, we’ll re-scaffold out the API types and controllers. Since we want both, we’ll say yes to both the API and controller prompts when asked what parts we want to scaffold:
kubebuilder create api --group batch --version v1 --kind CronJob
If you’re using multiple groups, some manual work is required to migrate. Please follow this for more details.
Migrate the APIs
Now, let’s copy the API definition from pkg/apis/batch/v1/cronjob_types.go
to
api/v1/cronjob_types.go
. We only need to copy the implementation of the Spec
and Status
fields.
We can replace the +k8s:deepcopy-gen:interfaces=...
marker (which is
deprecated in kubebuilder) with
+kubebuilder:object:root=true
.
We don’t need the following markers any more (they’re not used anymore, and are relics from much older versions of Kubebuilder):
// +genclient
// +k8s:openapi-gen=true
Our API types should look like the following:
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// CronJob is the Schema for the cronjobs API
type CronJob struct {...}
// +kubebuilder:object:root=true
// CronJobList contains a list of CronJob
type CronJobList struct {...}
Migrate the Controllers
Now, let’s migrate the controller reconciler code from
pkg/controller/cronjob/cronjob_controller.go
to
controllers/cronjob_controller.go
.
We’ll need to copy
- the fields from the
ReconcileCronJob
struct toCronJobReconciler
- the contents of the
Reconcile
function - the rbac related markers to the new file.
- the code under
func add(mgr manager.Manager, r reconcile.Reconciler) error
tofunc SetupWithManager
Migrate the Webhooks
If you don’t have a webhook, you can skip this section.
Webhooks for Core Types and External CRDs
If you are using webhooks for Kubernetes core types (e.g. Pods), or for an external CRD that is not owned by you, you can refer the controller-runtime example for builtin types and do something similar. Kubebuilder doesn’t scaffold much for these cases, but you can use the library in controller-runtime.
Scaffold Webhooks for our CRDs
Now let’s scaffold the webhooks for our CRD (CronJob). We’ll need to run the
following command with the --defaulting
and --programmatic-validation
flags
(since our test project uses defaulting and validating webhooks):
kubebuilder create webhook --group batch --version v1 --kind CronJob --defaulting --programmatic-validation
Depending on how many CRDs need webhooks, we may need to run the above command multiple times with different Group-Version-Kinds.
Now, we’ll need to copy the logic for each webhook. For validating webhooks, we
can copy the contents from
func validatingCronJobFn
in pkg/default_server/cronjob/validating/cronjob_create_handler.go
to func ValidateCreate
in api/v1/cronjob_webhook.go
and then the same for update
.
Similarly, we’ll copy from func mutatingCronJobFn
to func Default
.
Webhook Markers
When scaffolding webhooks, Kubebuilder v2 adds the following markers:
// These are v2 markers
// This is for the mutating webhook
// +kubebuilder:webhook:path=/mutate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=true,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=mcronjob.kb.io
...
// This is for the validating webhook
// +kubebuilder:webhook:path=/validate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=false,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=create;update,versions=v1,name=vcronjob.kb.io
The default verbs are verbs=create;update
. We need to ensure verbs
matches
what we need. For example, if we only want to validate creation, then we would
change it to verbs=create
.
We also need to ensure failure-policy
is still the same.
Markers like the following are no longer needed (since they deal with self-deploying certificate configuration, which was removed in v2):
// v1 markers
// +kubebuilder:webhook:port=9876,cert-dir=/tmp/cert
// +kubebuilder:webhook:service=test-system:webhook-service,selector=app:webhook-server
// +kubebuilder:webhook:secret=test-system:webhook-server-secret
// +kubebuilder:webhook:mutating-webhook-config-name=test-mutating-webhook-cfg
// +kubebuilder:webhook:validating-webhook-config-name=test-validating-webhook-cfg
In v1, a single webhook marker may be split into multiple ones in the same paragraph. In v2, each webhook must be represented by a single marker.
Others
If there are any manual updates in main.go
in v1, we need to port the changes
to the new main.go
. We’ll also need to ensure all of the needed schemes have
been registered.
If there are additional manifests added under config
directory, port them as
well.
Change the image name in the Makefile if needed.
Verification
Finally, we can run make
and make docker-build
to ensure things are working
fine.
Kubebuilder v2 vs v3
Migration from v2 to v3
Make sure you understand the differences between Kubebuilder v2 and v3 before continuing.
Please ensure you have followed the installation guide to install the required components.
The recommended way to migrate a v2 project is to create a new v3 project and copy over the API and the reconciliation code. The conversion will end up with a project that looks like a native v3 project. However, in some cases, it’s possible to do an in-place upgrade (i.e. reuse the v2 project layout, upgrading controller-runtime and controller-tools).
Initialize a v3 Project
Create a new directory with the name of your project. Note that this name is used in the scaffolds to create the name of your manager Pod and of the Namespace where the Manager is deployed by default.
$ mkdir migration-project-name
$ cd migration-project-name
Now, we need to initialize a v3 project. Before we do that, though, we’ll need
to initialize a new go module if we’re not on the GOPATH
. While technically this is
not needed inside GOPATH
, it is still recommended.
go mod init tutorial.kubebuilder.io/migration-project
Then, we can finish initializing the project with kubebuilder.
kubebuilder init --domain tutorial.kubebuilder.io
Migrate APIs and Controllers
Next, we’ll re-scaffold out the API types and controllers.
kubebuilder create api --group batch --version v1 --kind CronJob
Migrate the APIs
Now, let’s copy the API definition from api/v1/<kind>_types.go
in our old project to the new one.
These files have not been modified by the new plugin, so you should be able to replace your freshly scaffolded files by your old one. There may be some cosmetic changes. So you can choose to only copy the types themselves.
Migrate the Controllers
Now, let’s migrate the controller code from controllers/cronjob_controller.go
in our old project to the new one. There is a breaking change and there may be some cosmetic changes.
The new Reconcile
method receives the context as an argument now, instead of having to create it with context.Background()
. You can copy the rest of the code in your old controller to the scaffolded methods replacing:
func (r *CronJobReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
ctx := context.Background()
log := r.Log.WithValues("cronjob", req.NamespacedName)
With:
func (r *CronJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := r.Log.WithValues("cronjob", req.NamespacedName)
Migrate the Webhooks
Now let’s scaffold the webhooks for our CRD (CronJob). We’ll need to run the
following command with the --defaulting
and --programmatic-validation
flags
(since our test project uses defaulting and validating webhooks):
kubebuilder create webhook --group batch --version v1 --kind CronJob --defaulting --programmatic-validation
Now, let’s copy the webhook definition from api/v1/<kind>_webhook.go
from our old project to the new one.
Others
If there are any manual updates in main.go
in v2, we need to port the changes to the new main.go
. We’ll also need to ensure all of the needed schemes have been registered.
If there are additional manifests added under config directory, port them as well.
Change the image name in the Makefile if needed.
Verification
Finally, we can run make
and make docker-build
to ensure things are working
fine.
Migration from v2 to v3 by updating the files manually
Make sure you understand the differences between Kubebuilder v2 and v3 before continuing
Please ensure you have followed the installation guide to install the required components.
The following guide describes the manual steps required to upgrade your config version and start using the plugin-enabled version.
This way is more complex, susceptible to errors, and success cannot be assured. Also, by following these steps you will not get the improvements and bug fixes in the default generated project files.
Usually you will only try to do it manually if you customized your project and deviated too much from the proposed scaffold. Before continuing, ensure that you understand the note about project customizations. Note that you might need to spend more effort to do this process manually than organize your project customizations to follow up the proposed layout and keep your project maintainable and upgradable with less effort in the future.
The recommended upgrade approach is to follow the Migration Guide v2 to V3 instead.
Migration from project config version “2” to “3”
Migrating between project configuration versions involves additions, removals, and/or changes
to fields in your project’s PROJECT
file, which is created by running the init
command.
The PROJECT
file now has a new layout. It stores more information about what resources are in use, to better enable plugins to make useful decisions when scaffolding.
Furthermore, the PROJECT
file itself is now versioned. The version
field corresponds to the version of the PROJECT
file itself, while the layout
field indicates the scaffolding and the primary plugin version in use.
Steps to migrate
The following steps describe the manual changes required to bring the project configuration file (PROJECT
). These change will add the information that Kubebuilder would add when generating the file. This file can be found in the root directory.
Add the projectName
The project name is the name of the project directory in lowercase:
...
projectName: example
...
Add the layout
The default plugin layout which is equivalent to the previous version is go.kubebuilder.io/v2
:
...
layout:
- go.kubebuilder.io/v2
...
Update the version
The version
field represents the version of project’s layout. Update this to "3"
:
...
version: "3"
...
Add the resource data
The attribute resources
represents the list of resources scaffolded in your project.
You will need to add the following data for each resource added to the project.
Add the Kubernetes API version by adding resources[entry].api.crdVersion: v1beta1
:
...
resources:
- api:
...
crdVersion: v1beta1
domain: my.domain
group: webapp
kind: Guestbook
...
Add the scope used do scaffold the CRDs by adding resources[entry].api.namespaced: true
unless they were cluster-scoped:
...
resources:
- api:
...
namespaced: true
group: webapp
kind: Guestbook
...
If you have a controller scaffolded for the API then, add resources[entry].controller: true
:
...
resources:
- api:
...
controller: true
group: webapp
kind: Guestbook
Add the resource domain such as resources[entry].domain: testproject.org
which usually will be the project domain unless the API scaffold is a core type and/or an external type:
...
resources:
- api:
...
domain: testproject.org
group: webapp
kind: Guestbook
Note that you will only need to add the domain
if your project has a scaffold for a core type API which the Domain
value is not empty in Kubernetes API group qualified scheme definition. (For example, see here that for Kinds from the API apps
it has not a domain when see here that for Kinds from the API authentication
its domain is k8s.io
)
Check the following the list to know the core types supported and its domain:
Core Type | Domain |
---|---|
admission | “k8s.io” |
admissionregistration | “k8s.io” |
apps | empty |
auditregistration | “k8s.io” |
apiextensions | “k8s.io” |
authentication | “k8s.io” |
authorization | “k8s.io” |
autoscaling | empty |
batch | empty |
certificates | “k8s.io” |
coordination | “k8s.io” |
core | empty |
events | “k8s.io” |
extensions | empty |
imagepolicy | “k8s.io” |
networking | “k8s.io” |
node | “k8s.io” |
metrics | “k8s.io” |
policy | empty |
rbac.authorization | “k8s.io” |
scheduling | “k8s.io” |
setting | “k8s.io” |
storage | “k8s.io” |
Following an example where a controller was scaffold for the core type Kind Deployment via the command create api --group apps --version v1 --kind Deployment --controller=true --resource=false --make=false
:
- controller: true
group: apps
kind: Deployment
path: k8s.io/api/apps/v1
version: v1
Add the resources[entry].path
with the import path for the api:
...
resources:
- api:
...
...
group: webapp
kind: Guestbook
path: example/api/v1
If your project is using webhooks then, add resources[entry].webhooks.[type]: true
for each type generated and then, add resources[entry].webhooks.webhookVersion: v1beta1
:
resources:
- api:
...
...
group: webapp
kind: Guestbook
webhooks:
defaulting: true
validation: true
webhookVersion: v1beta1
Check your PROJECT file
Now ensure that your PROJECT
file has the same information when the manifests are generated via Kubebuilder V3 CLI.
For the QuickStart example, the PROJECT
file manually updated to use go.kubebuilder.io/v2
would look like:
domain: my.domain
layout:
- go.kubebuilder.io/v2
projectName: example
repo: example
resources:
- api:
crdVersion: v1
namespaced: true
controller: true
domain: my.domain
group: webapp
kind: Guestbook
path: example/api/v1
version: v1
version: "3"
You can check the differences between the previous layout(version 2
) and the current format(version 3
) with the go.kubebuilder.io/v2
by comparing an example scenario which involves more than one API and webhook, see:
Example (Project version 2)
domain: testproject.org
repo: sigs.k8s.io/kubebuilder/example
resources:
- group: crew
kind: Captain
version: v1
- group: crew
kind: FirstMate
version: v1
- group: crew
kind: Admiral
version: v1
version: "2"
Example (Project version 3)
domain: testproject.org
layout:
- go.kubebuilder.io/v2
projectName: example
repo: sigs.k8s.io/kubebuilder/example
resources:
- api:
crdVersion: v1
namespaced: true
controller: true
domain: testproject.org
group: crew
kind: Captain
path: example/api/v1
version: v1
webhooks:
defaulting: true
validation: true
webhookVersion: v1
- api:
crdVersion: v1
namespaced: true
controller: true
domain: testproject.org
group: crew
kind: FirstMate
path: example/api/v1
version: v1
webhooks:
conversion: true
webhookVersion: v1
- api:
crdVersion: v1
controller: true
domain: testproject.org
group: crew
kind: Admiral
path: example/api/v1
plural: admirales
version: v1
webhooks:
defaulting: true
webhookVersion: v1
version: "3"
Verification
In the steps above, you updated only the PROJECT
file which represents the project configuration. This configuration is useful only for the CLI tool. It should not affect how your project behaves.
There is no option to verify that you properly updated the configuration file. The best way to ensure the configuration file has the correct V3+
fields is to initialize a project with the same API(s), controller(s), and webhook(s) in order to compare generated configuration with the manually changed configuration.
If you made mistakes in the above process, you will likely face issues using the CLI.
Update your project to use go/v3 plugin
Migrating between project plugins involves additions, removals, and/or changes
to files created by any plugin-supported command, e.g. init
and create
. A plugin supports
one or more project config versions; make sure you upgrade your project’s
config version to the latest supported by your target plugin version before upgrading plugin versions.
The following steps describe the manual changes required to modify the project’s layout enabling your project to use the go/v3
plugin. These steps will not help you address all the bug fixes of the already generated scaffolds.
Steps to migrate
Update your plugin version into the PROJECT file
Before updating the layout
, please ensure you have followed the above steps to upgrade your Project version to 3
. Once you have upgraded the project version, update the layout
to the new plugin version go.kubebuilder.io/v3
as follows:
domain: my.domain
layout:
- go.kubebuilder.io/v3
...
Upgrade the Go version and its dependencies:
Ensure that your go.mod
is using Go version 1.15
and the following dependency versions:
module example
go 1.18
require (
github.com/onsi/ginkgo/v2 v2.1.4
github.com/onsi/gomega v1.19.0
k8s.io/api v0.24.0
k8s.io/apimachinery v0.24.0
k8s.io/client-go v0.24.0
sigs.k8s.io/controller-runtime v0.12.1
)
Update the golang image
In the Dockerfile, replace:
# Build the manager binary
FROM golang:1.13 as builder
With:
# Build the manager binary
FROM golang:1.16 as builder
Update your Makefile
To allow controller-gen to scaffold the nw Kubernetes APIs
To allow controller-gen
and the scaffolding tool to use the new API versions, replace:
CRD_OPTIONS ?= "crd:trivialVersions=true"
With:
CRD_OPTIONS ?= "crd"
To allow automatic downloads
To allow downloading the newer versions of the Kubernetes binaries required by Envtest into the testbin/
directory of your project instead of the global setup, replace:
# Run tests
test: generate fmt vet manifests
go test ./... -coverprofile cover.out
With:
# Setting SHELL to bash allows bash commands to be executed by recipes.
# Options are set to exit when a recipe line exits non-zero or a piped command fails.
SHELL = /usr/bin/env bash -o pipefail
.SHELLFLAGS = -ec
ENVTEST_ASSETS_DIR=$(shell pwd)/testbin
test: manifests generate fmt vet ## Run tests.
mkdir -p ${ENVTEST_ASSETS_DIR}
test -f ${ENVTEST_ASSETS_DIR}/setup-envtest.sh || curl -sSLo ${ENVTEST_ASSETS_DIR}/setup-envtest.sh https://raw.githubusercontent.com/kubernetes-sigs/controller-runtime/v0.8.3/hack/setup-envtest.sh
source ${ENVTEST_ASSETS_DIR}/setup-envtest.sh; fetch_envtest_tools $(ENVTEST_ASSETS_DIR); setup_envtest_env $(ENVTEST_ASSETS_DIR); go test ./... -coverprofile cover.out
To upgrade controller-gen
and kustomize
dependencies versions used
To upgrade the controller-gen
and kustomize
version used to generate the manifests replace:
# find or download controller-gen
# download controller-gen if necessary
controller-gen:
ifeq (, $(shell which controller-gen))
@{ \
set -e ;\
CONTROLLER_GEN_TMP_DIR=$$(mktemp -d) ;\
cd $$CONTROLLER_GEN_TMP_DIR ;\
go mod init tmp ;\
go get sigs.k8s.io/controller-tools/cmd/controller-gen@v0.2.5 ;\
rm -rf $$CONTROLLER_GEN_TMP_DIR ;\
}
CONTROLLER_GEN=$(GOBIN)/controller-gen
else
CONTROLLER_GEN=$(shell which controller-gen)
endif
With:
##@ Build Dependencies
## Location to install dependencies to
LOCALBIN ?= $(shell pwd)/bin
$(LOCALBIN):
mkdir -p $(LOCALBIN)
## Tool Binaries
KUSTOMIZE ?= $(LOCALBIN)/kustomize
CONTROLLER_GEN ?= $(LOCALBIN)/controller-gen
ENVTEST ?= $(LOCALBIN)/setup-envtest
## Tool Versions
KUSTOMIZE_VERSION ?= v3.8.7
CONTROLLER_TOOLS_VERSION ?= v0.9.0
KUSTOMIZE_INSTALL_SCRIPT ?= "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"
.PHONY: kustomize
kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary.
$(KUSTOMIZE): $(LOCALBIN)
test -s $(LOCALBIN)/kustomize || { curl -Ss $(KUSTOMIZE_INSTALL_SCRIPT) | bash -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }
.PHONY: controller-gen
controller-gen: $(CONTROLLER_GEN) ## Download controller-gen locally if necessary.
$(CONTROLLER_GEN): $(LOCALBIN)
test -s $(LOCALBIN)/controller-gen || GOBIN=$(LOCALBIN) go install sigs.k8s.io/controller-tools/cmd/controller-gen@$(CONTROLLER_TOOLS_VERSION)
.PHONY: envtest
envtest: $(ENVTEST) ## Download envtest-setup locally if necessary.
$(ENVTEST): $(LOCALBIN)
test -s $(LOCALBIN)/setup-envtest || GOBIN=$(LOCALBIN) go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
And then, to make your project use the kustomize
version defined in the Makefile, replace all usage of kustomize
with $(KUSTOMIZE)
Update your controllers
Replace:
func (r *<MyKind>Reconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
ctx := context.Background()
log := r.Log.WithValues("cronjob", req.NamespacedName)
With:
func (r *<MyKind>Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := r.Log.WithValues("cronjob", req.NamespacedName)
Update your controller and webhook test suite
Replace:
. "github.com/onsi/ginkgo"
With:
. "github.com/onsi/ginkgo/v2"
Also, adjust your test suite.
For Controller Suite:
RunSpecsWithDefaultAndCustomReporters(t,
"Controller Suite",
[]Reporter{printer.NewlineReporter{}})
With:
RunSpecs(t, "Controller Suite")
For Webhook Suite:
RunSpecsWithDefaultAndCustomReporters(t,
"Webhook Suite",
[]Reporter{printer.NewlineReporter{}})
With:
RunSpecs(t, "Webhook Suite")
Last but not least, remove the timeout variable from the BeforeSuite
blocks:
Replace:
var _ = BeforeSuite(func(done Done) {
....
}, 60)
With
var _ = BeforeSuite(func(done Done) {
....
})
Change Logger to use flag options
In the main.go
file replace:
flag.Parse()
ctrl.SetLogger(zap.New(zap.UseDevMode(true)))
With:
opts := zap.Options{
Development: true,
}
opts.BindFlags(flag.CommandLine)
flag.Parse()
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))
Rename the manager flags
The manager flags --metrics-addr
and enable-leader-election
were renamed to --metrics-bind-address
and --leader-elect
to be more aligned with core Kubernetes Components. More info: #1839.
In your main.go
file replace:
func main() {
var metricsAddr string
var enableLeaderElection bool
flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
With:
func main() {
var metricsAddr string
var enableLeaderElection bool
flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
And then, rename the flags in the config/default/manager_auth_proxy_patch.yaml
and config/default/manager.yaml
:
- name: manager
args:
- "--health-probe-bind-address=:8081"
- "--metrics-bind-address=127.0.0.1:8080"
- "--leader-elect"
Verification
Finally, we can run make
and make docker-build
to ensure things are working
fine.
Change your project to remove the Kubernetes deprecated API versions usage
The following steps describe a workflow to upgrade your project to remove the deprecated Kubernetes APIs: apiextensions.k8s.io/v1beta1
, admissionregistration.k8s.io/v1beta1
, cert-manager.io/v1alpha2
.
The Kubebuilder CLI tool does not support scaffolded resources for both Kubernetes API versions such as; an API/CRD with apiextensions.k8s.io/v1beta1
and another one with apiextensions.k8s.io/v1
.
The first step is to update your PROJECT
file by replacing the api.crdVersion:v1beta
and webhooks.WebhookVersion:v1beta
with api.crdVersion:v1
and webhooks.WebhookVersion:v1
which would look like:
domain: my.domain
layout: go.kubebuilder.io/v3
projectName: example
repo: example
resources:
- api:
crdVersion: v1
namespaced: true
group: webapp
kind: Guestbook
version: v1
webhooks:
defaulting: true
webhookVersion: v1
version: "3"
You can try to re-create the APIS(CRDs) and Webhooks manifests by using the --force
flag.
Now, re-create the APIS(CRDs) and Webhooks manifests by running the kubebuilder create api
and kubebuilder create webhook
for the same group, kind and versions with the flag --force
, respectively.
V3 - Plugins Layout Migration Guides
Following the migration guides from the plugins versions. Note that the plugins ecosystem
was introduced with Kubebuilder v3.0.0 release where the go/v3 version is the default layout
since 28 Apr 2021
.
Therefore, you can check here how to migrate the projects built from Kubebuilder 3.x with the plugin go/v3 to the latest.
go/v3 vs go/v4
This document covers all breaking changes when migrating from projects built using the plugin go/v3 (default for any scaffold done since 28 Apr 2021
) to the next version of the Golang plugin go/v4
.
The details of all changes (breaking or otherwise) can be found in:
- controller-runtime
- controller-tools
- kustomize
- kb-releases release notes.
Common changes
go/v4
projects use Kustomize v5x (instead of v3x)- note that some manifests under
config/
directory have been changed in order to no longer use the deprecated Kustomize features such as env vars. - A
kustomization.yaml
is scaffolded underconfig/samples
. This helps simply and flexibly generate sample manifests:kustomize build config/samples
. - adds support for Apple Silicon M1 (darwin/arm64)
- remove support to CRD/WebHooks Kubernetes API v1beta1 version which are no longer supported since k8s 1.22
- no longer scaffold webhook test files with
"k8s.io/api/admission/v1beta1"
the k8s API which is no longer served since k8s1.25
. By default webhooks test files are scaffolding using"k8s.io/api/admission/v1"
which is support from k8s1.20
- no longer provide backwards compatible support with k8s versions <
1.16
- change the layout to accommodate the community request to follow the Standard Go Project Layout
by moving the api(s) under a new directory called
api
, controller(s) under a new directory calledinternal
and themain.go
under a new directory namedcmd
TL;DR of the New go/v4
Plugin
More details on this can be found at here, but for the highlights, check below
Migrating to Kubebuilder go/v4
If you want to upgrade your scaffolding to use the latest and greatest features then, follow the guide which will cover the steps in the most straightforward way to allow you to upgrade your project to get all latest changes and improvements.
- Migration Guide go/v3 to go/v4 (Recommended)
By updating the files manually
If you want to use the latest version of Kubebuilder CLI without changing your scaffolding then, check the following guide which will describe the steps to be performed manually to upgrade only your PROJECT version and start using the plugins versions.
This way is more complex, susceptible to errors, and success cannot be assured. Also, by following these steps you will not get the improvements and bug fixes in the default generated project files.
Migration from go/v3 to go/v4
Make sure you understand the differences between Kubebuilder go/v3 and go/v4 before continuing.
Please ensure you have followed the installation guide to install the required components.
The recommended way to migrate a go/v3
project is to create a new go/v4
project and
copy over the API and the reconciliation code. The conversion will end up with a
project that looks like a native go/v4 project layout (latest version).
However, in some cases, it’s possible to do an in-place upgrade (i.e. reuse the go/v3 project layout, upgrading the PROJECT file, and scaffolds manually). For further information see Migration from go/v3 to go/v4 by updating the files manually
Initialize a go/v4 Project
Create a new directory with the name of your project. Note that this name is used in the scaffolds to create the name of your manager Pod and of the Namespace where the Manager is deployed by default.
$ mkdir migration-project-name
$ cd migration-project-name
Now, we need to initialize a go/v4 project. Before we do that, we’ll need
to initialize a new go module if we’re not on the GOPATH
. While technically this is
not needed inside GOPATH
, it is still recommended.
go mod init tutorial.kubebuilder.io/migration-project
Now, we can finish initializing the project with kubebuilder.
kubebuilder init --domain tutorial.kubebuilder.io --plugins=go/v4
Migrate APIs and Controllers
Next, we’ll re-scaffold out the API types and controllers.
kubebuilder create api --group batch --version v1 --kind CronJob
Migrate the APIs
Now, let’s copy the API definition from api/v1/<kind>_types.go
in our old project to the new one.
These files have not been modified by the new plugin, so you should be able to replace your freshly scaffolded files by your old one. There may be some cosmetic changes. So you can choose to only copy the types themselves.
Migrate the Controllers
Now, let’s migrate the controller code from controllers/cronjob_controller.go
in our old project to internal/controller/cronjob_controller.go
in the new one.
Migrate the Webhooks
Now let’s scaffold the webhooks for our CRD (CronJob). We’ll need to run the
following command with the --defaulting
and --programmatic-validation
flags
(since our test project uses defaulting and validating webhooks):
kubebuilder create webhook --group batch --version v1 --kind CronJob --defaulting --programmatic-validation
Now, let’s copy the webhook definition from api/v1/<kind>_webhook.go
from our old project to the new one.
Others
If there are any manual updates in main.go
in v3, we need to port the changes to the new main.go
. We’ll also need to ensure all of needed controller-runtime schemes
have been registered.
If there are additional manifests added under config directory, port them as well. Please, be aware that the new version go/v4 uses Kustomize v5x and no longer Kustomize v4. Therefore, if added customized implementations in the config you need to ensure that they can work with Kustomize v5 and if not update/upgrade any breaking change that you might face.
In v4, installation of Kustomize has been changed from bash script to go get
. Change the kustomize
dependency in Makefile to
.PHONY: kustomize
kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary. If wrong version is installed, it will be removed before downloading.
$(KUSTOMIZE): $(LOCALBIN)
@if test -x $(LOCALBIN)/kustomize && ! $(LOCALBIN)/kustomize version | grep -q $(KUSTOMIZE_VERSION); then \
echo "$(LOCALBIN)/kustomize version is not expected $(KUSTOMIZE_VERSION). Removing it before installing."; \
rm -rf $(LOCALBIN)/kustomize; \
fi
test -s $(LOCALBIN)/kustomize || GOBIN=$(LOCALBIN) GO111MODULE=on go install sigs.k8s.io/kustomize/kustomize/v5@$(KUSTOMIZE_VERSION)
Change the image name in the Makefile if needed.
Verification
Finally, we can run make
and make docker-build
to ensure things are working
fine.
Migration from go/v3 to go/v4 by updating the files manually
Make sure you understand the differences between Kubebuilder go/v3 and go/v4 before continuing.
Please ensure you have followed the installation guide to install the required components.
The following guide describes the manual steps required to upgrade your PROJECT config file to begin using go/v4
.
This way is more complex, susceptible to errors, and success cannot be assured. Also, by following these steps you will not get the improvements and bug fixes in the default generated project files.
Usually it is suggested to do it manually if you have customized your project and deviated too much from the proposed scaffold. Before continuing, ensure that you understand the note about [project customizations][project-customizations]. Note that you might need to spend more effort to do this process manually than to organize your project customizations. The proposed layout will keep your project maintainable and upgradable with less effort in the future.
The recommended upgrade approach is to follow the Migration Guide go/v3 to go/v4 instead.
Migration from project config version “go/v3” to “go/v4”
Update the PROJECT
file layout which stores information about the resources that are used to enable plugins make
useful decisions while scaffolding. The layout
field indicates the scaffolding and the primary plugin version in use.
Steps to migrate
Migrate the layout version into the PROJECT file
The following steps describe the manual changes required to bring the project configuration file (PROJECT
).
These change will add the information that Kubebuilder would add when generating the file. This file can be found in the root directory.
Update the PROJECT file by replacing:
layout:
- go.kubebuilder.io/v3
With:
layout:
- go.kubebuilder.io/v4
Changes to the layout
New layout:
- The directory
apis
was renamed toapi
to follow the standard - The
controller(s)
directory has been moved under a new directory calledinternal
and renamed to singular as wellcontroller
- The
main.go
previously scaffolded in the root directory has been moved under a new directory calledcmd
Therefore, you can check the changes in the layout results into:
...
├── cmd
│ └── main.go
├── internal
│ └── controller
└── api
Migrating to the new layout:
- Create a new directory
cmd
and move themain.go
under it. - If your project support multi-group the APIs are scaffold under a directory called
apis
. Rename this directory toapi
- Move the
controllers
directory under theinternal
and rename it forcontroller
- Now ensure that the imports will be updated accordingly by:
- Update the
main.go
imports to look for the new path of your controllers under theinternal/controller
directory
- Update the
Then, let’s update the scaffolds paths
- Update the Dockerfile to ensure that you will have:
COPY cmd/main.go cmd/main.go
COPY api/ api/
COPY internal/controller/ internal/controller/
Then, replace:
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager main.go
With:
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager cmd/main.go
- Update the Makefile targets to build and run the manager by replacing:
.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
go build -o bin/manager main.go
.PHONY: run
run: manifests generate fmt vet ## Run a controller from your host.
go run ./main.go
With:
.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
go build -o bin/manager cmd/main.go
.PHONY: run
run: manifests generate fmt vet ## Run a controller from your host.
go run ./cmd/main.go
- Update the
internal/controller/suite_test.go
to set the path for theCRDDirectoryPaths
:
Replace:
CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
With:
CRDDirectoryPaths: []string{filepath.Join("..", "..", "config", "crd", "bases")},
Note that if your project has multiple groups (multigroup:true
) then the above update should result into "..", "..", "..",
instead of "..",".."
Now, let’s update the PATHs in the PROJECT file accordingly
The PROJECT tracks the paths of all APIs used in your project. Ensure that they now point to api/...
as the following example:
Before update:
group: crew
kind: Captain
path: sigs.k8s.io/kubebuilder/testdata/project-v4/apis/crew/v1
After Update:
group: crew
kind: Captain
path: sigs.k8s.io/kubebuilder/testdata/project-v4/api/crew/v1
Update kustomize manifests with the changes made so far
- Update the manifest under
config/
directory with all changes performed in the default scaffold done withgo/v4
plugin. (see for exampletestdata/project-v4/config/
) to get all changes in the default scaffolds to be applied on your project - Create
config/samples/kustomization.yaml
with all Custom Resources samples specified intoconfig/samples
. (see for exampletestdata/project-v4/config/samples/kustomization.yaml
)
If you have webhooks:
Replace the import admissionv1beta1 "k8s.io/api/admission/v1beta1"
with admissionv1 "k8s.io/api/admission/v1"
in the webhook test files
Makefile updates
Update the Makefile with the changes which can be found in the samples under testdata for the release tag used. (see for example testdata/project-v4/Makefile
)
Update the dependencies
Update the go.mod
with the changes which can be found in the samples under testdata
for the release tag used. (see for example testdata/project-v4/go.mod
). Then, run
go mod tidy
to ensure that you get the latest dependencies and your Golang code has no breaking changes.
Verification
In the steps above, you updated your project manually with the goal of ensuring that it follows
the changes in the layout introduced with the go/v4
plugin that update the scaffolds.
There is no option to verify that you properly updated the PROJECT
file of your project.
The best way to ensure that everything is updated correctly, would be to initialize a project using the go/v4
plugin,
(ie) using kubebuilder init --domain tutorial.kubebuilder.io plugins=go/v4
and generating the same API(s),
controller(s), and webhook(s) in order to compare the generated configuration with the manually changed configuration.
Also, after all updates you would run the following commands:
make manifests
(to re-generate the files using the latest version of the contrller-gen after you update the Makefile)make all
(to ensure that you are able to build and perform all operations)
Single Group to Multi-Group
Let’s migrate the CronJob example.
To change the layout of your project to support Multi-Group run the command
kubebuilder edit --multigroup=true
. Once you switch to a multi-group layout, the new Kinds
will be generated in the new layout but additional manual work is needed
to move the old API groups to the new layout.
Generally, we use the prefix for the API group as the directory name. We
can check api/v1/groupversion_info.go
to find that out:
// +groupName=batch.tutorial.kubebuilder.io
package v1
Then, we’ll rename move our existing APIs into a new subdirectory, “batch”:
mkdir api/batch
mv api/* api/batch
After moving the APIs to a new directory, the same needs to be applied to the controllers. For go/v4:
mkdir internal/controller/batch
mv internal/controller/* internal/controller/batch/