Kubernetes Operators: Custom Kubernetes Controller Guide
2026 guide to building Kubernetes Operators — what custom controllers actually do, how to scaffold one with the Operator SDK, and the mistake most teams make the first time.

Picture this: your StatefulSet starts clean. Probes pass. Logs look fine. And then, somewhere in that fifteen-second window between "Running" and "Ready," your app tries to connect to a database that's still mid-initialization.
Three hours go into it. Grep logs. Adjust probe timings. Restart pods in a different order and watch hopefully. Eventually the truth lands: Kubernetes had zero knowledge that these two services had an ordering relationship at all. It was behaving perfectly. Scheduling containers. Restarting failures. Updating status. The rule that said don't serve traffic until the migration finishes lived in a Slack thread from eight months ago. Not anywhere near the cluster.
That's the problem Operators were built for. And once you've hit it once, you don't want to hit it again.
Table of Contents
What an Operator Actually Is (Not the Definition, the Reality)
Most people hear "Kubernetes Operator" and reach for the wrong analogy. It's not a plugin. Not middleware. Not a special cluster mode you toggle on.
Here's what it is: a Go program, running as an ordinary pod inside your cluster, watching the API server and taking action whenever something you care about changes. You define what "something you care about" means and what action makes sense. Kubernetes just runs the loop.
Two components make it work.
Custom Resource Definitions give you new vocabulary. You're not extending Kubernetes so much as teaching it a new noun. Call it MyApp, PostgresCluster, KafkaTopic, whatever fits your domain. From Kubernetes' perspective, a CRD is just another API object with a spec and a status. To your program, it's a declaration of intent it can read and act on.
Custom controllers do the actual work. Watch for changes, compare what exists against what should exist, close the gap. Create, update, delete. The loop fires on all of them, plus on a configurable requeue interval even when nothing changed. Because things drift. Pods restart. Network partitions heal. A well-written controller stays honest about reality rather than trusting its last known state.
One thing worth saying plainly: writing an Operator is real engineering work. A three-service app with no stateful components? Overkill. Full stop. For a database cluster you operate yourself, or a distributed queue, or any workload where the upgrade path has ordering constraints humans keep getting wrong at 2am. That's where this pattern pays back the investment. Tribal knowledge encoded in wikis doesn't page you when it's needed. A running controller does.
Setting Up the Tools
Three things to install. Do them in order and you'll avoid dependency headaches.
# Install kubectl
curl -LO https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# Install minikube
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
# Install Operator SDK
curl -LO https://github.com/operator-framework/operator-sdk/releases/latest/download/operator-sdk_linux_amd64
sudo install operator-sdk_linux_amd64 /usr/local/bin/operator-sdk
minikube is fine for working through this. When you move to a real cluster, check RBAC before anything else. Your controller needs specific verbs on specific API groups, declared in marker annotations the SDK uses to generate a ClusterRole. Get those wrong and reconciliation silently stops. No obvious error. No crash. Nothing happening. You'll sit there wondering if the binary even started.
Quick sanity check: Run
kubectl auth can-i list myapps --as=system:serviceaccount:default:myapp-operatorafter deploying. If it returnsno, that's your problem right there.
Building a Real Operator, Step by Step
For this walkthrough, the custom resource is called MyApp. Simple web service, returns a welcome message. Small enough to hold in your head, structured enough to extend into something real.
Step 1: Initialize the Project
operator-sdk init --domain=mycompany.com --repo=github.com/mycompany/myapp-operator
cd myapp-operator
After this runs, you have a complete project structure: manager entrypoint, scheme registration, controller-runtime wiring. None of that is yours to write. Your job is the domain logic and the type definitions. The SDK owns the scaffolding.
Step 2: Define Your Custom Resource
Open api/v1/myapp_types.go and write the shape of your resource:
// api/v1/myapp_types.go
package v1
import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
// MyAppSpec defines the desired state of MyApp
type MyAppSpec struct {
// Add fields as needed
}
// MyAppStatus defines the observed state of MyApp
type MyAppStatus struct {
// Add fields as needed
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// MyApp is the Schema for the myapps API
type MyApp struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MyAppSpec `json:"spec,omitempty"`
Status MyAppStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// MyAppList contains a list of MyApp
type MyAppList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []MyApp `json:"items"`
}
func init() {
SchemeBuilder.Register(&MyApp{}, &MyAppList{})
}
Spec is intent. Status is observed reality. Keep them separate in your mental model too, not just in the struct. When your program reads Spec, it's asking "what does the user want?" When it reads Status, it's asking "what does the cluster actually have right now?" Blur that line and you'll end up writing a program that gets confused about what it's even trying to fix. Those bugs are miserable to trace.
Step 3: Generate the Scaffolding
operator-sdk create api --group=myapp --version=v1 --kind=MyApp
One command. Produces the controller file, the CRD manifest, all the RBAC marker annotations pre-populated. You're not starting from a blank file.
Step 4: Write the Reconcile Function
This is the only part that genuinely matters. Open controllers/myapp_controller.go:
// controllers/myapp_controller.go
package controllers
import (
"context"
"reflect"
myappv1 "github.com/mycompany/myapp-operator/api/v1"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
)
// MyAppReconciler reconciles a MyApp object
type MyAppReconciler struct {
client.Client
Log logr.Logger
Scheme *runtime.Scheme
}
// +kubebuilder:rbac:groups=myapp.mycompany.com,resources=myapps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=myapp.mycompany.com,resources=myapps/status,verbs=get;update;patch
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := r.Log.WithValues("myapp", req.NamespacedName)
// Fetch the MyApp instance
myapp := &myappv1.MyApp{}
err := r.Get(ctx, req.NamespacedName, myapp)
if err != nil {
log.Error(err, "unable to fetch MyApp")
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Reconciliation logic here
return ctrl.Result{}, nil
}
func (r *MyAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&myappv1.MyApp{}).
Complete(r)
}
Reconcile fires on every watched event. Read current state, find the gap, close it. One rule you cannot skip: write this function to be idempotent. Running it twice on the same unchanged state should produce zero side effects the second time. Kubernetes calls it more often than you expect. During startup. After leader election handoffs. After the controller pod itself restarts. On a cluster with hundreds of existing objects, that initial watch-list flush means a lot of calls in the first thirty seconds. Plan for it.
Build incrementally inside this function. One concrete action first. Confirm it works end-to-end. Then add the next thing. Chain multiple writes in a single pass and you'll spend real time debugging a reconciler that does five things at once. It's not fun.
Step 5: Build and Run
operator-sdk build myapp-operator
docker push myapp-operator
operator-sdk run local --watch-namespace=default
Applying Your First Custom Resource
With everything running, create a MyApp object:
# myapp-instance.yaml
apiVersion: myapp.mycompany.com/v1
kind: MyApp
metadata:
name: example-myapp
spec:
# Add custom spec fields as needed
Apply it:
kubectl apply -f myapp-instance.yaml
Check the logs:
kubectl logs deployment/myapp-operator-controller-manager -n default -c manager
Nothing dramatic happens yet. The Reconcile body is empty. That's intentional. What you're verifying is that the event reaches the function, the object gets fetched, and nothing panics. Boring success. That's the right baseline before adding real behavior.
Why This Pattern Actually Matters
Back to that incident. StatefulSet race condition, ordering dependency, three hours of log-grepping.
A well-written controller rewrites that scenario entirely. Before touching the app, the program checks whether the database object has a Ready status condition set to true. If not, early return, requeue in ten seconds, check again. No race. No guesswork. No 2am page because someone deployed services in the wrong order.
I've watched this pattern play out across a few different teams. Over months of running something stateful, operational knowledge accumulates in scattered places. Incident retrospectives. On-call handoff notes. The heads of whoever was there when things broke. Putting it in a controller pulls it into code. Version-controlled. Testable. Running continuously, not sitting in a doc nobody opens until something's already on fire.
This scaffold is minimal by design. What goes inside Reconcile is where the real investment lives, and it pays back as the workload grows more complex. Start simple. Keep the function idempotent. Let the pattern do what it was designed for.
Read more — Kubernetes Operators: Custom Kubernetes Controller Guide
Himanshu Pant Chief Operating Officer at Innostax
About Innostax
Innostax is a global software consulting and custom software development company helping growth-stage startups, scaleups, and enterprises build reliable, scalable digital products. Founded in 2014 and headquartered in Framingham, Massachusetts, Innostax specializes in custom software development, web and mobile app development, IT staff augmentation, offshore software development, and digital transformation services — across industries including healthcare, retail, education, travel, and fintech. With a dedicated development team model, a 2-week risk-free trial, and deep expertise in technologies like React.js, Node.js, Python, .NET, and React Native, Innostax co-creates breakthrough solutions that help founders, CTOs, and product leaders ship better software, faster. Learn more at innostax.com.





