Kubernetes on Backend Engineering Strategy Tools

Gardener on Cleura

Tue, 16 Jun 2026 00:00:00 +0000

Gardener is a Kubernetes-as-a-Service framework that runs on Kubernetes and manages the lifecycle of other clusters declaratively. Rather than managing control planes by hand, Gardener treats clusters as a resource — defined, created, upgraded, and deleted via the Gardener API.

Concepts

Gardener uses three layers:

Layer	What it is
Garden cluster	Runs Gardener itself — the management control plane
Seed cluster	Hosts the control planes of shoot clusters (as pods)
Shoot cluster	The cluster you actually use — nodes run on the target cloud

The shoot cluster’s API server does not run on the shoot nodes. It runs as a pod inside the seed cluster. From the outside it behaves like any other Kubernetes cluster; internally the control plane is isolated from the data plane.

Shoot clusters are defined as Shoot resources applied to the garden cluster:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
 name: my-cluster
 namespace: garden-my-project
spec:
 cloudProfileName: openstack
 region: sto2
 provider:
 type: openstack
 workers:
 - name: worker-pool
 machine:
 type: l2.c2r4
 minimum: 1
 maximum: 3
 kubernetes:
 version: "1.30"
 networking:
 type: calico
 pods: 100.128.0.0/11
 nodes: 10.250.0.0/16
 services: 100.112.0.0/13

Shoot cluster on Cleura

Cleura is a European OpenStack provider. Gardener provisions shoot nodes as OpenStack VMs via the OpenStack machine controller.

Key integrations:

Component	Implementation
Node provisioning	OpenStack VMs via Gardener machine controller
Load balancers	Octavia via cloud-controller-manager
Block storage	Cinder via CSI driver
DNS	Manual or external-dns
CNI	Calico (default) or configurable

Gardener on Cleura does not provide an ingress controller or API gateway — these are brought in separately.

Networking

Gardener manages the cluster network configuration as part of the shoot spec. Pod, node, and service CIDRs are defined at cluster creation and must not overlap with the OpenStack network.

On Cleura, nodes get OpenStack floating IPs for egress. Pod-to-pod traffic stays within the cluster overlay network (Calico by default). Traffic entering from outside the cluster goes through a LoadBalancer service — either directly for raw TCP, or via a gateway controller for HTTP.

Ingress — classic vs Gateway API

The classic Kubernetes Ingress resource is HTTP-only, has no TCP support, and its feature set varies across implementations via non-standard annotations. The NGINX Ingress Controller — the most widely used implementation — is deprecated; NGINX now focuses on their Gateway API implementation instead.

The Kubernetes Gateway API is the forward path — a set of CRDs (Gateway, HTTPRoute, TCPRoute, TLSRoute) with a standardized spec and first-class support for both HTTP and TCP.

Resource	Protocol	API	Status
`Ingress`	HTTP only	Kubernetes	Stable, legacy
`HTTPRoute`	HTTP/HTTPS	Gateway API	Stable
`TCPRoute`	Raw TCP	Gateway API	Experimental
`TLSRoute`	TLS passthrough	Gateway API	Experimental

Envoy Gateway

Envoy Gateway is the CNCF implementation of the Kubernetes Gateway API using Envoy as the data plane. It supports HTTPRoute, TCPRoute, and TLSRoute through a single Gateway resource — one entry point, both protocols.

Octavia LB ← one LoadBalancer service per Gateway listener
 |
Envoy Gateway pod
 |
+------------------+------------------+
| |
HTTPRoute → ClusterIP pods TCPRoute → ClusterIP pods

Envoy Gateway is deployed into the shoot cluster and exposes a LoadBalancer service via Octavia, the same as any other service. The Gateway API resources then declare what routes through it.

TCPRoute — declaring TCP services

TCPRoute attaches to a Gateway listener and routes raw TCP traffic to a backend service. This is how a non-HTTP workload (e.g. a game server, a database proxy, a custom protocol service) gets exposed through the Gateway API rather than a standalone LoadBalancer service.

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TCPRoute
metadata:
 name: my-tcp-service
 namespace: my-app
spec:
 parentRefs:
 - name: my-gateway
 sectionName: tcp-listener
 rules:
 - backendRefs:
 - name: my-service
 port: 1234

The corresponding Gateway listener:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: my-gateway
 namespace: my-app
spec:
 gatewayClassName: envoy-gateway
 listeners:
 - name: tcp-listener
 protocol: TCP
 port: 1234
 - name: http-listener
 protocol: HTTP
 port: 80

One Gateway, both protocols declared explicitly. The TCPRoute API is in the experimental channel and requires opting in when installing Envoy Gateway.

HTTPRoute — HTTP services

HTTPRoute handles HTTP and HTTPS traffic with routing by hostname, path, header, or method — without annotations.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: my-http-service
 namespace: my-app
spec:
 parentRefs:
 - name: my-gateway
 sectionName: http-listener
 hostnames:
 - my-app.example.com
 rules:
 - matches:
 - path:
 type: PathPrefix
 value: /
 backendRefs:
 - name: my-service
 port: 8080

LoadBalancer — direct TCP via Octavia

For cases where a TCPRoute is not appropriate (or the Gateway API experimental channel is not enabled), a LoadBalancer service provisions an Octavia LB directly:

apiVersion: v1
kind: Service
metadata:
 name: my-tcp-service
 namespace: my-app
spec:
 type: LoadBalancer
 selector:
 app: my-app
 ports:
 - port: 1234
 targetPort: 1234
 protocol: TCP

Annotations control Octavia behaviour — timeouts, health check parameters, internal vs external. These are provider-specific and not standardised across OpenStack deployments.

Storage

Cinder block volumes are available via the CSI driver. A PersistentVolumeClaim provisions a Cinder volume automatically using the cluster’s default storage class.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: my-data
spec:
 accessModes:
 - ReadWriteOnce
 resources:
 requests:
 storage: 20Gi

Cinder volumes are ReadWriteOnce — they attach to a single node. For stateful workloads, use StatefulSet rather than Deployment to get stable volume binding across pod restarts.

Provisioning a shoot cluster on Cleura

Cleura wraps Gardener behind their own REST API at rest.cleura.cloud. The garden cluster kubeconfig is not exposed — gardenctl does not work directly. Cluster lifecycle is managed through HTTP calls.

Authentication

Every call requires a token obtained once per session:

curl -s -X POST https://rest.cleura.cloud/auth/v1/tokens \
 -H "Content-Type: application/json" \
 -d '{"auth": {"login": "you@example.com", "password": "yourpass"}}' \
 | jq '{token: .token}'

Pass X-AUTH-LOGIN and X-AUTH-TOKEN headers on all subsequent calls.

Bootstrap (once per project/region)

Before creating any clusters, the project must be bootstrapped — this wires up the OpenStack credentials that Gardener uses to provision nodes:

curl -X POST \
 https://rest.cleura.cloud/gardener/v1/public/secret/kna1/{projectId}/bootstrap \
 -H "X-AUTH-LOGIN: ..." -H "X-AUTH-TOKEN: ..."

Safe to call repeatedly; idempotent.

Create a shoot cluster

curl -X POST \
 https://rest.cleura.cloud/gardener/v1/public/shoot/kna1/{projectId} \
 -H "X-AUTH-LOGIN: ..." -H "X-AUTH-TOKEN: ..." \
 -H "Content-Type: application/json" \
 -d '{
 "shoot": {
 "name": "my-cluster",
 "kubernetes": {"version": "1.31.0"},
 "provider": {
 "infrastructureConfig": {"floatingPoolName": "ext-net"},
 "workers": [{
 "name": "default",
 "machine": {
 "type": "4C-8GB-50GB",
 "image": {"name": "ubuntu", "version": "22.4.20230301"}
 },
 "minimum": 1,
 "maximum": 3,
 "volume": {"size": "50Gi"}
 }]
 }
 }
 }'

Poll until ready

curl https://rest.cleura.cloud/gardener/v1/public/shoot/kna1/{projectId}/my-cluster \
 -H "X-AUTH-LOGIN: ..." -H "X-AUTH-TOKEN: ..." \
 | jq '.lastOperation | {state, description, progress}'

Poll until lastOperation.state == "Succeeded". Takes roughly 10–15 minutes on first provision.

Fetch kubeconfig

The Cleura docs reference two kubeconfig paths — GET /kubeconfig (lowercase) and POST /Kubeconfig (uppercase, different casing). Neither worked reliably in practice. The endpoint that actually returns a kubeconfig is:

curl -s -X POST \
 https://rest.cleura.cloud/gardener/v1/public/shoot/kna1/{projectId}/my-cluster/adminkubeconfig \
 -H "X-AUTH-LOGIN: ..." -H "X-AUTH-TOKEN: ..." \
 -H "Content-Type: application/json" \
 -d '{"config": {"expirationSeconds": 3600}}' \
 | jq -r > my-cluster-kubeconfig.yaml

The expirationSeconds field controls credential lifetime. A bug report has been filed with Cleura about the endpoint inconsistency — the adminkubeconfig path is not documented.

Path	Method	Documented	Works
`/kubeconfig`	GET	yes	unclear
`/Kubeconfig`	POST	yes	unclear
`/adminkubeconfig`	POST	no	yes

→ Cleura docs issue #534 — kubeconfig endpoint inconsistencies in Gardener REST API

Script

A bash script wrapping the full workflow (list, create, wait, kubeconfig, delete) is available: cleura-shoot.sh

export CLEURA_LOGIN="you@example.com"
export CLEURA_PASSWORD="yourpass"

./cleura-shoot.sh list
./cleura-shoot.sh create my-cluster
./cleura-shoot.sh wait my-cluster
./cleura-shoot.sh kubeconfig my-cluster
./cleura-shoot.sh delete my-cluster

IaC options

No native Terraform provider exists for Cleura’s Gardener REST API. The Gardener Terraform provider (registry.terraform.io/providers/gardener/gardener) requires the garden cluster kubeconfig, which Cleura does not expose. Options:

Approach	Notes
Bash + curl	Minimal deps — just `curl` and `jq`
Crossplane `provider-http`	Declarative, Kubernetes-native, reconciliation loop
Custom Terraform provider	Full `plan`/`apply` semantics — requires Go provider development
Pulumi custom dynamic provider	Python/TypeScript, similar effort to custom Terraform provider

Rook

Thu, 14 May 2026 00:00:00 +0000

Rook is a Kubernetes operator that deploys and manages storage systems — primarily Ceph — as native Kubernetes resources. The distinction: Ceph is the storage system; Rook is the Kubernetes wiring around it.

Without Rook you would run Ceph manually (or via cephadm) and then configure the Kubernetes CSI driver separately. Rook collapses that into CRDs and handles the full lifecycle: deployment, configuration, expansion, upgrades, and failure recovery.

How it works

Rook introduces several CRDs:

CephCluster — declares the cluster: which nodes, which disks to use as OSDs, replication settings.

CephBlockPool — defines a Ceph pool (replication factor, failure domain). Maps to an RBD pool.

StorageClass — references a CephBlockPool and enables dynamic PVC provisioning. Kubernetes workloads request storage; Rook/Ceph fulfils it.

CephFilesystem — deploys CephFS + MDS for POSIX shared filesystem access.

CephObjectStore — deploys the Ceph RGW S3-compatible object storage gateway.

Typical install sequence

kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/common.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/operator.yaml

Then apply a CephCluster manifest declaring your storage topology, followed by CephBlockPool and StorageClass for PVC support.

Single-node considerations

A single-node setup requires allowMultiplePerNode: true in the CephCluster spec (MONs, MGR, and OSDs all land on the same node). Replication size must be set to 1 — there is nowhere else to replicate. This works for experimentation; it is not a production configuration. See Ceph for details on the replication model.

Rook documentation
Ceph — the underlying storage system
Rook + Ceph in the homelab

Talos Linux + Omni

Thu, 14 May 2026 00:00:00 +0000

Talos Linux is an immutable, minimal operating system designed specifically for running Kubernetes. There is no shell, no SSH, no package manager. The entire OS is read-only and managed via a gRPC API (talosctl). Node configuration is declarative YAML applied over the API; changes that require a reboot take effect on the next boot.

The tradeoff is rigidity for operational simplicity. You cannot log into a Talos node and fix something by hand. In return, nodes are deterministic, reproducible, and there is no configuration drift.

Comparison to other installs

Method	OS	Config	Mutable
kubeadm	Ubuntu / RHEL / etc	Manual + scripts	Yes
k3s	Any Linux	Minimal	Yes
Talos	Talos Linux	Declarative API	No

k3s and kubeadm give you more flexibility and a familiar Linux environment. Talos is the right choice when you want the cluster nodes to behave like appliances — provisioned, never touched.

Omni

Omni is a cluster management platform by Sidero Labs built on top of Talos. It handles:

Node registration (nodes boot and phone home to the Omni API)
Cluster creation and machine assignment
Kubernetes upgrades (one action in the UI)
talosctl and kubeconfig access via the Omni CLI

Nodes register via a join token embedded in the kernel command line at PXE boot time. The cluster runs on your hardware; Omni only manages the control plane.

Hobby tier: 10 nodes, non-commercial use, free. Sidero Labs also offers a self-hosted version.

Image Factory

factory.talos.dev generates custom Talos images with hardware extensions included. Notable extensions:

siderolabs/bnx2 — Broadcom NetXtreme II (BCM5708/BCM5709) NIC firmware, required on some enterprise hardware (IBM x3550 M3, HP Gen 6/7 blades)
siderolabs/intel-ucode — Intel microcode updates
siderolabs/nvidia-* — NVIDIA GPU support

The factory produces both ISO and PXE artifacts (kernel + initramfs). See the OPNSense + iPXE reference for how to serve these over TFTP.

Supporting Sidero Labs

Talos and Omni are built by Sidero Labs — good people doing good work. I sponsor them via GitHub Sponsors at the fanboi tier.

Relevant links

etcd

Mon, 01 Jan 2024 00:00:00 +0000

etcd is the distributed key-value store that backs Kubernetes. Every Kubernetes object — pods, services, deployments, configmaps, secrets — is stored in etcd. The API server is the only component that reads and writes it directly; everything else in the cluster reads from the API server’s cache. etcd’s reliability is the cluster’s reliability: if etcd loses quorum, the Kubernetes control plane stops functioning.

Raft consensus

etcd uses the Raft consensus algorithm. The cluster elects a leader; all writes go through the leader, which replicates them to followers before acknowledging the write. The cluster tolerates (n-1)/2 node failures — a three-node cluster survives one failure, a five-node cluster survives two. This is why control plane node counts are always odd. Three nodes is standard for production; five for clusters where control plane availability is critical.

Watches and revisions

Every write increments a global revision counter. Clients can watch a key or key prefix and receive every change since a given revision. This is how the Kubernetes controller manager and scheduler work — they hold long-lived watch connections and react to changes in specific resource types without polling.

Operations

# Snapshot backup
etcdctl snapshot save /backup/etcd-snapshot.db \
 --endpoints=https://127.0.0.1:2379 \
 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
 --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
 --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

# Restore from snapshot
etcdctl snapshot restore /backup/etcd-snapshot.db --data-dir=/var/lib/etcd-restore

# Check cluster health
etcdctl endpoint health --cluster

Backing up etcd regularly is the most critical operational task for a Kubernetes cluster. The snapshot is the only path to full recovery if cluster state is lost.

Resources

Istio

Mon, 01 Jan 2024 00:00:00 +0000

Istio is a service mesh for Kubernetes. It injects a sidecar proxy (Envoy) into every pod, and all traffic between pods flows through these proxies rather than directly between containers. This gives the mesh control over traffic routing, security, and observability without any changes to application code.

What it solves

In a large microservice deployment, every service needs to handle retries, timeouts, circuit breaking, mutual TLS, and metrics collection — or skip them and accept the risk. Without a mesh, each team implements this differently, or not at all. Istio moves these concerns out of the application and into the infrastructure layer, where they are configured once and applied uniformly.

Traffic management

Istio’s VirtualService and DestinationRule CRDs give fine-grained control over how traffic is routed:

apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
 name: reviews
spec:
 hosts:
 - reviews
 http:
 - match:
 - headers:
 end-user:
 exact: test-user
 route:
 - destination:
 host: reviews
 subset: v2
 - route:
 - destination:
 host: reviews
 subset: v1

This routes a specific user to v2 of a service while everyone else gets v1 — canary testing without a load balancer rule or code change.

mTLS

Istio issues and rotates certificates for every workload and enforces mutual TLS between services automatically. Services authenticate each other’s identity, not just encrypt the connection. A PeerAuthentication policy can enforce strict mTLS across a namespace, ensuring no plaintext traffic is accepted.

Observability

Because all traffic flows through Envoy sidecars, Istio generates L7 metrics (request rate, error rate, latency percentiles), distributed traces, and access logs for every service-to-service call — without instrumentation in the services themselves. This integrates with Prometheus, Grafana, and Jaeger.

Cost

Istio adds latency (two extra proxy hops per call) and resource overhead (a sidecar per pod). For clusters with tens of services, the operational benefit is clear. For small clusters or teams early in a microservices journey, the complexity may outweigh the gains.

Resources

Kubernetes

Mon, 01 Jan 2024 00:00:00 +0000

Kubernetes (K8s) is the de facto standard for container orchestration and the second largest open source project after the Linux kernel. It has well and truly reached the plateau of productivity — the ecosystem is mature and it genuinely delivers.

That said, the honest take: K8s is ridiculously hard to deploy and manage (day 2 operations especially). Docker Swarm is equally ridiculously easy to get started with. For raw scale, Mesos/DC/OS wins — clusters of 80k+ nodes have been documented in the wild, versus K8s master’s practical ceiling of around 5k nodes.

So the real question is whether the ecosystem justifies the complexity for your situation. For most teams doing cloud-native work, it does.

Core concepts

The main building blocks:

Pods — smallest deployable unit, wrapping one or more containers that share network and storage.

Deployments — declare desired state; K8s handles rolling updates and self-healing.

Secrets — store sensitive data (passwords, tokens, keys) separately from application config.

DaemonSets — run a pod on every node. Typical use: log collectors, monitoring agents.

ReplicaSets — ensure N copies of a pod are running at any given time.

Ingress — HTTP/S routing rules at layer 7. Your load balancer config, declarative.

CronJobs — scheduled jobs, K8s-native.

Custom Resource Definitions (CRDs) — extend the K8s API with your own resource types. The foundation of most K8s operators.

Architecture

How the pieces fit together internally:

Containers vs virtual machines

Not an either/or — they solve different problems and are frequently combined.

Local clusters for development

When you need K8s without a full cluster:

Tool	Best for
MicroK8s	Ubuntu, snap-based, batteries included
Minikube	The classic, broad driver support
Kind	K8s in Docker, great for CI pipelines
K3D	K3s in Docker, fast startup
K3S	Lightweight K8s, edge and IoT use cases

Resources

kubernetes.io
CNCF Landscape — map of the cloud-native ecosystem
TGI Kubernetes intro (YouTube)
Setting up MicroK8s with RBAC and Storage

Kubernetes Autoscaling

Mon, 01 Jan 2024 00:00:00 +0000

Kubernetes has built-in autoscaling at two levels: the Horizontal Pod Autoscaler scales the number of pod replicas based on CPU or memory, and the Cluster Autoscaler adds or removes nodes when pods can’t be scheduled. KEDA and Karpenter extend these primitives — KEDA pushing workload scaling further, Karpenter replacing the node provisioner entirely.

KEDA

Kubernetes Event-Driven Autoscaling. KEDA extends the HPA to scale workloads based on external event sources — Kafka consumer lag, queue depth in SQS or RabbitMQ, HTTP request rate, database query results, cron schedules. The built-in HPA only knows about CPU and memory; KEDA adds a long list of scalers for external systems. The important capability it adds is scale-to-zero: a consumer that has no messages to process can scale down to zero pods and scale back up when work arrives. This makes it well-suited for event-driven workloads and batch processing where idle replicas waste resources.

Karpenter

A node provisioner that replaces the Cluster Autoscaler, originally from AWS and now a CNCF project with support for other clouds. Where the Cluster Autoscaler works by adjusting existing Auto Scaling Groups, Karpenter provisions EC2 instances (or equivalent) directly based on the actual resource requirements of pending pods — choosing the right instance type, size, and purchase option (on-demand vs spot) in real time. This makes provisioning significantly faster and more cost-efficient: the cluster gets exactly the nodes the pending workload needs, not the nearest pre-configured node group. Karpenter also handles consolidation — continuously evaluating whether running workloads could be packed onto fewer nodes and replacing over-provisioned nodes accordingly.

Resources

KubeVirt

Mon, 01 Jan 2024 00:00:00 +0000

See Virtualization — KVM and KubeVirt for full coverage of both KVM and KubeVirt.

Kyverno

Mon, 01 Jan 2024 00:00:00 +0000

Kyverno is a policy engine for Kubernetes. It runs as an admission controller and intercepts every resource creation or update, applying rules that validate, mutate, or generate resources. Policies are written as Kubernetes CRDs in YAML — no Rego, no separate language to learn. If you can write a Kubernetes manifest, you can write a Kyverno policy.

Three rule types

Validate — reject resources that don’t meet requirements:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
 name: require-labels
spec:
 rules:
 - name: check-team-label
 match:
 any:
 - resources:
 kinds: [Deployment]
 validate:
 message: "Deployments must have a 'team' label."
 pattern:
 metadata:
 labels:
 team: "?*"

Mutate — automatically add or modify fields on admission:

- name: add-default-resources
 match:
 any:
 - resources:
 kinds: [Pod]
 mutate:
 patchStrategicMerge:
 spec:
 containers:
 - (name): "*"
 resources:
 requests:
 +(memory): "64Mi"
 +(cpu): "250m"

Generate — create related resources automatically. A common use: generate a NetworkPolicy every time a new namespace is created.

Enforcement vs audit

Policies run in enforce mode (block non-compliant resources) or audit mode (allow but report violations). Audit mode is the right starting point — understand your existing state before enforcing.

Common policies

The Kyverno policy library has ready-made policies for common requirements: disallow privileged containers, require image tags to not be latest, enforce resource limits, restrict hostPath mounts. Most teams start from the library and customise.

Resources

Local Kubernetes

Mon, 01 Jan 2024 00:00:00 +0000

Running Kubernetes locally is useful for development, testing, and CI — a real cluster without the cloud bill. The options differ mainly in weight, startup speed, and whether they target local dev, CI pipelines, or lightweight production use.

MiniKube

The original local Kubernetes, maintained by the Kubernetes project itself. Runs a single-node cluster inside a VM (VirtualBox, HyperKit) or a Docker container. The reference implementation — if something works in Kubernetes, it works in MiniKube. Slower to start than the container-based options, heavier on resources, but the most faithful representation of a real cluster. Good for getting started and for testing things that need VM-level isolation.

Kind

Kubernetes IN Docker — each cluster node runs as a Docker container, no VM required. Fast startup (seconds), low overhead, and multi-node clusters are easy to spin up. The standard choice for running Kubernetes in CI pipelines: create a cluster, run tests, tear it down. The Kubernetes project itself uses Kind for conformance testing. Not designed for running workloads long-term, but excellent for ephemeral test environments.

K3S

Lightweight Kubernetes from Rancher (now SUSE), packaged as a single binary under 100MB. It strips out cloud-provider integrations, in-tree storage drivers, and alpha features — the result is a fully conformant Kubernetes that runs on hardware where full K8s won’t. Used in production for edge deployments, IoT, and resource-constrained environments. Also a good choice when you want a real persistent cluster locally without the overhead of MiniKube.

K3D

K3S running inside Docker containers — the same relationship Kind has to standard Kubernetes. Fast, lightweight, multi-node clusters in Docker. The advantage over Kind is that K3S starts faster and uses less memory per node. Good choice for local dev and CI when you want the lightweight K3S runtime rather than full upstream Kubernetes.

MicroK8S

Canonical’s take on local Kubernetes, distributed as a snap package on Ubuntu. Single-command install, add-ons (DNS, storage, ingress, observability) enabled with microk8s enable <addon>. Opinionated and tightly integrated with the Ubuntu/Canonical ecosystem. The right choice if you’re on Ubuntu and want a low-friction local cluster with batteries included — less so outside that ecosystem.

Which to use

	Best for
MiniKube	Getting started, testing with VM isolation
Kind	CI pipelines, ephemeral test clusters
K3S	Persistent local cluster, edge/IoT production
K3D	Fast local dev and CI with K3S runtime
MicroK8S	Ubuntu users wanting a managed local cluster

Resources

Managing Secrets in Kubernetes

Mon, 01 Jan 2024 00:00:00 +0000

Kubernetes has a built-in Secret resource, but it is not a secrets management solution — it is base64-encoded storage with no encryption at rest by default and no access audit trail. How you actually manage secrets in a Kubernetes cluster depends on how far you need to go beyond the default.

Native Kubernetes Secrets

The baseline. A Secret is a key-value store mounted into pods as environment variables or files:

apiVersion: v1
kind: Secret
metadata:
 name: db-credentials
type: Opaque
data:
 username: YWRtaW4=  # base64("admin")
 password: cGFzc3dvcmQ=

The problems: base64 is encoding, not encryption. Secrets are stored in etcd — enabling etcd encryption at rest is a cluster configuration step that is easy to skip. Secrets are visible to anyone with kubectl get secret in that namespace. For anything beyond a local dev cluster or a low-sensitivity workload, you need something more.

Sealed Secrets

A Kubernetes controller from Bitnami. SealedSecret resources contain secrets encrypted with the cluster’s public key — only the controller running in that cluster can decrypt them. The encrypted form is safe to commit to Git, which makes GitOps workflows possible without a separate secrets store. Simple to operate, no external dependency. The tradeoff: secrets are tied to a specific cluster’s key, cross-cluster sharing requires re-encryption, and there is no centralised audit trail.

External Secrets Operator

ESO reads secrets from an external store (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault, 1Password) and syncs them into native Kubernetes Secrets. Your source of truth stays in the external system; the K8s Secret is a read-only projection of it, refreshed on a configurable interval:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
 name: db-credentials
spec:
 refreshInterval: 1h
 secretStoreRef:
 name: aws-secrets-manager
 kind: ClusterSecretStore
 target:
 name: db-credentials
 data:
 - secretKey: password
 remoteRef:
 key: prod/db/password

ESO is the right choice when you already have a secrets store and want Kubernetes workloads to consume from it without changing how secrets are managed elsewhere.

Secrets Store CSI Driver

An alternative to ESO for the same problem: mount secrets from an external store directly as files in a pod, without creating a Kubernetes Secret at all. The secret materialises only in the pod’s filesystem, is not stored in etcd, and disappears when the pod terminates. Supported by AWS, Azure, GCP, and Vault providers. Used in combination with a SecretProviderClass to define what to fetch and where to mount it.

HashiCorp Vault

A dedicated secrets management platform. Vault stores arbitrary secrets, issues dynamic credentials (database passwords that expire, AWS IAM credentials valid for an hour), manages PKI, and provides a full audit log of every read and write. Kubernetes workloads authenticate to Vault via the Kubernetes auth method (using the pod’s service account token) and receive a Vault token scoped to the secrets their service account is allowed to read. More to operate than the other options, but the right answer for organisations that need dynamic credentials, fine-grained access control, and audit logs.

Summary

Approach	Good for
Native Secrets	Local dev, low-sensitivity workloads
Sealed Secrets	GitOps, single-cluster, no external dependency
External Secrets Operator	Syncing from existing external stores
Secrets Store CSI	Avoiding etcd entirely, file-based secret injection
HashiCorp Vault	Dynamic credentials, audit logs, enterprise requirements

Resources

OpenShift Data Foundation

Mon, 01 Jan 2024 00:00:00 +0000

OpenShift Data Foundation (ODF) is Red Hat’s enterprise Kubernetes storage platform, built on Ceph orchestrated by Rook. Where Rook-Ceph is the open source upstream, ODF packages it with an operator, a validated configuration, enterprise support, and integration with the OpenShift console. It provides block (RBD), file (CephFS), and object (S3-compatible via Ceph RGW) storage as Kubernetes StorageClasses on the same hardware.

What it provides

Three storage modes from one cluster:

Mode	StorageClass	Use case
Block (RBD)	`ocs-storagecluster-ceph-rbd`	Databases, stateful apps needing a single-writer disk
File (CephFS)	`ocs-storagecluster-cephfs`	Shared filesystems, multiple pods reading/writing the same volume
Object	S3-compatible endpoint	Buckets via `ObjectBucketClaim`, backup targets, artifact storage

Installation

ODF installs via the ODF operator from OperatorHub. The operator creates a StorageCluster CR that drives the Ceph deployment:

apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
 name: ocs-storagecluster
 namespace: openshift-storage
spec:
 storageDeviceSets:
 - name: ocs-deviceset
 count: 1
 replica: 3
 dataPVCTemplate:
 spec:
 storageClassName: local-storage
 volumeMode: Block
 resources:
 requests:
 storage: 1Ti

Requires at minimum three nodes with dedicated block devices. The operator handles Ceph cluster formation, monitors, MGRs, and OSDs.

vs Rook-Ceph

ODF IS Rook-Ceph under the hood. The difference is packaging and support: ODF is tested and supported on OpenShift, includes the NooBaa multi-cloud gateway for object storage federation, and integrates with the OpenShift UI. For self-managed Kubernetes outside OpenShift, raw Rook-Ceph is the equivalent path.

Resources

Velero

Mon, 01 Jan 2024 00:00:00 +0000

Velero backs up and restores Kubernetes clusters. It captures both Kubernetes resource definitions (deployments, services, configmaps, secrets, CRDs) and persistent volume data, stores them in object storage (S3, GCS, Azure Blob), and can restore them to the same cluster or a different one. The primary use cases are disaster recovery, cluster migration, and namespace cloning.

How it works

Velero runs as a controller in the cluster. A Backup CR triggers a snapshot of selected resources:

apiVersion: velero.io/v1
kind: Backup
metadata:
 name: daily-backup
 namespace: velero
spec:
 includedNamespaces:
 - production
 storageLocation: default
 ttl: 720h  # 30 days

Persistent volume data is handled via storage provider snapshots (CSI snapshots, AWS EBS snapshots) or a file-system-level backup using the node-agent daemonset (formerly Restic). CSI snapshot integration is the preferred modern approach.

Scheduled backups run via a Schedule CR:

apiVersion: velero.io/v1
kind: Schedule
metadata:
 name: daily
 namespace: velero
spec:
 schedule: "0 2 * * *"
 template:
 includedNamespaces:
 - production
 ttl: 720h

Restore

Restoring is a Restore CR pointing at a backup:

velero restore create --from-backup daily-backup

Velero recreates the Kubernetes objects and restores volume data. Namespaces can be remapped on restore — useful for cloning production to staging.

Cluster migration

The standard migration pattern: back up from the source cluster, configure the destination cluster to point at the same object storage bucket, restore. Velero handles the resource recreation; DNS cutover is a separate step.

Kubernetes on Backend Engineering Strategy Tools

Gardener on Cleura

Concepts

Shoot cluster on Cleura

Networking

Ingress — classic vs Gateway API

Envoy Gateway

TCPRoute — declaring TCP services

HTTPRoute — HTTP services

LoadBalancer — direct TCP via Octavia

Storage

Provisioning a shoot cluster on Cleura

Authentication

Bootstrap (once per project/region)

Create a shoot cluster

Poll until ready

Fetch kubeconfig

Script

IaC options

Rook

How it works

Typical install sequence

Single-node considerations

Related

Talos Linux + Omni

Comparison to other installs

Omni

Image Factory

Supporting Sidero Labs

Relevant links

etcd

Raft consensus

Watches and revisions

Operations

Resources

Istio

What it solves

Traffic management

mTLS

Observability

Cost

Resources

Kubernetes

Core concepts

Architecture

Containers vs virtual machines

Local clusters for development

Resources

Kubernetes Autoscaling

KEDA

Karpenter

Resources

KubeVirt

Kyverno

Three rule types

Enforcement vs audit

Common policies

Resources

Local Kubernetes

MiniKube

Kind

K3S

K3D

MicroK8S

Which to use

Resources

Managing Secrets in Kubernetes

Native Kubernetes Secrets

Sealed Secrets

External Secrets Operator

Secrets Store CSI Driver

HashiCorp Vault

Summary

Resources

OpenShift Data Foundation

What it provides

Installation

vs Rook-Ceph

Resources

Velero