<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kubernetes on Backend Engineering Strategy Tools</title><link>https://backend-engineering-strategy-tools.github.io/site/tags/kubernetes/</link><description>Recent content in Kubernetes on Backend Engineering Strategy Tools</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Thu, 18 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://backend-engineering-strategy-tools.github.io/site/tags/kubernetes/index.xml" rel="self" type="application/rss+xml"/><item><title>Image Tooling</title><link>https://backend-engineering-strategy-tools.github.io/site/projects/image-tooling/</link><pubDate>Thu, 18 Jun 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/projects/image-tooling/</guid><description>&lt;p&gt;Versioned, multi-arch Docker images for Kubernetes workflows — built with &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/dagger/" &gt;Dagger&lt;/a&gt;, published to Docker Hub, triggered by a version tag.&lt;/p&gt;
&lt;p&gt;The motivation is in &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/thinking/shared-tooling-images/" &gt;Shared Tooling Images&lt;/a&gt;: one image, consistent versions, three contexts — CI, local, colleagues.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="images"&gt;Images
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;GitHub repo&lt;/th&gt;
 &lt;th&gt;Docker Hub&lt;/th&gt;
 &lt;th&gt;Contents&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;a class="link" href="https://github.com/Backend-Engineering-Strategy-Tools/image-tooling" target="_blank" rel="noopener"
 &gt;&lt;code&gt;image-tooling&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;best-tools/tooling-k8s&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;kubectl, helm, kustomize, argocd CLI, k9s, jq, yq&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;image-tooling&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;best-tools/tooling-k8s-aws&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;tooling-k8s&lt;/code&gt; + AWS CLI&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;image-tooling&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;best-tools/tooling-k8s-openstack&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;tooling-k8s&lt;/code&gt; + OpenStack CLI&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;a class="link" href="https://github.com/Backend-Engineering-Strategy-Tools/image-buildx" target="_blank" rel="noopener"
 &gt;&lt;code&gt;image-buildx&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;best-tools/buildx&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;CI builder — Docker buildx, AWS CLI, Dagger CLI&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;a class="link" href="https://github.com/Backend-Engineering-Strategy-Tools/image-pandoc" target="_blank" rel="noopener"
 &gt;&lt;code&gt;image-pandoc&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;best-tools/pandoc&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;PDF generation — pandoc + TeX Live&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;All images publish as multi-arch manifests: &lt;code&gt;linux/amd64&lt;/code&gt; + &lt;code&gt;linux/arm64&lt;/code&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="quick-start"&gt;Quick start
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Interactive shell with kubeconfig mounted:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run -it --rm &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -v ~/.kube:/mnt/kube:ro &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -v &lt;span style="color:#66d9ef"&gt;$(&lt;/span&gt;pwd&lt;span style="color:#66d9ef"&gt;)&lt;/span&gt;:/work &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -w /work &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; docker.io/best-tools/tooling-k8s:latest
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The image entry point symlinks &lt;code&gt;/mnt/kube&lt;/code&gt; → &lt;code&gt;/root/.kube&lt;/code&gt; on startup, so &lt;code&gt;kubectl&lt;/code&gt; picks it up immediately.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shell alias for daily use:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;alias k8s&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;docker run -it --rm \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; -v ~/.kube:/mnt/kube:ro \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; -v $(pwd):/work -w /work \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; docker.io/best-tools/tooling-k8s:latest&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;k8s helm lint .
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;k8s kubectl get pods -n argocd
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;In CI (GitHub Actions):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Lint chart&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;run&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;docker run --rm -v ${{ github.workspace }}:/work -w /work docker.io/best-tools/tooling-k8s:latest helm lint .&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Or reference the image directly as the job container — no install step needed.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="setup-contributors--maintainers"&gt;Setup (contributors / maintainers)
&lt;/h2&gt;&lt;p&gt;Credentials are set once as &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/github/" &gt;GitHub org-level secrets&lt;/a&gt; and inherited by all &lt;code&gt;image-*&lt;/code&gt; repos automatically.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Secret&lt;/th&gt;
 &lt;th&gt;Where to get it&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DOCKERHUB_TOKEN&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;hub.docker.com → Account → Security → Access Tokens (Read, Write, Delete)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DAGGER_CLOUD_TOKEN&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;cloud.dagger.io → Organisation → Tokens&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Path: github.com/Backend-Engineering-Strategy-Tools → Settings → Secrets and variables → Actions → New organisation secret.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="releasing"&gt;Releasing
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git tag -a v1.0.0 -m &lt;span style="color:#e6db74"&gt;&amp;#34;Release v1.0.0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git push origin v1.0.0
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The GitHub Actions workflow triggers on &lt;code&gt;v*.*.*&lt;/code&gt; tags, calls &lt;code&gt;dagger call publish-multi-arch&lt;/code&gt;, and pushes both &lt;code&gt;best-tools/&amp;lt;image&amp;gt;:v1.0.0&lt;/code&gt; and &lt;code&gt;best-tools/&amp;lt;image&amp;gt;:latest&lt;/code&gt; to Docker Hub. Pipeline trace at &lt;a class="link" href="https://dagger.cloud/" target="_blank" rel="noopener"
 &gt;cloud.dagger.io&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="links"&gt;Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/Backend-Engineering-Strategy-Tools" target="_blank" rel="noopener"
 &gt;Backend-Engineering-Strategy-Tools org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://hub.docker.com/u/best-tools" target="_blank" rel="noopener"
 &gt;best-tools on Docker Hub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/dagger/" &gt;Dagger pipelines&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Gardener on Cleura</title><link>https://backend-engineering-strategy-tools.github.io/site/projects/gardener/</link><pubDate>Tue, 16 Jun 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/projects/gardener/</guid><description>&lt;p&gt;Getting hands-on with &lt;a class="link" href="https://gardener.cloud/" target="_blank" rel="noopener"
 &gt;Gardener&lt;/a&gt; on &lt;a class="link" href="https://cleura.com/" target="_blank" rel="noopener"
 &gt;Cleura&lt;/a&gt; — a European OpenStack cloud — ahead of using it professionally. The focus is on the networking and traffic ingress side: how does a Gardener shoot cluster on OpenStack expose services, what does the LoadBalancer path actually look like, and when does ingress apply versus when it does not.&lt;/p&gt;
&lt;p&gt;The test application is a &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/projects/minecraft/" &gt;Minecraft server with Velocity proxy&lt;/a&gt; — useful precisely because it is raw TCP rather than HTTP, which forces the full LoadBalancer path rather than an ingress shortcut.&lt;/p&gt;
&lt;p&gt;→ &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/gardener/" &gt;Gardener on Cleura — technical notes&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="steps"&gt;Steps
&lt;/h2&gt;&lt;h3 id="1--shoot-cluster"&gt;1 — Shoot cluster
&lt;/h3&gt;&lt;p&gt;Provision a Gardener shoot cluster on Cleura. Cleura wraps Gardener behind their own REST API — &lt;code&gt;gardenctl&lt;/code&gt; and the Gardener Terraform provider require the garden cluster kubeconfig, which Cleura does not expose. Cluster lifecycle goes through their REST API instead.&lt;/p&gt;
&lt;p&gt;→ &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/gardener/#provisioning-a-shoot-cluster-on-cleura" &gt;Provisioning via Cleura REST API&lt;/a&gt;&lt;br&gt;
→ &lt;a class="link" href="https://github.com/cleura/docs/issues/533" target="_blank" rel="noopener"
 &gt;Cleura docs issue #533 — IaC and gardenctl access&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="2--minecraft-via-standard-loadbalancer"&gt;2 — Minecraft via standard LoadBalancer
&lt;/h3&gt;&lt;p&gt;Deploy &lt;a class="link" href="https://github.com/itzg/docker-minecraft-server" target="_blank" rel="noopener"
 &gt;&lt;code&gt;itzg/minecraft-server&lt;/code&gt;&lt;/a&gt; as a StatefulSet with a plain &lt;code&gt;LoadBalancer&lt;/code&gt; service for TCP 25565 — the direct Octavia path, no Gateway involved. Gets the server running quickly and confirms TCP exposure works on Cleura independently.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Internet
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TCP 25565
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Octavia LB (direct LoadBalancer service)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Minecraft Pod (itzg/minecraft-server)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PVC (Cinder)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Manifests: &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/scripts/mc/stateful_set.yaml" &gt;stateful_set.yaml&lt;/a&gt; · &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/scripts/mc/service.yaml" &gt;service.yaml&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Apply directly from this repo:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl apply -k &lt;span style="color:#e6db74"&gt;&amp;#34;https://github.com/Backend-Engineering-Strategy-Tools/site//static/scripts/mc?ref=main&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Check the rollout and grab the external IP:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl get all -l app&lt;span style="color:#f92672"&gt;=&lt;/span&gt;mc-example
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl get svc mc-example &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -o jsonpath&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].port}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="25--migrate-to-helm-chart"&gt;2.5 — Migrate to Helm chart
&lt;/h3&gt;&lt;p&gt;Swap the raw manifests for the &lt;a class="link" href="https://github.com/itzg/minecraft-server-charts" target="_blank" rel="noopener"
 &gt;&lt;code&gt;itzg/minecraft-server-charts&lt;/code&gt;&lt;/a&gt; Helm chart — actively maintained, covers server type, persistence, RCON, backups, and extra ports (BlueMap, Dynmap). The raw YAML stays useful as a reference for the underlying shape.&lt;/p&gt;
&lt;p&gt;Manifests: &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/scripts/mc-helm/values.yaml" &gt;values.yaml&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm repo add minecraft-server-charts https://itzg.github.io/minecraft-server-charts/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm upgrade --install mc minecraft-server-charts/minecraft &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -f https://backend-engineering-strategy-tools.github.io/site/scripts/mc-helm/values.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Check the rollout and grab the external IP:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl get all -l app.kubernetes.io/instance&lt;span style="color:#f92672"&gt;=&lt;/span&gt;mc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl get svc mc-example -o jsonpath&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].port}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="3--envoy-gateway"&gt;3 — Envoy Gateway
&lt;/h3&gt;&lt;p&gt;Deploy &lt;a class="link" href="https://gateway.envoyproxy.io/" target="_blank" rel="noopener"
 &gt;Envoy Gateway&lt;/a&gt; into the shoot cluster — the CNCF implementation of the &lt;a class="link" href="https://gateway-api.sigs.k8s.io/" target="_blank" rel="noopener"
 &gt;Kubernetes Gateway API&lt;/a&gt;. The NGINX Ingress Controller is deprecated; Gateway API is the forward path with a standardised spec for both HTTP and TCP.&lt;/p&gt;
&lt;p&gt;Envoy Gateway exposes a single &lt;code&gt;LoadBalancer&lt;/code&gt; service via Octavia. Everything routes through it.&lt;/p&gt;
&lt;h3 id="4--httproute-certificates-and-bluemap"&gt;4 — HTTPRoute, certificates, and BlueMap
&lt;/h3&gt;&lt;p&gt;Deploy &lt;a class="link" href="https://bluemap.bluecolored.de/" target="_blank" rel="noopener"
 &gt;BlueMap&lt;/a&gt; — a Minecraft mod that renders the world as a live 3D web map served over HTTP. Route it through the Gateway with a &lt;code&gt;HTTPRoute&lt;/code&gt; and wire &lt;a class="link" href="https://cert-manager.io/" target="_blank" rel="noopener"
 &gt;cert-manager&lt;/a&gt; to provision a Let&amp;rsquo;s Encrypt certificate.&lt;/p&gt;
&lt;p&gt;A real HTTP service with a real use, not a throwaway test page. Validates the full HTTP + TLS path before touching the game server.&lt;/p&gt;
&lt;h3 id="5--migrate-to-tcproute"&gt;5 — Migrate to TCPRoute
&lt;/h3&gt;&lt;p&gt;Migrate the TCP service to a &lt;code&gt;TCPRoute&lt;/code&gt; through Envoy Gateway. &lt;code&gt;TCPRoute&lt;/code&gt; is in the Gateway API experimental channel — this step validates that a single Gateway handles both HTTP and raw TCP.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Internet
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Octavia LB (one Gateway LoadBalancer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Envoy Gateway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;+------------------------------+------------------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HTTPRoute → BlueMap TCPRoute → Minecraft
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="6--velocity-if-needed"&gt;6 — Velocity (if needed)
&lt;/h3&gt;&lt;p&gt;Add &lt;a class="link" href="https://papermc.io/software/velocity" target="_blank" rel="noopener"
 &gt;Velocity&lt;/a&gt; as a TCP proxy in front of the Minecraft server if multi-server routing becomes relevant — lobby, modded, survival as separate backends. Skip if a single server is enough.&lt;/p&gt;
&lt;p&gt;→ &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/projects/minecraft/" &gt;Minecraft project&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="7--plugin-pipeline"&gt;7 — Plugin pipeline
&lt;/h3&gt;&lt;p&gt;A colleague is building a Minecraft plugin. The goal is a &lt;a class="link" href="https://dagger.io/" target="_blank" rel="noopener"
 &gt;Dagger&lt;/a&gt; pipeline with GitHub Actions — the same build running locally and in CI, covering the JVM toolchain and packaging steps.&lt;/p&gt;
&lt;h3 id="8--ai"&gt;8 — AI
&lt;/h3&gt;&lt;p&gt;Something with NPC behaviour, a bot, or plugin-side automation. Low priority, high fun.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="iac-gap"&gt;IaC gap
&lt;/h2&gt;&lt;p&gt;Cleura does not expose the garden cluster kubeconfig. That one limitation closes off the entire Gardener tooling ecosystem: &lt;code&gt;gardenctl&lt;/code&gt; requires it, the &lt;a class="link" href="https://registry.terraform.io/providers/gardener/gardener" target="_blank" rel="noopener"
 &gt;Gardener Terraform provider&lt;/a&gt; requires it, and any Crossplane provider built on the Gardener API would require it too. There is no HCL path here.&lt;/p&gt;
&lt;p&gt;What remains is Cleura&amp;rsquo;s own REST API — which is fine for interactive use but falls short the moment you want to drive cluster lifecycle from a pipeline. A bash script wrapping &lt;code&gt;curl&lt;/code&gt; and &lt;code&gt;jq&lt;/code&gt; works, and that is what &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/scripts/cleura-shoot.sh" &gt;cleura-shoot.sh&lt;/a&gt; does, but it is a workaround rather than a solution. No state, no plan, no diff — just imperative API calls.&lt;/p&gt;
&lt;p&gt;Options if this needs to graduate beyond a script:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Crossplane &lt;code&gt;provider-http&lt;/code&gt;&lt;/strong&gt; — can wrap the REST API declaratively, but has no native polling or deletion hooks, so the reconciliation story is awkward&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Custom Terraform provider&lt;/strong&gt; — full &lt;code&gt;plan&lt;/code&gt;/&lt;code&gt;apply&lt;/code&gt; semantics, but requires writing a Go provider from scratch&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pulumi dynamic provider&lt;/strong&gt; — similar effort, Python or TypeScript&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A feature request for gardenctl access or a native IaC provider has been filed with Cleura (→ &lt;a class="link" href="https://github.com/cleura/docs/issues/533" target="_blank" rel="noopener"
 &gt;cleura/docs#533&lt;/a&gt;). Until something changes there, the bash script is as good as it gets.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="status"&gt;Status
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Step&lt;/th&gt;
 &lt;th&gt;Status&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1 — Shoot cluster on Cleura&lt;/td&gt;
 &lt;td&gt;done&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2 — Minecraft via LoadBalancer (itzg)&lt;/td&gt;
 &lt;td&gt;planned&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2.5 — Migrate to Helm chart&lt;/td&gt;
 &lt;td&gt;planned&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3 — Envoy Gateway&lt;/td&gt;
 &lt;td&gt;planned&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4 — HTTPRoute + cert-manager + BlueMap&lt;/td&gt;
 &lt;td&gt;planned&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5 — Migrate to TCPRoute&lt;/td&gt;
 &lt;td&gt;planned&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;6 — Velocity&lt;/td&gt;
 &lt;td&gt;planned&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;7 — Plugin pipeline (Dagger)&lt;/td&gt;
 &lt;td&gt;planned&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;8 — AI&lt;/td&gt;
 &lt;td&gt;planned&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Building this out — notes will expand as each step lands.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Gardener on Cleura</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/gardener/</link><pubDate>Tue, 16 Jun 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/gardener/</guid><description>&lt;p&gt;&lt;a class="link" href="https://gardener.cloud/" target="_blank" rel="noopener"
 &gt;Gardener&lt;/a&gt; is a Kubernetes-as-a-Service framework that runs on Kubernetes and manages the lifecycle of other clusters declaratively. Rather than managing control planes by hand, Gardener treats clusters as a resource — defined, created, upgraded, and deleted via the Gardener API.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="concepts"&gt;Concepts
&lt;/h2&gt;&lt;p&gt;Gardener uses three layers:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Layer&lt;/th&gt;
 &lt;th&gt;What it is&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Garden cluster&lt;/td&gt;
 &lt;td&gt;Runs Gardener itself — the management control plane&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Seed cluster&lt;/td&gt;
 &lt;td&gt;Hosts the control planes of shoot clusters (as pods)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Shoot cluster&lt;/td&gt;
 &lt;td&gt;The cluster you actually use — nodes run on the target cloud&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The shoot cluster&amp;rsquo;s API server does not run on the shoot nodes. It runs as a pod inside the seed cluster. From the outside it behaves like any other Kubernetes cluster; internally the control plane is isolated from the data plane.&lt;/p&gt;
&lt;p&gt;Shoot clusters are defined as &lt;code&gt;Shoot&lt;/code&gt; resources applied to the garden cluster:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;core.gardener.cloud/v1beta1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Shoot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-cluster&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;garden-my-project&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;cloudProfileName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;openstack&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;region&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;sto2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;provider&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;type&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;openstack&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;workers&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;worker-pool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;machine&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;type&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;l2.c2r4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;minimum&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;maximum&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;kubernetes&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;version&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;1.30&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;networking&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;type&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;calico&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;pods&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;100.128.0.0&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;/11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;nodes&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;10.250.0.0&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;/16&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;services&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;100.112.0.0&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;/13&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="shoot-cluster-on-cleura"&gt;Shoot cluster on Cleura
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://cleura.com/" target="_blank" rel="noopener"
 &gt;Cleura&lt;/a&gt; is a European OpenStack provider. Gardener provisions shoot nodes as OpenStack VMs via the OpenStack machine controller.&lt;/p&gt;
&lt;p&gt;Key integrations:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Component&lt;/th&gt;
 &lt;th&gt;Implementation&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Node provisioning&lt;/td&gt;
 &lt;td&gt;OpenStack VMs via Gardener machine controller&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Load balancers&lt;/td&gt;
 &lt;td&gt;Octavia via cloud-controller-manager&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Block storage&lt;/td&gt;
 &lt;td&gt;Cinder via CSI driver&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;DNS&lt;/td&gt;
 &lt;td&gt;Manual or external-dns&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CNI&lt;/td&gt;
 &lt;td&gt;Calico (default) or configurable&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Gardener on Cleura does not provide an ingress controller or API gateway — these are brought in separately.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="networking"&gt;Networking
&lt;/h2&gt;&lt;p&gt;Gardener manages the cluster network configuration as part of the shoot spec. Pod, node, and service CIDRs are defined at cluster creation and must not overlap with the OpenStack network.&lt;/p&gt;
&lt;p&gt;On Cleura, nodes get OpenStack floating IPs for egress. Pod-to-pod traffic stays within the cluster overlay network (Calico by default). Traffic entering from outside the cluster goes through a &lt;code&gt;LoadBalancer&lt;/code&gt; service — either directly for raw TCP, or via a gateway controller for HTTP.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="ingress--classic-vs-gateway-api"&gt;Ingress — classic vs Gateway API
&lt;/h2&gt;&lt;p&gt;The classic Kubernetes &lt;code&gt;Ingress&lt;/code&gt; resource is HTTP-only, has no TCP support, and its feature set varies across implementations via non-standard annotations. The NGINX Ingress Controller — the most widely used implementation — is deprecated; NGINX now focuses on their &lt;a class="link" href="https://github.com/nginxinc/nginx-gateway-fabric" target="_blank" rel="noopener"
 &gt;Gateway API implementation&lt;/a&gt; instead.&lt;/p&gt;
&lt;p&gt;The &lt;a class="link" href="https://gateway-api.sigs.k8s.io/" target="_blank" rel="noopener"
 &gt;Kubernetes Gateway API&lt;/a&gt; is the forward path — a set of CRDs (&lt;code&gt;Gateway&lt;/code&gt;, &lt;code&gt;HTTPRoute&lt;/code&gt;, &lt;code&gt;TCPRoute&lt;/code&gt;, &lt;code&gt;TLSRoute&lt;/code&gt;) with a standardized spec and first-class support for both HTTP and TCP.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Resource&lt;/th&gt;
 &lt;th&gt;Protocol&lt;/th&gt;
 &lt;th&gt;API&lt;/th&gt;
 &lt;th&gt;Status&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;Ingress&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;HTTP only&lt;/td&gt;
 &lt;td&gt;Kubernetes&lt;/td&gt;
 &lt;td&gt;Stable, legacy&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;HTTPRoute&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;HTTP/HTTPS&lt;/td&gt;
 &lt;td&gt;Gateway API&lt;/td&gt;
 &lt;td&gt;Stable&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TCPRoute&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Raw TCP&lt;/td&gt;
 &lt;td&gt;Gateway API&lt;/td&gt;
 &lt;td&gt;Experimental&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TLSRoute&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;TLS passthrough&lt;/td&gt;
 &lt;td&gt;Gateway API&lt;/td&gt;
 &lt;td&gt;Experimental&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="envoy-gateway"&gt;Envoy Gateway
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://gateway.envoyproxy.io/" target="_blank" rel="noopener"
 &gt;Envoy Gateway&lt;/a&gt; is the CNCF implementation of the Kubernetes Gateway API using &lt;a class="link" href="https://www.envoyproxy.io/" target="_blank" rel="noopener"
 &gt;Envoy&lt;/a&gt; as the data plane. It supports &lt;code&gt;HTTPRoute&lt;/code&gt;, &lt;code&gt;TCPRoute&lt;/code&gt;, and &lt;code&gt;TLSRoute&lt;/code&gt; through a single &lt;code&gt;Gateway&lt;/code&gt; resource — one entry point, both protocols.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Octavia LB ← one LoadBalancer service per Gateway listener
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Envoy Gateway pod
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;+------------------+------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HTTPRoute → ClusterIP pods TCPRoute → ClusterIP pods
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Envoy Gateway is deployed into the shoot cluster and exposes a &lt;code&gt;LoadBalancer&lt;/code&gt; service via Octavia, the same as any other service. The Gateway API resources then declare what routes through it.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tcproute--declaring-tcp-services"&gt;TCPRoute — declaring TCP services
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;TCPRoute&lt;/code&gt; attaches to a &lt;code&gt;Gateway&lt;/code&gt; listener and routes raw TCP traffic to a backend service. This is how a non-HTTP workload (e.g. a game server, a database proxy, a custom protocol service) gets exposed through the Gateway API rather than a standalone &lt;code&gt;LoadBalancer&lt;/code&gt; service.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;gateway.networking.k8s.io/v1alpha2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;TCPRoute&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-tcp-service&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;parentRefs&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-gateway&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;sectionName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;tcp-listener&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;rules&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;backendRefs&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-service&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;port&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The corresponding &lt;code&gt;Gateway&lt;/code&gt; listener:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;gateway.networking.k8s.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Gateway&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-gateway&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;gatewayClassName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;envoy-gateway&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;listeners&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;tcp-listener&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;protocol&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;TCP&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;port&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;http-listener&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;protocol&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;HTTP&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;port&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;One Gateway, both protocols declared explicitly. The &lt;code&gt;TCPRoute&lt;/code&gt; API is in the experimental channel and requires opting in when installing Envoy Gateway.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="httproute--http-services"&gt;HTTPRoute — HTTP services
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;HTTPRoute&lt;/code&gt; handles HTTP and HTTPS traffic with routing by hostname, path, header, or method — without annotations.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;gateway.networking.k8s.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;HTTPRoute&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-http-service&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;parentRefs&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-gateway&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;sectionName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;http-listener&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;hostnames&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;my-app.example.com&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;rules&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;matches&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;path&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;type&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;PathPrefix&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;value&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;backendRefs&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-service&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;port&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;8080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="loadbalancer--direct-tcp-via-octavia"&gt;LoadBalancer — direct TCP via Octavia
&lt;/h2&gt;&lt;p&gt;For cases where a &lt;code&gt;TCPRoute&lt;/code&gt; is not appropriate (or the Gateway API experimental channel is not enabled), a &lt;code&gt;LoadBalancer&lt;/code&gt; service provisions an Octavia LB directly:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Service&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-tcp-service&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;type&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;LoadBalancer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;selector&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;app&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;ports&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;port&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetPort&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;protocol&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;TCP&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Annotations control Octavia behaviour — timeouts, health check parameters, internal vs external. These are provider-specific and not standardised across OpenStack deployments.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="storage"&gt;Storage
&lt;/h2&gt;&lt;p&gt;Cinder block volumes are available via the CSI driver. A &lt;code&gt;PersistentVolumeClaim&lt;/code&gt; provisions a Cinder volume automatically using the cluster&amp;rsquo;s default storage class.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;PersistentVolumeClaim&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;accessModes&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;ReadWriteOnce&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;requests&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;storage&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;20Gi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Cinder volumes are &lt;code&gt;ReadWriteOnce&lt;/code&gt; — they attach to a single node. For stateful workloads, use &lt;code&gt;StatefulSet&lt;/code&gt; rather than &lt;code&gt;Deployment&lt;/code&gt; to get stable volume binding across pod restarts.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="provisioning-a-shoot-cluster-on-cleura"&gt;Provisioning a shoot cluster on Cleura
&lt;/h2&gt;&lt;p&gt;Cleura wraps Gardener behind their own REST API at &lt;code&gt;rest.cleura.cloud&lt;/code&gt;. The garden cluster kubeconfig is not exposed — &lt;code&gt;gardenctl&lt;/code&gt; does not work directly. Cluster lifecycle is managed through HTTP calls.&lt;/p&gt;
&lt;h3 id="authentication"&gt;Authentication
&lt;/h3&gt;&lt;p&gt;Every call requires a token obtained once per session:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl -s -X POST https://rest.cleura.cloud/auth/v1/tokens &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;Content-Type: application/json&amp;#34;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -d &lt;span style="color:#e6db74"&gt;&amp;#39;{&amp;#34;auth&amp;#34;: {&amp;#34;login&amp;#34;: &amp;#34;you@example.com&amp;#34;, &amp;#34;password&amp;#34;: &amp;#34;yourpass&amp;#34;}}&amp;#39;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | jq &lt;span style="color:#e6db74"&gt;&amp;#39;{token: .token}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Pass &lt;code&gt;X-AUTH-LOGIN&lt;/code&gt; and &lt;code&gt;X-AUTH-TOKEN&lt;/code&gt; headers on all subsequent calls.&lt;/p&gt;
&lt;h3 id="bootstrap-once-per-projectregion"&gt;Bootstrap (once per project/region)
&lt;/h3&gt;&lt;p&gt;Before creating any clusters, the project must be bootstrapped — this wires up the OpenStack credentials that Gardener uses to provision nodes:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl -X POST &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; https://rest.cleura.cloud/gardener/v1/public/secret/kna1/&lt;span style="color:#f92672"&gt;{&lt;/span&gt;projectId&lt;span style="color:#f92672"&gt;}&lt;/span&gt;/bootstrap &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;X-AUTH-LOGIN: ...&amp;#34;&lt;/span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;X-AUTH-TOKEN: ...&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Safe to call repeatedly; idempotent.&lt;/p&gt;
&lt;h3 id="create-a-shoot-cluster"&gt;Create a shoot cluster
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl -X POST &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; https://rest.cleura.cloud/gardener/v1/public/shoot/kna1/&lt;span style="color:#f92672"&gt;{&lt;/span&gt;projectId&lt;span style="color:#f92672"&gt;}&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;X-AUTH-LOGIN: ...&amp;#34;&lt;/span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;X-AUTH-TOKEN: ...&amp;#34;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;Content-Type: application/json&amp;#34;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -d &lt;span style="color:#e6db74"&gt;&amp;#39;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;shoot&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;name&amp;#34;: &amp;#34;my-cluster&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;kubernetes&amp;#34;: {&amp;#34;version&amp;#34;: &amp;#34;1.31.0&amp;#34;},
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;provider&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;infrastructureConfig&amp;#34;: {&amp;#34;floatingPoolName&amp;#34;: &amp;#34;ext-net&amp;#34;},
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;workers&amp;#34;: [{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;name&amp;#34;: &amp;#34;default&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;machine&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;type&amp;#34;: &amp;#34;4C-8GB-50GB&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;image&amp;#34;: {&amp;#34;name&amp;#34;: &amp;#34;ubuntu&amp;#34;, &amp;#34;version&amp;#34;: &amp;#34;22.4.20230301&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;minimum&amp;#34;: 1,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;maximum&amp;#34;: 3,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &amp;#34;volume&amp;#34;: {&amp;#34;size&amp;#34;: &amp;#34;50Gi&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; }]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; }&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="poll-until-ready"&gt;Poll until ready
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl https://rest.cleura.cloud/gardener/v1/public/shoot/kna1/&lt;span style="color:#f92672"&gt;{&lt;/span&gt;projectId&lt;span style="color:#f92672"&gt;}&lt;/span&gt;/my-cluster &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;X-AUTH-LOGIN: ...&amp;#34;&lt;/span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;X-AUTH-TOKEN: ...&amp;#34;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | jq &lt;span style="color:#e6db74"&gt;&amp;#39;.lastOperation | {state, description, progress}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Poll until &lt;code&gt;lastOperation.state == &amp;quot;Succeeded&amp;quot;&lt;/code&gt;. Takes roughly 10–15 minutes on first provision.&lt;/p&gt;
&lt;h3 id="fetch-kubeconfig"&gt;Fetch kubeconfig
&lt;/h3&gt;&lt;p&gt;The Cleura docs reference two kubeconfig paths — &lt;code&gt;GET /kubeconfig&lt;/code&gt; (lowercase) and &lt;code&gt;POST /Kubeconfig&lt;/code&gt; (uppercase, different casing). Neither worked reliably in practice. The endpoint that actually returns a kubeconfig is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl -s -X POST &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; https://rest.cleura.cloud/gardener/v1/public/shoot/kna1/&lt;span style="color:#f92672"&gt;{&lt;/span&gt;projectId&lt;span style="color:#f92672"&gt;}&lt;/span&gt;/my-cluster/adminkubeconfig &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;X-AUTH-LOGIN: ...&amp;#34;&lt;/span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;X-AUTH-TOKEN: ...&amp;#34;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -H &lt;span style="color:#e6db74"&gt;&amp;#34;Content-Type: application/json&amp;#34;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -d &lt;span style="color:#e6db74"&gt;&amp;#39;{&amp;#34;config&amp;#34;: {&amp;#34;expirationSeconds&amp;#34;: 3600}}&amp;#39;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | jq -r &amp;gt; my-cluster-kubeconfig.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;expirationSeconds&lt;/code&gt; field controls credential lifetime. A bug report has been filed with Cleura about the endpoint inconsistency — the &lt;code&gt;adminkubeconfig&lt;/code&gt; path is not documented.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Path&lt;/th&gt;
 &lt;th&gt;Method&lt;/th&gt;
 &lt;th&gt;Documented&lt;/th&gt;
 &lt;th&gt;Works&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/kubeconfig&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;GET&lt;/td&gt;
 &lt;td&gt;yes&lt;/td&gt;
 &lt;td&gt;unclear&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/Kubeconfig&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;POST&lt;/td&gt;
 &lt;td&gt;yes&lt;/td&gt;
 &lt;td&gt;unclear&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/adminkubeconfig&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;POST&lt;/td&gt;
 &lt;td&gt;no&lt;/td&gt;
 &lt;td&gt;yes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;→ &lt;a class="link" href="https://github.com/cleura/docs/issues/534" target="_blank" rel="noopener"
 &gt;Cleura docs issue #534 — kubeconfig endpoint inconsistencies in Gardener REST API&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="script"&gt;Script
&lt;/h3&gt;&lt;p&gt;A bash script wrapping the full workflow (list, create, wait, kubeconfig, delete) is available: &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/scripts/cleura-shoot.sh" &gt;cleura-shoot.sh&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export CLEURA_LOGIN&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;you@example.com&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export CLEURA_PASSWORD&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;yourpass&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./cleura-shoot.sh list
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./cleura-shoot.sh create my-cluster
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./cleura-shoot.sh wait my-cluster
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./cleura-shoot.sh kubeconfig my-cluster
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./cleura-shoot.sh delete my-cluster
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="iac-options"&gt;IaC options
&lt;/h3&gt;&lt;p&gt;No native Terraform provider exists for Cleura&amp;rsquo;s Gardener REST API. The Gardener Terraform provider (&lt;code&gt;registry.terraform.io/providers/gardener/gardener&lt;/code&gt;) requires the garden cluster kubeconfig, which Cleura does not expose. Options:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Approach&lt;/th&gt;
 &lt;th&gt;Notes&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Bash + curl&lt;/td&gt;
 &lt;td&gt;Minimal deps — just &lt;code&gt;curl&lt;/code&gt; and &lt;code&gt;jq&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Crossplane &lt;code&gt;provider-http&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Declarative, Kubernetes-native, reconciliation loop&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Custom Terraform provider&lt;/td&gt;
 &lt;td&gt;Full &lt;code&gt;plan&lt;/code&gt;/&lt;code&gt;apply&lt;/code&gt; semantics — requires Go provider development&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Pulumi custom dynamic provider&lt;/td&gt;
 &lt;td&gt;Python/TypeScript, similar effort to custom Terraform provider&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;</description></item><item><title>Kubernetes Policy</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/policy-as-code/kubernetes-policy/</link><pubDate>Mon, 08 Jun 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/policy-as-code/kubernetes-policy/</guid><description>&lt;p&gt;Kubernetes has three distinct policy enforcement mechanisms. They sit at the same point in the request lifecycle — the admission controller — but differ in language, capability, and operational complexity.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;th&gt;Kyverno&lt;/th&gt;
 &lt;th&gt;Gatekeeper (OPA)&lt;/th&gt;
 &lt;th&gt;ValidatingAdmissionPolicy&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Language&lt;/td&gt;
 &lt;td&gt;YAML/JMESPath&lt;/td&gt;
 &lt;td&gt;Rego&lt;/td&gt;
 &lt;td&gt;CEL&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Native to K8s&lt;/td&gt;
 &lt;td&gt;No (CRD)&lt;/td&gt;
 &lt;td&gt;No (CRD)&lt;/td&gt;
 &lt;td&gt;Yes (built-in)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Validate&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Mutate&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Limited&lt;/td&gt;
 &lt;td&gt;Yes (1.32+)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Generate&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Image verify&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GA since&lt;/td&gt;
 &lt;td&gt;—&lt;/td&gt;
 &lt;td&gt;—&lt;/td&gt;
 &lt;td&gt;K8s 1.30&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Good for&lt;/td&gt;
 &lt;td&gt;Full-featured, K8s-native feel&lt;/td&gt;
 &lt;td&gt;Rego-first teams, policy-as-data&lt;/td&gt;
 &lt;td&gt;Simple rules, no extra install&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="validatingadmissionpolicy-vap"&gt;ValidatingAdmissionPolicy (VAP)
&lt;/h2&gt;&lt;p&gt;Added in Kubernetes 1.26, &lt;strong&gt;GA in 1.30&lt;/strong&gt;. Policies are built into the API server — no admission controller to deploy or maintain. Policy is written in &lt;strong&gt;CEL&lt;/strong&gt; (Common Expression Language), a simple expression language also used in Kubernetes&amp;rsquo; &lt;code&gt;x-kubernetes-validations&lt;/code&gt; CRD validation.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;admissionregistration.k8s.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ValidatingAdmissionPolicy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;require-run-as-non-root&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;failurePolicy&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Fail&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;matchConstraints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;resourceRules&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;apiGroups&lt;/span&gt;: [&lt;span style="color:#e6db74"&gt;&amp;#34;apps&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;apiVersions&lt;/span&gt;: [&lt;span style="color:#e6db74"&gt;&amp;#34;v1&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;operations&lt;/span&gt;: [&lt;span style="color:#e6db74"&gt;&amp;#34;CREATE&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;resources&lt;/span&gt;: [&lt;span style="color:#e6db74"&gt;&amp;#34;deployments&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;validations&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;expression&lt;/span&gt;: &amp;gt;&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; object.spec.template.spec.securityContext.runAsNonRoot == true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;message&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;Pods must run as non-root&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;admissionregistration.k8s.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ValidatingAdmissionPolicyBinding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;require-run-as-non-root-binding&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;policyName&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;require-run-as-non-root&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;validationActions&lt;/span&gt;: [&lt;span style="color:#ae81ff"&gt;Deny]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;matchResources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespaceSelector&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;matchLabels&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;enforce-policy&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;true&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;ValidatingAdmissionPolicyBinding&lt;/code&gt; scopes where a policy applies — cluster-wide, specific namespaces, or by label selector.&lt;/p&gt;
&lt;h3 id="cel-basics"&gt;CEL basics
&lt;/h3&gt;&lt;p&gt;CEL expressions have access to &lt;code&gt;object&lt;/code&gt; (the incoming resource), &lt;code&gt;oldObject&lt;/code&gt; (for updates), &lt;code&gt;request&lt;/code&gt; (metadata, user, etc.), and &lt;code&gt;params&lt;/code&gt; (a referenced ConfigMap or CRD for parameterisation).&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# Simple field check
object.spec.replicas &amp;lt;= 10

# Nested optional field (use ?. for optional traversal)
object.spec.template.spec.?securityContext.?runAsNonRoot == optional.of(true)

# List comprehension — all containers must have limits
object.spec.template.spec.containers.all(c,
 has(c.resources) &amp;amp;&amp;amp; has(c.resources.limits)
)

# String operations
object.metadata.name.startsWith(&amp;#34;prod-&amp;#34;)
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="mutatingadmissionpolicy"&gt;MutatingAdmissionPolicy
&lt;/h3&gt;&lt;p&gt;Added in Kubernetes 1.32 (alpha). Brings CEL-based mutation — set defaults, inject labels, patch fields — without Kyverno or a webhook. Still early; not production-ready yet.&lt;/p&gt;
&lt;h2 id="gatekeeper"&gt;Gatekeeper
&lt;/h2&gt;&lt;p&gt;OPA running as a Kubernetes admission controller. Policies are written in Rego and stored as &lt;code&gt;ConstraintTemplate&lt;/code&gt; CRDs. The separation between template (the Rego logic) and constraint (the enforcement configuration + parameters) is the key design pattern.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;templates.gatekeeper.sh/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ConstraintTemplate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;k8srequiredlabels&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;crd&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;names&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;K8sRequiredLabels&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;validation&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;openAPIV3Schema&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;properties&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;labels&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;type&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;array&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;items&lt;/span&gt;: {&lt;span style="color:#f92672"&gt;type&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;string}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targets&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;target&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;admission.k8s.gatekeeper.sh&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;rego&lt;/span&gt;: |&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; package k8srequiredlabels
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; violation[{&amp;#34;msg&amp;#34;: msg}] {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; provided := {label | input.review.object.metadata.labels[label]}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; required := {label | label := input.parameters.labels[_]}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; missing := required - provided
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; count(missing) &amp;gt; 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; msg := sprintf(&amp;#34;missing required labels: %v&amp;#34;, [missing])
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;constraints.gatekeeper.sh/v1beta1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;K8sRequiredLabels&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;require-team-label&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;match&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;kinds&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;apiGroups&lt;/span&gt;: [&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;kinds&lt;/span&gt;: [&lt;span style="color:#e6db74"&gt;&amp;#34;Namespace&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;parameters&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;labels&lt;/span&gt;: [&lt;span style="color:#e6db74"&gt;&amp;#34;team&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;environment&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Gatekeeper also supports &lt;strong&gt;audit mode&lt;/strong&gt; — it continuously evaluates existing resources against policies and surfaces violations without blocking. Useful for measuring compliance against policies you&amp;rsquo;re not yet ready to enforce.&lt;/p&gt;
&lt;h3 id="gatekeeper-vs-kyverno"&gt;Gatekeeper vs Kyverno
&lt;/h3&gt;&lt;p&gt;Kyverno is better if your team does not know Rego and wants policies that look like Kubernetes manifests. Gatekeeper is better if you are already invested in OPA/Rego and want a single policy language across K8s and non-K8s surfaces (via Conftest).&lt;/p&gt;
&lt;p&gt;The K8s-native VAP is the right default for simple validation rules on new clusters — no extra install, but it does not cover mutation (until 1.32+), generation, or image verification.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://open-policy-agent.github.io/gatekeeper/" target="_blank" rel="noopener"
 &gt;Gatekeeper documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/open-policy-agent/gatekeeper-library" target="_blank" rel="noopener"
 &gt;Gatekeeper library&lt;/a&gt; — ready-made ConstraintTemplates&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/" target="_blank" rel="noopener"
 &gt;ValidatingAdmissionPolicy docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://kubernetes.io/docs/reference/using-api/cel/" target="_blank" rel="noopener"
 &gt;CEL in Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kyverno/" &gt;Kyverno&lt;/a&gt; — separate note&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Crossplane</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/infra-as-code/crossplane/</link><pubDate>Wed, 03 Jun 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/infra-as-code/crossplane/</guid><description>&lt;p&gt;Crossplane is Kubernetes-native infrastructure management. Where Terraform runs as a CLI tool that applies changes and exits, Crossplane runs as a controller inside a Kubernetes cluster and continuously reconciles infrastructure — the same control loop model as Kubernetes itself.&lt;/p&gt;
&lt;p&gt;Cloud resources become Kubernetes objects. You &lt;code&gt;kubectl apply&lt;/code&gt; an RDS instance the same way you apply a Deployment. Crossplane&amp;rsquo;s controllers watch those objects and make the API calls to converge actual infrastructure to the desired state.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="core-concepts"&gt;Core concepts
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Providers&lt;/strong&gt; extend Crossplane with CRDs for a specific cloud. &lt;code&gt;provider-aws&lt;/code&gt; adds Kubernetes resources for every AWS service — S3 buckets, RDS instances, VPCs. Apply a provider, get hundreds of new resource types.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Managed Resources (MRs)&lt;/strong&gt; are the individual cloud resources:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;s3.aws.upbound.io/v1beta1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Bucket&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-assets&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;forProvider&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;region&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;eu-central-1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;tags&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;Environment&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;prod&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Crossplane creates this bucket and keeps it in sync. If someone deletes it outside of Crossplane, the controller recreates it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Composite Resources (XRs)&lt;/strong&gt; are the powerful part. You define your own CRDs — a &lt;code&gt;Platform&lt;/code&gt; or &lt;code&gt;DatabaseCluster&lt;/code&gt; — that compose multiple managed resources. A developer applies a &lt;code&gt;DatabaseCluster&lt;/code&gt; and gets an RDS instance, a subnet group, a parameter group, and security groups, all wired together, without needing to know any of the details.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;XRDs (Composite Resource Definitions)&lt;/strong&gt; define the schema for composite resources — what fields the developer sees, what defaults apply.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compositions&lt;/strong&gt; define how a composite resource maps to managed resources — the implementation behind the abstraction.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-platform-engineering-model"&gt;The platform engineering model
&lt;/h2&gt;&lt;p&gt;Crossplane&amp;rsquo;s real value is as a platform layer. A platform team owns the Compositions — they define what a &amp;ldquo;compliant database&amp;rdquo; or &amp;ldquo;standard app environment&amp;rdquo; looks like. Dev teams consume the simplified abstractions without touching the underlying cloud resources.&lt;/p&gt;
&lt;p&gt;Self-service infrastructure with guardrails baked in.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="vs-terraform"&gt;vs Terraform
&lt;/h2&gt;&lt;p&gt;Crossplane and Terraform are not direct alternatives — they solve the problem differently.&lt;/p&gt;
&lt;p&gt;Terraform is a CLI tool: run plan, review, apply, exit. State is a file. Good for human-in-the-loop workflows and one-off provisioning.&lt;/p&gt;
&lt;p&gt;Crossplane is a control plane: always running, always reconciling. Better for continuous enforcement and self-service platforms. More complex to set up and operate.&lt;/p&gt;
&lt;p&gt;In practice: Terraform for provisioning foundational infrastructure (clusters, networks, accounts). Crossplane for what runs on top of the cluster — letting application teams provision their own databases, queues, and object storage through Kubernetes-native APIs.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="upbound"&gt;Upbound
&lt;/h2&gt;&lt;p&gt;The commercial platform behind Crossplane. Managed control plane hosting, a marketplace of providers and compositions, and tooling for building and publishing your own platform APIs. Worth evaluating if you are building a serious internal platform.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="learning-curve"&gt;Learning curve
&lt;/h2&gt;&lt;p&gt;Steep. You need to understand Kubernetes controllers, CRDs, and the Crossplane composition model before you can be productive. The payoff is a genuinely powerful platform abstraction — but it is not a beginner tool.&lt;/p&gt;
&lt;p&gt;A good framing: Crossplane is a &lt;strong&gt;digital twin of your infrastructure&lt;/strong&gt;. The cluster holds the desired state of everything — cloud resources, application configuration, other tools — and continuously reconciles reality to match it.&lt;/p&gt;
&lt;p&gt;Genuinely cool and worth learning if you have a cluster. The provider model has expanded well beyond cloud infrastructure — from v2 onwards Crossplane can manage applications, not just infra. There are also providers for Ansible and Terraform/OpenTofu, which means Crossplane can be the orchestration layer that drives other IaC tools. One control plane to rule them all.&lt;/p&gt;
&lt;p&gt;The prerequisite is the cluster itself. If you already run Kubernetes, Crossplane is a natural extension of the same model you already operate. If you do not, it is not the tool to start with.&lt;/p&gt;</description></item><item><title>ASGARD — the blade cluster</title><link>https://backend-engineering-strategy-tools.github.io/site/homelab/asgard-blades/</link><pubDate>Fri, 15 May 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/homelab/asgard-blades/</guid><description>&lt;p&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/inventory/systems/" &gt;ASGARD (SYS-007)&lt;/a&gt; is the HP BladeSystem C7000 with 16× BL460c Gen8 blades. The reason to use it is profile switching: boot a blade as a Slurm compute node, run the experiment, reimage it as a Talos worker, run the next one. The same iPXE boot menu already set up for &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/talos-omni/" &gt;ODEN&lt;/a&gt; works here — the C7000 Onboard Administrator lets you configure boot order per blade slot, so switching roles is a BIOS setting and a PXE entry, not a reinstall.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="power-reality"&gt;Power reality
&lt;/h2&gt;&lt;p&gt;Before committing to blades as the permanent always-on platform, it&amp;rsquo;s worth being honest about the enclosure overhead. The C7000 has fixed costs regardless of how many blades are populated: 10 fans, dual OA modules, 2 interconnect switches, backplane management. It doesn&amp;rsquo;t scale down gracefully.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Setup&lt;/th&gt;
 &lt;th&gt;Approx power&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;C7000 enclosure alone (no blades)&lt;/td&gt;
 &lt;td&gt;200–400W&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;C7000 + 1 blade&lt;/td&gt;
 &lt;td&gt;350–550W&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;C7000 + 3 blades&lt;/td&gt;
 &lt;td&gt;500–800W&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ODEN alone (1U M3, Talos)&lt;/td&gt;
 &lt;td&gt;100–150W&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;HEIMDAL alone (Sun X4150, router)&lt;/td&gt;
 &lt;td&gt;150–200W&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ODEN + HEIMDAL&lt;/td&gt;
 &lt;td&gt;250–350W&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Two pizza boxes beat three blades in the enclosure on power. The overhead only amortises at 8+ populated slots. For a permanent minimal setup, the 1U rack servers win. For experiments where you want to run 8–16 nodes at once, ASGARD earns its place.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="what-each-role-actually-needs"&gt;What each role actually needs
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;th&gt;RAM&lt;/th&gt;
 &lt;th&gt;Disk&lt;/th&gt;
 &lt;th&gt;Network&lt;/th&gt;
 &lt;th&gt;Limiting factor&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Talos / K8s worker&lt;/td&gt;
 &lt;td&gt;32–64GB&lt;/td&gt;
 &lt;td&gt;1× OSD disk&lt;/td&gt;
 &lt;td&gt;1GbE fine&lt;/td&gt;
 &lt;td&gt;RAM — current blades too thin&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;OpenStack compute&lt;/td&gt;
 &lt;td&gt;32–64GB&lt;/td&gt;
 &lt;td&gt;local ephemeral&lt;/td&gt;
 &lt;td&gt;1GbE fine&lt;/td&gt;
 &lt;td&gt;RAM&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;OpenStack control&lt;/td&gt;
 &lt;td&gt;32GB+&lt;/td&gt;
 &lt;td&gt;small&lt;/td&gt;
 &lt;td&gt;1GbE fine&lt;/td&gt;
 &lt;td&gt;RAM&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Slurm compute&lt;/td&gt;
 &lt;td&gt;as much as possible&lt;/td&gt;
 &lt;td&gt;fast scratch&lt;/td&gt;
 &lt;td&gt;1GbE mediocre&lt;/td&gt;
 &lt;td&gt;network&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Ceph OSD&lt;/td&gt;
 &lt;td&gt;16–32GB&lt;/td&gt;
 &lt;td&gt;more / bigger disks&lt;/td&gt;
 &lt;td&gt;1GbE&lt;/td&gt;
 &lt;td&gt;disk count&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The network note matters for Slurm: blade LOM connects to the enclosure switch backplane at &lt;strong&gt;1GbE&lt;/strong&gt;, not 10GbE. The switch has 10GbE uplinks going out, but blade-to-blade traffic inside the enclosure goes through the switch at 1GbE. For Talos and OpenStack this is fine. For MPI jobs exchanging large datasets between Slurm nodes it&amp;rsquo;s a real bottleneck — HPC wants InfiniBand, which the empty interconnect bays 5–8 could take (plus matching mezzanine cards in each blade), but that&amp;rsquo;s a separate cost. For learning Slurm, 1GbE is workable.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="current-blade-state"&gt;Current blade state
&lt;/h2&gt;&lt;p&gt;Most blades are underpowered for any of the roles above. CPUs are also unknown across all 16 slots — the OA web GUI reports CPU model and core count per blade and should be checked first. The E5-2600 v1 range runs from E5-2603 (4c, 80W) to E5-2690 (8c/16t, 135W), which matters significantly for role assignment.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Slot&lt;/th&gt;
 &lt;th&gt;RAM&lt;/th&gt;
 &lt;th&gt;Disk&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-001&lt;/td&gt;
 &lt;td&gt;4GB&lt;/td&gt;
 &lt;td&gt;2× 146GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-002&lt;/td&gt;
 &lt;td&gt;14GB (mixed, odd count)&lt;/td&gt;
 &lt;td&gt;—&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-003&lt;/td&gt;
 &lt;td&gt;32GB&lt;/td&gt;
 &lt;td&gt;2× 300GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-004&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;—&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-005&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;1× 146GB + 1× 300GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-006&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;2× 300GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-007&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;2× 900GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-008&lt;/td&gt;
 &lt;td&gt;16GB&lt;/td&gt;
 &lt;td&gt;2× 300GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-009&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;—&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-010&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;2× 300GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-011&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;2× 300GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-012&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;2× 300GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-013&lt;/td&gt;
 &lt;td&gt;32GB&lt;/td&gt;
 &lt;td&gt;—&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-014&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;—&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-015&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;2× 300GB SAS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLD-016&lt;/td&gt;
 &lt;td&gt;8GB&lt;/td&gt;
 &lt;td&gt;—&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;BLD-003 and BLD-013 are already at 32GB and are natural candidates for control-plane or master roles once CPUs are confirmed.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="suggested-configuration-from-existing-stock"&gt;Suggested configuration from existing stock
&lt;/h2&gt;&lt;p&gt;Available spare hardware:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;14× RAM-007 (8GB DDR3 1600MHz ECC Reg) — unassigned&lt;/li&gt;
&lt;li&gt;2× HDD-004 (120GB SATA SSD) — spare&lt;/li&gt;
&lt;li&gt;6× HDD-002 (146GB 10K SAS) — spare&lt;/li&gt;
&lt;li&gt;Embedded P220i on each blade (can be set to JBOD/passthrough for Ceph)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Fat&amp;rdquo; nodes × 2&lt;/strong&gt; — Talos control plane, OpenStack control, Slurm master:
Add 4× RAM-007 to each blade. From a base of 8–16GB that gives ~40GB. Candidates: BLD-006 and BLD-010, both have 2× 300GB SAS for local storage. Costs 8 of 14 spare sticks. Install a spare 120GB SSD as boot disk in each.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Medium&amp;rdquo; nodes × 3&lt;/strong&gt; — Talos workers, OpenStack compute, Slurm compute:
Add 2× RAM-007 to each → 24GB from the 8GB base. Candidates: BLD-008 (already 16GB, gets to 32GB), BLD-011, BLD-012. All three have 300GB SAS for scratch or Ceph OSDs. Costs the remaining 6 spare sticks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rest&lt;/strong&gt; — thin compute, storage expansion, or powered off:
Leave at current RAM. BLD-007&amp;rsquo;s 900GB SAS pair is better used elsewhere (see below). BLD-003 and BLD-013 at 32GB can step up to fat-node role once CPUs are confirmed.&lt;/p&gt;
&lt;p&gt;That leaves 5 blades properly kitted and 11 available for experiments or idle.&lt;/p&gt;
&lt;p&gt;BL460c Gen8 DIMM rule: populate per-CPU symmetrically — pairs or quads per memory channel — for best throughput. Don&amp;rsquo;t mix odd counts.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="storage--what-moves-where"&gt;Storage — what moves where
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Pull the 900GB SAS drives from BLD-007 now.&lt;/strong&gt; HDD-013 (HGST 900GB) and HDD-014 (Toshiba 900GB) are the two largest drives in the blade pool and they&amp;rsquo;re sitting in a blade that may end up as a thin compute worker. Move them into ODEN or LOKE as permanent Ceph OSDs. This immediately gives the always-on cluster substantially more storage than the current 120GB SSDs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MIMIR&lt;/strong&gt; (SYS-004, 15× 1TB SAS) is the Ceph expansion story for later. To connect it: install CTRL-006 (ServeRAID-8e, have 2 unplaced) into a server with a free PCIe slot, then cable it with a SFF-8470 → SFF-8088 cable (not currently owned, inexpensive). TOR is the natural host — it already has CTRL-003 in HBA mode and free PCIe slots. Not urgent, but the hardware is almost all there.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;What&lt;/th&gt;
 &lt;th&gt;Goes to&lt;/th&gt;
 &lt;th&gt;When&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;900GB SAS ×2 from BLD-007&lt;/td&gt;
 &lt;td&gt;ODEN or LOKE, permanent Ceph OSDs&lt;/td&gt;
 &lt;td&gt;Now&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;120GB SSD ×2 spare&lt;/td&gt;
 &lt;td&gt;BLD fat node boot disks&lt;/td&gt;
 &lt;td&gt;Before Talos on blades&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;300GB SAS in blades&lt;/td&gt;
 &lt;td&gt;Local scratch or blade Ceph OSDs&lt;/td&gt;
 &lt;td&gt;During ASGARD experiments&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;MIMIR 15× 1TB SAS&lt;/td&gt;
 &lt;td&gt;TOR via CTRL-006, Ceph expansion&lt;/td&gt;
 &lt;td&gt;Later (needs cable)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="three-things-to-do-before-blades-can-boot-anything"&gt;Three things to do before blades can boot anything
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Identify CPUs.&lt;/strong&gt; Connect to the OA management port, open the web GUI, check CPU model per slot. Ten minutes. Everything else depends on this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Network uplink.&lt;/strong&gt; The blade switches in bays 1 and 2 have 4× RJ45 1GbE uplinks (ports 22–25). Run a patch cable from one to any available switch — MODI, MAGNI, whatever&amp;rsquo;s reachable from the cable box. That&amp;rsquo;s enough for blades to reach DHCP and iPXE.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;RAM redistribution.&lt;/strong&gt; Pull the 14 spare RAM-007 sticks and install into the chosen fat and medium nodes per the profile above.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2 id="the-permanent-vs-experiment-split"&gt;The permanent vs experiment split
&lt;/h2&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;Always on (~300–400W total):
 HEIMDAL → OPNsense router, Sun X4150, ~150–200W
 ODEN → Talos, Minecraft + small services, ~100–150W
 LOKE → 2nd Talos node (needs RAM-007 × 8 + SSD boot), ~100–150W

Experiments (fire up, learn, power off):
 ASGARD → 3–16 blades for Slurm / OpenStack / larger Talos cluster
 TYR+TOR+FREJA → Proxmox cluster (M1 DDR2, temporary)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Once the Proxmox experiment wraps, TYR, TOR, and FREJA can be powered down permanently. If ASGARD blades eventually become the long-term compute platform, OPNsense can move to a VM on a blade at that point — but not before the blades are stable and trusted. Don&amp;rsquo;t consolidate the router onto experimental infrastructure.&lt;/p&gt;</description></item><item><title>BGP</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/networking/bgp/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/networking/bgp/</guid><description>&lt;p&gt;BGP (Border Gateway Protocol) is the routing protocol that holds the internet together. Every major network operator uses it to advertise which IP prefixes they own and to exchange that information with peers. In a homelab context the scale is different but the mechanics are the same.&lt;/p&gt;
&lt;p&gt;BGP is a path-vector protocol: each router advertises routes along with the path (sequence of ASNs) taken to reach them. Routers choose the best path based on a set of attributes and policy rules, then advertise that path to their peers.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="ebgp-vs-ibgp"&gt;eBGP vs iBGP
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;eBGP&lt;/strong&gt; (external BGP) — sessions between routers in &lt;em&gt;different&lt;/em&gt; autonomous systems. Each party has a different ASN. This is what you configure between VyOS and OPNsense, and between VyOS and MetalLB.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;iBGP&lt;/strong&gt; (internal BGP) — sessions between routers in the &lt;em&gt;same&lt;/em&gt; autonomous system. Used inside large networks to distribute external routes internally. Not relevant for a basic homelab setup.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="asns-for-private-use"&gt;ASNs for private use
&lt;/h2&gt;&lt;p&gt;Autonomous System Numbers in the range &lt;strong&gt;64512–65534&lt;/strong&gt; are reserved for private use (&lt;a class="link" href="https://www.rfc-editor.org/rfc/rfc6996" target="_blank" rel="noopener"
 &gt;RFC 6996&lt;/a&gt;) — the same concept as RFC 1918 private IP addresses. Assign one to each participant in your BGP topology:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Participant&lt;/th&gt;
 &lt;th&gt;Example ASN&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;OPNsense&lt;/td&gt;
 &lt;td&gt;64512&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;VyOS&lt;/td&gt;
 &lt;td&gt;64513&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;MetalLB (Talos cluster)&lt;/td&gt;
 &lt;td&gt;64514&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="why-bgp-for-kubernetes-loadbalancer-ips"&gt;Why BGP for Kubernetes LoadBalancer IPs
&lt;/h2&gt;&lt;p&gt;Kubernetes &lt;code&gt;LoadBalancer&lt;/code&gt; services need something external to the cluster to route traffic to them. In a cloud environment the cloud provider handles this automatically. On bare metal you need to do it yourself.&lt;/p&gt;
&lt;p&gt;Two common approaches with &lt;a class="link" href="https://metallb.universe.tf" target="_blank" rel="noopener"
 &gt;MetalLB&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;L2 mode&lt;/strong&gt; — MetalLB uses ARP (IPv4) or NDP (IPv6) to announce service IPs directly on the LAN. Simple to set up. Limitations: only one node handles traffic for each IP at a time (no real load balancing at the network layer), and the service IP must be in the same subnet as the nodes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a class="link" href="https://metallb.universe.tf/concepts/bgp/" target="_blank" rel="noopener"
 &gt;BGP mode&lt;/a&gt;&lt;/strong&gt; — MetalLB establishes a BGP session with an upstream router (VyOS, for example) and announces service IPs as /32 prefixes. The router learns the route and can ECMP across all nodes that are advertising it. More correct: actual load balancing, no subnet constraint, clean separation between cluster and network layer.&lt;/p&gt;
&lt;p&gt;The tradeoff is that BGP mode requires a BGP-capable router in the path, which is why VyOS exists in this topology.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="testing-with-a-real-bgp-network"&gt;Testing with a real BGP network
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://dn42.eu" target="_blank" rel="noopener"
 &gt;DN42&lt;/a&gt; is a community-run experimental network that simulates the real internet using actual BGP, DNS, and whois infrastructure. Participants connect via WireGuard or other tunnels and peer with each other using real BGP sessions and real (private-range) ASNs. A good way to practice BGP outside the homelab without needing a production ASN.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="related"&gt;Related
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/networking/vyos/" &gt;VyOS&lt;/a&gt; — the BGP peer router&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/vyos-bgp/" &gt;VyOS + BGP experiment&lt;/a&gt; — the actual setup in this homelab&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Ceph</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/cloud-infrastructure/ceph/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/cloud-infrastructure/ceph/</guid><description>&lt;p&gt;Ceph is an open-source distributed storage platform providing object, block, and file storage in a single unified system. It runs across multiple nodes and has no single point of failure.&lt;/p&gt;
&lt;p&gt;The core idea: data is not stored on specific disks on specific nodes. Instead, the CRUSH algorithm distributes data across all available OSDs (Object Storage Daemons) based on a placement map. Add nodes and the cluster rebalances automatically. Lose a node and Ceph re-replicates from surviving copies without operator intervention.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="storage-types"&gt;Storage types
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Type&lt;/th&gt;
 &lt;th&gt;Interface&lt;/th&gt;
 &lt;th&gt;Typical use&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Block (RBD)&lt;/td&gt;
 &lt;td&gt;Kernel block device / iSCSI&lt;/td&gt;
 &lt;td&gt;Kubernetes PVCs, VM disks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Object (RGW)&lt;/td&gt;
 &lt;td&gt;S3-compatible API&lt;/td&gt;
 &lt;td&gt;Backups, artifacts, media&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;File (CephFS)&lt;/td&gt;
 &lt;td&gt;POSIX filesystem / NFS&lt;/td&gt;
 &lt;td&gt;Shared filesystems, home dirs&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For Kubernetes workloads, RBD block storage via a StorageClass is the common path.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="components"&gt;Components
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;MON (Monitor)&lt;/strong&gt; — maintains the cluster map; quorum-based, needs an odd number (typically 3 or 5). Not a data path.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OSD (Object Storage Daemon)&lt;/strong&gt; — one per disk; handles actual data reads/writes and replication.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MGR (Manager)&lt;/strong&gt; — collects metrics, hosts the dashboard, runs modules (balancer, alertmanager, etc.).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MDS (Metadata Server)&lt;/strong&gt; — only required for CephFS; manages the filesystem namespace.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="single-node-constraint"&gt;Single-node constraint
&lt;/h2&gt;&lt;p&gt;A single-node Ceph cluster can be made to run (&lt;code&gt;allowMultiplePerNode: true&lt;/code&gt; in Rook, replication &lt;code&gt;size: 1&lt;/code&gt;), but it provides no actual redundancy. There is nothing to replicate to. This is fine for testing concepts; it is not a valid storage setup for anything you care about.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="related"&gt;Related
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.ceph.com/" target="_blank" rel="noopener"
 &gt;Ceph documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/rook/" &gt;Rook&lt;/a&gt; — Kubernetes operator that manages Ceph clusters inside K8s&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cloud-infrastructure/proxmox/" &gt;Proxmox&lt;/a&gt; — Ceph is a native storage backend in Proxmox clusters&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/rook-ceph/" &gt;Rook + Ceph in the homelab&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Rook</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/rook/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/rook/</guid><description>&lt;p&gt;Rook is a Kubernetes operator that deploys and manages storage systems — primarily &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cloud-infrastructure/ceph/" &gt;Ceph&lt;/a&gt; — as native Kubernetes resources. The distinction: Ceph is the storage system; Rook is the Kubernetes wiring around it.&lt;/p&gt;
&lt;p&gt;Without Rook you would run Ceph manually (or via &lt;code&gt;cephadm&lt;/code&gt;) and then configure the Kubernetes CSI driver separately. Rook collapses that into CRDs and handles the full lifecycle: deployment, configuration, expansion, upgrades, and failure recovery.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="how-it-works"&gt;How it works
&lt;/h2&gt;&lt;p&gt;Rook introduces several CRDs:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CephCluster&lt;/strong&gt; — declares the cluster: which nodes, which disks to use as OSDs, replication settings.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CephBlockPool&lt;/strong&gt; — defines a Ceph pool (replication factor, failure domain). Maps to an RBD pool.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;StorageClass&lt;/strong&gt; — references a CephBlockPool and enables dynamic PVC provisioning. Kubernetes workloads request storage; Rook/Ceph fulfils it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CephFilesystem&lt;/strong&gt; — deploys CephFS + MDS for POSIX shared filesystem access.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CephObjectStore&lt;/strong&gt; — deploys the Ceph RGW S3-compatible object storage gateway.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="typical-install-sequence"&gt;Typical install sequence
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/crds.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/common.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/operator.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then apply a &lt;code&gt;CephCluster&lt;/code&gt; manifest declaring your storage topology, followed by &lt;code&gt;CephBlockPool&lt;/code&gt; and &lt;code&gt;StorageClass&lt;/code&gt; for PVC support.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="single-node-considerations"&gt;Single-node considerations
&lt;/h2&gt;&lt;p&gt;A single-node setup requires &lt;code&gt;allowMultiplePerNode: true&lt;/code&gt; in the &lt;code&gt;CephCluster&lt;/code&gt; spec (MONs, MGR, and OSDs all land on the same node). Replication &lt;code&gt;size&lt;/code&gt; must be set to &lt;code&gt;1&lt;/code&gt; — there is nowhere else to replicate. This works for experimentation; it is not a production configuration. See &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cloud-infrastructure/ceph/" &gt;Ceph&lt;/a&gt; for details on the replication model.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="related"&gt;Related
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://rook.io/docs/rook/latest/" target="_blank" rel="noopener"
 &gt;Rook documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cloud-infrastructure/ceph/" &gt;Ceph&lt;/a&gt; — the underlying storage system&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/rook-ceph/" &gt;Rook + Ceph in the homelab&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Rook + Ceph on ODEN</title><link>https://backend-engineering-strategy-tools.github.io/site/homelab/rook-ceph/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/homelab/rook-ceph/</guid><description>&lt;p&gt;Attempting to add persistent block storage to the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/inventory/systems/" &gt;ODEN&lt;/a&gt; single-node Talos cluster using &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/rook/" &gt;Rook&lt;/a&gt; and &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cloud-infrastructure/ceph/" &gt;Ceph&lt;/a&gt;. This did not fully succeed — the setup reached the point of a bound PVC and a working write test, but the cluster was not left in a clean stable state. Notes are here for completeness.&lt;/p&gt;
&lt;p&gt;This builds on the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/talos-omni/" &gt;Talos cluster setup on ODEN&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardware"&gt;Hardware
&lt;/h2&gt;&lt;p&gt;ODEN has five storage devices:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Device&lt;/th&gt;
 &lt;th&gt;Type&lt;/th&gt;
 &lt;th&gt;Size&lt;/th&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/dev/sdb&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Kingston SA400S3 SSD (SATA)&lt;/td&gt;
 &lt;td&gt;120 GB&lt;/td&gt;
 &lt;td&gt;Boot disk — leave alone&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/dev/nvme0n1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Samsung 970 EVO NVMe&lt;/td&gt;
 &lt;td&gt;500 GB&lt;/td&gt;
 &lt;td&gt;OSD&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/dev/sdc&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Kingston SA400S3 SSD (SATA)&lt;/td&gt;
 &lt;td&gt;120 GB&lt;/td&gt;
 &lt;td&gt;OSD&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/dev/sdd&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Kingston SA400S3 SSD (SATA)&lt;/td&gt;
 &lt;td&gt;120 GB&lt;/td&gt;
 &lt;td&gt;OSD&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/dev/sde&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Kingston SA400S3 SSD (SATA)&lt;/td&gt;
 &lt;td&gt;120 GB&lt;/td&gt;
 &lt;td&gt;OSD&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Do not add &lt;code&gt;/dev/sdb&lt;/code&gt; to Ceph. It is the boot disk.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-1--install-the-rook-operator"&gt;Step 1 — Install the Rook operator
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/crds.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/common.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.17.9/deploy/examples/operator.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Wait for the operator pod to be running in &lt;code&gt;rook-ceph&lt;/code&gt; namespace before continuing.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-2--cephcluster-single-node"&gt;Step 2 — CephCluster (single-node)
&lt;/h2&gt;&lt;p&gt;Single-node requires &lt;code&gt;allowMultiplePerNode: true&lt;/code&gt; and explicit disk selection. The cluster-test example from the Rook repo is a reasonable starting point:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;storage&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;useAllNodes&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;nodes&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;192.168.1.171&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;devices&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;nvme0n1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;sdc&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;sdd&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;sde&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Reference: &lt;a class="link" href="https://github.com/rook/rook/blob/release-1.17/deploy/examples/cluster-test.yaml" target="_blank" rel="noopener"
 &gt;cluster-test.yaml&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-3--cephblockpool-and-storageclass"&gt;Step 3 — CephBlockPool and StorageClass
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ceph.rook.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;CephBlockPool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;replicapool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;rook-ceph&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;replicated&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;size&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;storage.k8s.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;StorageClass&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;rook-ceph-block&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;provisioner&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;rook-ceph.rbd.csi.ceph.com&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;parameters&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;clusterID&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;rook-ceph&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;pool&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;replicapool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;imageFormat&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;2&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;imageFeatures&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;layering&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;reclaimPolicy&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Delete&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="step-4--pvc-test"&gt;Step 4 — PVC test
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;PersistentVolumeClaim&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;test-pvc&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;accessModes&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;ReadWriteOnce&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;storageClassName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;rook-ceph-block&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;requests&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;storage&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;10Gi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;PVC reached &lt;code&gt;Bound&lt;/code&gt;. A BusyBox pod mounting it could write to &lt;code&gt;/mnt&lt;/code&gt;. The Ceph dashboard (&lt;code&gt;kubectl -n rook-ceph port-forward svc/rook-ceph-mgr-dashboard 7000:7000&lt;/code&gt;) showed OSDs active and the pool present.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="what-did-not-work"&gt;What did not work
&lt;/h2&gt;&lt;p&gt;The cluster ran but was not left stable. Single-node Ceph produces health warnings by design (no redundancy, no failure domain separation). More importantly, the setup was not revisited after initial testing and there are unresolved questions about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CSI driver behaviour on Talos (Talos has specific requirements for CSI socket paths)&lt;/li&gt;
&lt;li&gt;Whether the dashboard warnings were cosmetic or indicated real issues&lt;/li&gt;
&lt;li&gt;Long-term stability under actual workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is left as a draft until there is time to run it properly — ideally on more than one node.&lt;/p&gt;</description></item><item><title>Talos Linux + Omni</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/talos/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/talos/</guid><description>&lt;p&gt;Talos Linux is an immutable, minimal operating system designed specifically for running Kubernetes. There is no shell, no SSH, no package manager. The entire OS is read-only and managed via a gRPC API (&lt;code&gt;talosctl&lt;/code&gt;). Node configuration is declarative YAML applied over the API; changes that require a reboot take effect on the next boot.&lt;/p&gt;
&lt;p&gt;The tradeoff is rigidity for operational simplicity. You cannot log into a Talos node and fix something by hand. In return, nodes are deterministic, reproducible, and there is no configuration drift.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="comparison-to-other-installs"&gt;Comparison to other installs
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Method&lt;/th&gt;
 &lt;th&gt;OS&lt;/th&gt;
 &lt;th&gt;Config&lt;/th&gt;
 &lt;th&gt;Mutable&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;kubeadm&lt;/td&gt;
 &lt;td&gt;Ubuntu / RHEL / etc&lt;/td&gt;
 &lt;td&gt;Manual + scripts&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;k3s&lt;/td&gt;
 &lt;td&gt;Any Linux&lt;/td&gt;
 &lt;td&gt;Minimal&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Talos&lt;/td&gt;
 &lt;td&gt;Talos Linux&lt;/td&gt;
 &lt;td&gt;Declarative API&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;k3s and kubeadm give you more flexibility and a familiar Linux environment. Talos is the right choice when you want the cluster nodes to behave like appliances — provisioned, never touched.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="omni"&gt;Omni
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://omni.siderolabs.com" target="_blank" rel="noopener"
 &gt;Omni&lt;/a&gt; is a cluster management platform by Sidero Labs built on top of Talos. It handles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Node registration (nodes boot and phone home to the Omni API)&lt;/li&gt;
&lt;li&gt;Cluster creation and machine assignment&lt;/li&gt;
&lt;li&gt;Kubernetes upgrades (one action in the UI)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;talosctl&lt;/code&gt; and &lt;code&gt;kubeconfig&lt;/code&gt; access via the Omni CLI&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Nodes register via a join token embedded in the kernel command line at PXE boot time. The cluster runs on your hardware; Omni only manages the control plane.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hobby tier&lt;/strong&gt;: 10 nodes, non-commercial use, free. Sidero Labs also offers a self-hosted version.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="image-factory"&gt;Image Factory
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://factory.talos.dev" target="_blank" rel="noopener"
 &gt;factory.talos.dev&lt;/a&gt; generates custom Talos images with hardware extensions included. Notable extensions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;siderolabs/bnx2&lt;/code&gt; — Broadcom NetXtreme II (BCM5708/BCM5709) NIC firmware, required on some enterprise hardware (IBM x3550 M3, HP Gen 6/7 blades)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;siderolabs/intel-ucode&lt;/code&gt; — Intel microcode updates&lt;/li&gt;
&lt;li&gt;&lt;code&gt;siderolabs/nvidia-*&lt;/code&gt; — NVIDIA GPU support&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The factory produces both ISO and PXE artifacts (kernel + initramfs). See the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/hardware/hardware-provisioning/ipxe-opnsense/" &gt;OPNSense + iPXE reference&lt;/a&gt; for how to serve these over TFTP.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="supporting-sidero-labs"&gt;Supporting Sidero Labs
&lt;/h2&gt;&lt;p&gt;Talos and Omni are built by &lt;a class="link" href="https://github.com/siderolabs" target="_blank" rel="noopener"
 &gt;Sidero Labs&lt;/a&gt; — good people doing good work. I sponsor them via &lt;a class="link" href="https://github.com/sponsors/siderolabs" target="_blank" rel="noopener"
 &gt;GitHub Sponsors&lt;/a&gt; at the fanboi tier.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="relevant-links"&gt;Relevant links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://www.talos.dev/latest/" target="_blank" rel="noopener"
 &gt;Talos Linux docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://omni.siderolabs.com/docs" target="_blank" rel="noopener"
 &gt;Omni docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://factory.talos.dev" target="_blank" rel="noopener"
 &gt;Image factory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/siderolabs" target="_blank" rel="noopener"
 &gt;Sidero Labs GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/sponsors/siderolabs" target="_blank" rel="noopener"
 &gt;Sponsor Sidero Labs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Talos Linux in the homelab via Omni</title><link>https://backend-engineering-strategy-tools.github.io/site/homelab/talos-omni/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/homelab/talos-omni/</guid><description>&lt;p&gt;Getting &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/talos/" &gt;Talos Linux&lt;/a&gt; running in the homelab via PXE boot and &lt;a class="link" href="https://omni.siderolabs.com" target="_blank" rel="noopener"
 &gt;Omni&lt;/a&gt; — starting with &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/inventory/systems/" &gt;ODEN (SYS-005)&lt;/a&gt;, an IBM System x3550 M3. The full OPNSense + iPXE configuration lives in the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/hardware/hardware-provisioning/ipxe-opnsense/" &gt;reference note&lt;/a&gt;; this covers what actually happened, in order.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="setup"&gt;Setup
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Hardware&lt;/strong&gt;: ODEN (SYS-005) — IBM x3550 M3, Broadcom BNX2 NICs (BCM5709)&lt;br&gt;
&lt;strong&gt;Network&lt;/strong&gt;: OPNSense router on LAN; ODEN connected via one NIC (start with one — removes variables)&lt;br&gt;
&lt;strong&gt;Target&lt;/strong&gt;: Single-node Talos cluster registered in Omni&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-1--opnsense-dhcp-and-tftp"&gt;Step 1 — OPNSense DHCP and TFTP
&lt;/h2&gt;&lt;p&gt;Enable network booting on the LAN DHCP server and download the iPXE binaries to the TFTP root. Full field values in the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/hardware/hardware-provisioning/ipxe-opnsense/" &gt;iPXE reference note&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One thing to check first: if you previously set DHCP options 66 and 67 as raw additional options, remove them. OPNSense&amp;rsquo;s built-in network boot fields do the same job and having both causes conflicts.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-2--ipxe-boot-script"&gt;Step 2 — iPXE boot script
&lt;/h2&gt;&lt;p&gt;Write &lt;code&gt;default.ipxe&lt;/code&gt; to &lt;code&gt;/usr/local/tftp/&lt;/code&gt;. Include a boot menu with at minimum a Talos option and a shell fallback — the shell is genuinely useful when something fails and you need to debug from the boot prompt. Full script in the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/hardware/hardware-provisioning/ipxe-opnsense/" &gt;reference note&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Talos entry in the menu needs the Omni join token from your Omni console. Generate a join link in Omni; it provides the API endpoint, token, and SideroLink addresses.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-3--talos-kernel-and-initramfs"&gt;Step 3 — Talos kernel and initramfs
&lt;/h2&gt;&lt;p&gt;The standard Talos release binaries do not include BNX2 firmware. Since around Talos 1.6 those drivers are available as extensions but not in the mainline image. Without them, the node boots, fails to initialise the NIC, and produces &lt;code&gt;can't load firmware bnx2&lt;/code&gt; errors — everything else looks fine until you notice the node never gets an IP and never appears in Omni.&lt;/p&gt;
&lt;p&gt;Fix: generate a custom image at &lt;a class="link" href="https://factory.talos.dev" target="_blank" rel="noopener"
 &gt;factory.talos.dev&lt;/a&gt; with the &lt;code&gt;siderolabs/bnx2&lt;/code&gt; extension included, then download the PXE kernel and initramfs from the factory URL. Commands in the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/hardware/hardware-provisioning/ipxe-opnsense/" &gt;reference note&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-4--first-boot"&gt;Step 4 — First boot
&lt;/h2&gt;&lt;p&gt;Go into BIOS and set the boot device to PXE. On the M3, UEFI boot with &lt;code&gt;ipxe.efi&lt;/code&gt; fails silently — the image is too large for the NIC&amp;rsquo;s PXE memory buffer. Switch to legacy/BIOS mode and use &lt;code&gt;undionly.kpxe&lt;/code&gt; instead.&lt;/p&gt;
&lt;p&gt;The machine takes a while to POST and boot. This is normal for old enterprise hardware. It is also why demos typically use virtual machines.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-5--static-ip"&gt;Step 5 — Static IP
&lt;/h2&gt;&lt;p&gt;After the BNX2 fix the node boots Talos successfully but still does not appear in Omni. The DHCP assignment for the node is not being picked up during early boot. Workaround: add a static IP via kernel params in the iPXE script:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-ipxe" data-lang="ipxe"&gt;ip=192.168.1.171::192.168.1.1:255.255.255.0::eth0:off
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Add this to the &lt;code&gt;kernel&lt;/code&gt; line in the Talos iPXE entry. The format is &lt;code&gt;ip=&amp;lt;client-ip&amp;gt;::&amp;lt;gateway&amp;gt;:&amp;lt;netmask&amp;gt;::&amp;lt;iface&amp;gt;:off&lt;/code&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-6--omni-registration"&gt;Step 6 — Omni registration
&lt;/h2&gt;&lt;p&gt;With a working NIC and an IP, the node contacts the Omni API using the join token. It appears in the Omni console as an unallocated machine. Create a cluster, assign the machine, and let Omni configure it. The initial cluster bootstrap takes a few minutes.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="step-7--fix-the-bios-boot-order"&gt;Step 7 — Fix the BIOS boot order
&lt;/h2&gt;&lt;p&gt;After the cluster is up, change the BIOS boot order so the disk is first. If PXE remains the primary boot device, every reboot drops the machine back to the iPXE menu instead of booting the installed Talos. Discovered on first reboot. Worth noting it here so you don&amp;rsquo;t make the same trip to the garage.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="upgrade"&gt;Upgrade
&lt;/h2&gt;&lt;p&gt;Omni makes single-node upgrades straightforward: open the cluster in the Omni console, select a new Talos version, apply. The node reboots once. Single-node means the cluster has downtime during the reboot; that is expected. Nothing else to do.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="result"&gt;Result
&lt;/h2&gt;&lt;p&gt;Single-node Kubernetes cluster running on ODEN, managed via Omni. &lt;code&gt;kubectl&lt;/code&gt; and &lt;code&gt;talosctl&lt;/code&gt; access via the Omni CLI. Next experiment: &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/rook-ceph/" &gt;Rook + Ceph&lt;/a&gt; for persistent storage.&lt;/p&gt;</description></item><item><title>Kubernetes Across the Stack</title><link>https://backend-engineering-strategy-tools.github.io/site/projects/kubernetes-stack/</link><pubDate>Mon, 16 Mar 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/projects/kubernetes-stack/</guid><description>&lt;p&gt;A documented comparison of running Kubernetes across every major hosting model — cloud managed, self-managed on cloud, private cloud, and bare metal at home. The goal is a honest, practical reference for each environment: what it costs you in time and money, where the rough edges are, and how the networking story differs between them.&lt;/p&gt;
&lt;p&gt;The thread running through all of it is &lt;a class="link" href="https://www.talos.dev/" target="_blank" rel="noopener"
 &gt;Talos Linux&lt;/a&gt; — an immutable, API-driven OS built specifically for Kubernetes. No SSH, no shell, no config drift. The same OS everywhere means the operational model stays consistent regardless of what is running underneath.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Environment&lt;/th&gt;
 &lt;th&gt;Approach&lt;/th&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;OpenStack — &lt;a class="link" href="https://cleura.com/" target="_blank" rel="noopener"
 &gt;Cleura&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Talos &amp;amp; Terraform&lt;/td&gt;
 &lt;td&gt;draft exists&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;OpenStack — &lt;a class="link" href="https://cleura.com/" target="_blank" rel="noopener"
 &gt;Cleura&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Talos, with Omni&lt;/td&gt;
 &lt;td&gt;maybe ?&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;OpenStack — &lt;a class="link" href="https://elastx.se/" target="_blank" rel="noopener"
 &gt;ElastX&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Talos &amp;amp; Terraform&lt;/td&gt;
 &lt;td&gt;draft exists&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;OpenStack — &lt;a class="link" href="https://elastx.se/" target="_blank" rel="noopener"
 &gt;ElastX&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Talos, with Omni&lt;/td&gt;
 &lt;td&gt;maybe ?&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Homelab — bare metal&lt;/td&gt;
 &lt;td&gt;Talos + Pixieboot + Omni&lt;/td&gt;
 &lt;td&gt;draft exists&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Homelab — bare metal&lt;/td&gt;
 &lt;td&gt;Talos + Pixieboot without Omni&lt;/td&gt;
 &lt;td&gt;maybe ?&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Homelab — OpenStack&lt;/td&gt;
 &lt;td&gt;OpenStack on bare metal, Talos running on top&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;(stretch)&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Homelab — OpenStack&lt;/td&gt;
 &lt;td&gt;Talos on bare metal, OpenStack inside cluster&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;(stretch)&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AWS&lt;/td&gt;
 &lt;td&gt;Talos on EC2&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;(stretch)&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Azure&lt;/td&gt;
 &lt;td&gt;Talos on VMs&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;(stretch)&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GCP&lt;/td&gt;
 &lt;td&gt;Talos on Compute Engine&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;(stretch)&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Stretch goals&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;AWS, Azure, GCP — same Talos approach, different underlying infrastructure. Interesting eventually, but not the priority.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Omni&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="link" href="https://omni.siderolabs.com/" target="_blank" rel="noopener"
 &gt;Omni&lt;/a&gt; is Sidero&amp;rsquo;s managed control plane for Talos clusters — worth documenting both with and without it. Without Omni gives you the full picture of what Talos management looks like manually; with Omni shows what the managed layer buys you.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Homelab provisioning&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Nodes provisioned via Pixieboot — no USB sticks, no manual installations. A node powers on, boots from the network, and registers. The goal is a fully reproducible cluster from scratch with minimal human steps.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scope&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cluster provisioning and bootstrap for each environment&lt;/li&gt;
&lt;li&gt;Networking — CNI choices, ingress, cross-cluster connectivity&lt;/li&gt;
&lt;li&gt;Storage — what you get managed vs what you have to bring yourself&lt;/li&gt;
&lt;li&gt;Operational differences — upgrades, node management, observability&lt;/li&gt;
&lt;li&gt;Cost and trade-off summary across environments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Making it usable&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Getting a cluster running is the easy part. Making it usable is where environments diverge. Each environment needs an answer for ingress, DNS, and storage — and the answer varies significantly depending on what the underlying platform provides.&lt;/p&gt;
&lt;p&gt;On managed cloud you can lean on load balancers and block storage from the provider. On OpenStack you have those options if the provider exposes them. On bare metal at home you are on your own — MetalLB or similar for load balancer IPs, a local DNS solution, and either local storage or something like Rook/Ceph. Same Kubernetes, very different operational story underneath.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Notes exist in various states — pulling them together, testing, and documenting properly is the work.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Minecraft Server</title><link>https://backend-engineering-strategy-tools.github.io/site/projects/minecraft/</link><pubDate>Mon, 16 Mar 2026 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/projects/minecraft/</guid><description>&lt;p&gt;Building and running a Minecraft server with the kids — hosted in the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/homelab/" &gt;homelab&lt;/a&gt; on bare metal rather than paying for a managed service. Part infrastructure project, part excuse to learn together.&lt;/p&gt;
&lt;p&gt;The longer-term goal is a proper setup: automated backups, world persistence across restarts, maybe some automation around starting and stopping the server on demand.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Notes and repo to follow.&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;More to come.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Argo</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-project/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-project/</guid><description>&lt;p&gt;The Argo project is a suite of Kubernetes-native tools for running and managing workloads and deployments. Each tool solves a distinct problem and they compose well together, but they are independent — you can use any one without the others. All four are CNCF graduated or incubating projects.&lt;/p&gt;
&lt;h2 id="argocd"&gt;ArgoCD
&lt;/h2&gt;&lt;p&gt;GitOps continuous delivery. ArgoCD watches a Git repository and continuously reconciles the cluster state to match it — any drift is detected and corrected automatically. It is the CD half of a modern Kubernetes delivery pipeline: a CI system builds and pushes an image, ArgoCD detects the new tag and rolls it out. See the &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-cd/" &gt;ArgoCD&lt;/a&gt; note for a full walkthrough including App of Apps, bootstrapping, and self-management.&lt;/p&gt;
&lt;h2 id="argo-workflows"&gt;Argo Workflows
&lt;/h2&gt;&lt;p&gt;A general-purpose workflow execution engine for Kubernetes. Workflows are CRDs that define DAGs or sequential step graphs — each step runs in a container, with outputs passed as artifacts or parameters to downstream steps. Used for CI pipelines, ML training jobs, data processing, and batch workloads. Where Tekton models CI-specific primitives (Tasks, Pipelines), Argo Workflows is lower-level and more flexible: any containerised workload that has dependencies between steps fits the model.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Workflow&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;entrypoint&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;build-test&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;templates&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;build-test&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;dag&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;tasks&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;build&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;template&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;run-step&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;arguments&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;parameters&lt;/span&gt;: [{&lt;span style="color:#f92672"&gt;name: cmd, value&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;make build&amp;#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;test&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;dependencies&lt;/span&gt;: [&lt;span style="color:#ae81ff"&gt;build]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;template&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;run-step&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;arguments&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;parameters&lt;/span&gt;: [{&lt;span style="color:#f92672"&gt;name: cmd, value&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;make test&amp;#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;run-step&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;inputs&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;parameters&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;cmd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;container&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;image&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;golang:1.22&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;command&lt;/span&gt;: [&lt;span style="color:#ae81ff"&gt;sh, -c]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;args&lt;/span&gt;: [&lt;span style="color:#e6db74"&gt;&amp;#34;{{inputs.parameters.cmd}}&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="argo-rollouts"&gt;Argo Rollouts
&lt;/h2&gt;&lt;p&gt;Progressive delivery for Kubernetes. Where a standard Kubernetes &lt;code&gt;Deployment&lt;/code&gt; does a rolling update (replace pods gradually), Argo Rollouts adds canary and blue-green strategies with analysis gates. A canary rollout shifts a percentage of traffic to the new version, runs automated analysis (checking metrics from Prometheus, Datadog, or similar), and either promotes fully or rolls back based on the result. This makes deployments measurably safer — a bad release fails the analysis gate before it reaches 100% of traffic.&lt;/p&gt;
&lt;h2 id="argo-events"&gt;Argo Events
&lt;/h2&gt;&lt;p&gt;Event-driven automation. Argo Events defines &lt;code&gt;EventSources&lt;/code&gt; (sensors that listen for events — git pushes, S3 uploads, Kafka messages, webhooks, cron schedules) and &lt;code&gt;Sensors&lt;/code&gt; (triggers that respond to those events by creating Argo Workflows, sending notifications, or calling other systems). It is the event bus that ties the rest of the Argo stack together: a git push fires an EventSource, a Sensor detects it and creates a Workflow, the Workflow builds and tests, ArgoCD picks up the new image and rolls it out.&lt;/p&gt;
&lt;h2 id="kargo"&gt;Kargo
&lt;/h2&gt;&lt;p&gt;A newer tool from Akuity (the company behind ArgoCD) that solves multi-stage GitOps promotion. ArgoCD is good at keeping one environment in sync with a Git ref — but promoting a release through dev → staging → production requires updating that ref in each environment and coordinating the sequence. Kargo models this as &lt;code&gt;Stages&lt;/code&gt; with &lt;code&gt;FreightRequests&lt;/code&gt; — a release is a piece of freight that must pass through each stage in order, with optional approval gates between them. It sits above ArgoCD in the stack and handles the promotion logic that ArgoCD deliberately leaves out.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://argoproj.github.io/" target="_blank" rel="noopener"
 &gt;Argo project&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://argoproj.github.io/argo-workflows/" target="_blank" rel="noopener"
 &gt;Argo Workflows documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://argoproj.github.io/argo-rollouts/" target="_blank" rel="noopener"
 &gt;Argo Rollouts documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://argoproj.github.io/argo-events/" target="_blank" rel="noopener"
 &gt;Argo Events documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.kargo.io/" target="_blank" rel="noopener"
 &gt;Kargo documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>ArgoCD</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-cd/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-cd/</guid><description>&lt;p&gt;&lt;img alt="ArgoCD" class="gallery-image" data-flex-basis="240px" data-flex-grow="100" height="268" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-cd/argo.png" width="268"&gt;&lt;/p&gt;
&lt;p&gt;You deploy with &lt;code&gt;kubectl apply&lt;/code&gt; from your laptop. It works. Then a colleague edits a deployment directly on the cluster to fix something urgent. Now what is running no longer matches what is in Git. That is drift, and it is silent — until something breaks in production and nobody can explain why the live state differs from the last known good config.&lt;/p&gt;
&lt;p&gt;So you use ArgoCD. Git becomes the single source of truth. Every change flows through a pull request, gets reviewed, and syncs to the cluster automatically. If anyone touches a resource directly, ArgoCD detects the divergence and overrides it back. The cluster converges to Git, always.&lt;/p&gt;
&lt;p&gt;This is GitOps: the deployment pipeline is driven by Git state, not by humans running commands.&lt;/p&gt;
&lt;h2 id="ci-vs-cd"&gt;CI vs CD
&lt;/h2&gt;&lt;p&gt;A useful mental separation: CI and CD are different concerns and should be handled by different tools.&lt;/p&gt;
&lt;p&gt;&lt;img alt="CI/CD flow" class="gallery-image" data-flex-basis="426px" data-flex-grow="177" height="540" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-cd/cicd_flow.png" srcset="https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-cd/cicd_flow_hu_6a2bb36163cfd265.png 800w, https://backend-engineering-strategy-tools.github.io/site/public-notes/cicd/argo-cd/cicd_flow.png 960w" width="960"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CI&lt;/strong&gt; (Continuous Integration) is about code — build, test, produce an artifact (a container image). A pipeline in GitHub Actions, Tekton, or Jenkins owns this. It ends with an image pushed to a registry.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CD&lt;/strong&gt; (Continuous Delivery) is about cluster state — take that artifact and make sure the right version is running in the right environment. ArgoCD owns this. It watches Git, not the CI pipeline.&lt;/p&gt;
&lt;p&gt;Keeping them separate means your deployment logic is not buried inside a CI pipeline that developers need to understand and maintain. ArgoCD runs in the cluster and continuously reconciles state. It is always on.&lt;/p&gt;
&lt;h2 id="applications"&gt;Applications
&lt;/h2&gt;&lt;p&gt;ArgoCD manages &lt;strong&gt;Applications&lt;/strong&gt; — a CRD that maps a Git source to a cluster destination:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Application&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argocd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;project&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;source&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;repoURL&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;https://github.com/myorg/my-app-config&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetRevision&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;main&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;path&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;manifests/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;destination&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;server&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;https://kubernetes.default.svc&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;syncPolicy&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;automated&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;prune&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;selfHeal&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;prune: true&lt;/code&gt; — resources removed from Git are deleted from the cluster.
&lt;code&gt;selfHeal: true&lt;/code&gt; — any manual change to the cluster is immediately reverted.&lt;/p&gt;
&lt;h2 id="app-of-apps"&gt;App of Apps
&lt;/h2&gt;&lt;p&gt;Managing dozens of Applications individually gets unwieldy. The &lt;strong&gt;App of Apps&lt;/strong&gt; pattern solves this: one root Application whose source is a directory of other Application manifests. ArgoCD applies the root, which creates all the child Applications, which in turn sync their own workloads. One repo, one sync, everything deployed.&lt;/p&gt;
&lt;h2 id="sync-strategies"&gt;Sync strategies
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Strategy&lt;/th&gt;
 &lt;th&gt;Behaviour&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Automated&lt;/td&gt;
 &lt;td&gt;ArgoCD syncs on every Git change automatically&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Manual&lt;/td&gt;
 &lt;td&gt;Changes are detected and shown as OutOfSync — a human triggers the sync&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Automated sync with selfHeal is the purest GitOps posture. Manual sync is useful for production environments where you want a human approval step before changes roll out.&lt;/p&gt;
&lt;h2 id="rollback"&gt;Rollback
&lt;/h2&gt;&lt;p&gt;Because every state the cluster has ever been in corresponds to a Git commit, rollback is a &lt;code&gt;git revert&lt;/code&gt; — or clicking &amp;ldquo;Sync to previous revision&amp;rdquo; in the ArgoCD UI. No special tooling, no runbooks, just Git history.&lt;/p&gt;
&lt;h2 id="repo-structure"&gt;Repo structure
&lt;/h2&gt;&lt;p&gt;A layout that works well in practice separates ArgoCD&amp;rsquo;s own installation from the workloads it manages:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;cluster/&amp;lt;cluster&amp;gt;/
 cfg/argo-cd/ # ArgoCD install only — CRDs and Helm values
 app-of-apps/ # Root Application, Projects, app definitions
 overlay/&amp;lt;app&amp;gt;/ # Per-cluster Kustomize patches, secret/config overrides

external/ # Reusable base manifests shared across clusters
internal/ # Internal app base manifests
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The key separations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;ArgoCD install is isolated&lt;/strong&gt; in &lt;code&gt;cfg/argo-cd&lt;/code&gt; to avoid recursive install loops and make upgrades predictable. ArgoCD is not managing its own installation yet at this point — that comes later.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;App-of-Apps lives separately&lt;/strong&gt; from the install. Once ArgoCD is running, applying &lt;code&gt;app-of-apps/&lt;/code&gt; bootstraps the entire cluster in one step.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Base vs overlay&lt;/strong&gt; — &lt;code&gt;external/&lt;/code&gt; and &lt;code&gt;internal/&lt;/code&gt; define &lt;em&gt;what an app is&lt;/em&gt;. The cluster overlay defines &lt;em&gt;how it runs in this environment&lt;/em&gt;. Cluster-specific concerns (resource limits, replica counts, secret refs) stay in the cluster directory and never bleed into the base.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="bootstrapping-a-cluster"&gt;Bootstrapping a cluster
&lt;/h2&gt;&lt;p&gt;There is a chicken-and-egg problem: ArgoCD manages everything, but something has to install ArgoCD first. The two-step bootstrap solves it cleanly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 1 — Install ArgoCD manually (once):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm repo add argo https://argoproj.github.io/argo-helm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm repo update
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm install argocd argo/argo-cd &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -n argo-cd --create-namespace &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -f cluster/staging/cfg/argo-cd/values.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Step 2 — Apply the App-of-Apps root:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl apply -k cluster/&amp;lt;cluster&amp;gt;/app-of-apps/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;From this point ArgoCD reconciles the entire cluster. Every subsequent change goes through Git — you never run &lt;code&gt;helm install&lt;/code&gt; or &lt;code&gt;kubectl apply&lt;/code&gt; for workloads again.&lt;/p&gt;
&lt;h2 id="self-management"&gt;Self-management
&lt;/h2&gt;&lt;p&gt;The final step is making ArgoCD manage its own upgrades. Create an Application that points at &lt;code&gt;cluster/&amp;lt;cluster&amp;gt;/cfg/argo-cd&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Application&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argocd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argocd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;project&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;source&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;repoURL&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;https://github.com/myorg/cluster-config&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetRevision&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;main&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;path&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;cluster/staging/cfg/argo-cd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;destination&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;server&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;https://kubernetes.default.svc&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argocd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;syncPolicy&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;automated&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;prune&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt; &lt;span style="color:#75715e"&gt;# be cautious pruning ArgoCD&amp;#39;s own resources&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;selfHeal&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now ArgoCD upgrades itself when you update the Helm values in Git. No more manual &lt;code&gt;helm upgrade&lt;/code&gt; — the cluster is fully self-managing. Changes to ArgoCD config go through the same PR review process as everything else.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://argo-cd.readthedocs.io/" target="_blank" rel="noopener"
 &gt;ArgoCD documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://opengitops.dev/" target="_blank" rel="noopener"
 &gt;GitOps principles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://argo-cd.readthedocs.io/en/stable/user-guide/best_practices/" target="_blank" rel="noopener"
 &gt;ArgoCD best practices&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Docker &amp; OCI</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/docker/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/docker/</guid><description>&lt;p&gt;Docker packages applications and their dependencies into portable, reproducible units called containers. Unlike virtual machines, containers share the host kernel — they&amp;rsquo;re isolated processes, not emulated hardware. This makes them fast to start, light on resources, and consistent across environments: the same image runs on a developer&amp;rsquo;s laptop, in CI, and in production.&lt;/p&gt;
&lt;p&gt;Docker popularised containers, but the underlying standard is now open. The &lt;strong&gt;OCI (Open Container Initiative)&lt;/strong&gt; defines three specifications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Image spec&lt;/strong&gt; — the format of a container image: layers, config, manifest&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runtime spec&lt;/strong&gt; — how a container is run: namespaces, cgroups, lifecycle&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Distribution spec&lt;/strong&gt; — how images are pushed and pulled from registries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Any tool that produces an OCI image can run on any OCI-compliant runtime. Docker is one implementation. It is still the most natural entry point and the &lt;code&gt;docker&lt;/code&gt; CLI remains the most familiar interface, but it is worth knowing that the ecosystem is broader than Docker Inc.&lt;/p&gt;
&lt;h2 id="oci-images-and-containers"&gt;OCI images and containers
&lt;/h2&gt;&lt;p&gt;An &lt;strong&gt;image&lt;/strong&gt; is a read-only, layered filesystem snapshot built from a Dockerfile — each layer is a diff on top of the previous one. A &lt;strong&gt;container&lt;/strong&gt; is a running instance of an image — an isolated process with its own filesystem, network interface, and process space, sharing the host kernel.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker build -t myapp:1.0 . &lt;span style="color:#75715e"&gt;# build OCI image from Dockerfile&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run -p 8080:8080 myapp:1.0 &lt;span style="color:#75715e"&gt;# start container&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker ps &lt;span style="color:#75715e"&gt;# list running containers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker exec -it &amp;lt;id&amp;gt; bash &lt;span style="color:#75715e"&gt;# shell into a running container&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Images are stored in registries — Docker Hub, GitHub Container Registry, ECR, Nexus. All speak the OCI distribution spec, so images built with any tool push and pull the same way.&lt;/p&gt;
&lt;h2 id="dockerfile"&gt;Dockerfile
&lt;/h2&gt;&lt;p&gt;The Dockerfile defines how an image is built — each instruction adds a layer:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#e6db74"&gt;golang:1.22-alpine&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;build&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;WORKDIR&lt;/span&gt; &lt;span style="color:#e6db74"&gt;/app&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COPY&lt;/span&gt; go.mod go.sum ./&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;RUN&lt;/span&gt; go mod download&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COPY&lt;/span&gt; . .&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;RUN&lt;/span&gt; go build -o /app/server .&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#e6db74"&gt;alpine:3.19&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COPY&lt;/span&gt; --from&lt;span style="color:#f92672"&gt;=&lt;/span&gt;build /app/server /server&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXPOSE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;8080&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ENTRYPOINT&lt;/span&gt; [&lt;span style="color:#e6db74"&gt;&amp;#34;/server&amp;#34;&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Multi-stage builds&lt;/strong&gt; keep the final image lean: the first stage compiles using the full toolchain, the second copies only the binary. No compiler, no source, no build cache in the image you ship.&lt;/p&gt;
&lt;p&gt;Order matters for layer caching — put things that change rarely (dependency downloads) before things that change often (source code). A cache miss invalidates all subsequent layers.&lt;/p&gt;
&lt;h2 id="volumes-and-bind-mounts"&gt;Volumes and bind mounts
&lt;/h2&gt;&lt;p&gt;Containers have ephemeral filesystems — anything written inside is lost when the container stops. Persist data with volumes:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker volume create pgdata
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run -v pgdata:/var/lib/postgresql/data postgres:16
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For local development, bind mounts map a host directory into the container:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run -v &lt;span style="color:#66d9ef"&gt;$(&lt;/span&gt;pwd&lt;span style="color:#66d9ef"&gt;)&lt;/span&gt;:/app -w /app node:20 npm test
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="networking"&gt;Networking
&lt;/h2&gt;&lt;p&gt;Containers on the same Docker network can reach each other by name. Docker Compose creates a default network automatically; named networks can be created explicitly:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker network create backend
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --network backend --name db postgres:16
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --network backend myapp &lt;span style="color:#75715e"&gt;# can reach &amp;#39;db&amp;#39; by hostname&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="podman"&gt;Podman
&lt;/h2&gt;&lt;p&gt;Podman is a drop-in Docker replacement that runs without a daemon and without root. The CLI is intentionally compatible:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;alias docker&lt;span style="color:#f92672"&gt;=&lt;/span&gt;podman &lt;span style="color:#75715e"&gt;# usually just works&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Rootless containers mean a compromised container process cannot escalate to host root. Daemonless means no long-running background service with broad system access. On RHEL and Fedora, Podman is the default. For CI environments and security-conscious setups it is the better choice.&lt;/p&gt;
&lt;p&gt;Podman also supports &lt;strong&gt;pods&lt;/strong&gt; — groups of containers sharing a network namespace, mirroring the Kubernetes pod model. Useful for local development that needs to mirror how things will run in the cluster.&lt;/p&gt;
&lt;h2 id="buildah"&gt;Buildah
&lt;/h2&gt;&lt;p&gt;Buildah builds OCI images without a Docker daemon. It can build from a Dockerfile or construct images programmatically using shell commands — useful in CI pipelines where running a privileged Docker daemon is undesirable:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;buildah bud -t myapp:1.0 . &lt;span style="color:#75715e"&gt;# build from Dockerfile&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;buildah push myapp:1.0 registry/myapp:1.0
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Buildah and Podman share the same underlying storage, so images built with Buildah are immediately available to Podman.&lt;/p&gt;
&lt;h2 id="docker-compose"&gt;Docker Compose
&lt;/h2&gt;&lt;p&gt;Compose manages multi-container applications defined in &lt;code&gt;compose.yml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;services&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;app&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;build&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;ports&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#e6db74"&gt;&amp;#34;8080:8080&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;environment&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;DATABASE_URL&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;postgres://app:secret@db/appdb&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;depends_on&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;db&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;db&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;image&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;postgres:16&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;volumes&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;pgdata:/var/lib/postgresql/data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;environment&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;POSTGRES_PASSWORD&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;secret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;volumes&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;pgdata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker compose up -d &lt;span style="color:#75715e"&gt;# start in background&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker compose logs -f &lt;span style="color:#75715e"&gt;# stream logs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker compose down &lt;span style="color:#75715e"&gt;# stop and remove containers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Compose is useful for local development environments. It is a shame it exists as a separate abstraction — it taught people to think in multi-container terms without teaching them Kubernetes, and then left them with a gap to cross when they needed to go to production. That said, it is practical for what it does and is not going away.&lt;/p&gt;
&lt;p&gt;For production orchestration, see &lt;a class="link" href="../../kubernetes/kubernetes/" &gt;Kubernetes&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="skopeo"&gt;Skopeo
&lt;/h2&gt;&lt;p&gt;Skopeo works with OCI images directly — copy, inspect, and convert — without pulling them to local storage. Useful in pipelines and for auditing registries:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Inspect an image without pulling it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;skopeo inspect docker://registry.example.com/myapp:1.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Copy between registries without touching local disk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;skopeo copy docker://source-registry/myapp:1.0 docker://dest-registry/myapp:1.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Copy to a local OCI layout&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;skopeo copy docker://myapp:1.0 oci:myapp-local:1.0
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;skopeo inspect&lt;/code&gt; is particularly useful for checking image metadata, digest, and labels in CI before deciding whether to promote an image.&lt;/p&gt;
&lt;h2 id="oras"&gt;ORAS
&lt;/h2&gt;&lt;p&gt;ORAS (OCI Registry As Storage) pushes and pulls arbitrary artifacts to OCI registries — not just container images. Helm charts, SBOMs, attestations, Terraform modules, binary releases — anything can be stored in a registry that speaks OCI distribution spec:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Push a file as an OCI artifact&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oras push registry.example.com/myapp-sbom:1.0 sbom.json:application/spdx+json
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Pull it back&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oras pull registry.example.com/myapp-sbom:1.0
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This matters because it means a single registry can become the distribution mechanism for the entire software supply chain — image, SBOM, signature, attestation — all with the same access controls and audit trail.&lt;/p&gt;
&lt;h2 id="useful-practices"&gt;Useful practices
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Use specific image tags (&lt;code&gt;postgres:16.2&lt;/code&gt;, not &lt;code&gt;postgres:latest&lt;/code&gt;) — &lt;code&gt;latest&lt;/code&gt; changes under you&lt;/li&gt;
&lt;li&gt;Reference images by digest in production (&lt;code&gt;myapp@sha256:abc123&lt;/code&gt;) — tags are mutable, digests are not&lt;/li&gt;
&lt;li&gt;Run as a non-root user: &lt;code&gt;USER appuser&lt;/code&gt; in the Dockerfile&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;.dockerignore&lt;/code&gt; to exclude &lt;code&gt;.git&lt;/code&gt;, &lt;code&gt;node_modules&lt;/code&gt;, build artefacts from the build context&lt;/li&gt;
&lt;li&gt;Keep images small — large images are slow to push, pull, and scan&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://opencontainers.org/" target="_blank" rel="noopener"
 &gt;OCI specifications&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.docker.com/" target="_blank" rel="noopener"
 &gt;Docker documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://podman.io/docs" target="_blank" rel="noopener"
 &gt;Podman documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://buildah.io/" target="_blank" rel="noopener"
 &gt;Buildah documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/containers/skopeo" target="_blank" rel="noopener"
 &gt;Skopeo GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://oras.land/docs/" target="_blank" rel="noopener"
 &gt;ORAS documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>etcd</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/etcd/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/etcd/</guid><description>&lt;p&gt;etcd is the distributed key-value store that backs Kubernetes. Every Kubernetes object — pods, services, deployments, configmaps, secrets — is stored in etcd. The API server is the only component that reads and writes it directly; everything else in the cluster reads from the API server&amp;rsquo;s cache. etcd&amp;rsquo;s reliability is the cluster&amp;rsquo;s reliability: if etcd loses quorum, the Kubernetes control plane stops functioning.&lt;/p&gt;
&lt;h2 id="raft-consensus"&gt;Raft consensus
&lt;/h2&gt;&lt;p&gt;etcd uses the Raft consensus algorithm. The cluster elects a leader; all writes go through the leader, which replicates them to followers before acknowledging the write. The cluster tolerates &lt;code&gt;(n-1)/2&lt;/code&gt; node failures — a three-node cluster survives one failure, a five-node cluster survives two. This is why control plane node counts are always odd. Three nodes is standard for production; five for clusters where control plane availability is critical.&lt;/p&gt;
&lt;h2 id="watches-and-revisions"&gt;Watches and revisions
&lt;/h2&gt;&lt;p&gt;Every write increments a global revision counter. Clients can watch a key or key prefix and receive every change since a given revision. This is how the Kubernetes controller manager and scheduler work — they hold long-lived watch connections and react to changes in specific resource types without polling.&lt;/p&gt;
&lt;h2 id="operations"&gt;Operations
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Snapshot backup&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;etcdctl snapshot save /backup/etcd-snapshot.db &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --endpoints&lt;span style="color:#f92672"&gt;=&lt;/span&gt;https://127.0.0.1:2379 &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --cacert&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/etc/kubernetes/pki/etcd/ca.crt &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --cert&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/etc/kubernetes/pki/etcd/healthcheck-client.crt &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --key&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/etc/kubernetes/pki/etcd/healthcheck-client.key
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Restore from snapshot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;etcdctl snapshot restore /backup/etcd-snapshot.db --data-dir&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/var/lib/etcd-restore
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Check cluster health&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;etcdctl endpoint health --cluster
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Backing up etcd regularly is the most critical operational task for a Kubernetes cluster. The snapshot is the only path to full recovery if cluster state is lost.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://etcd.io/docs/" target="_blank" rel="noopener"
 &gt;etcd documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/" target="_blank" rel="noopener"
 &gt;Kubernetes etcd administration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Istio</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/istio/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/istio/</guid><description>&lt;p&gt;Istio is a service mesh for Kubernetes. It injects a sidecar proxy (Envoy) into every pod, and all traffic between pods flows through these proxies rather than directly between containers. This gives the mesh control over traffic routing, security, and observability without any changes to application code.&lt;/p&gt;
&lt;h2 id="what-it-solves"&gt;What it solves
&lt;/h2&gt;&lt;p&gt;In a large microservice deployment, every service needs to handle retries, timeouts, circuit breaking, mutual TLS, and metrics collection — or skip them and accept the risk. Without a mesh, each team implements this differently, or not at all. Istio moves these concerns out of the application and into the infrastructure layer, where they are configured once and applied uniformly.&lt;/p&gt;
&lt;h2 id="traffic-management"&gt;Traffic management
&lt;/h2&gt;&lt;p&gt;Istio&amp;rsquo;s &lt;code&gt;VirtualService&lt;/code&gt; and &lt;code&gt;DestinationRule&lt;/code&gt; CRDs give fine-grained control over how traffic is routed:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;networking.istio.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;VirtualService&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;reviews&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;hosts&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;reviews&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;http&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;match&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;headers&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;end-user&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;exact&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;test-user&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;route&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;destination&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;host&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;reviews&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;subset&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;v2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;route&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;destination&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;host&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;reviews&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;subset&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This routes a specific user to &lt;code&gt;v2&lt;/code&gt; of a service while everyone else gets &lt;code&gt;v1&lt;/code&gt; — canary testing without a load balancer rule or code change.&lt;/p&gt;
&lt;h2 id="mtls"&gt;mTLS
&lt;/h2&gt;&lt;p&gt;Istio issues and rotates certificates for every workload and enforces mutual TLS between services automatically. Services authenticate each other&amp;rsquo;s identity, not just encrypt the connection. A &lt;code&gt;PeerAuthentication&lt;/code&gt; policy can enforce strict mTLS across a namespace, ensuring no plaintext traffic is accepted.&lt;/p&gt;
&lt;h2 id="observability"&gt;Observability
&lt;/h2&gt;&lt;p&gt;Because all traffic flows through Envoy sidecars, Istio generates L7 metrics (request rate, error rate, latency percentiles), distributed traces, and access logs for every service-to-service call — without instrumentation in the services themselves. This integrates with &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/prometheus/" &gt;Prometheus&lt;/a&gt;, &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/grafana/" &gt;Grafana&lt;/a&gt;, and &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/jaeger/" &gt;Jaeger&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="cost"&gt;Cost
&lt;/h2&gt;&lt;p&gt;Istio adds latency (two extra proxy hops per call) and resource overhead (a sidecar per pod). For clusters with tens of services, the operational benefit is clear. For small clusters or teams early in a microservices journey, the complexity may outweigh the gains.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://istio.io/latest/docs/" target="_blank" rel="noopener"
 &gt;Istio documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://istio.io/latest/docs/concepts/" target="_blank" rel="noopener"
 &gt;Istio concepts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>K9s &amp; Lens</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/k9s/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/k9s/</guid><description>&lt;p&gt;You run everything with &lt;code&gt;kubectl&lt;/code&gt;. Get pods, describe, logs, exec, delete, apply — fifty times a day across five namespaces. It works, but every command is a context switch: type, wait, read, type again. &lt;code&gt;-n namespace&lt;/code&gt; on every single invocation.&lt;/p&gt;
&lt;p&gt;So you use K9s. A terminal UI that shows your entire cluster in one view. Switch namespaces and clusters in a keystroke, tail logs in real time, exec into a pod without constructing the command — everything you reach for in &lt;code&gt;kubectl&lt;/code&gt;, but without the friction.&lt;/p&gt;
&lt;h2 id="k9s"&gt;K9s
&lt;/h2&gt;&lt;p&gt;K9s is a TUI (terminal UI) for Kubernetes. It stays in your terminal, updates live, and is keyboard-driven throughout.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brew install derailed/k9s/k9s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;k9s &lt;span style="color:#75715e"&gt;# connect to current context&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;k9s --context prod &lt;span style="color:#75715e"&gt;# specific context&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;k9s -n monitoring &lt;span style="color:#75715e"&gt;# start in a specific namespace&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="navigation"&gt;Navigation
&lt;/h3&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Key&lt;/th&gt;
 &lt;th&gt;Action&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;:pod&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Jump to pods view&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;:deploy&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Deployments&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;:svc&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Services&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;:ns&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Switch namespace&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Filter/search&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;l&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Logs&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;e&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Edit resource YAML&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;d&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Describe&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;s&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Shell into pod&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;ctrl-d&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Delete&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;?&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Help / full keybinding list&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Most resource types are reachable by typing &lt;code&gt;:&lt;/code&gt; followed by the resource name — &lt;code&gt;:configmap&lt;/code&gt;, &lt;code&gt;:secret&lt;/code&gt;, &lt;code&gt;:ingress&lt;/code&gt;, &lt;code&gt;:pvc&lt;/code&gt;, and so on.&lt;/p&gt;
&lt;h3 id="why-tui-over-gui"&gt;Why TUI over GUI
&lt;/h3&gt;&lt;p&gt;K9s lives in the terminal alongside your other tools. No window switching, works over SSH, starts instantly, and the keyboard-driven workflow is faster once it is in muscle memory. For day-to-day cluster work it is the right default.&lt;/p&gt;
&lt;h2 id="lens"&gt;Lens
&lt;/h2&gt;&lt;p&gt;Lens is a desktop GUI for Kubernetes — a full IDE-style interface with a visual cluster overview, resource browsing, metrics charts, log streaming, and terminal access built in.&lt;/p&gt;
&lt;p&gt;It is the better choice when you need to onboard someone who is not yet comfortable with the terminal, or when you want a visual overview to share with a non-technical stakeholder. For engineers doing operational work all day, K9s is faster.&lt;/p&gt;
&lt;p&gt;Worth noting: Lens has moved toward a commercial model (Lens Desktop Pro). &lt;strong&gt;&lt;a class="link" href="https://github.com/MuhammedKalkan/OpenLens" target="_blank" rel="noopener"
 &gt;OpenLens&lt;/a&gt;&lt;/strong&gt; is the open-source build of the same codebase, without the account requirement.&lt;/p&gt;
&lt;h2 id="kubectx--kubens"&gt;kubectx / kubens
&lt;/h2&gt;&lt;p&gt;If K9s is more than you need and you just want to stop typing &lt;code&gt;--context&lt;/code&gt; and &lt;code&gt;-n&lt;/code&gt; on every command, &lt;code&gt;kubectx&lt;/code&gt; and &lt;code&gt;kubens&lt;/code&gt; solve exactly that:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectx &lt;span style="color:#75715e"&gt;# list contexts&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectx prod &lt;span style="color:#75715e"&gt;# switch to prod context&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectx - &lt;span style="color:#75715e"&gt;# switch back to previous context&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubens &lt;span style="color:#75715e"&gt;# list namespaces&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubens monitoring &lt;span style="color:#75715e"&gt;# switch default namespace&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;No TUI, no GUI — just fast context and namespace switching that persists for the rest of your terminal session. Install alongside K9s; they complement each other.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brew install kubectx
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://k9scli.io/" target="_blank" rel="noopener"
 &gt;K9s documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/derailed/k9s" target="_blank" rel="noopener"
 &gt;K9s GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://k8slens.dev/" target="_blank" rel="noopener"
 &gt;Lens&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/MuhammedKalkan/OpenLens" target="_blank" rel="noopener"
 &gt;OpenLens&lt;/a&gt; — open-source Lens build&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ahmetb/kubectx" target="_blank" rel="noopener"
 &gt;kubectx/kubens&lt;/a&gt; — fast context and namespace switching&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Kubernetes</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/</guid><description>&lt;p&gt;Kubernetes (K8s) is the de facto standard for container orchestration and the second largest open source project after the Linux kernel. It has well and truly reached the plateau of productivity — the ecosystem is mature and it genuinely delivers.&lt;/p&gt;
&lt;p&gt;That said, the honest take: &lt;strong&gt;K8s is ridiculously hard to deploy and manage&lt;/strong&gt; (day 2 operations especially). Docker Swarm is equally ridiculously easy to get started with. For raw scale, Mesos/DC/OS wins — clusters of 80k+ nodes have been documented in the wild, versus K8s master&amp;rsquo;s practical ceiling of around 5k nodes.&lt;/p&gt;
&lt;p&gt;So the real question is whether the ecosystem justifies the complexity for your situation. For most teams doing cloud-native work, it does.&lt;/p&gt;
&lt;h2 id="core-concepts"&gt;Core concepts
&lt;/h2&gt;&lt;p&gt;The main building blocks:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pods&lt;/strong&gt; — smallest deployable unit, wrapping one or more containers that share network and storage.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Pods" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/pods.png" width="800"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Deployments&lt;/strong&gt; — declare desired state; K8s handles rolling updates and self-healing.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Deployments" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/deployments.png" width="800"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Secrets&lt;/strong&gt; — store sensitive data (passwords, tokens, keys) separately from application config.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Secrets" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/secrets.png" width="800"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DaemonSets&lt;/strong&gt; — run a pod on every node. Typical use: log collectors, monitoring agents.&lt;/p&gt;
&lt;p&gt;&lt;img alt="DaemonSets" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/daemonsets.png" width="800"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ReplicaSets&lt;/strong&gt; — ensure N copies of a pod are running at any given time.&lt;/p&gt;
&lt;p&gt;&lt;img alt="ReplicaSets" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/replicasets.png" width="800"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ingress&lt;/strong&gt; — HTTP/S routing rules at layer 7. Your load balancer config, declarative.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Ingress" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="1920" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/ingress.png" srcset="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/ingress_hu_b0bb8a58e86a1fe2.png 800w, https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/ingress_hu_34af71a3280d9a15.png 1600w, https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/ingress.png 2400w" width="2400"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CronJobs&lt;/strong&gt; — scheduled jobs, K8s-native.&lt;/p&gt;
&lt;p&gt;&lt;img alt="CronJobs" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/cronjobs.png" width="800"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Custom Resource Definitions (CRDs)&lt;/strong&gt; — extend the K8s API with your own resource types. The foundation of most K8s operators.&lt;/p&gt;
&lt;p&gt;&lt;img alt="CRDs" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/crd.png" width="800"&gt;&lt;/p&gt;
&lt;h2 id="architecture"&gt;Architecture
&lt;/h2&gt;&lt;p&gt;How the pieces fit together internally:&lt;/p&gt;
&lt;p&gt;&lt;img alt="K8s internal architecture" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="1280" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/inner.png" srcset="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/inner_hu_ca8f0543462c084c.png 800w, https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/inner.png 1600w" width="1600"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="K8s component overview" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/zoo.png" width="800"&gt;&lt;/p&gt;
&lt;h2 id="containers-vs-virtual-machines"&gt;Containers vs virtual machines
&lt;/h2&gt;&lt;p&gt;Not an either/or — they solve different problems and are frequently combined.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Separate: containers alongside VMs" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="1280" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/seperate.png" srcset="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/seperate_hu_557182bcbb415ab4.png 800w, https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/seperate.png 1600w" width="1600"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Combined: containers running on top of VMs" class="gallery-image" data-flex-basis="300px" data-flex-grow="125" height="1280" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/combined.png" srcset="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/combined_hu_9f383172dd98181c.png 800w, https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubernetes/combined.png 1600w" width="1600"&gt;&lt;/p&gt;
&lt;h2 id="local-clusters-for-development"&gt;Local clusters for development
&lt;/h2&gt;&lt;p&gt;When you need K8s without a full cluster:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Tool&lt;/th&gt;
 &lt;th&gt;Best for&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;a class="link" href="https://microk8s.io/" target="_blank" rel="noopener"
 &gt;MicroK8s&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Ubuntu, snap-based, batteries included&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;a class="link" href="https://minikube.sigs.k8s.io/" target="_blank" rel="noopener"
 &gt;Minikube&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;The classic, broad driver support&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;a class="link" href="https://kind.sigs.k8s.io/" target="_blank" rel="noopener"
 &gt;Kind&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;K8s in Docker, great for CI pipelines&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;a class="link" href="https://k3d.io/" target="_blank" rel="noopener"
 &gt;K3D&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;K3s in Docker, fast startup&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;a class="link" href="https://k3s.io/" target="_blank" rel="noopener"
 &gt;K3S&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Lightweight K8s, edge and IoT use cases&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://kubernetes.io/" target="_blank" rel="noopener"
 &gt;kubernetes.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://landscape.cncf.io/" target="_blank" rel="noopener"
 &gt;CNCF Landscape&lt;/a&gt; — map of the cloud-native ecosystem&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.youtube.com/watch?v=PH-2FfFD2PU" target="_blank" rel="noopener"
 &gt;TGI Kubernetes intro (YouTube)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://igy.cx/posts/setup-microk8s-rbac-storage/" target="_blank" rel="noopener"
 &gt;Setting up MicroK8s with RBAC and Storage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Kubernetes Autoscaling</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/k8s-autoscaling/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/k8s-autoscaling/</guid><description>&lt;p&gt;Kubernetes has built-in autoscaling at two levels: the Horizontal Pod Autoscaler scales the number of pod replicas based on CPU or memory, and the Cluster Autoscaler adds or removes nodes when pods can&amp;rsquo;t be scheduled. KEDA and Karpenter extend these primitives — KEDA pushing workload scaling further, Karpenter replacing the node provisioner entirely.&lt;/p&gt;
&lt;h2 id="keda"&gt;KEDA
&lt;/h2&gt;&lt;p&gt;Kubernetes Event-Driven Autoscaling. KEDA extends the HPA to scale workloads based on external event sources — Kafka consumer lag, queue depth in SQS or RabbitMQ, HTTP request rate, database query results, cron schedules. The built-in HPA only knows about CPU and memory; KEDA adds a long list of scalers for external systems. The important capability it adds is scale-to-zero: a consumer that has no messages to process can scale down to zero pods and scale back up when work arrives. This makes it well-suited for event-driven workloads and batch processing where idle replicas waste resources.&lt;/p&gt;
&lt;h2 id="karpenter"&gt;Karpenter
&lt;/h2&gt;&lt;p&gt;A node provisioner that replaces the Cluster Autoscaler, originally from AWS and now a CNCF project with support for other clouds. Where the Cluster Autoscaler works by adjusting existing Auto Scaling Groups, Karpenter provisions EC2 instances (or equivalent) directly based on the actual resource requirements of pending pods — choosing the right instance type, size, and purchase option (on-demand vs spot) in real time. This makes provisioning significantly faster and more cost-efficient: the cluster gets exactly the nodes the pending workload needs, not the nearest pre-configured node group. Karpenter also handles consolidation — continuously evaluating whether running workloads could be packed onto fewer nodes and replacing over-provisioned nodes accordingly.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://keda.sh/docs/" target="_blank" rel="noopener"
 &gt;KEDA documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://keda.sh/docs/scalers/" target="_blank" rel="noopener"
 &gt;KEDA scalers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://karpenter.sh/docs/" target="_blank" rel="noopener"
 &gt;Karpenter documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://karpenter.sh/docs/concepts/" target="_blank" rel="noopener"
 &gt;Karpenter concepts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>KubeVirt</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubevirt/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kubevirt/</guid><description>&lt;p&gt;See &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/virtualization/" &gt;Virtualization — KVM and KubeVirt&lt;/a&gt; for full coverage of both KVM and KubeVirt.&lt;/p&gt;</description></item><item><title>Kyverno</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kyverno/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/kyverno/</guid><description>&lt;p&gt;Kyverno is a policy engine for Kubernetes. It runs as an admission controller and intercepts every resource creation or update, applying rules that validate, mutate, or generate resources. Policies are written as Kubernetes CRDs in YAML — no Rego, no separate language to learn. If you can write a Kubernetes manifest, you can write a Kyverno policy.&lt;/p&gt;
&lt;h2 id="three-rule-types"&gt;Three rule types
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Validate&lt;/strong&gt; — reject resources that don&amp;rsquo;t meet requirements:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;kyverno.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ClusterPolicy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;require-labels&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;rules&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;check-team-label&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;match&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;any&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;kinds&lt;/span&gt;: [&lt;span style="color:#ae81ff"&gt;Deployment]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;validate&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;message&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;Deployments must have a &amp;#39;team&amp;#39; label.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;pattern&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;labels&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;team&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;?*&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Mutate&lt;/strong&gt; — automatically add or modify fields on admission:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;add-default-resources&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;match&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;any&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;kinds&lt;/span&gt;: [&lt;span style="color:#ae81ff"&gt;Pod]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;mutate&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;patchStrategicMerge&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;containers&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;(name)&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;*&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;requests&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;+(memory)&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;64Mi&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;+(cpu)&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;250m&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Generate&lt;/strong&gt; — create related resources automatically. A common use: generate a &lt;code&gt;NetworkPolicy&lt;/code&gt; every time a new namespace is created.&lt;/p&gt;
&lt;h2 id="enforcement-vs-audit"&gt;Enforcement vs audit
&lt;/h2&gt;&lt;p&gt;Policies run in &lt;code&gt;enforce&lt;/code&gt; mode (block non-compliant resources) or &lt;code&gt;audit&lt;/code&gt; mode (allow but report violations). Audit mode is the right starting point — understand your existing state before enforcing.&lt;/p&gt;
&lt;h2 id="common-policies"&gt;Common policies
&lt;/h2&gt;&lt;p&gt;The &lt;a class="link" href="https://kyverno.io/policies/" target="_blank" rel="noopener"
 &gt;Kyverno policy library&lt;/a&gt; has ready-made policies for common requirements: disallow privileged containers, require image tags to not be &lt;code&gt;latest&lt;/code&gt;, enforce resource limits, restrict hostPath mounts. Most teams start from the library and customise.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://kyverno.io/docs/" target="_blank" rel="noopener"
 &gt;Kyverno documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://kyverno.io/policies/" target="_blank" rel="noopener"
 &gt;Kyverno policy library&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Local Kubernetes</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/local-kubernetes/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/local-kubernetes/</guid><description>&lt;p&gt;Running Kubernetes locally is useful for development, testing, and CI — a real cluster without the cloud bill. The options differ mainly in weight, startup speed, and whether they target local dev, CI pipelines, or lightweight production use.&lt;/p&gt;
&lt;h2 id="minikube"&gt;MiniKube
&lt;/h2&gt;&lt;p&gt;The original local Kubernetes, maintained by the Kubernetes project itself. Runs a single-node cluster inside a VM (VirtualBox, HyperKit) or a Docker container. The reference implementation — if something works in Kubernetes, it works in MiniKube. Slower to start than the container-based options, heavier on resources, but the most faithful representation of a real cluster. Good for getting started and for testing things that need VM-level isolation.&lt;/p&gt;
&lt;h2 id="kind"&gt;Kind
&lt;/h2&gt;&lt;p&gt;Kubernetes IN Docker — each cluster node runs as a Docker container, no VM required. Fast startup (seconds), low overhead, and multi-node clusters are easy to spin up. The standard choice for running Kubernetes in CI pipelines: create a cluster, run tests, tear it down. The Kubernetes project itself uses Kind for conformance testing. Not designed for running workloads long-term, but excellent for ephemeral test environments.&lt;/p&gt;
&lt;h2 id="k3s"&gt;K3S
&lt;/h2&gt;&lt;p&gt;Lightweight Kubernetes from Rancher (now SUSE), packaged as a single binary under 100MB. It strips out cloud-provider integrations, in-tree storage drivers, and alpha features — the result is a fully conformant Kubernetes that runs on hardware where full K8s won&amp;rsquo;t. Used in production for edge deployments, IoT, and resource-constrained environments. Also a good choice when you want a real persistent cluster locally without the overhead of MiniKube.&lt;/p&gt;
&lt;h2 id="k3d"&gt;K3D
&lt;/h2&gt;&lt;p&gt;K3S running inside Docker containers — the same relationship Kind has to standard Kubernetes. Fast, lightweight, multi-node clusters in Docker. The advantage over Kind is that K3S starts faster and uses less memory per node. Good choice for local dev and CI when you want the lightweight K3S runtime rather than full upstream Kubernetes.&lt;/p&gt;
&lt;h2 id="microk8s"&gt;MicroK8S
&lt;/h2&gt;&lt;p&gt;Canonical&amp;rsquo;s take on local Kubernetes, distributed as a snap package on Ubuntu. Single-command install, add-ons (DNS, storage, ingress, observability) enabled with &lt;code&gt;microk8s enable &amp;lt;addon&amp;gt;&lt;/code&gt;. Opinionated and tightly integrated with the Ubuntu/Canonical ecosystem. The right choice if you&amp;rsquo;re on Ubuntu and want a low-friction local cluster with batteries included — less so outside that ecosystem.&lt;/p&gt;
&lt;h2 id="which-to-use"&gt;Which to use
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;th&gt;Best for&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;MiniKube&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Getting started, testing with VM isolation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Kind&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;CI pipelines, ephemeral test clusters&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;K3S&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Persistent local cluster, edge/IoT production&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;K3D&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Fast local dev and CI with K3S runtime&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;MicroK8S&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Ubuntu users wanting a managed local cluster&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://minikube.sigs.k8s.io/docs/" target="_blank" rel="noopener"
 &gt;MiniKube documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://kind.sigs.k8s.io/" target="_blank" rel="noopener"
 &gt;Kind documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.k3s.io/" target="_blank" rel="noopener"
 &gt;K3S documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://k3d.io/" target="_blank" rel="noopener"
 &gt;K3D documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://microk8s.io/docs" target="_blank" rel="noopener"
 &gt;MicroK8S documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Loki</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/loki/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/loki/</guid><description>&lt;p&gt;&lt;a class="link" href="../prometheus/" &gt;Prometheus&lt;/a&gt; tells you &lt;em&gt;that&lt;/em&gt; something is wrong and &lt;em&gt;when&lt;/em&gt; it started. Loki tells you &lt;em&gt;what&lt;/em&gt; happened — it is the log aggregation layer of the observability stack. Logs from every pod across every node are collected, indexed, and made searchable in one place. Grafana is the front end for both.&lt;/p&gt;
&lt;h2 id="how-it-works"&gt;How it works
&lt;/h2&gt;&lt;p&gt;Loki stores logs as compressed chunks, indexed only by labels (not by content). This makes it cheap to store and fast to query by label — namespace, pod name, app — but slower for full-text search than something like Elasticsearch. The trade-off is intentional: label-scoped queries cover the vast majority of real operational use, and the storage cost is dramatically lower.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Promtail&lt;/strong&gt; runs as a DaemonSet on every node, tails log files from &lt;code&gt;/var/log/pods/&lt;/code&gt;, attaches Kubernetes labels, and ships to Loki. Grafana queries Loki directly.&lt;/p&gt;
&lt;h2 id="deployment-modes"&gt;Deployment modes
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;SingleBinary&lt;/strong&gt; — ingestion, querying, and management all run in a single instance. Simple to deploy, minimal operational overhead. A single point of failure: if it goes down, ingestion stops and logs are lost. The right starting point for most clusters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SimpleScalable&lt;/strong&gt; — responsibilities split into separate pods, each running a minimum of two instances for HA. Ingestion, querying, and the compactor can be scaled independently. Significantly more operational overhead, but fault-tolerant and tunable under load. The right move for production once you have volume and reliability requirements.&lt;/p&gt;
&lt;h2 id="getting-started"&gt;Getting started
&lt;/h2&gt;&lt;p&gt;The fastest path to a working stack is deploying Loki alongside &lt;code&gt;kube-prometheus-stack&lt;/code&gt;, which brings up Prometheus, Grafana, and Alertmanager together. See the &lt;a class="link" href="../prometheus/" &gt;Prometheus&lt;/a&gt; note for the kube-prometheus-stack setup and the ArgoCD CRD workaround.&lt;/p&gt;
&lt;p&gt;Loki and Promtail are installed as a separate ArgoCD Application, using multiple Helm sources with values pulled from the cluster config repo:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Application&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;log-ingestion&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argo-cd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;project&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;sources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Loki&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;repoURL&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;https://grafana.github.io/helm-charts&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;chart&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;loki&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetRevision&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;6.55.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;helm&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;releaseName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;loki&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;valueFiles&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;$values/cluster/testing/overlay/monitoring/helm/loki-values.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Promtail&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;repoURL&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;https://grafana.github.io/helm-charts&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;chart&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;promtail&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetRevision&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;6.17.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;helm&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;releaseName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;promtail&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;valueFiles&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;$values/cluster/testing/overlay/monitoring/helm/promtail-values.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Values source — cluster config repo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;repoURL&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#39;git@github.com:example-org/cluster-config.git&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetRevision&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;HEAD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;ref&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;values&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;destination&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;server&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;https://kubernetes.default.svc&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;monitoring&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;syncPolicy&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;automated&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;selfHeal&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;prune&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;syncOptions&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;CreateNamespace=true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;ServerSideApply=true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note: &lt;code&gt;targetRevision: HEAD&lt;/code&gt; is fine for testing environments. Pin to a tag for staging and production.&lt;/p&gt;
&lt;h2 id="promtail-deprecation"&gt;Promtail deprecation
&lt;/h2&gt;&lt;p&gt;Promtail is deprecated as of February 2025 and in LTS — security fixes only, no new features. Expected EOL is end of 2026.&lt;/p&gt;
&lt;p&gt;The Grafana-recommended replacement is &lt;strong&gt;&lt;a class="link" href="https://grafana.com/docs/alloy/latest/" target="_blank" rel="noopener"
 &gt;Grafana Alloy&lt;/a&gt;&lt;/strong&gt;, a more capable collector that handles metrics, logs, and traces in a single agent. The migration path is not yet settled enough for a confident recommendation — worth waiting for clear community consensus before moving. Until then, Promtail continues to work and the LTS window gives time to plan.&lt;/p&gt;
&lt;h2 id="grafana-integration"&gt;Grafana integration
&lt;/h2&gt;&lt;p&gt;Add Loki as a data source in Grafana and logs become queryable alongside metrics. A useful starting point is a simple app-oriented logs dashboard — filter by namespace and pod, tail in near-real-time, correlate timestamps with Prometheus spikes.&lt;/p&gt;
&lt;p&gt;LogQL, Loki&amp;rsquo;s query language, mirrors PromQL in style:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-logql" data-lang="logql"&gt;# All error logs from a namespace
{namespace=&amp;#34;production&amp;#34;} |= &amp;#34;error&amp;#34;

# Parse and filter structured logs
{app=&amp;#34;my-api&amp;#34;} | json | status &amp;gt;= 500

# Rate of error log lines over time
rate({namespace=&amp;#34;production&amp;#34;} |= &amp;#34;error&amp;#34; [5m])
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://grafana.com/docs/loki/latest/" target="_blank" rel="noopener"
 &gt;Loki documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://grafana.com/docs/alloy/latest/" target="_blank" rel="noopener"
 &gt;Grafana Alloy documentation&lt;/a&gt; — future Promtail replacement&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/grafana/helm-charts/tree/main/charts/loki-stack" target="_blank" rel="noopener"
 &gt;loki-stack Helm chart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack" target="_blank" rel="noopener"
 &gt;kube-prometheus-stack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Managing Secrets in Kubernetes</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/k8s-secrets/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/k8s-secrets/</guid><description>&lt;p&gt;Kubernetes has a built-in &lt;code&gt;Secret&lt;/code&gt; resource, but it is not a secrets management solution — it is base64-encoded storage with no encryption at rest by default and no access audit trail. How you actually manage secrets in a Kubernetes cluster depends on how far you need to go beyond the default.&lt;/p&gt;
&lt;h2 id="native-kubernetes-secrets"&gt;Native Kubernetes Secrets
&lt;/h2&gt;&lt;p&gt;The baseline. A &lt;code&gt;Secret&lt;/code&gt; is a key-value store mounted into pods as environment variables or files:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Secret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;db-credentials&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;type&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Opaque&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;data&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;username&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;YWRtaW4= &lt;/span&gt; &lt;span style="color:#75715e"&gt;# base64(&amp;#34;admin&amp;#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;password&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;cGFzc3dvcmQ=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The problems: base64 is encoding, not encryption. Secrets are stored in etcd — enabling etcd encryption at rest is a cluster configuration step that is easy to skip. Secrets are visible to anyone with &lt;code&gt;kubectl get secret&lt;/code&gt; in that namespace. For anything beyond a local dev cluster or a low-sensitivity workload, you need something more.&lt;/p&gt;
&lt;h2 id="sealed-secrets"&gt;Sealed Secrets
&lt;/h2&gt;&lt;p&gt;A Kubernetes controller from Bitnami. &lt;code&gt;SealedSecret&lt;/code&gt; resources contain secrets encrypted with the cluster&amp;rsquo;s public key — only the controller running in that cluster can decrypt them. The encrypted form is safe to commit to Git, which makes GitOps workflows possible without a separate secrets store. Simple to operate, no external dependency. The tradeoff: secrets are tied to a specific cluster&amp;rsquo;s key, cross-cluster sharing requires re-encryption, and there is no centralised audit trail.&lt;/p&gt;
&lt;h2 id="external-secrets-operator"&gt;External Secrets Operator
&lt;/h2&gt;&lt;p&gt;ESO reads secrets from an external store (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault, 1Password) and syncs them into native Kubernetes Secrets. Your source of truth stays in the external system; the K8s Secret is a read-only projection of it, refreshed on a configurable interval:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;external-secrets.io/v1beta1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ExternalSecret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;db-credentials&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;refreshInterval&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1h&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;secretStoreRef&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;aws-secrets-manager&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ClusterSecretStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;target&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;db-credentials&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;data&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;secretKey&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;password&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;remoteRef&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;key&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;prod/db/password&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;ESO is the right choice when you already have a secrets store and want Kubernetes workloads to consume from it without changing how secrets are managed elsewhere.&lt;/p&gt;
&lt;h2 id="secrets-store-csi-driver"&gt;Secrets Store CSI Driver
&lt;/h2&gt;&lt;p&gt;An alternative to ESO for the same problem: mount secrets from an external store directly as files in a pod, without creating a Kubernetes Secret at all. The secret materialises only in the pod&amp;rsquo;s filesystem, is not stored in etcd, and disappears when the pod terminates. Supported by AWS, Azure, GCP, and Vault providers. Used in combination with a &lt;code&gt;SecretProviderClass&lt;/code&gt; to define what to fetch and where to mount it.&lt;/p&gt;
&lt;h2 id="hashicorp-vault"&gt;HashiCorp Vault
&lt;/h2&gt;&lt;p&gt;A dedicated secrets management platform. Vault stores arbitrary secrets, issues dynamic credentials (database passwords that expire, AWS IAM credentials valid for an hour), manages PKI, and provides a full audit log of every read and write. Kubernetes workloads authenticate to Vault via the Kubernetes auth method (using the pod&amp;rsquo;s service account token) and receive a Vault token scoped to the secrets their service account is allowed to read. More to operate than the other options, but the right answer for organisations that need dynamic credentials, fine-grained access control, and audit logs.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Approach&lt;/th&gt;
 &lt;th&gt;Good for&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Native Secrets&lt;/td&gt;
 &lt;td&gt;Local dev, low-sensitivity workloads&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Sealed Secrets&lt;/td&gt;
 &lt;td&gt;GitOps, single-cluster, no external dependency&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;External Secrets Operator&lt;/td&gt;
 &lt;td&gt;Syncing from existing external stores&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Secrets Store CSI&lt;/td&gt;
 &lt;td&gt;Avoiding etcd entirely, file-based secret injection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;HashiCorp Vault&lt;/td&gt;
 &lt;td&gt;Dynamic credentials, audit logs, enterprise requirements&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/bitnami-labs/sealed-secrets" target="_blank" rel="noopener"
 &gt;Sealed Secrets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://external-secrets.io/" target="_blank" rel="noopener"
 &gt;External Secrets Operator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://secrets-store-csi-driver.sigs.k8s.io/" target="_blank" rel="noopener"
 &gt;Secrets Store CSI Driver&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://developer.hashicorp.com/vault/docs/platform/k8s" target="_blank" rel="noopener"
 &gt;HashiCorp Vault on Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>OpenShift Data Foundation</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/odf/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/odf/</guid><description>&lt;p&gt;OpenShift Data Foundation (ODF) is Red Hat&amp;rsquo;s enterprise Kubernetes storage platform, built on &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/cloud-infrastructure/ceph/" &gt;Ceph&lt;/a&gt; orchestrated by &lt;a class="link" href="https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/rook/" &gt;Rook&lt;/a&gt;. Where Rook-Ceph is the open source upstream, ODF packages it with an operator, a validated configuration, enterprise support, and integration with the OpenShift console. It provides block (RBD), file (CephFS), and object (S3-compatible via Ceph RGW) storage as Kubernetes StorageClasses on the same hardware.&lt;/p&gt;
&lt;h2 id="what-it-provides"&gt;What it provides
&lt;/h2&gt;&lt;p&gt;Three storage modes from one cluster:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Mode&lt;/th&gt;
 &lt;th&gt;StorageClass&lt;/th&gt;
 &lt;th&gt;Use case&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Block (RBD)&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;ocs-storagecluster-ceph-rbd&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Databases, stateful apps needing a single-writer disk&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;File (CephFS)&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;ocs-storagecluster-cephfs&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Shared filesystems, multiple pods reading/writing the same volume&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Object&lt;/td&gt;
 &lt;td&gt;S3-compatible endpoint&lt;/td&gt;
 &lt;td&gt;Buckets via &lt;code&gt;ObjectBucketClaim&lt;/code&gt;, backup targets, artifact storage&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="installation"&gt;Installation
&lt;/h2&gt;&lt;p&gt;ODF installs via the ODF operator from OperatorHub. The operator creates a &lt;code&gt;StorageCluster&lt;/code&gt; CR that drives the Ceph deployment:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ocs.openshift.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;StorageCluster&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ocs-storagecluster&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;openshift-storage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;storageDeviceSets&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ocs-deviceset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;count&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;replica&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;dataPVCTemplate&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;storageClassName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;local-storage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;volumeMode&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Block&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;requests&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;storage&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;1Ti&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Requires at minimum three nodes with dedicated block devices. The operator handles Ceph cluster formation, monitors, MGRs, and OSDs.&lt;/p&gt;
&lt;h2 id="vs-rook-ceph"&gt;vs Rook-Ceph
&lt;/h2&gt;&lt;p&gt;ODF IS Rook-Ceph under the hood. The difference is packaging and support: ODF is tested and supported on OpenShift, includes the NooBaa multi-cloud gateway for object storage federation, and integrates with the OpenShift UI. For self-managed Kubernetes outside OpenShift, raw Rook-Ceph is the equivalent path.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/" target="_blank" rel="noopener"
 &gt;ODF documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://rook.io/docs/rook/latest/" target="_blank" rel="noopener"
 &gt;Rook documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Reloader</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/reloader/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/reloader/</guid><description>&lt;p&gt;Reloader is a Kubernetes controller from Stakater that watches ConfigMaps and Secrets and automatically triggers rolling restarts of Deployments, StatefulSets, and DaemonSets when the watched resources change. Kubernetes does not do this natively — updating a ConfigMap does not restart pods that consume it, so configuration changes don&amp;rsquo;t take effect until the next deploy.&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage
&lt;/h2&gt;&lt;p&gt;Annotate a Deployment to watch a specific ConfigMap or Secret:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;apps/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Deployment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;annotations&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;reloader.stakater.com/auto&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;true&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;# watch all referenced ConfigMaps/Secrets&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# OR be specific:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;configmap.reloader.stakater.com/reload&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;my-config&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;secret.reloader.stakater.com/reload&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;my-secret&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When the ConfigMap or Secret changes, Reloader detects it and triggers a rolling restart by updating a pod template annotation. The deployment rolls out new pods that pick up the updated configuration.&lt;/p&gt;
&lt;h2 id="installation"&gt;Installation
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm repo add stakater https://stakater.github.io/stakater-charts
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm install reloader stakater/reloader -n reloader --create-namespace
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="why-not-use-a-hash-annotation-manually"&gt;Why not use a hash annotation manually
&lt;/h2&gt;&lt;p&gt;The common alternative is to inject a hash of the ConfigMap into the pod template annotations via Helm or Kustomize — when the hash changes, Kubernetes rolls the deployment. This works but requires build-time tooling. Reloader handles it at runtime without any changes to the deployment pipeline.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/stakater/Reloader" target="_blank" rel="noopener"
 &gt;Reloader GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/stakater/stakater-charts" target="_blank" rel="noopener"
 &gt;Stakater charts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Velero</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/velero/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/velero/</guid><description>&lt;p&gt;Velero backs up and restores Kubernetes clusters. It captures both Kubernetes resource definitions (deployments, services, configmaps, secrets, CRDs) and persistent volume data, stores them in object storage (S3, GCS, Azure Blob), and can restore them to the same cluster or a different one. The primary use cases are disaster recovery, cluster migration, and namespace cloning.&lt;/p&gt;
&lt;h2 id="how-it-works"&gt;How it works
&lt;/h2&gt;&lt;p&gt;Velero runs as a controller in the cluster. A &lt;code&gt;Backup&lt;/code&gt; CR triggers a snapshot of selected resources:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;velero.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Backup&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;daily-backup&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;velero&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;includedNamespaces&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;production&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;storageLocation&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;ttl&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;720h &lt;/span&gt; &lt;span style="color:#75715e"&gt;# 30 days&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Persistent volume data is handled via storage provider snapshots (CSI snapshots, AWS EBS snapshots) or a file-system-level backup using the node-agent daemonset (formerly Restic). CSI snapshot integration is the preferred modern approach.&lt;/p&gt;
&lt;p&gt;Scheduled backups run via a &lt;code&gt;Schedule&lt;/code&gt; CR:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;velero.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Schedule&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;daily&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;velero&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;schedule&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;0 2 * * *&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;template&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;includedNamespaces&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;production&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;ttl&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;720h&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="restore"&gt;Restore
&lt;/h2&gt;&lt;p&gt;Restoring is a &lt;code&gt;Restore&lt;/code&gt; CR pointing at a backup:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;velero restore create --from-backup daily-backup
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Velero recreates the Kubernetes objects and restores volume data. Namespaces can be remapped on restore — useful for cloning production to staging.&lt;/p&gt;
&lt;h2 id="cluster-migration"&gt;Cluster migration
&lt;/h2&gt;&lt;p&gt;The standard migration pattern: back up from the source cluster, configure the destination cluster to point at the same object storage bucket, restore. Velero handles the resource recreation; DNS cutover is a separate step.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://velero.io/docs/" target="_blank" rel="noopener"
 &gt;Velero documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://velero.io/docs/main/csi/" target="_blank" rel="noopener"
 &gt;CSI snapshot support&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Virtualization — KVM and KubeVirt</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/virtualization/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/frameworks-tools/virtualization/</guid><description>&lt;p&gt;KVM is the Linux kernel&amp;rsquo;s native hypervisor. KubeVirt extends Kubernetes to run virtual machines using KVM under the hood. They are the same virtualization layer at different levels of abstraction — KVM on bare metal, KubeVirt in a Kubernetes cluster.&lt;/p&gt;
&lt;h2 id="kvm"&gt;KVM
&lt;/h2&gt;&lt;p&gt;Kernel-based Virtual Machine. KVM turns the Linux kernel into a hypervisor using hardware virtualization extensions (Intel VT-x, AMD-V). Virtual machines run as regular Linux processes backed by QEMU for device emulation. Managed via &lt;code&gt;libvirt&lt;/code&gt; and its CLI tools (&lt;code&gt;virsh&lt;/code&gt;, &lt;code&gt;virt-install&lt;/code&gt;) or the &lt;code&gt;virt-manager&lt;/code&gt; GUI.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Create a VM from an ISO&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virt-install &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --name ubuntu-vm &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --ram &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --vcpus &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --disk path&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/var/lib/libvirt/images/ubuntu.qcow2,size&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --cdrom /tmp/ubuntu.iso &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --os-variant ubuntu22.04
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# List running VMs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virsh list
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Start/stop&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virsh start ubuntu-vm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virsh shutdown ubuntu-vm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Connect to console&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virsh console ubuntu-vm
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;KVM gives near-native performance for CPU-bound workloads. Network and disk I/O use virtio drivers for efficient paravirtualised I/O. Live migration moves a running VM between hosts without downtime if shared storage is available.&lt;/p&gt;
&lt;h2 id="kubevirt"&gt;KubeVirt
&lt;/h2&gt;&lt;p&gt;KubeVirt adds &lt;code&gt;VirtualMachine&lt;/code&gt; and &lt;code&gt;VirtualMachineInstance&lt;/code&gt; CRDs to Kubernetes. VMs are defined as Kubernetes resources, scheduled by the Kubernetes scheduler, and managed alongside containers. Under the hood, each VM runs as a pod containing a QEMU-KVM process.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;kubevirt.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;VirtualMachine&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ubuntu-vm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;running&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;template&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;domain&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;devices&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;disks&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;rootdisk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;disk&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;bus&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;virtio&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;requests&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;memory&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;4Gi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;cpu&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;2&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;volumes&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;rootdisk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;containerDisk&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;image&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;kubevirt/fedora-cloud-container-disk-demo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;virtctl&lt;/code&gt; CLI complements &lt;code&gt;kubectl&lt;/code&gt; for VM-specific operations:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virtctl start ubuntu-vm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virtctl stop ubuntu-vm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virtctl console ubuntu-vm &lt;span style="color:#75715e"&gt;# serial console&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virtctl ssh ubuntu-vm &lt;span style="color:#75715e"&gt;# SSH via the Kubernetes API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virtctl migrate ubuntu-vm &lt;span style="color:#75715e"&gt;# live migrate to another node&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="cdi--containerized-data-importer"&gt;CDI — Containerized Data Importer
&lt;/h2&gt;&lt;p&gt;KubeVirt is typically paired with CDI, which imports VM disk images from URLs, container registries, or PVCs into &lt;code&gt;DataVolume&lt;/code&gt; resources that VMs can boot from. CDI handles the data flow; the VM definition just references the DataVolume.&lt;/p&gt;
&lt;h2 id="why-vms-in-kubernetes"&gt;Why VMs in Kubernetes
&lt;/h2&gt;&lt;p&gt;Some workloads can&amp;rsquo;t be containerised — legacy applications expecting a full OS, Windows workloads, software with kernel module requirements. KubeVirt lets those workloads live in the same cluster as containers, managed with the same tooling, subject to the same scheduling and networking policies.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://www.linux-kvm.org/page/Documents" target="_blank" rel="noopener"
 &gt;KVM documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://kubevirt.io/user-guide/" target="_blank" rel="noopener"
 &gt;KubeVirt documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/kubevirt/containerized-data-importer" target="_blank" rel="noopener"
 &gt;CDI documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>