<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Distributed-Systems on Backend Engineering Strategy Tools</title><link>https://backend-engineering-strategy-tools.github.io/site/tags/distributed-systems/</link><description>Recent content in Distributed-Systems on Backend Engineering Strategy Tools</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Mon, 01 Jan 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://backend-engineering-strategy-tools.github.io/site/tags/distributed-systems/index.xml" rel="self" type="application/rss+xml"/><item><title>etcd</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/etcd/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/kubernetes/etcd/</guid><description>&lt;p&gt;etcd is the distributed key-value store that backs Kubernetes. Every Kubernetes object — pods, services, deployments, configmaps, secrets — is stored in etcd. The API server is the only component that reads and writes it directly; everything else in the cluster reads from the API server&amp;rsquo;s cache. etcd&amp;rsquo;s reliability is the cluster&amp;rsquo;s reliability: if etcd loses quorum, the Kubernetes control plane stops functioning.&lt;/p&gt;
&lt;h2 id="raft-consensus"&gt;Raft consensus
&lt;/h2&gt;&lt;p&gt;etcd uses the Raft consensus algorithm. The cluster elects a leader; all writes go through the leader, which replicates them to followers before acknowledging the write. The cluster tolerates &lt;code&gt;(n-1)/2&lt;/code&gt; node failures — a three-node cluster survives one failure, a five-node cluster survives two. This is why control plane node counts are always odd. Three nodes is standard for production; five for clusters where control plane availability is critical.&lt;/p&gt;
&lt;h2 id="watches-and-revisions"&gt;Watches and revisions
&lt;/h2&gt;&lt;p&gt;Every write increments a global revision counter. Clients can watch a key or key prefix and receive every change since a given revision. This is how the Kubernetes controller manager and scheduler work — they hold long-lived watch connections and react to changes in specific resource types without polling.&lt;/p&gt;
&lt;h2 id="operations"&gt;Operations
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Snapshot backup&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;etcdctl snapshot save /backup/etcd-snapshot.db &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --endpoints&lt;span style="color:#f92672"&gt;=&lt;/span&gt;https://127.0.0.1:2379 &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --cacert&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/etc/kubernetes/pki/etcd/ca.crt &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --cert&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/etc/kubernetes/pki/etcd/healthcheck-client.crt &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --key&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/etc/kubernetes/pki/etcd/healthcheck-client.key
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Restore from snapshot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;etcdctl snapshot restore /backup/etcd-snapshot.db --data-dir&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/var/lib/etcd-restore
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Check cluster health&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;etcdctl endpoint health --cluster
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Backing up etcd regularly is the most critical operational task for a Kubernetes cluster. The snapshot is the only path to full recovery if cluster state is lost.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://etcd.io/docs/" target="_blank" rel="noopener"
 &gt;etcd documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/" target="_blank" rel="noopener"
 &gt;Kubernetes etcd administration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>