<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Grafana on Backend Engineering Strategy Tools</title><link>https://backend-engineering-strategy-tools.github.io/site/tags/grafana/</link><description>Recent content in Grafana on Backend Engineering Strategy Tools</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Mon, 01 Jan 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://backend-engineering-strategy-tools.github.io/site/tags/grafana/index.xml" rel="self" type="application/rss+xml"/><item><title>Grafana</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/grafana/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/grafana/</guid><description>&lt;p&gt;Prometheus shows you the spike. It tells you memory climbed at 14:32, error rate crossed 5% at 14:35, and latency hit 2 seconds at 14:37. But raw PromQL results are numbers in a table. You cannot see the shape of an incident in a table. You cannot hand a table to a product manager and explain what happened.&lt;/p&gt;
&lt;p&gt;So you use Grafana. It connects to Prometheus (and Loki, and a dozen other data sources) and turns those numbers into dashboards. You see the spike, the timeline, the correlation between services — all on one screen.&lt;/p&gt;
&lt;h2 id="data-sources"&gt;Data sources
&lt;/h2&gt;&lt;p&gt;Grafana is a visualisation layer, not a storage layer. It queries data sources and renders the results. In a Kubernetes observability stack, the typical setup:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Data source&lt;/th&gt;
 &lt;th&gt;What it provides&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Prometheus&lt;/td&gt;
 &lt;td&gt;Metrics — CPU, memory, request rates, error rates, latency&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Loki&lt;/td&gt;
 &lt;td&gt;Logs — searchable, filterable, correlated with metrics by time&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Jaeger / Tempo&lt;/td&gt;
 &lt;td&gt;Traces — individual request journeys across services&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Adding a data source is a few fields in the UI or a ConfigMap if you manage Grafana as code.&lt;/p&gt;
&lt;h2 id="dashboards"&gt;Dashboards
&lt;/h2&gt;&lt;p&gt;A dashboard is a collection of panels. Each panel runs a query against a data source and renders the result as a graph, gauge, stat, table, or heatmap.&lt;/p&gt;
&lt;p&gt;The fastest way to get useful dashboards is &lt;a class="link" href="https://grafana.com/grafana/dashboards/" target="_blank" rel="noopener"
 &gt;grafana.com/grafana/dashboards&lt;/a&gt; — a library of community dashboards for almost every common component. Import by ID:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;1860&lt;/strong&gt; — Node Exporter Full (host metrics: CPU, memory, disk, network)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;6417&lt;/strong&gt; — Kubernetes cluster overview&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;7362&lt;/strong&gt; — MySQL overview&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;9628&lt;/strong&gt; — Postgres overview&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Import these on day one and you have coverage before writing a single PromQL query.&lt;/p&gt;
&lt;h2 id="variables"&gt;Variables
&lt;/h2&gt;&lt;p&gt;Dashboard variables make panels reusable across namespaces, clusters, or services. A variable populated from a Prometheus label query:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;label_values(kube_pod_info{namespace=~&amp;#34;.+&amp;#34;}, namespace)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now every panel can use &lt;code&gt;$namespace&lt;/code&gt; in its query, and a dropdown at the top of the dashboard filters the whole view.&lt;/p&gt;
&lt;h2 id="alerting"&gt;Alerting
&lt;/h2&gt;&lt;p&gt;Grafana has its own alert engine that evaluates queries on a schedule and routes alerts through contact points (Slack, PagerDuty, email). For Kubernetes setups already using Alertmanager, it is usually cleaner to define alert rules in Prometheus and use Grafana purely for visualisation — one place for alert rules, not two.&lt;/p&gt;
&lt;h2 id="managing-grafana-as-code"&gt;Managing Grafana as code
&lt;/h2&gt;&lt;p&gt;Dashboards built in the UI are fragile — they live in a database and disappear if you rebuild the stack. Two better approaches:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Grafana provisioning&lt;/strong&gt; — mount dashboard JSON files via ConfigMap. Grafana loads them on startup and they survive restarts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Grafonnet / Jsonnet&lt;/strong&gt; — generate dashboard JSON programmatically. Verbose but version-controllable and reviewable in pull requests.&lt;/p&gt;
&lt;h2 id="the-observability-trio"&gt;The observability trio
&lt;/h2&gt;&lt;p&gt;Grafana is the front end for the full observability stack:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prometheus&lt;/strong&gt; — something is wrong, here are the numbers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Loki&lt;/strong&gt; — here are the log lines from that time window&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Jaeger&lt;/strong&gt; — here is the exact request that failed and where it slowed down&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each answers a different question. Grafana is where you look at all three in one place.&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://grafana.com/docs/grafana/latest/" target="_blank" rel="noopener"
 &gt;Grafana documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://grafana.com/grafana/dashboards/" target="_blank" rel="noopener"
 &gt;Grafana dashboard library&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack" target="_blank" rel="noopener"
 &gt;kube-prometheus-stack&lt;/a&gt; — installs Prometheus + Grafana together&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Loki</title><link>https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/loki/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://backend-engineering-strategy-tools.github.io/site/public-notes/observability/loki/</guid><description>&lt;p&gt;&lt;a class="link" href="../prometheus/" &gt;Prometheus&lt;/a&gt; tells you &lt;em&gt;that&lt;/em&gt; something is wrong and &lt;em&gt;when&lt;/em&gt; it started. Loki tells you &lt;em&gt;what&lt;/em&gt; happened — it is the log aggregation layer of the observability stack. Logs from every pod across every node are collected, indexed, and made searchable in one place. Grafana is the front end for both.&lt;/p&gt;
&lt;h2 id="how-it-works"&gt;How it works
&lt;/h2&gt;&lt;p&gt;Loki stores logs as compressed chunks, indexed only by labels (not by content). This makes it cheap to store and fast to query by label — namespace, pod name, app — but slower for full-text search than something like Elasticsearch. The trade-off is intentional: label-scoped queries cover the vast majority of real operational use, and the storage cost is dramatically lower.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Promtail&lt;/strong&gt; runs as a DaemonSet on every node, tails log files from &lt;code&gt;/var/log/pods/&lt;/code&gt;, attaches Kubernetes labels, and ships to Loki. Grafana queries Loki directly.&lt;/p&gt;
&lt;h2 id="deployment-modes"&gt;Deployment modes
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;SingleBinary&lt;/strong&gt; — ingestion, querying, and management all run in a single instance. Simple to deploy, minimal operational overhead. A single point of failure: if it goes down, ingestion stops and logs are lost. The right starting point for most clusters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SimpleScalable&lt;/strong&gt; — responsibilities split into separate pods, each running a minimum of two instances for HA. Ingestion, querying, and the compactor can be scaled independently. Significantly more operational overhead, but fault-tolerant and tunable under load. The right move for production once you have volume and reliability requirements.&lt;/p&gt;
&lt;h2 id="getting-started"&gt;Getting started
&lt;/h2&gt;&lt;p&gt;The fastest path to a working stack is deploying Loki alongside &lt;code&gt;kube-prometheus-stack&lt;/code&gt;, which brings up Prometheus, Grafana, and Alertmanager together. See the &lt;a class="link" href="../prometheus/" &gt;Prometheus&lt;/a&gt; note for the kube-prometheus-stack setup and the ArgoCD CRD workaround.&lt;/p&gt;
&lt;p&gt;Loki and Promtail are installed as a separate ArgoCD Application, using multiple Helm sources with values pulled from the cluster config repo:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;Application&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;log-ingestion&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;argo-cd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;project&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;sources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Loki&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;repoURL&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;https://grafana.github.io/helm-charts&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;chart&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;loki&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetRevision&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;6.55.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;helm&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;releaseName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;loki&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;valueFiles&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;$values/cluster/testing/overlay/monitoring/helm/loki-values.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Promtail&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;repoURL&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;https://grafana.github.io/helm-charts&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;chart&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;promtail&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetRevision&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;6.17.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;helm&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;releaseName&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;promtail&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;valueFiles&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;$values/cluster/testing/overlay/monitoring/helm/promtail-values.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Values source — cluster config repo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;repoURL&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#39;git@github.com:example-org/cluster-config.git&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;targetRevision&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;HEAD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;ref&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;values&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;destination&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;server&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;https://kubernetes.default.svc&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;monitoring&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;syncPolicy&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;automated&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;selfHeal&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;prune&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;syncOptions&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;CreateNamespace=true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#ae81ff"&gt;ServerSideApply=true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note: &lt;code&gt;targetRevision: HEAD&lt;/code&gt; is fine for testing environments. Pin to a tag for staging and production.&lt;/p&gt;
&lt;h2 id="promtail-deprecation"&gt;Promtail deprecation
&lt;/h2&gt;&lt;p&gt;Promtail is deprecated as of February 2025 and in LTS — security fixes only, no new features. Expected EOL is end of 2026.&lt;/p&gt;
&lt;p&gt;The Grafana-recommended replacement is &lt;strong&gt;&lt;a class="link" href="https://grafana.com/docs/alloy/latest/" target="_blank" rel="noopener"
 &gt;Grafana Alloy&lt;/a&gt;&lt;/strong&gt;, a more capable collector that handles metrics, logs, and traces in a single agent. The migration path is not yet settled enough for a confident recommendation — worth waiting for clear community consensus before moving. Until then, Promtail continues to work and the LTS window gives time to plan.&lt;/p&gt;
&lt;h2 id="grafana-integration"&gt;Grafana integration
&lt;/h2&gt;&lt;p&gt;Add Loki as a data source in Grafana and logs become queryable alongside metrics. A useful starting point is a simple app-oriented logs dashboard — filter by namespace and pod, tail in near-real-time, correlate timestamps with Prometheus spikes.&lt;/p&gt;
&lt;p&gt;LogQL, Loki&amp;rsquo;s query language, mirrors PromQL in style:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-logql" data-lang="logql"&gt;# All error logs from a namespace
{namespace=&amp;#34;production&amp;#34;} |= &amp;#34;error&amp;#34;

# Parse and filter structured logs
{app=&amp;#34;my-api&amp;#34;} | json | status &amp;gt;= 500

# Rate of error log lines over time
rate({namespace=&amp;#34;production&amp;#34;} |= &amp;#34;error&amp;#34; [5m])
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="resources"&gt;Resources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://grafana.com/docs/loki/latest/" target="_blank" rel="noopener"
 &gt;Loki documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://grafana.com/docs/alloy/latest/" target="_blank" rel="noopener"
 &gt;Grafana Alloy documentation&lt;/a&gt; — future Promtail replacement&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/grafana/helm-charts/tree/main/charts/loki-stack" target="_blank" rel="noopener"
 &gt;loki-stack Helm chart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack" target="_blank" rel="noopener"
 &gt;kube-prometheus-stack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>