Quickwit filled with Kube Logs (With Vector+ArgoCD+Grafana)

An experienced operators guide to streaming Kubernetes workload logs into Quickwit.

Overview
How does it work?
Quickwit Costs
Argo Configurations
- Project Definitions
- Application Definitions
AWS Configuration
- IAM Role Setup
Using Quickwit
- Direct Access
- Querying Logs
Log Management
- Log Retention
Grafana Integration

How does it work?

Every Kubernetes pod has its STDOUT and STDIN streams written to the kube node filesystem. Ever wondered how kubectl logs ... works? Well, the container logs are always written to disk and automatically trimmed before they can fill up the kube node's filesystem. The node's kubelet streams them to you on-demand -- going from filesystem to kubelet to kube api and finally over to your kubectl! Read more about Kubernetes standard log management.

We want to create an alternative path for these logs -- they will still be available for use with commands such as kubectl logs ... but we want to tail the files and ship their contents off to Quickwit as soon as possible. Getting to the point where everything works can be an odyssey, so this is my attempt at writing the ~~book~~ pamphlet.

Quickwit Costs

I have been accumulating data with Quickwit+Vector for about two days. The visibility has already helped me identify orphaned workloads and misconfigured logging in important cronjobs. And it has done so with a minimum Kubernetes footprint and negligible S3 usage.

1.9GB of compressed data in S3:

and the kube footprint is minuscule (and also not configured for high-availability just yet!)

% kubectl -n vector top pods
NAME                 CPU(cores)   MEMORY(bytes)
vector-agent-2fw7s   2m           25Mi
vector-agent-448zq   1m           26Mi
vector-agent-45m98   2m           27Mi
vector-agent-4b4r4   1m           21Mi
vector-agent-67lkg   1m           18Mi
[[[ SNIP ]]]

and

% kubectl -n quickwit top pods
NAME                                              CPU(cores)   MEMORY(bytes)
quickwit-logs-control-plane-7447dfb4d9-xgb2m      2m           11Mi
quickwit-logs-indexer-0                           25m          166Mi
quickwit-logs-janitor-9cb844987-whg7g             2m           19Mi
quickwit-logs-metastore-54d7c68f59-gnfjx          2m           15Mi
quickwit-logs-searcher-0                          2m           249Mi

Around these parts, we use ArgoCD to manage our apps. We don't manage helm repos locally and we don't run helm release updates by hand. We rely on gitops and active state sync to keep our many apps in sync. Kubernetes is complex and ArgoCD adds just a smidge more complexity in order to give unified visibility.

Best part of ArgoCD, it's really easy to share definitive Application specifications!

Argo Project Definitions

You have to create a tenancy with some guard rails. We want to limit which helm charts are installed to which namespaces (and what objects the charts can and cannot install in the cluster). Here's the relevant AppProject's edited for clarity:

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: kube-system
  namespace: argo
spec:
  destinations:
    - namespace: 'kube-system'
      server: '*'
    - namespace: 'prometheus' # creates a ton of metric-gathering pods
      server: '*'
    - namespace: 'vector' # creates a ton of vector agent pods
      server: '*'
  sourceRepos:
    - 'https://prometheus-community.github.io/helm-charts'
    - 'https://helm.vector.dev'
  clusterResourceWhitelist: # required to install the Prometheus CRDs
    - group: "*"
      kind: "*"

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: quickwit
  namespace: argo
spec:
  destinations:
    - namespace: "quickwit"
      server: "*"
  sourceRepos:
    - "https://helm.quickwit.io"
    - "https://github.com/xrl/quickwit-helm-charts.git"

these AppProject definitions are naturally managed by Git but that whole state-sync-webhooktastic architecture is out of scope.

Argo Application Definitions

Vector runs in agent mode:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: vector-agent
  namespace: argo
spec:
  project: kube-system
  syncPolicy:
    automated:
      prune: true
  source:
    repoURL: https://helm.vector.dev
    targetRevision: 0.37.0
    chart: vector
    helm:
      releaseName: vector-agent
      values: |
        fullnameOverride: "vector-agent"
        role: Agent
        customConfig:
          data_dir: /vector-data-dir
          api:
            enabled: true
            address: 0.0.0.0:8686
          sources:
            kubernetes_logs:
              type: kubernetes_logs
          transforms:
            filtered_logs:
              type: remap
              inputs: ["kubernetes_logs"]
              source: |
                .message = string!(.message)
                if contains(.message, "GET /ready HTTP/1.1") {
                  abort # we don't care RX health messages
                }
            kube_logs_to_otel:
              type: remap
              inputs: ["filtered_logs"]
              source: |
                .timestamp_nanos = to_unix_timestamp!(.timestamp, unit: "nanoseconds")
                .severity_text = "INFO"
                .body = {
                  "message": .message,
                  "stream": .stream
                }
                .attributes = .kubernetes

                del(.file)
                del(.timestamp)
                del(.source_type)
                del(.stream)
                del(.kubernetes)
                del(.message)
          sinks:
            quickwit_logs:
              type: http
              method: post
              inputs: ["kube_logs_to_otel"]
              encoding:
                codec: "json"
              framing:
                method: "newline_delimited"
              uri: "http://quickwit-logs-indexer.quickwit.svc.cluster.local:7280/api/v1/otel-logs-v0_7/ingest"
        # livenessProbe -- Override default liveness probe settings, if customConfig is used requires customConfig.api.enabled true
        ## Requires Vector's API to be enabled
        livenessProbe:
          httpGet:
            path: /health
            port: api

        # readinessProbe -- Override default readiness probe settings, if customConfig is used requires customConfig.api.enabled true
        ## Requires Vector's API to be enabled
        readinessProbe:
          httpGet:
            path: /health
            port: api
  destination:
    server: https://kubernetes.default.svc
    namespace: vector

good to notice in this config:

kubernetes logs are filtered to remove nuisance messages
remap transform is used to painfully convert to OTel-compatible log format
- use kubectl -n vector exec -it $vector_pod -- vector tap kubernetes_logs to see what a kube log look like (I passed this through printf '$json' | jq to format it):

{
  "file": "/var/log/pods/argo_argo-cd-repo-server-869d695dc8-fgmqc_eab867f0-389b-4b6b-9b7f-69c7c3474c45/repo-server/2.log",
  "kubernetes": {
    "container_id": "containerd://d3e051e44f0fe97790d4615998b4747abe3a1f3adae0d7a9395934d526386615",
    "container_image": "quay.io/argoproj/argocd:v2.11.7",
    "container_image_id": "quay.io/argoproj/argocd@sha256:47e3e00dc501680e77b2496c67ed2e6bff8de1c71e55b56b37b9b11fc34f2ed4",
    "container_name": "repo-server",
    "namespace_labels": {
      "kubernetes.io/metadata.name": "argo"
    },
    "node_labels": {
      "arch": "amd64",
      "beta.kubernetes.io/arch": "amd64",
      "beta.kubernetes.io/instance-type": "r5.xlarge",
      "beta.kubernetes.io/os": "linux",
      "eks.amazonaws.com/capacityType": "ON_DEMAND",
      "eks.amazonaws.com/nodegroup": "ondemand-1b-2024083115310830840000000d",
      "eks.amazonaws.com/nodegroup-image": "ami-039bdded3573af90a",
      "failure-domain.beta.kubernetes.io/region": "eu-central-1",
      "failure-domain.beta.kubernetes.io/zone": "eu-central-1b",
      "k8s.io/cloud-provider-aws": "3a3320977962e39cf45d0123eecd5f54",
      "kubernetes.io/arch": "amd64",
      "kubernetes.io/hostname": "ip-172-30-50-168.eu-central-1.compute.internal",
      "kubernetes.io/os": "linux",
      "lifecycle": "ondemand",
      "node.kubernetes.io/instance-type": "r5.xlarge",
      "nodegroup": "ondemand-eu-central-1b",
      "topology.ebs.csi.aws.com/zone": "eu-central-1b",
      "topology.k8s.aws/zone-id": "euc1-az3",
      "topology.kubernetes.io/region": "eu-central-1",
      "topology.kubernetes.io/zone": "eu-central-1b"
    },
    "pod_annotations": {
      "checksum/cm": "860c7d2900972fc99c6d7059e06a25d9646dcbf74da82484611321c8cce79377",
      "checksum/cmd-params": "4c016fc0004793cf74267de6a9da23ad69fb79f0f9cd503ffae016297898f41d"
    },
    "pod_ip": "172.30.34.204",
    "pod_ips": [
      "172.30.34.204"
    ],
    "pod_labels": {
      "app.kubernetes.io/component": "repo-server",
      "app.kubernetes.io/instance": "argo-cd",
      "app.kubernetes.io/managed-by": "Helm",
      "app.kubernetes.io/name": "argocd-repo-server",
      "app.kubernetes.io/part-of": "argocd",
      "app.kubernetes.io/version": "v2.11.7",
      "helm.sh/chart": "argo-cd-7.3.11",
      "pod-template-hash": "869d695dc8"
    },
    "pod_name": "argo-cd-repo-server-869d695dc8-fgmqc",
    "pod_namespace": "argo",
    "pod_node_name": "ip-172-30-50-168.eu-central-1.compute.internal",
    "pod_owner": "ReplicaSet/argo-cd-repo-server-869d695dc8",
    "pod_uid": "eab867f0-389b-4b6b-9b7f-69c7c3474c45"
  },
  "message": "time=\"2024-10-31T02:28:03Z\" level=info msg=\"finished unary call with code OK\" grpc.code=OK grpc.method=Check grpc.service=grpc.health.v1.Health grpc.start_time=\"2024-10-31T02:28:03Z\" grpc.time_ms=0.019 span.kind=server system=grpc",
  "source_type": "kubernetes_logs",
  "stream": "stderr",
  "timestamp": "2024-10-31T02:28:03.185047076Z"
}

use kubectl -n vector exec -it $vector_pod -- vector top to see how many messages are moving through the agent
quickwit is hit with the generic http output plugin, values taken from the quickwit vector docs

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: quickwit-logs
  namespace: argo
spec:
  project: quickwit
  syncPolicy:
    automated:
      prune: true
  source:
    repoURL: "https://github.com/xrl/quickwit-helm-charts.git"
    path: charts/quickwit
    targetRevision: per-service-env-from
    helm:
      releaseName: quickwit-logs
      values: |
        fullnameOverride: quickwit-logs
        config:
          default_index_root_uri: s3://quickwit-logs
          storage:
            s3:
              region: eu-central-1
        metastore:
          extraEnv:
           -  name: QW_METASTORE_URI
              valueFrom:
                secretKeyRef:
                  name: quickwitlogs-secret
                  key: POSTGRES_URL
        searcher:
          replicaCount: 1
        serviceAccount:
          create: true
          annotations:
            eks.amazonaws.com/role-arn: "arn:aws:iam::1234567890:role/quickwit-logs"
  destination:
    server: https://kubernetes.default.svc
    namespace: quickwit

good to notice in this config:

uses a fork of the helm chart until this PR can be addressed
- I have a RDS postgres instance I want to connect to. I will never put postgres credentials in my helm values 🫡
I only need one searcher for now
uses the EKS service account mechanism to inject a AWS session token in to the pod
- I will never roll/manage AWS service account credentials again 🫡

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kube-prometheus-stack
  namespace: argo
spec:
  project: kube-system
  syncPolicy:
    automated:
      prune: true
    syncOptions:
      - ServerSideApply=true
  source:
    repoURL: https://prometheus-community.github.io/helm-charts
    targetRevision: 65.3.2
    chart: kube-prometheus-stack
    helm:
      releaseName: prometheus
      values: |
        fullnameOverride: "prometheus"

        grafana:
          env:
            GF_INSTALL_PLUGINS: "quickwit-quickwit-datasource"
          persistence:
            enabled: true
          additionalDataSources:
            - name: Quickwit Logs
              type: quickwit-quickwit-datasource
              url: http://quickwit-logs-searcher.quickwit.svc.cluster.local:7280/api/v1
              jsonData:
                index: otel-logs-v0_7
                logMessageField: body
                logLevelField: severity_text
          grafana.ini:
            auth:
              disable_login_form: true
              disable_signout_menu: true
            auth.anonymous:
              enabled: true
              org_name: Main Org.
              org_role: Editor
              # database:
              # type: postgres
              # url: "${POSTGRES_URL}"

        prometheusOperator:
          kubeletService:
            enabled: false

        prometheus:
          prometheusSpec:
            resources:
              requests:
                memory: "28Gi"
                cpu: "2000m"
            ## Prometheus StorageSpec for persistent data
            ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/storage.md
            ##
            storageSpec:
              volumeClaimTemplate:
                spec:
                  storageClassName: gp2
                  accessModes: ["ReadWriteOnce"]
                  resources:
                    requests:
                      storage: 200Gi
        kube-state-metrics:
          podSecurityPolicy:
            enabled: false
  destination:
    server: https://kubernetes.default.svc
    namespace: prometheus

good to notice in this config:

this configures a prometheus instance AND a daemonset which forwards metrics from every kube node
the GF_INSTALL_PLUGINS ENV var lets us install the quickwit plugin on every container boot (that was new to me!)
we configure the data source right in the helm chart values (also new to me, I usually did clickops for that)
the persistence configuration kind of stinks, I have problems with the rollout strategy creating a deadlock over the PVC. deleting replicasets works
- the ultimate goal should be to use postgres to store my grafana dashboard data, no PVCs
I disable the grafana auth machinery because I have a OIDC gateway in front of the service
- out of scope from this documentation
- you can go straight to the grafana service with kubectl -n prometheus port-forward svc/prometheus-grafana 8080:80, then open http://localhost:8080 in your browser. no login required.

AWS Role IAM Terraform

resource "aws_iam_role" "quickwit-logs" {
  name = "quickwit-logs"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRoleWithWebIdentity"
        Effect = "Allow"
        Principal = {
          Federated = format("arn:aws:iam::%s:oidc-provider/%s", var.aws_account_id, var.oidc_provider_id)
        }
        Condition = {
          StringLike = {
            "${var.oidc_provider_id}:sub" : "system:serviceaccount:quickwit:quickwit-logs",
            "${var.oidc_provider_id}:aud" : "sts.amazonaws.com"
          }
        }
      }
    ]
  })

  inline_policy {
    name = "s3-access"
    policy = jsonencode({
      "Version" : "2012-10-17",
      "Statement" : [
        {
          "Effect" : "Allow",
          "Action" : [
            "s3:ListBucket"
          ],
          "Resource" : [
            "arn:aws:s3:::quickwit-logs"
          ]
        },
        {
          "Effect" : "Allow",
          "Action" : [
            "s3:GetObject",
            "s3:PutObject",
            "s3:DeleteObject",
            "s3:ListMultipartUploadParts",
            "s3:AbortMultipartUpload"
          ],
          "Resource" : [
            "arn:aws:s3:::quickwit-logs/*"
          ]
        }
      ]
    })
  }

  managed_policy_arns = []
}

good to notice in this terraform:

OIDC trust relationship with the Kubernetes cluster (see the link to the IAM docs above)
- limited access to kubernetes service accounts in the quickwit namespace
grants a variety of S3 permissions to the quickwit-logs bucket
- the S3 permissions were found in quickwit's AWS S3 storage docs
- remember you want to follow the principle of least privilege, grant only what is strictly necessary

the exact mechanism of running this terraform is out of scope -- but take comfort in knowing that it was applied through a gitops workflow.

Accessing Quickwit Directly

Port-forward in to the quickwit searcher service: kubectl -n quickwit port-forward svc/quickwit-logs-searcher 7280:7280 and then open your browser to http://localhost:7280 and you'll see this:

The Quickwit UI homepage showing the search interface

and if you go to look at the automatic otel logs index, you'll see this:

things to notice:

compression is a great thing. we pay for ~250MB of S3 storage for almost 5GB of JSON (there's a lot of junk in there I plan to strip out with vector's VRL)
- at the time of writing this incarnation of quickwit had been running for less than 8 hours. this is a production kube cluster with ~150 nodes handling an enterprise workload.
- the constant object writes, reads, write-backs might add up. I'll keep an eye on things best I can.
the number of splits changes often as quickwit service opens, merges, and garbage collects
- a split is a single file with all the contents of a tantivy segment in one compressed, seakable blob. check out the contents in S3 like this:

% aws s3 ls s3://quickwit-logs/otel-logs-v0_7/
2024-10-30 16:24:59    7336474 01JBFHMNRV7XQJAF3B4PHBJ2CY.split
2024-10-30 16:26:38    9817784 01JBFHQNS1J8KGC50MJ3BP6S9V.split
2024-10-30 16:27:33   29905109 01JBFHSATR3TXDDDD65R3K4RHN.split
2024-10-30 17:25:12      63586 01JBFN2RTR4Y10873NBMP0A3NQ.split
2024-10-30 17:25:17      80415 01JBFN2XQ4PB5PJJBD2G8SKMCQ.split
2024-10-30 19:22:47  149153365 01JBFVT4BJ6DYAB86SK8V6B439.split
[[[ SNIP ]]]

the quickwit project has not tackled the unenviable task of authentication at the quickwit level. don't expose quickwit to the open internet.
- grafana is the full-fledged dashboard builder so it's probably best to leave the auth to them. when I get grafana OIDC working I'll update this document.

Querying the Kubernetes Logs from the Quickwit UI

When visiting http://localhost:7280, I can query the Kubernetes logs and just make sure I understand what a document looks like.

The Query Editor panel of the Quickwit UI looks like this when our index has data flowing:

and one log's JSON looks like:

{
  "attributes": {
    "container_id": "containerd://9f6e5be434e97b7e37628b5f7a2423c4ec293939fbf58b22a66446ebff54ba87",
    "container_image": "registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7",
    "container_image_id": "registry.k8s.io/ingress-nginx/controller@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7",
    "container_name": "controller",
    "namespace_labels": {
      "kubernetes.io/metadata.name": "ingress-nginx"
    },
    "node_labels": {
      "arch": "amd64",
      "beta.kubernetes.io/arch": "amd64",
      "beta.kubernetes.io/instance-type": "r5.xlarge",
      "beta.kubernetes.io/os": "linux",
      "eks.amazonaws.com/capacityType": "ON_DEMAND",
      "eks.amazonaws.com/nodegroup": "ondemand-1a-20240831153108298900000007",
      "eks.amazonaws.com/nodegroup-image": "ami-039bdded3573af90a",
      "failure-domain.beta.kubernetes.io/region": "eu-central-1",
      "failure-domain.beta.kubernetes.io/zone": "eu-central-1a",
      "k8s.io/cloud-provider-aws": "3a3320977962e39cf45d0123eecd5f54",
      "kubernetes.io/arch": "amd64",
      "kubernetes.io/hostname": "ip-172-30-22-198.eu-central-1.compute.internal",
      "kubernetes.io/os": "linux",
      "lifecycle": "ondemand",
      "node.kubernetes.io/instance-type": "r5.xlarge",
      "nodegroup": "ondemand-eu-central-1a",
      "topology.ebs.csi.aws.com/zone": "eu-central-1a",
      "topology.k8s.aws/zone-id": "euc1-az2",
      "topology.kubernetes.io/region": "eu-central-1",
      "topology.kubernetes.io/zone": "eu-central-1a"
    },
    "pod_annotations": {
      "kubectl.kubernetes.io/restartedAt": "2023-12-06T01:04:59Z"
    },
    "pod_ip": "172.30.11.35",
    "pod_ips": [
      "172.30.11.35"
    ],
    "pod_labels": {
      "app.kubernetes.io/component": "controller",
      "app.kubernetes.io/instance": "ingress-nginx",
      "app.kubernetes.io/managed-by": "Helm",
      "app.kubernetes.io/name": "ingress-nginx",
      "app.kubernetes.io/part-of": "ingress-nginx",
      "app.kubernetes.io/version": "1.11.3",
      "helm.sh/chart": "ingress-nginx-4.11.3",
      "pod-template-hash": "6bc959cb88"
    },
    "pod_name": "ingress-nginx-controller-6bc959cb88-fp97t",
    "pod_namespace": "ingress-nginx",
    "pod_node_name": "ip-172-30-22-198.eu-central-1.compute.internal",
    "pod_owner": "ReplicaSet/ingress-nginx-controller-6bc959cb88",
    "pod_uid": "3d301f4e-b13a-45b3-8853-99b836e464a1"
  },
  "body": {
    "message": "172.30.9.182 - - [31/Oct/2024:02:46:19 +0000] \"GET /inventory?id=8934812a-40c7-4df9-8b79-32a02f358282 HTTP/1.1\" 200 11645 \"-\" \"Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)\" 392 0.159 [production-rx-production-80] [] 172.30.55.162:80 11624 0.160 200 2d56343d741dbdbbc2a2d0dfbbcbe7f8",
    "stream": "stdout"
  },
  "severity_text": "INFO",
  "timestamp_nanos": 1730342779504241000
}

Things to note:

the object structure is dictated by my VRL from the vector-agent Application definition above, but copied here:

kube_logs_to_otel:
  type: remap
  inputs: ["filtered_logs"]
  source: |
    .timestamp_nanos = to_unix_timestamp!(.timestamp, unit: "nanoseconds")
    .severity_text = "INFO"
    .body = {
      "message": .message,
      "stream": .stream
    }
    .attributes = .kubernetes

    del(.file)
    del(.timestamp)
    del(.source_type)
    del(.stream)
    del(.kubernetes)
    del(.message)

This is the first time I have tried to match the otel-logs schema
- The layout of the log object may not make sense but I'm glad it passes schema validation from quickwit's otel-logs-v0_7 index. Perhaps the opentelemetry-collector which is well documented by quickwit would fare better.

My must-haves to perform field-based search:

a field must be an exactly value: attributes.container_id:"containerd://9f6e5be434e97b7e37628b5f7a2423c4ec293939fbf58b22a66446ebff54ba87"
a field must be one of a list of values
a field must not be a value -attributes.container_id:"containerd://9f6e5be434e97b7e37628b5f7a2423c4ec293939fbf58b22a66446ebff54ba87" (note the minus)
a field should be present (no specific value in mind)
a field should be not be present

The hits just keep coming!

Managing Log Retention

How do we tweak the otel logs retention? 🤔

Grafana

You don't want to use the Quickwit UI for day-to-day observability/incident response. It's handy, for sure, but you want to build dashboards that lay out all the data at once.

This tutorial focuses on Grafana 12.

Creating a dashboard

Remember that we're building on top of the popular kube-prometheus-stack helm chart. This chart injects a whole suite of prometheus-centric dashboards to the grafana, creating what is essential a stdlib of monitoring. And we're totally able to add more dashboards.

Make a Logs folder where we can store our quickwit-backed dashboards:

Creating a new folder to organize Quickwit log dashboards

Select "New Dashboard" from the dropdown menu:

Selecting the New Dashboard option from Grafana menu

And we'll create a new dashboard to lay out some smart widgets:

Empty dashboard ready for new panels

Add your first visualization panel:

Adding a new visualization panel to the dashboard

Let's just start by saving the empty dashboard:

Saving the dashboard with initial configuration

Confirm the save operation:

Confirming dashboard save operation

Summarizing Logs with Aggregations

Aggregations, or bucketing, is used for generating summary statistics of a dataset. The dataset is split in to multiple buckets and we can ask Quickwit to generate summary statistics on each bucket. The usual suspects for summarizing a bucket: count, average, min, max, sum, percentiles, etc. The docs will tell you that aggregations are only performed on fast fields -- stats are calculated from the columnar portion of the quickwit split without having to read all the data.

Let's use aggregations to identify the noisiest kube cluster namespaces. We want to group by kube namespace and emit a count metric for each group. We'll further aggregate our data by time so we get a sense for the trends.

Add a visualization, choose the Quickwit Logs data source (remember we configured this as part of the helm values for the prometheus-kube-stack):

Selecting the Quickwit Logs data source for the visualization

You'll then be met with an intimidating blank panel editing screen:

The initial blank panel editing interface

Be sure your panel is the Metric type and specify the timestamp_nanos field for the aggregation:

Configuring the metric type and timestamp field

I don't like the line graph. In the top-right you can change the visualization type to Bar Chart:

Switching the visualization type to Bar Chart

Set the first group by statement to build a date histogram on the timestamp_nanos field and then hit the grafana's dashboard refresh icon and you'll see:

Initial date histogram visualization

You'll need to hit the grafana refresh button often, it doesn't look like the quickwit plugin reissues the query after changing the grafana UI, this button:

Location of the refresh button for updating visualizations

Make the bar chart more intelligible by increasing bucketing interval, set it to 5m:

Setting the bucket interval to 5 minutes

And we'll further subdivide those buckets by adding another term aggregation, let's group by attributes.namespace_labels.kubernetes.io/metadata.name. Click on the + icon to the right of the Group By expression builder:

Adding a new term aggregation

Configuring the group by settings

And thankfully you can type-ahead to discover relevant fields:

Using type-ahead to find relevant fields

And you'll get something like this, with the bar side-by-side:

Visualization showing side-by-side bar chart

Search the options on the right-side for stacking:

Locating the stacking option in settings

And then choose normal:

Selecting normal stacking mode

Give the panel a smart title:

Adding a descriptive title to the panel

And that's enough of the screenshot parade. Hit Save on the page and now you have your first starter dashboard:

The completed dashboard with configured visualization

Viewing Logs

Aggregations are great for summarizing what's going on. But what about when it's time to dig in to specifics? Thankfully, Grafana has a built in panel type for displaying log data. It has a few gotchas but let's go ahead and add a new panel to our dashboard:

Note: I am filtering to quickwit's pods

Add another visualization

Adding a new visualization panel to the dashboard

Change the visualization type from "Time series" to "Logs"

Switching visualization type to Logs view

Change query type to "Logs"

Setting the query type to Logs

Click "refresh dashboard" to fetch data and populate the panel

Initial view after refreshing the dashboard

Notice the little chevrons and the lack of log textual data. Grafana is definitely fetching the data from quickwit but the data is not coming back in a format matching the panels conventions. It's not really documented anywhere, but the logs panel presents data based on its position in dataset returned by the database plugin. The logs panel does not depend on the name of the field, just the position. You can see the data using the table view toggle:

Table view showing the raw data structure

Notice how $qw_message is the second column and it's blank. I'm not sure what the $qw_message template variable is used for, but we want to reorder the dataset and put body.message as the second column. Good news for everyone, Grafana added data transforms in version 11, which was new to me. Data transform seems like a powerful feature, let's try it out here. Switch from Query to Transform data (0):

Accessing the data transform options