Skip to content

Instantly share code, notes, and snippets.

@lioneltchami
Created April 1, 2025 03:07
Show Gist options
  • Save lioneltchami/4142b0850e669c781f6dfff2ce4928e9 to your computer and use it in GitHub Desktop.
Save lioneltchami/4142b0850e669c781f6dfff2ce4928e9 to your computer and use it in GitHub Desktop.

Root Causes of the Cloudflare ↔ Backstage Connectivity Failures

  1. Incorrect DNS Resolution

    • Problem: Cloudflare Tunnel tried to reach backstage.backstage-dev (short DNS) but needed the full Kubernetes internal DNS:
      backstage.backstage-dev.svc.cluster.local
    • Symptom: i/o timeout errors in Cloudflare logs.
    • Fix: Updated Terraform to use the full DNS path:
      service = "http://backstage.backstage-dev.svc.cluster.local:80"
  2. Misaligned Ports

    • Problem:
      • Backstage listened on port 7007 (default).
      • Cloudflare Tunnel sent traffic to port 80 (no service listening there).
    • Symptom: Connection refused errors.
    • Fix:
      • Helm values.yaml aligned service.port: 80targetPort: 7007.
      • Backstage config explicitly set listen: { port: 7007, host: 0.0.0.0 }.
  3. Invalid Health Checks

    • Problem:
      • Probes used /* (invalid path) → Kubernetes marked pods as unhealthy.
      • Cloudflare routed traffic to "unready" pods.
    • Symptom: Pods stuck in CrashLoopBackoff.
    • Fix:
      • Temporarily switched probes to / (root path).
      • Future: Implement /healthcheck endpoint.
  4. Hardcoded Localhost URLs

    • Problem:
      • app-config.yaml used http://localhost:7007 → Pods couldn’t be reached externally.
    • Fix:
      • Updated to dynamic URLs:
        baseUrl: http://{{ .Values.backstage.internalHost }}:7007
  5. Missing Kubernetes Resource Headers

    • Problem:
      • Helm templates lacked apiVersion/kind → Helm failed to deploy.
    • Fix:
      • Converted app-config.yaml into a proper ConfigMap resource.

How These Changes Fixed the Issues

Issue Change Applied Result
Cloudflare couldn’t resolve DNS Used full Kubernetes DNS (svc.cluster.local) Tunnel now finds the service
Port mismatch Aligned service.port:80targetPort:7007 Traffic reaches Backstage
Unhealthy pods Fixed probes (temporarily to /) Kubernetes routes traffic only to ready pods
Localhost binding Set host: 0.0.0.0 in Backstage config Pods accept external connections
Invalid Helm templates Added apiVersion/kind to all templates Helm deploys successfully

Key Lessons Learned

  1. Kubernetes Networking 101:
    Always use full DNS names (<service>.<namespace>.svc.cluster.local) for internal communication.
  2. Probes Are Critical:
    Invalid health checks break pod availability, even if the app is running.
  3. Helm Best Practices:
    Templates must be valid Kubernetes manifests (apiVersion, kind, metadata).

Visual Flow (Before vs. After)

Before (Broken)

graph LR
  A[Cloudflare] -->|DNS: backstage.backstage-dev| B(Timeout)
  B --> C[Pod Unhealthy: /* probes failed]
Loading

After (Fixed)

graph LR
  A[Cloudflare] -->|DNS: backstage.backstage-dev.svc.cluster.local:80| B[Healthy Pod]
  B -->|Probes: / → 200 OK| C[Traffic Flows]
Loading

Next Steps

  1. Implement /healthcheck (permanent fix for probes).
  2. Test in Staging: Verify end-to-end connectivity.
  3. Document: Add a troubleshooting guide for similar issues.

These changes transformed the setup from failing silently to reliable routing. Let me know if you’d like a deeper dive into any part! 🔍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment