explanation-bstage-cfare.md

Root Causes of the Cloudflare ↔ Backstage Connectivity Failures

Incorrect DNS Resolution
- Problem: Cloudflare Tunnel tried to reach backstage.backstage-dev (short DNS) but needed the full Kubernetes internal DNS:
  backstage.backstage-dev.svc.cluster.local
- Symptom: i/o timeout errors in Cloudflare logs.
- Fix: Updated Terraform to use the full DNS path:
```
service = "http://backstage.backstage-dev.svc.cluster.local:80"
```
Misaligned Ports
- Problem:
  - Backstage listened on port 7007 (default).
  - Cloudflare Tunnel sent traffic to port 80 (no service listening there).
- Symptom: Connection refused errors.
- Fix:
  - Helm values.yaml aligned service.port: 80 → targetPort: 7007.
  - Backstage config explicitly set listen: { port: 7007, host: 0.0.0.0 }.
Invalid Health Checks
- Problem:
  - Probes used /* (invalid path) → Kubernetes marked pods as unhealthy.
  - Cloudflare routed traffic to "unready" pods.
- Symptom: Pods stuck in CrashLoopBackoff.
- Fix:
  - Temporarily switched probes to / (root path).
  - Future: Implement /healthcheck endpoint.
Hardcoded Localhost URLs
- Problem:
  - app-config.yaml used http://localhost:7007 → Pods couldn’t be reached externally.
- Fix:
  - Updated to dynamic URLs:
```
baseUrl: http://{{ .Values.backstage.internalHost }}:7007
```
Missing Kubernetes Resource Headers
- Problem:
  - Helm templates lacked apiVersion/kind → Helm failed to deploy.
- Fix:
  - Converted app-config.yaml into a proper ConfigMap resource.

How These Changes Fixed the Issues

Issue	Change Applied	Result
Cloudflare couldn’t resolve DNS	Used full Kubernetes DNS (`svc.cluster.local`)	Tunnel now finds the service
Port mismatch	Aligned `service.port:80` → `targetPort:7007`	Traffic reaches Backstage
Unhealthy pods	Fixed probes (temporarily to `/`)	Kubernetes routes traffic only to ready pods
Localhost binding	Set `host: 0.0.0.0` in Backstage config	Pods accept external connections
Invalid Helm templates	Added `apiVersion`/`kind` to all templates	Helm deploys successfully

Key Lessons Learned

Kubernetes Networking 101:
Always use full DNS names (<service>.<namespace>.svc.cluster.local) for internal communication.
Probes Are Critical:
Invalid health checks break pod availability, even if the app is running.
Helm Best Practices:
Templates must be valid Kubernetes manifests (apiVersion, kind, metadata).

Visual Flow (Before vs. After)

Before (Broken)

graph LR
  A[Cloudflare] -->|DNS: backstage.backstage-dev| B(Timeout)
  B --> C[Pod Unhealthy: /* probes failed]

After (Fixed)

graph LR
  A[Cloudflare] -->|DNS: backstage.backstage-dev.svc.cluster.local:80| B[Healthy Pod]
  B -->|Probes: / → 200 OK| C[Traffic Flows]

Next Steps

Implement /healthcheck (permanent fix for probes).
Test in Staging: Verify end-to-end connectivity.
Document: Add a troubleshooting guide for similar issues.

These changes transformed the setup from failing silently to reliable routing. Let me know if you’d like a deeper dive into any part! 🔍

lioneltchami/explanation-bstage-cfare.md

Root Causes of the Cloudflare ↔ Backstage Connectivity Failures

How These Changes Fixed the Issues

Key Lessons Learned

Visual Flow (Before vs. After)

Before (Broken)

After (Fixed)

Next Steps