Skip to content

Instantly share code, notes, and snippets.

@aungkyawminn
Last active October 28, 2025 07:27
Show Gist options
  • Save aungkyawminn/2ce5758ae054db5bd56ea4adbfee0826 to your computer and use it in GitHub Desktop.
Save aungkyawminn/2ce5758ae054db5bd56ea4adbfee0826 to your computer and use it in GitHub Desktop.
Designing a Scalable and Cost-Efficient Kong Gateway Architecture on AWS

Designing a Scalable and Cost-Efficient Kong Gateway Architecture on AWS

Using Amazon ECS and Aurora Serverless


1. Overview

This architecture demonstrates a Kong Gateway Hybrid Mode (Control Plane + Data Plane) deployment using AWS ECS (Fargate) and Aurora Serverless PostgreSQL, optimized for scalability, security, and cost efficiency.


2. Core Components

Layer AWS Service Description
Networking VPC (Public/Private Subnets), Route53 Provides network isolation and internal routing.
Load Balancer Shared ALB Serves both API and Admin traffic through the same entry point (kong-gw.example.com).
Control Plane (CP) ECS Fargate Service Manages configuration, plugins, consumers, and certs. Accessible only via Data Plane proxy.
Data Plane (DP) ECS Fargate Service Processes API traffic and securely proxies Admin API requests to Control Plane.
Database Aurora Serverless PostgreSQL Central configuration store for Control Plane.
Service Discovery ECS Namespace (Cloud Map, internal) Enables private DNS resolution between ECS services.
Security Security Groups, IAM, WAF, API Keys Enforces least privilege, authentication, and controlled access.

3. Overall Architecture Diagram (Unified Domain)

flowchart TB
classDef public fill:#E6F4FF,stroke:#1A73E8,stroke-width:1px,color:#000
classDef private fill:#E8F5E9,stroke:#2E7D32,stroke-width:1px,color:#000
classDef secure fill:#FFF3E0,stroke:#F57C00,stroke-width:1px,color:#000
classDef data fill:#FCE4EC,stroke:#C2185B,stroke-width:1px,color:#000

subgraph Internet
    User["Public API Clients"]:::public
    Admin["Authorized Admin<br/>(IP Restricted + API Key)"]:::secure
end

subgraph "AWS Route53"
    DNS["kong-gw.example.com"]:::public
end

subgraph VPC
  direction TB
  subgraph "Public Subnets"
    ALB["Shared ALB<br>(All traffic routed to Data Plane)"]:::public
  end
  subgraph "Private Subnets"
    DP["ECS Service: Kong Data Plane<br>(Auto Scaling)<br>Routes: / → APIs, /kong-admin → CP Proxy"]:::private
    CP["ECS Service: Kong Control Plane<br>(Private Access Only)"]:::private
    DB["Aurora Serverless PostgreSQL"]:::data
  end
end

User -->|"https://kong-gw.example.com"| DNS --> ALB --> DP
Admin -->|"https://kong-gw.example.com/kong-admin<br/>(API Key + IP Restriction)"| DNS --> ALB --> DP
DP -->|"gRPC (8005) + Admin API (8001)<br/>via ECS Namespace"| CP
CP --> DB
Loading

Explanation:
All external access (API + Admin) enters via a single endpoint https://kong-gw.example.com through the shared ALB.
The Data Plane proxies /kong-admin traffic to the private Control Plane’s Admin API (8001) while continuing to sync configs over gRPC (8005).
The Control Plane remains private and connects to Aurora Serverless.


4. Network and Access Control Diagram (Aligned)

flowchart TD
classDef public fill:#E6F4FF,stroke:#1A73E8,stroke-width:1px,color:#000
classDef private fill:#E8F5E9,stroke:#2E7D32,stroke-width:1px,color:#000
classDef secure fill:#FFF3E0,stroke:#F57C00,stroke-width:1px,color:#000
classDef data fill:#FCE4EC,stroke:#C2185B,stroke-width:1px,color:#000

VPC["VPC 10.0.0.0/16 (Public + Private Subnets)"]:::private

ALB_SG["SG: Shared ALB<br>Inbound: TCP 443 from 0.0.0.0/0"]:::secure
DP_SG["SG: Data Plane<br>Inbound: TCP 8000 from SG:ALB<br>Outbound: TCP 8001,8005 to SG:CP"]:::secure
CP_SG["SG: Control Plane<br>Inbound: TCP 8001,8005 from SG:DP<br>Outbound: TCP 5432 to SG:DB"]:::secure
DB_SG["SG: Aurora DB<br>Inbound: TCP 5432 from SG:CP only"]:::secure

ALB["Shared ALB<br>Listener: HTTPS 443<br>Target: Data Plane only"]:::public
DP["ECS Tasks: Kong Data Plane"]:::private
CP["ECS Tasks: Kong Control Plane (Private)"]:::private
DB["Aurora Serverless PostgreSQL<br>Port 5432"]:::data

ALB --> DP
DP -->|"mTLS Sync (8005) + Admin Proxy (8001)"| CP
CP -->|"TCP 5432"| DB
Loading

Explanation:

  • The ALB handles all HTTPS traffic and forwards to the Data Plane only.
  • The Data Plane securely communicates with the Control Plane on ports 8001 (Admin) and 8005 (gRPC).
  • Aurora Serverless accepts inbound traffic only from Control Plane SG.
  • All management and configuration flows remain inside the private subnet.

6.3 Control Plane Access via Data Plane Proxy

sequenceDiagram
    participant Admin as "Authorized Admin"
    participant DP as "Kong Data Plane (Proxy /kong-admin)"
    participant CP as "Kong Control Plane (Private ECS)"
    participant DB as "Aurora Serverless"

    Admin->>DP: HTTPS /kong-admin (API Key + IP Restriction)
    DP->>CP: HTTP 8001 (Admin API)
    CP->>DB: Read/Write Configurations
    CP-->>DP: Response
    DP-->>Admin: JSON (Proxied Admin API Response)
Loading

How it works:

  • The Admin API (8001) remains private.
  • Data Plane defines a Service + Route to forward /kong-admin to Control Plane:
    Service: kong-admin
    URL: http://kong-cp.namespace.local:8001
    Route: /kong-admin
    Plugins: key-auth, ip-restriction
    
  • Only authorized IPs and API keys can reach it.
  • Kong Manager UI runs locally with:
    VUE_APP_KONG_ADMIN_API=https://kong-gw.example.com/kong-admin

6.2 Scaling & Cost Optimization (No Change in Logic)

flowchart TD
    classDef startEnd fill:#E3F2FD,stroke:#1E88E5,color:#000,stroke-width:1px
    classDef action fill:#E8F5E9,stroke:#43A047,color:#000,stroke-width:1px
    classDef decision fill:#FFF3E0,stroke:#FB8C00,color:#000,stroke-width:1px
    classDef result fill:#F3E5F5,stroke:#8E24AA,color:#000,stroke-width:1px

    A["1️⃣ Data Plane scales up<br/>(CloudWatch Alarm or ECS Event)"]:::startEnd
    B["2️⃣ Start Control Plane<br/>(ECS desiredCount = 1)"]:::action
    C["3️⃣ CP connects to Aurora<br/>Aurora auto-resumes from 0 ACU"]:::action
    D["4️⃣ Wait 5 min for DP tasks<br/>to sync configuration"]:::action
    E["5️⃣ Start 30 min cooldown timer"]:::action
    F{"6️⃣ Any new scale-up or<br/>config push during cooldown?"}:::decision
    G["7️⃣ Extend cooldown + 15 min"]:::action
    H{"8️⃣ DP stable & idle<br/>after cooldown?"}:::decision
    I["9️⃣ Stop Control Plane<br/>(desiredCount = 0)"]:::action
    J["🔟 Aurora auto-pauses to 0 ACU<br/>DP continues serving cached config"]:::result

    A --> B --> C --> D --> E --> F
    F -->|Yes| G --> F
    F -->|No| H
    H -->|No| F
    H -->|Yes| I --> J
    J -->|Next traffic spike| A
Loading

7. Cost and Efficiency Summary

Component Scaling Behavior Cost Benefit
Aurora Serverless Auto-scale & pause No fixed DB cost (min 0 ACU)
ECS Control Plane Manual start/stop Zero runtime when idle
ECS Data Plane Auto Scaling Pay only for actual load
Shared ALB Unified entrypoint Single cost (CP removed)
CloudWatch Pay-per-metric Lightweight observability

8. Security Highlights (Aligned)

  • Control Plane fully private — no direct ALB exposure.
  • /kong-admin proxy protected with API Key and IP restriction.
  • SG-to-SG communication ensures minimal attack surface.
  • Aurora only accepts inbound from CP SG.
  • mTLS sync (8005) between DP↔CP.
  • Centralized logging via CloudWatch and S3.

9. Summary

This architecture unifies ingress under a single domain (kong-gw.example.com) and fully isolates the Control Plane inside private subnets.
It reduces operational complexity, enhances security, and optimizes cost while maintaining full hybrid synchronization and administrative flexibility.


Kong Gateway – AWS Cost & Transaction Estimate (Singapore Region)

This document estimates monthly cost and throughput for the Event-Driven Kong Gateway Architecture deployed on AWS ECS Fargate + Aurora Serverless v2 (PostgreSQL) in the ap-southeast-1 (Singapore) region.


Architecture Summary

Component Mode Behavior
Data Plane (DP) ECS Fargate Always on – handles API traffic
Control Plane (CP) ECS Fargate Auto-start on scale/config, stops after cooldown
Aurora Serverless v2 (Postgres) Min 0 ACU Auto-pause (0 ACU when idle)
ALB Public Handles HTTPS to DP
Route 53 + CloudWatch DNS + logs

AWS Service Rates (Singapore Region)

Service Rate
Aurora Serverless v2 $0.20 / ACU-hour
ECS Fargate (x86) $0.05056 / vCPU-hour, $0.00553 / GB-hour
Storage (Aurora) $0.115 / GB-month
ALB $0.025 / hr base + $0.008 / LCU-hr (≈ fixed)
Route 53 + CloudWatch ~$3.5 / month

Monthly Cost Estimate (25 KB Payload)

Scenario CP+Aurora Active Time ALB (Fixed) DP Fargate (1 vCPU + 2 GB) CP Fargate (0.25 vCPU + 0.5 GB) Aurora Compute (1 ACU) Aurora Storage (20 GB) Route 53 + CW Total (USD)
A – Low load (UAT) 2 h/day $20 $10.7 $0.9 $12.0 $2.3 $3.5 $49
B – Medium load (Prod) 6 h/day $20 $10.7 $2.6 $36.0 $2.3 $3.5 $76
C – Heavy load 12 h/day $20 $10.7 $5.2 $72.0 $2.3 $3.5 $114
D – Full load (24 h/day) 24 h/day $20 $10.7 $10.4 $144.0 $2.3 $3.5 $191

💡 Assumptions
• ECS Fargate vCPU cost $0.05056/hr + Memory $0.00553/GB-hr
• DP task runs 24×7 (steady traffic)
• CP task starts automatically on updates or scale events, then stops after cooldown
• Aurora Serverless v2 = 1 ACU @ $0.20/hr (2 GB RAM equiv)
• Aurora Storage ≈ 20 GB @ $0.115/GB-month
• ALB base $0.025/hr + minimal LCU usage (~$1–2/month, effectively fixed)
• Route 53 + CloudWatch ≈ $3.5 flat monthly


Transaction Throughput (25 KB avg payload)

At 0.1 LCU/hr (≈ 0.1 GB/hr ≈ 28 KB/s):

Payload TPS Tx / Month (30 days)
25 KB (standard) ≈ 1.1 ≈ 2.9 M
10 KB (light) ≈ 2.8 ≈ 7.3 M
50 KB (heavy) ≈ 0.56 ≈ 1.4 M

Cost Efficiency (25 KB payload)

Scenario Monthly Cost Tx / Month Cost / 1 M Tx
A – Low load $49 2.9 M $17 / M tx
B – Medium load $76 4.4 M $17 / M tx
C – Heavy load $114 7.2 M $16 / M tx
D – Full load $191 14.4 M $13 / M tx

Observations

  • Aurora cost dominates during active periods ($0.20 / ACU-hr).
  • Auto-pause Aurora and event-driven CP start save ~70% vs always-on.
  • DP + ALB base floor ≈ $30/mo, keeping system always reachable.
  • Cost per 1M transactions improves at higher utilization (economy of scale).
  • Idle cost ≈ $25/mo (ALB + storage + DNS/metrics).

Cost vs Transaction Chart (Markdown Table)

Scenario Transaction Volume (M/month) Cost (USD)
A – UAT / Low Load 2.9 $49
B – Prod / Medium 4.4 $76
C – Heavy Load 7.2 $114
D – Full Load (24h/day) 14.4 $191

Scaling Behavior and Cost Growth Analysis

When transaction volume doubles (e.g., 14.4 M → 28.8 M / month), cost does not double linearly — but under sustained full-load, both Control Plane (CP) and Aurora are always active.

Component Behavior Cost Growth
ALB Fixed hourly + minor LCU change ~$20 → ~$22
DP Fargate (Base) Always-on task (1 vCPU + 2 GB) ~$10.7 → ~$21.4 (if scaled to 2 tasks)
CP Fargate Always-on under full load (0.25 vCPU + 0.5 GB) ~$10.4 → ~$20.8
Aurora Compute 1 ACU → 2 ACUs 24h/day ~$144 → ~$288
Aurora Storage Slight increase (logs, cache) ~$2.3 → ~$3
Route 53 + CW Fixed ~$3.5 → ~$3.5

Estimated Monthly Total:$390–410,
still below 2× linear cost ($382) because ALB, storage, and monitoring are fixed.

⚙️ Rule of Thumb: When both Aurora and CP are always active, cost scales at roughly +95–105% per 100% increase in traffic.
Main drivers: Aurora ACUs and CP task compute usage dominate total cost.


AWS Services Inventory for Kong Gateway ECS + Aurora Serverless Architecture

This document lists all AWS services used in the Kong Gateway ECS + Aurora Serverless v2 Architecture, annotated with (Optional) where the service is not strictly required for the current minimal, cost-optimized design.


1. Networking & Routing

Service Purpose
Amazon VPC Provides private network isolation for ECS tasks, Aurora DB, and service discovery.
Subnets (Public & Private) Public subnets host ALB; private subnets host ECS tasks and Aurora.
Route Tables Manage routing between public and private subnets.
Internet Gateway (IGW) Enables outbound internet access for ALB.
NAT Gateway (Optional) Needed only if ECS tasks in private subnets must access the internet directly.
AWS Route 53 Manages DNS records (e.g., kong-gw.example.com).
AWS Cloud Map (Namespace) Enables internal ECS service discovery (kong-cp.namespace.local).
Security Groups (SGs) Control inbound/outbound traffic between ALB, ECS tasks, and Aurora.
Network ACLs (Optional) Extra subnet-level security control (not required for current design).

2. Compute & Container Orchestration

Service Purpose
Amazon ECS (Fargate) Runs Control Plane (CP) and Data Plane (DP) containers serverlessly.
ECS Services Maintain desired task counts and apply scaling policies.
ECS Task Definitions Define containers, ports, environment variables, and IAM roles.
ECS Auto Scaling Scales Data Plane tasks based on CloudWatch metrics.
ECS Service Discovery (Cloud Map) Resolves Control Plane DNS internally for mTLS and Admin API access.

3. Database & Storage

Service Purpose
Amazon Aurora Serverless v2 (PostgreSQL) Primary config database for Kong Control Plane; auto-scales to 0 ACU.
Amazon RDS Proxy (Optional) Improves connection pooling if Control Plane activity increases.
Amazon S3 (Optional) Used only if ALB or CloudWatch logs need to be archived.

4. Load Balancing & Traffic Management

Service Purpose
Application Load Balancer (ALB) Single entrypoint for all traffic (kong-gw.example.com).
ALB Target Group Routes incoming HTTPS requests to ECS Data Plane tasks.
ALB Listener Rules Forwards all HTTPS traffic (port 443 → DP TG).
AWS Certificate Manager (ACM) Manages SSL/TLS certificates for *.example.com.

5. Security, Authentication & Compliance

Service Purpose
AWS Identity and Access Management (IAM) Provides Task Roles and Execution Roles for ECS tasks.
AWS Secrets Manager Stores Aurora credentials and API keys securely.
AWS WAF (Optional) Protects ALB from web exploits if exposed to public internet.
AWS Shield (Standard) Provides baseline DDoS protection (default for ALB).
API Key Authentication (Kong Plugin) Secures /kong-admin route for Control Plane proxy.
IP Restriction (Kong Plugin) Restricts /kong-admin route to trusted admin IPs.

6. Monitoring, Logging & Observability

Service Purpose
Amazon CloudWatch Logs Collects ECS logs for both CP and DP tasks.
Amazon CloudWatch Metrics Monitors ECS resource utilization and Aurora ACUs.
Amazon CloudWatch Alarms Triggers Control Plane start/stop or scaling events.
AWS CloudTrail (Optional) Audits API calls for compliance and governance.
Amazon S3 (Optional) Stores ALB or CloudWatch log archives if long-term retention is needed.

7. Automation & Scaling Coordination

Service Purpose
Amazon EventBridge (Optional) Triggers ECS service actions (e.g., start Control Plane on scale-up).
AWS Lambda (Optional) Executes automation logic for ECS lifecycle or scaling.
ECS Auto Scaling Policies Automatically scale Data Plane tasks based on traffic and metrics.

8. Developer & Operations Tools

Service Purpose
AWS Systems Manager (Parameter Store) Alternative lightweight config storage.
AWS Config (Optional) Detects infrastructure configuration drift.
AWS Budgets / Cost Explorer Monitors and forecasts service usage and costs.
AWS CodePipeline / CodeBuild (Optional) Automates ECS image builds and deployments.

✅ Core Required Stack

Category Essential Services
Networking VPC, Subnets, Route 53, Cloud Map, Security Groups
Compute ECS (Fargate), ECS Service, ECS Task Definition
Database Aurora Serverless v2 (PostgreSQL)
Traffic ALB, Target Group, ACM
Security IAM, Secrets Manager, API Key + IP Restriction
Monitoring CloudWatch (Logs, Metrics, Alarms)

All other services are optional extensions for automation, compliance, or scaling enhancements.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment