Skip to content

Instantly share code, notes, and snippets.

@Barnyvic
Last active May 23, 2026 13:46
Show Gist options
  • Select an option

  • Save Barnyvic/a4dd29e86362c1f0b0aa2e9cd8f9ba86 to your computer and use it in GitHub Desktop.

Select an option

Save Barnyvic/a4dd29e86362c1f0b0aa2e9cd8f9ba86 to your computer and use it in GitHub Desktop.
Notification System Technical Specification

Notification System Technical Specification (RabbitMQ)

Overview

Design a highly scalable notification system for 1M+ users supporting push, SMS, and email with strict reliability guarantees:

  • No duplicate sends
  • No missed notifications
  • Graceful degradation when providers fail
  • Horizontal scalability

High-Level Architecture

                +----------------------+
                |  Client Applications |
                +----------+-----------+
                           |
                           v
                +----------------------+
                | Notification API     |
                | (Gateway Layer)      |
                +----------+-----------+
                           |
                           v
                +----------------------+
                | Notification Service |
                | Validation + Routing |
                +----------+-----------+
                           |
               Writes Notification Jobs
                           |
                           v
                +----------------------+
                | PostgreSQL           |
                | Notifications Table  |
                +----------+-----------+
                           |
                           v
                +----------------------+
                | Outbox Table         |
                +----------+-----------+
                           |
                    CDC / Poller
                           |
                           v
                +----------------------+
                |         RabbitMQ     |
                +----------+-----------+
                           |
        +------------------+------------------+
        |                  |                  |
        v                  v                  v
+---------------+  +---------------+  +---------------+
| Email Worker  |  | SMS Worker    |  | Push Worker   |
+-------+-------+  +-------+-------+  +-------+-------+
        |                  |                  |
        v                  v                  v
+---------------+  +---------------+  +---------------+
| Provider      |  | Provider      |  | Provider      |
| Adapters      |  | Adapters      |  | Adapters      |
+-------+-------+  +-------+-------+  +-------+-------+
        |                  |                  |
        v                  v                  v
  SendGrid/SES        Twilio/Termii      FCM/APNs


Core Design

End-to-End Flow Overview

When a user triggers a notification (email, SMS, or push), the system follows a strict flow to ensure reliability and no duplication:

  1. The API receives the request and validates it.
  2. The Notification Service creates a notification record in PostgreSQL.
  3. In the same database transaction, an outbox event is stored.
  4. A background publisher reads from the outbox table and publishes messages to RabbitMQ.
  5. RabbitMQ routes the message to the appropriate channel queue (email, SMS, or push).
  6. Channel workers consume the message and process it.
  7. Workers send the notification via external providers.
  8. The notification status is updated in PostgreSQL.
  9. If a failure occurs, the message goes through retry or fallback flow.

This flow ensures durability, traceability, and fault tolerance from request to delivery.


Idempotency

Each request carries a unique idempotency key:

idempotency_key = SHA256(user_id + template_id + event_id)

Enforced via a unique constraint in PostgreSQL to prevent duplicate processing across:

  • API retries
  • Queue redeliveries
  • Worker restarts

Transactional Outbox

Notification and outbox event are written in a single database transaction. A background publisher reliably pushes events into RabbitMQ.

This guarantees no message is lost between the database and queue layer.


Delivery Model

  • RabbitMQ provides at-least-once delivery
  • Workers are designed to be idempotent
  • Notification state is persisted in PostgreSQL for consistency

Database Design

notifications

  • id (UUID)
  • user_id (UUID)
  • channel (email | sms | push)
  • payload (JSONB)
  • status (PENDING | PROCESSING | SENT | FAILED | RETRYING | DLQ)
  • provider
  • provider_message_id
  • retry_count
  • idempotency_key (unique)
  • created_at, updated_at

Indexes:

  • (user_id, created_at)
  • (status)
  • UNIQUE (idempotency_key)

outbox_events

  • id
  • aggregate_id
  • event_type
  • payload
  • processed
  • created_at

RabbitMQ Design

Exchange

  • notifications.exchange (topic)

Queue Configuration

All queues are:

  • durable: true (survive broker restart)
  • messages persistent: true
  • use TTL for retry control
  • priority enabled for critical notifications (OTP > alerts > marketing)

Example queues:

  • notifications.email.queue
  • notifications.sms.queue
  • notifications.push.queue
  • notifications.retry.queue
  • notifications.dlq.queue

Publisher Confirms

To guarantee no message loss between publisher and RabbitMQ:

  • Publisher confirms are enabled
  • Every publish waits for broker ACK/NACK
  • On NACK or timeout, message is retried

This ensures messages are never silently dropped.


Routing Keys

  • email.send
  • sms.send
  • push.send
  • notification.retry

DLX Retry Flow

Dead Letter Exchange is used for retries:

Main Queue → Failure → DLX → Retry Queue → TTL expires → back to Main Queue

After max retries → DLQ for permanent failure handling.


Prefetch Tuning

Controls consumer throughput:

  • SMS/OTP: 5–10
  • Email: 10–50
  • Push: 50–100

Rule:

prefetch ≈ worker concurrency × 2

Worker Architecture

Workers are stateless and horizontally scalable.

Responsibilities:

  • Template rendering
  • Provider selection
  • Idempotency check
  • Retry handling
  • Status updates

Retry & Failover

Retry Strategy

  • exponential backoff (30s → 2m → 10m → 30m)
  • only retry transient failures
  • permanent failures go to DLQ

Provider Failover

If a provider fails:

Twilio → Termii
SendGrid → SES
FCM → APNs

Circuit breaker prevents cascading failures.


Graceful Degradation

If one channel fails:

Push → SMS → Email

Applied based on notification priority.


Caching

Redis is used for caching hot paths such as templates, rate limits, and deduplication locks.


Reliability & Scaling

Reliability is achieved through idempotency keys, transactional outbox, RabbitMQ durable queues, publisher confirms, DLX retries, and idempotent workers.

System scales horizontally across API, workers, and queue consumers per channel.


Technology Choice: RabbitMQ vs Kafka

Factor RabbitMQ Kafka
Use case fit Message routing (email vs SMS vs push) Event streaming
Ops complexity Lower Higher
Replay capability No Yes
Max throughput 20K msg/sec 100K+ msg/sec

For notifications: RabbitMQ wins due to flexible routing, simpler ops, and sufficient throughput.

Example Flow (OTP)

  1. API receives request
  2. Notification + outbox stored in DB
  3. Publisher sends event to RabbitMQ
  4. SMS worker consumes message
  5. Provider sends OTP
  6. Status updated in DB
  7. On failure → retry or fallback provider

Conclusion

This design provides a reliable, scalable notification system using RabbitMQ with:

  • Strong delivery guarantees
  • No duplicates via idempotency
  • No message loss via transactional outbox + publisher confirms
  • Controlled retries via DLX
  • Graceful degradation across channels
  • Horizontal scalability for 1M+ users
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment