System Design Resiliency Cards
- Design for horizontal scaling to handle increased load.
- Implement load balancing to evenly distribute traffic across servers.
- Use sharding or partitioning to manage large datasets.
- Consider eventual consistency for systems that need to scale easily.
- Eventual consistency often requires consensus algorithms (e.g., Paxos, Raft).
- Explore sharding and data read/write separation as easier-to-implement alternatives.
- Use double entry, order insertion ledgers to ensure accurate tracking of transactions and changes.
- Ensure high availability with active-active, autoscaling, or predicted scaling.
- Use Caching, and Backoff Retries to avoid cascading
- Maintain multiple copies of critical data.
- Optimize response times with in mem caching (e.g., Redis, Memcached).
- Use indexing and efficient algorithms to enhance database query speed.
- Chill with the lambdas
- Incorporate fault tolerance mechanisms such as circuit breakers and backedpff retries
- Use dead letter queues to handle failed tasks with automatic retries.
- Implement redundancy to eliminate single points of failure.
- Use external session stores (e.g., Redis) to manage session data across servers.
- Move state to transactions wherever possible
- Cannot have too much
- Use message queues (e.g., Kafka, RabbitMQ) for decoupling services.
- Implement background jobs for tasks that do not require immediate user feedback.
- Avoid the Reporting Batch Hell if you can
- Use elastic scaling to optimize resource usage based on demand.
- PreLeased Instances often favor larger machines versus horizontal scaling cattle
- Optimize cloud resources to reduce costs without sacrificing performance.
- Use CDN for cost-effective content delivery and reduced server load push compute to the edge
- Implement authentication and authorization to secure access.
- Encrypt sensitive data in transit and at rest.
- Use rate limiting and throttling to protect against abuse and overuse.
- S: Scalability (horizontal scaling, load balancing).
- C: Consistency (eventual consistency, sharding, ledgers).
- A: Availability (redundancy, failover).
- L: Latency and performance (caching, indexing).
- E: Error resilience (circuit breakers, retries).
- S: State management (session stores, state machines).
- M: Monitoring and observability (logging, alerting).
- A: Asynchronous processing (queues, event-driven).
- R: Resource efficiency (elastic scaling, cost control).
- T: Trust and security (encryption, authentication).