"Go is about making software engineering more effective, not just making programmers more productive." - The Go Team
Target Go Version: Go 1.23+ (examples align with 1.23–1.24 features)
This guide synthesizes production wisdom from Google, Uber, and the broader Go community into actionable patterns for building maintainable, performant Go applications. It emphasizes explicit over implicit, composition over inheritance, and simplicity over cleverness.
Prerequisites: Basic Go syntax knowledge and familiarity with standard library concepts.
Topic | ✅ Do This | ❌ Never This | Why (What Goes Wrong) |
---|---|---|---|
Dependencies | Inject via constructors | Global variables | Testing impossible, hidden coupling, debugging nightmare |
Errors | errors.Is/As checking |
err == ErrFoo |
Wrapped errors break equality, silent failures in production |
Context | Pass as first parameter | Store in structs | Stale contexts, goroutine leaks, lifecycle chaos |
Concurrency | Use errgroup |
Bare sync.WaitGroup |
No error handling, no cancellation, silent failures |
Interfaces | Small, focused, consumer-defined | God interfaces | Impossible to test, violates ISP, tight coupling |
Logging | Structured with log/slog |
fmt.Printf debugging |
No searchability, impossible monitoring/alerting |
Performance | Pre-allocate with capacity | Growing slices | Multiple reallocations, memory churn, GC pressure |
Naming | Intention-revealing | Cryptic abbreviations | Code becomes write-only, maintenance nightmare |
- Dependencies are explicit and injectable
- Errors are handled, never ignored
- Concurrency uses errgroup, never bare sync.WaitGroup
- Context flows through parameters, never stored in structs
- Interfaces are small and defined where they're used
Quick Navigation:
- Foundations: §1-3 (Philosophy, Structure, Style)
- Core Patterns: §4-7 (Errors, Concurrency, Architecture, Observability)
- Advanced Topics: §8-11 (Performance, Generics, Patterns, Footguns)
- Reference: §12 (Anti-Patterns Checklist)
2.3. The Dependency Rule
2.5. Package Naming Rules
3.1. The Art of Go Naming
3.3. Context-Aware Naming
4.2. The Golden Rules
7.4. Log Level Strategy
10.7. Go 1.24+ Features Update
11.2. The Repository Pattern
12.2. Interface Nil Gotcha
12.3. Slice Header Gotchas
12.4. Map Concurrency Gotcha
12.5. Networking Footguns
12.7. Defer in Loops Gotcha
13.1. Go's Testing Philosophy
13.8. Testing Anti-Patterns
15. References
Go's apparent simplicity is actually sophisticated constraint. By limiting language features, Go reduces cognitive load and forces developers to write explicit, maintainable code.
Decision Rationale: These constraints aren't limitations—they're design enablers. When you can't hide complexity, you're forced to address it directly, leading to better architecture.
What Makes Go Different:
Aspect | Go's Approach | Why This Matters | What Goes Wrong Without It |
---|---|---|---|
Formatting | Single style (gofmt ) |
Eliminates bike-shedding, enables automatic formatting | Teams waste hours debating tabs vs spaces, inconsistent codebases become unreadable |
Error Handling | Explicit returns | No hidden control flow, all error paths visible | Exceptions hide failure paths, leading to unhandled edge cases and mysterious crashes |
Inheritance | Composition via interfaces | Prevents deep hierarchies, enables flexible design | Deep inheritance chains become unmaintainable, change in base class breaks everything |
Concurrency | Goroutines + channels | Makes concurrent programming accessible | Thread management becomes complex, deadlocks and race conditions proliferate |
Dependencies | Explicit imports | No hidden dependencies or global state | Hidden dependencies make testing impossible, circular imports, mysterious side effects |
Production Disaster #1: Hidden Global State
// BAD: Hidden global state - The debugging nightmare
var DB *sql.DB // Where does this come from? How do I test it?
func GetUser(id string) (*User, error) {
return DB.Query(...) // Cannot mock, cannot test, cannot trace failures
}
What Happened in Production: A major e-commerce platform spent 6 months debugging intermittent database errors. The root cause? A global DB
variable was being shared across goroutines without proper synchronization. During peak traffic, connection pool exhaustion caused random query failures that were impossible to trace because the dependency was hidden throughout the codebase.
Why This Is a Nightmare:
- Testing Impossible: Cannot inject mock database for unit tests
- Debugging Hell: No way to trace which code path caused the DB failure
- Hidden Coupling: Every package secretly depends on database initialization order
- Race Conditions: Global state shared across goroutines without synchronization
- Environment Issues: Cannot easily switch between dev/staging/prod databases
// What developers coming from Java/C# try to do:
func processOrder(orderID string) {
order := getOrder(orderID) // What if this fails?
payment := processPayment(order) // What if payment gateway is down?
shipment := createShipment(order, payment) // What if shipping service fails?
// Three points of failure, zero error handling
}
What Went Wrong: A payment processing service had 47 different points where external API calls could fail. Without explicit error handling, failures bubbled up as generic "internal server error" messages. Customer support couldn't troubleshoot issues because there was no way to determine if the problem was with payment gateway, inventory service, or shipping calculator.
With Explicit Dependencies & Error Handling:
// GOOD: Clear, testable, traceable design
type UserService struct {
db *sql.DB
logger *slog.Logger
}
func NewUserService(db *sql.DB, logger *slog.Logger) *UserService {
return &UserService{
db: db,
logger: logger,
}
}
func (s *UserService) GetUser(ctx context.Context, id string) (*User, error) {
user, err := s.db.QueryContext(ctx, "SELECT * FROM users WHERE id = $1", id)
if err != nil {
s.logger.ErrorContext(ctx, "database query failed",
"user_id", id,
"error", err)
return nil, fmt.Errorf("failed to get user %s: %w", id, err)
}
return user, nil // Explicit success path
}
Why This Works:
- Mockable: Can inject test database for unit tests
- Traceable: Clear error context with structured logging
- Explicit: All dependencies visible in constructor
- Contextual: Request tracing through context parameter
- Debuggable: Each error path provides specific failure information
✅ Perfect for Go:
- Network Services: HTTP/gRPC APIs, microservices
- CLI Tools: Single binary deployment, fast startup
- System Tools: Container orchestration, infrastructure tools
- Data Pipelines: Concurrent processing, stream handling
- Backend Services: API servers, background processors
❌ Consider Alternatives:
- GUI Applications: Limited ecosystem (use web tech for UI)
- Scientific Computing: NumPy/SciPy ecosystem in Python is mature
- Game Development: GC pauses affect frame rates (use C++/Rust)
- Embedded Systems: Runtime overhead may be prohibitive
Decision Rationale: Go optimizes for network-connected, concurrent, server-side applications. It's the infrastructure language of the cloud-native era.
golang-standards/project-layout
is a community proposal, not an official Go standard. The Go team advocates for starting simple.
Official Go Advice: Start with a flat structure and only add directories when complexity demands them.
What Goes Wrong with Premature Organization:
- Over-Engineering: Creating
cmd/
,internal/
,pkg/
for a 500-line project - Artificial Complexity: Forcing abstractions before understanding the domain
- Maintenance Burden: More directories mean more places to look for code
- Cognitive Overhead: Developers spend time navigating structure instead of solving problems
Simple Project (Start Here):
github.com/ashokallu/myservice/
├── main.go # Single binary? Keep it simple
├── user.go # Domain logic
├── handler.go # HTTP handlers
├── repository.go # Data access
├── go.mod
└── go.sum
When to Evolve to Complex Structure:
github.com/ashokallu/myservice/
├── cmd/ # Multiple binaries
│ ├── server/main.go # Main application server
│ └── migrator/main.go # Database migration tool
├── internal/ # Private application code
│ ├── user/ # Domain packages
│ │ ├── service.go
│ │ ├── repository.go
│ │ └── models.go
│ ├── order/
│ └── shared/ # Shared internal utilities
├── pkg/ # Public library code
│ └── client/ # Client SDK for this service
│ ├── client.go
│ └── models.go
├── api/ # API definitions
│ ├── proto/ # Protocol buffer definitions
│ └── openapi/ # OpenAPI specifications
├── go.mod
└── go.sum
When to Add Directories:
cmd/
- Only when you have multiple binaries. Usego run ./cmd/server
pattern.internal/
- When you need to prevent external importspkg/
- Only when other projects will import your code as a libraryapi/
- When you have API contracts (protobuf, OpenAPI)tools/
- For development tools with//go:build tools
pattern
Tools Directory Pattern:
// tools/tools.go
//go:build tools
package tools
import (
_ "github.com/google/wire/cmd/wire"
_ "github.com/golang/mock/mockgen"
_ "github.com/golangci/golangci-lint/cmd/golangci-lint"
)
Multi-Architecture Builds:
// internal/platform/cpu_amd64.go
//go:build amd64
package platform
const Arch = "amd64"
// internal/platform/cpu_arm64.go
//go:build arm64
package platform
const Arch = "arm64"
Decision Rationale: Premature organization is as harmful as premature optimization. Start simple, refactor when complexity demands structure.
✅ DO: Design packages around capabilities, not data structures
// GOOD: Package organized by capability
package user
type Service struct {
repo Repository
logger *slog.Logger
}
type Repository interface {
GetUser(ctx context.Context, id string) (*User, error)
SaveUser(ctx context.Context, user *User) error
}
func (s *Service) CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error) {
// Business logic here
}
❌ DON'T: Create god packages or organize by technical layers
// BAD: Layer-based organization creates tight coupling
package models // Everything in one package
type User struct {...}
type Order struct {...}
type UserService struct {...} // Business logic mixed with data
type OrderService struct {...}
// ... 50 more types creating circular dependencies
What Goes Wrong with God Packages:
- Circular Dependencies: User needs Order, Order needs User, import cycle hell
- Compilation Bottlenecks: Changes to any type force recompilation of everything
- Testing Nightmare: Cannot test individual components in isolation
- Merge Conflicts: Multiple developers editing the same massive file
- Cognitive Load: 2000-line files where finding anything requires scrolling
Production War Story: A fintech startup had a models
package with 127 types. Every change required 15+ minute rebuilds, developers spent more time resolving merge conflicts than writing features, and the codebase became so tightly coupled that simple changes broke unrelated functionality.
Critical Principle: Your package dependencies must form a Directed Acyclic Graph (DAG). Dependencies should always point inward, from outer layers to inner layers.
┌─────────────┐
│ handler │ ──────┐
└─────────────┘ │
▼
┌─────────────┐ ┌─────────────┐
│ transport │──▶│ service │
│ (HTTP/gRPC)│ │ (business) │
└─────────────┘ └─────────────┘
│
▼
┌──────────────┐
│ repository │
│ (interface) │
└──────────────┘
▲
│
┌──────────────────┐
│ repository/mysql │
│ (implementation) │
└──────────────────┘
Dependency Flow Rules:
handler
depends onservice
service
depends onrepository
(interface)repository/mysql
(implementation) depends onrepository
(interface)- The
service
package knows nothing about HTTP or MySQL - The
repository
interface knows nothing about specific implementations
Why This Works: This inversion of control, facilitated by interfaces, makes the architecture testable and flexible. You can swap MySQL for PostgreSQL without changing business logic.
What Goes Wrong with Circular Dependencies:
// DISASTER: Circular dependency hell
package user
import "myapp/order" // User needs Order
type User struct {
Orders []order.Order // User knows about orders
}
package order
import "myapp/user" // Order needs User
type Order struct {
Customer user.User // Order knows about user
}
// Result: go build fails with "import cycle not allowed"
How to Break Circular Dependencies:
// SOLUTION 1: Define interfaces in consuming package
package user
type OrderService interface {
GetOrdersForUser(ctx context.Context, userID string) ([]Order, error)
}
type Service struct {
orderService OrderService // Depends on interface, not concrete type
}
// SOLUTION 2: Create shared domain package
package domain
type User struct { ID string }
type Order struct { UserID string }
// Both user and order packages import domain, no cycle
Module Toolchain Directive: Ensure reproducible builds with explicit toolchain versions:
// go.mod
module github.com/ashokallu/myservice
go 1.23
toolchain go1.24.5 // Guarantees exact Go toolchain for reproducible builds
require (
golang.org/x/sync v0.7.0
)
Why Toolchain Matters: Different Go versions can produce different binaries. The toolchain
directive prevents subtle bugs from compiler differences between development and production environments.
Monorepo with Workspaces (Go 1.18+):
// go.work
go 1.23
use (
./service-a
./service-b
./shared
)
Run go mod tidy -go=1.23
to maintain version consistency across workspace modules.
Type | Convention | Example | Rationale | What Goes Wrong |
---|---|---|---|---|
Package | Short, lowercase, no underscores | user , http , time |
Clear, concise, no stuttering | userManagement creates verbose imports |
Interface | Behavior-focused, often ends in -er | Reader , UserRepository |
Describes what it does | IUser brings Java naming to Go |
Struct | Clear noun, avoid stuttering | User (not UserStruct ) |
Package context provides clarity | user.UserStruct is redundant |
Methods | Verb or verb phrase | GetUser , ProcessPayment |
Action-oriented | RetrieveUserDataFromDatabase is verbose |
Decision Rationale: Good names are documentation. They should reveal intent and make incorrect usage difficult.
What Goes Wrong with Poor Naming:
- Write-Only Code: Code becomes impossible to maintain
- Context Loss: Future developers (including you) can't understand intent
- Bug Introduction: Cryptic names lead to wrong assumptions
- Onboarding Nightmare: New team members can't contribute effectively
Element | Convention | Example | Anti-Pattern | What Goes Wrong |
---|---|---|---|---|
Package | Short, descriptive | user , payment |
userManagement , utils |
Verbose imports, unclear purpose |
Interface | Behavior + -er (if single method) | Reader , UserFinder |
IUser , UserInterface |
Java-style naming |
Struct | Clear noun | User , PaymentRequest |
UserStruct , UserData |
Redundant suffixes |
Method | Verb phrase | GetUser , ProcessPayment |
RetrieveUserData |
Overly verbose |
Variable | Concise in narrow scope | u for user in short function |
userData in 2-line scope |
Unnecessary verbosity |
Constant | Descriptive | DefaultTimeout , MaxRetries |
TIMEOUT_VALUE |
C-style naming |
Error Variable | Err prefix | ErrUserNotFound |
ErrorUserNotFound |
Non-standard pattern |
✅ DO: Use intention-revealing names with proper scope
// EXCELLENT: Clear intent and appropriate scope
func (s *PaymentService) ProcessTransaction(ctx context.Context, amount decimal.Decimal, cardToken string) (*Transaction, error) {
// Short names in narrow scope are perfectly fine
tx := &Transaction{
ID: generateID(),
Amount: amount,
Status: StatusPending,
}
// Descriptive names for important operations
validationResult, err := s.validatePaymentDetails(ctx, amount, cardToken)
if err != nil {
return nil, fmt.Errorf("payment validation failed: %w", err)
}
return tx, nil
}
// Helper function to generate unique IDs
func generateID() string {
return fmt.Sprintf("tx_%d", time.Now().UnixNano())
}
❌ DON'T: Use cryptic abbreviations or overly verbose names
// BAD: Cryptic and unclear
func (s *PaymentService) ProcTx(c context.Context, amt decimal.Decimal, ct string) (*Transaction, error) {
t := &Transaction{...} // What is 't'?
vr, e := s.valPmtDtls(c, amt, ct) // Impossible to understand
return t, e
}
// ALSO BAD: Overly verbose
func (s *PaymentService) ProcessPaymentTransactionWithCardTokenValidation(
applicationContext context.Context,
paymentAmountInDecimalFormat decimal.Decimal,
encryptedCardTokenString string) (*PaymentTransactionResult, error) {
// Too verbose for simple concepts
}
What Goes Wrong with Bad Naming:
// Production debugging nightmare:
func (s *Service) p(c context.Context, d []byte) error {
r, e := s.r.g(c, string(d)) // What does this do?
if e != nil {
return e // Which error? From where?
}
return s.pr(c, r) // What is 'pr'? What is 'r'?
}
Real Production Story: A team inherited code where functions were named p1
, p2
, p3
(for "process 1", "process 2", etc.). A critical bug required understanding the business logic, but with names like r := s.p2(d.f, d.s)
, debugging took 3 weeks instead of 3 hours.
✅ DO: Follow standard library conventions
// Error variables: Err prefix, clear message
var (
ErrUserNotFound = errors.New("user not found")
ErrInvalidEmail = errors.New("invalid email address")
ErrPaymentDeclined = errors.New("payment declined")
ErrInsufficientFunds = errors.New("insufficient funds")
ErrEmailRequired = errors.New("email is required")
ErrEmailInvalid = errors.New("email format is invalid")
ErrNameRequired = errors.New("name is required")
)
// Constants: Clear, descriptive
const (
DefaultTimeout = 30 * time.Second
MaxRetryAttempts = 3
APIVersion = "v1"
)
Why This Matters: Consistent error naming enables reliable error handling with errors.Is
and makes error types instantly recognizable in logs and debugging.
Note: Package-level sentinel errors (e.g., var ErrX = errors.New("x")
) are idiomatic and expected. What’s not acceptable are mutable global state and comparing errors with ==
instead of errors.Is
.
Go's explicit error handling is a feature, not a bug. It forces you to think about failure modes and makes error paths visible.
Decision Rationale: Hidden exceptions create invisible control flow. Explicit errors make failure modes obvious and force developers to handle them consciously.
✅ ALWAYS Use errors.Is
and errors.As
:
// MODERN: Works with wrapped errors
if errors.Is(err, ErrUserNotFound) {
return handleNotFound(ctx, userID)
}
var validationErr *ValidationError
if errors.As(err, &validationErr) {
return handleValidationError(ctx, validationErr)
}
❌ NEVER Use Equality Operators:
// LEGACY: Breaks with wrapped errors
if err == ErrUserNotFound { // ❌ Fragile
return handleNotFound(ctx, userID)
}
if err.Error() == "user not found" { // ❌ Extremely fragile
return handleNotFound(ctx, userID)
}
What Goes Wrong with Error Equality:
// The Production Disaster Scenario
func (s *UserService) GetUser(ctx context.Context, id string) (*User, error) {
user, err := s.repo.GetUser(ctx, id)
if err != nil {
// This wrapping breaks == comparison
return nil, fmt.Errorf("failed to get user %s: %w", id, err)
}
return user, nil
}
func (h *HTTPHandler) GetUser(w http.ResponseWriter, r *http.Request) {
user, err := h.service.GetUser(r.Context(), userID)
if err == ErrUserNotFound { // ❌ This will NEVER be true now!
http.Error(w, "User not found", http.StatusNotFound)
return
}
// User not found becomes 500 Internal Server Error instead of 404
http.Error(w, "Internal error", http.StatusInternalServerError)
}
Production War Story: An e-commerce API was returning 500 errors for "product not found" instead of 404s. The issue was error wrapping in the repository layer broke ==
comparisons in HTTP handlers. This caused search engines to stop indexing their product pages because they were seen as server errors instead of missing products.
Why errors.Is
Works:
- Unwraps Error Chains: Traverses the entire error chain looking for matches
- Handles Wrapping: Works even when errors are wrapped multiple times
- Semantic Equality: Compares error identity, not string representation
✅ DO: Provide rich context while preserving error chains
func (s *UserService) CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error) {
logger := s.logger.With("operation", "create_user", "email", req.Email)
if err := s.validateUser(req); err != nil {
logger.WarnContext(ctx, "user validation failed", "error", err)
return nil, fmt.Errorf("CreateUser: validation failed for %s: %w", req.Email, err)
}
user := &User{
ID: generateID(),
Email: req.Email,
CreatedAt: time.Now(),
}
if err := s.repo.SaveUser(ctx, user); err != nil {
logger.ErrorContext(ctx, "failed to save user", "error", err, "user_id", user.ID)
return nil, fmt.Errorf("CreateUser: failed to save user %s: %w", user.ID, err)
}
logger.InfoContext(ctx, "user created successfully", "user_id", user.ID)
return user, nil
}
Why This Pattern Is Critical:
- Debugging Speed: Full error context enables fast problem resolution
- Error Chain Preservation:
%w
verb preserves the original error forerrors.Is/As
- Operation Context: Clear operation names help identify where failures occur
- Structured Logging: Consistent logging enables automated alerting
✅ DO: Create structured error types when you need rich error information
// ServiceError provides structured error context
type ServiceError struct {
Code string `json:"code"`
Message string `json:"message"`
Service string `json:"service"`
Operation string `json:"operation"`
RequestID string `json:"request_id,omitempty"`
Details map[string]interface{} `json:"details,omitempty"`
Cause error `json:"-"` // Don't serialize wrapped error
}
func (e *ServiceError) Error() string {
if e.Cause != nil {
return fmt.Sprintf("%s.%s failed: %s (caused by: %v)",
e.Service, e.Operation, e.Message, e.Cause)
}
return fmt.Sprintf("%s.%s failed: %s", e.Service, e.Operation, e.Message)
}
func (e *ServiceError) Unwrap() error {
return e.Cause
}
// Builder pattern for cleaner error creation
func NewServiceError(code, service, operation, message string) *ServiceError {
return &ServiceError{
Code: code,
Service: service,
Operation: operation,
Message: message,
Details: make(map[string]interface{}),
}
}
func (e *ServiceError) WithCause(err error) *ServiceError {
e.Cause = err
return e
}
func (e *ServiceError) WithRequestID(id string) *ServiceError {
e.RequestID = id
return e
}
func (e *ServiceError) WithDetail(key string, value interface{}) *ServiceError {
e.Details[key] = value
return e
}
When to Use Custom Error Types:
- API Responses: When errors need to be serialized to clients
- Retry Logic: When different error types require different retry strategies
- Monitoring: When you need to categorize errors for metrics
- Debugging: When you need rich context for production troubleshooting
✅ DO: Use errors.Join
for combining multiple errors
import (
"errors"
"fmt"
)
// Define sentinel errors for specific validation failures
var (
ErrEmailRequired = errors.New("email is required")
ErrEmailInvalid = errors.New("email format is invalid")
ErrNameRequired = errors.New("name is required")
)
func isValidEmail(email string) bool {
return strings.Contains(email, "@") // Simplified validation
}
func (s *UserService) ValidateAndCreate(ctx context.Context, req *CreateUserRequest) (*User, error) {
var errs []error
// Collect all validation errors using sentinel values
if req.Email == "" {
errs = append(errs, ErrEmailRequired)
} else if !isValidEmail(req.Email) {
errs = append(errs, ErrEmailInvalid)
}
if req.Name == "" {
errs = append(errs, ErrNameRequired)
}
// Join all errors if any exist
if len(errs) > 0 {
return nil, errors.Join(errs...)
}
return s.createUser(ctx, req)
}
// Handle multi-errors with proper unwrapping
func handleValidationErrors(err error) {
// Check for specific errors using errors.Is (works with Join chains)
if errors.Is(err, ErrEmailRequired) {
fmt.Println("Email is required for user creation")
}
if errors.Is(err, ErrEmailInvalid) {
fmt.Println("Email format is invalid")
}
// Correctly traverse joined errors (Unwrap() []error for Join)
type unwrapper interface{ Unwrap() []error }
var stack []error
stack = append(stack, err)
for len(stack) > 0 {
// pop
e := stack[len(stack)-1]
stack = stack[:len(stack)-1]
fmt.Printf("Validation error: %v\n", e)
if u, ok := e.(unwrapper); ok {
stack = append(stack, u.Unwrap()...)
} else if ue := errors.Unwrap(e); ue != nil {
stack = append(stack, ue)
}
}
}
Why Multi-Error Handling Matters:
- Better User Experience: Show all validation issues at once, not one at a time
- API Design: RESTful APIs can return all field errors in a single response
- Batch Processing: Collect all failures when processing multiple items
Note: errors.Join
returns an error whose Unwrap()
method yields []error
. Do not use a simple errors.Unwrap
loop; rely on errors.Is/As
or explicitly walk the unwrapped slice as shown.
❌ Critical Mistakes to Avoid:
Anti-Pattern | Why It's Wrong | Production Impact | What Goes Wrong |
---|---|---|---|
if err != nil { panic(err) } |
Crashes entire service | Service downtime, cascading failures | Customer-facing 500 errors, data loss |
_, err := fn(); err != nil { return } |
Ignores errors silently | Silent failures, data corruption | Money transfers fail silently, user data lost |
err.Error() == "some string" |
Breaks with wrapped errors | Missed error conditions | Wrong HTTP status codes, broken retry logic |
Global mutable variables | Hidden coupling, test flakiness | Difficult to test, race conditions | Inject via constructors |
log.Printf("Error: %v", err) |
Unstructured, no context | Difficult to debug, no alerting | Cannot search logs, no automated incident response |
Production Horror Story: A payment service used panic(err)
for database connection failures. During a brief database outage, the entire payment system crashed instead of gracefully degrading. This caused a 2-hour service outage during Black Friday, resulting in $2M in lost revenue.
Decision Rationale: sync.WaitGroup
is too primitive for most production systems. errgroup
provides error handling, context cancellation, and resource management that real applications need. However, sync.WaitGroup
is acceptable for fire-and-forget operations where errors can be safely ignored.
Feature | sync.WaitGroup | errgroup | What Goes Wrong with WaitGroup |
---|---|---|---|
Error Handling | Manual collection, race conditions | Automatic error propagation | Errors lost, silent failures, no way to stop other goroutines when one fails |
Cancellation | Manual context checking | Built-in context cancellation | Goroutines continue running even after parent context is cancelled |
Resource Limits | No built-in limiting | SetLimit() prevents resource exhaustion |
Unbounded goroutine creation leads to OOM crashes |
First Error | Must collect all errors | First error cancels all operations | Cannot implement fail-fast behavior, waste resources on doomed operations |
API Cleanliness | Manual Add/Done counting | Clean function-based API | Easy to forget Done() calls, leading to deadlocks |
Best For | Fire-and-forget fan-out | Operations needing error handling | Use for logging, metrics collection (where failures don't matter) |
Production Disaster with sync.WaitGroup:
// DANGEROUS: Real production bug that caused service outage
func processUserBatch(userIDs []string) error {
var wg sync.WaitGroup
var errors []error
var mu sync.Mutex
for _, userID := range userIDs {
wg.Add(1)
go func(id string) {
defer wg.Done()
if err := processUser(id); err != nil {
mu.Lock()
errors = append(errors, err) // Race condition potential
mu.Unlock()
}
}(userID)
}
wg.Wait()
if len(errors) > 0 {
return fmt.Errorf("processing failed: %v", errors[0])
}
return nil
}
What Went Wrong:
- No Resource Limits: With 10,000 user IDs, created 10,000 goroutines simultaneously
- Memory Exhaustion: Server ran out of memory and crashed
- No Cancellation: Even when first user processing failed, all other goroutines continued
- Error Collection Race: Multiple goroutines modifying errors slice despite mutex
- No Context: Impossible to implement timeouts or cancellation
✅ DO: Use errgroup for all concurrent operations
import "golang.org/x/sync/errgroup"
func (s *UserService) ProcessUserBatch(ctx context.Context, userIDs []string) error {
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(10) // ALWAYS set limits to prevent resource exhaustion
for _, userID := range userIDs {
userID := userID // Capture loop variable
g.Go(func() error {
return s.processUser(ctx, userID)
})
}
return g.Wait() // Returns first error, cancels all other operations
}
Why This Works:
- Resource Control:
SetLimit(10)
prevents creating thousands of goroutines - Fail Fast: First error cancels context, stopping other operations
- Error Propagation: First error is returned to caller
- Context Integration: Respects parent context cancellation
- Clean API: No manual Add/Done counting
flowchart LR
A["Start parent context"]:::context
B["errgroup.WithContext"]:::context
C{"Spawn goroutines"}:::decision
D["Go fn (task 1)"]:::task
E["Go fn (task 2)"]:::task
F["Go fn (task 3)"]:::task
G{"Error occurred?"}:::decision
H{"Error occurred?"}:::decision
I{"Error occurred?"}:::decision
J["Return first error"]:::error
K["Cancel context"]:::context
L["Stop remaining work"]:::completion
M["All tasks completed"]:::completion
N["g.Wait returns nil"]:::success
A --> B
B --> C
C -->|task 1| D
C -->|task 2| E
C -->|task 3| F
D --> G
E --> H
F --> I
G -->|Yes| J
H -->|Yes| J
I -->|Yes| J
J --> K
K --> L
G -->|No| M
H -->|No| M
I -->|No| M
M --> N
classDef context fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#0d47a1;
classDef decision fill:#e0f7fa,stroke:#00838f,stroke-width:2px,color:#006064;
classDef task fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c;
classDef error fill:#ffebee,stroke:#c62828,stroke-width:2px,color:#b71c1c;
classDef completion fill:#fff8e1,stroke:#ffa000,stroke-width:2px,color:#e65100;
classDef success fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#1b5e20;
🚨 THE BIGGEST GOTCHA: Panics in errgroup goroutines are NOT caught and will crash your entire service!
Decision Rationale: Go's panic/recover mechanism doesn't cross goroutine boundaries. A panic in one goroutine crashes the entire program unless explicitly handled.
❌ DANGEROUS:
func processItems(ctx context.Context, items []Item) error {
g, ctx := errgroup.WithContext(ctx)
for _, item := range items {
item := item
g.Go(func() error {
// If this panics, your entire service crashes!
result := item.Data["key"].(string) // Potential panic
return processResult(ctx, result)
})
}
return g.Wait() // Panic bubbles up and crashes the program
}
Production War Story: A high-traffic API service had a panic in one goroutine due to a malformed JSON response from an external service. This single panic crashed the entire service, causing a 15-minute outage affecting millions of users.
✅ PRODUCTION SOLUTION: Panic-safe errgroup wrapper
import (
"context"
"fmt"
"runtime/debug"
"golang.org/x/sync/errgroup"
"log/slog"
)
// SafeGroup wraps errgroup.Group with panic recovery
type SafeGroup struct {
group *errgroup.Group
logger *slog.Logger
}
func NewSafeGroup(ctx context.Context, logger *slog.Logger) (*SafeGroup, context.Context) {
g, ctx := errgroup.WithContext(ctx)
return &SafeGroup{
group: g,
logger: logger,
}, ctx
}
func (sg *SafeGroup) Go(ctx context.Context, fn func() error) {
sg.group.Go(func() (err error) {
defer func() {
if r := recover(); r != nil {
stack := debug.Stack()
// Convert panic to error
panicErr := fmt.Errorf("goroutine panic: %v", r)
sg.logger.ErrorContext(ctx, "goroutine panic recovered",
"panic", r,
"stack_trace", string(stack))
err = panicErr
}
}()
return fn()
})
}
func (sg *SafeGroup) SetLimit(n int) {
sg.group.SetLimit(n)
}
func (sg *SafeGroup) Wait() (err error) {
// CRITICAL: Handle panics during Wait() itself
defer func() {
if r := recover(); r != nil {
sg.logger.Error("panic during SafeGroup.Wait()",
"panic", r,
"stack_trace", string(debug.Stack()))
err = fmt.Errorf("wait panic: %v", r)
}
}()
return sg.group.Wait()
}
// Usage: Production-safe concurrent processing
func (s *PaymentService) ProcessBatchPayments(ctx context.Context, paymentIDs []string) error {
logger := s.logger.With("operation", "process_batch_payments")
g, ctx := NewSafeGroup(ctx, logger)
g.SetLimit(5) // Limit concurrent operations
for _, paymentID := range paymentIDs {
paymentID := paymentID
g.Go(ctx, func() error {
// Now safe from panics that could crash the service
return s.processPayment(ctx, paymentID)
})
}
return g.Wait()
}
✅ DO: Use channels for elegant coordination patterns
Input Channel ──┬─► Worker 1 ──┐
├─► Worker 2 ──┼─► Collector ──► Output Channel
└─► Worker 3 ──┘
Simple Example: Here's a basic illustration before diving into the full implementation:
// Simple fan-out to 2 workers, fan-in to a single result channel
func simpleFanOutFanIn(ctx context.Context, input <-chan int) <-chan int {
// Step 1: Fan-out - distribute work
worker1Chan := make(chan int)
worker2Chan := make(chan int)
go func() {
defer close(worker1Chan)
defer close(worker2Chan)
for item := range input {
// Distribute items alternately
select {
case worker1Chan <- item:
case worker2Chan <- item:
case <-ctx.Done():
return
}
}
}()
// Step 2: Worker processing (simple operation)
processWorker := func(in <-chan int, out chan<- int) {
for item := range in {
select {
case out <- item * 2: // Simple processing
case <-ctx.Done():
return
}
}
}
worker1Out := make(chan int)
worker2Out := make(chan int)
go processWorker(worker1Chan, worker1Out)
go processWorker(worker2Chan, worker2Out)
// Step 3: Fan-in - collect results
output := make(chan int)
go func() {
defer close(output)
for {
select {
case res1, ok1 := <-worker1Out:
if ok1 {
output <- res1
} else {
worker1Out = nil // Channel closed, stop selecting from it
}
case res2, ok2 := <-worker2Out:
if ok2 {
output <- res2
} else {
worker2Out = nil
}
case <-ctx.Done():
return
}
// Exit loop if both worker channels are closed
if worker1Out == nil && worker2Out == nil {
break
}
}
}()
return output
}
Full Example with Error Handling & SafeGroup: This is a more robust implementation suitable for production.
// Fan-out/Fan-in pattern with proper error handling
func (s *DataService) processDataPipeline(ctx context.Context, input <-chan DataItem) <-chan Result {
const numWorkers = 3
// Create worker channels
workChan := make(chan DataItem, numWorkers*2) // Buffered for better throughput
resultChan := make(chan Result, numWorkers)
// Start workers using SafeGroup
g, ctx := NewSafeGroup(ctx, s.logger)
// Fan-out: Start workers to process items
for i := 0; i < numWorkers; i++ {
workerID := i
g.Go(ctx, func() error {
for {
select {
case item, ok := <-workChan:
if !ok {
return nil // Work channel closed
}
result, err := s.processDataItem(ctx, item)
if err != nil {
return fmt.Errorf("worker %d failed: %w", workerID, err)
}
select {
case resultChan <- result:
case <-ctx.Done():
return ctx.Err()
}
case <-ctx.Done():
return ctx.Err()
}
}
})
}
// Fan-in: Collect and forward results
output := make(chan Result)
go func() {
defer close(output)
// Forward input to workers
go func() {
defer close(workChan)
for {
select {
case item, ok := <-input:
if !ok {
return // Input closed
}
select {
case workChan <- item:
case <-ctx.Done():
return
}
case <-ctx.Done():
return
}
}
}()
// Wait for workers and collect results
go func() {
defer close(resultChan)
_ = g.Wait() // Workers complete
}()
// Forward results to output
for result := range resultChan {
select {
case output <- result:
case <-ctx.Done():
return
}
}
}()
return output
}
Why This Pattern Works:
- Backpressure: Buffered channels prevent goroutine blocking
- Context Integration: Proper cancellation throughout the pipeline
- Error Propagation: SafeGroup handles worker failures correctly
- Resource Cleanup: Channels are properly closed to prevent goroutine leaks
Decision Rationale: Context enables request tracing, cancellation, and timeout control across service boundaries. Without proper context propagation, debugging distributed systems becomes impossible.
✅ ALWAYS: Use unexported struct pointers as context keys
// CORRECT: Unique memory addresses prevent collisions
var (
requestIDKey = &struct{}{}
correlationIDKey = &struct{}{}
userIDKey = &struct{}{}
)
func WithRequestID(parent context.Context, id string) context.Context {
return context.WithValue(parent, requestIDKey, id)
}
func GetRequestID(ctx context.Context) string {
if id, ok := ctx.Value(requestIDKey).(string); ok && id != "" {
return id
}
return "unknown-request"
}
❌ NEVER: Use string types as context keys
// WRONG: String keys can collide across packages
const userIDKey = "user_id" // ❌ Risk of collision
func StoreUserID(ctx context.Context, userID string) context.Context {
return context.WithValue(ctx, userIDKey, userID) // ❌ Fragile
}
What Goes Wrong with String Keys:
// Package A
const requestIDKey = "request_id"
func SetRequestID(ctx context.Context, id string) context.Context {
return context.WithValue(ctx, requestIDKey, id)
}
// Package B (different team, unaware of Package A)
const requestIDKey = "request_id" // Same string!
func SetRequestData(ctx context.Context, data string) context.Context {
return context.WithValue(ctx, requestIDKey, data) // Overwrites Package A's value!
}
// Result: Silent data corruption, impossible to debug
Production Disaster: A microservice had two teams using "user_id"
as context keys. Team A stored the authenticated user ID, Team B stored the target user ID for admin operations. The values overwrote each other, causing admin operations to be performed on the wrong users, including deleting accounts belonging to other customers.
flowchart TB
subgraph Root["Root Context"]
R["context.Background()"]:::background
end
R --> A["WithTimeout 2m (batchCtx)"]:::timeout
A --> B["WithValue operation=process_user_batch"]:::value
A --> C["WithCancel (worker 1)"]:::cancel
A --> D["WithCancel (worker 2)"]:::cancel
A --> E["WithCancel (worker 3)"]:::cancel
classDef background fill:#e0f2f1,stroke:#00897b,stroke-width:2px,color:#00695c;
classDef timeout fill:#fff8e1,stroke:#ffa000,stroke-width:2px,color:#e65100;
classDef value fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#0d47a1;
classDef cancel fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c;
❌ ANTI-PATTERN #1: Storing context in struct fields
// WRONG: Context stored in struct
type Service struct {
ctx context.Context // ❌ NEVER DO THIS
logger *slog.Logger
}
func (s *Service) ProcessData(data string) error {
return s.doWork(s.ctx, data) // ❌ Using potentially stale/cancelled context
}
Why This Is Catastrophically Wrong:
1. Lifecycle Mismatch Nightmare:
// The disaster scenario
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
service := &Service{ctx: ctx} // Context with 5 second timeout
cancel() // Context is now cancelled
// 10 minutes later...
service.ProcessData("important data") // Uses CANCELLED context!
// All operations fail immediately with "context canceled"
}
2. Stale Context Hell:
func (h *HTTPHandler) CreateService(w http.ResponseWriter, r *http.Request) {
// Request-scoped context
service := &Service{ctx: r.Context()}
// Store service globally (common pattern)
GlobalServices["user"] = service
// Request ends, context is cancelled
}
// Later, different request:
func (h *HTTPHandler) ProcessUser(w http.ResponseWriter, r *http.Request) {
service := GlobalServices["user"] // Service has STALE context
service.ProcessData("user data") // Fails with cancelled context!
}
3. Race Conditions:
type Service struct {
ctx context.Context // Shared across goroutines
}
func (s *Service) HandleRequests() {
for i := 0; i < 100; i++ {
go func() {
// Multiple goroutines reading/writing s.ctx
s.ctx = context.WithValue(s.ctx, "key", "value") // Race condition!
}()
}
}
4. Testing Nightmare:
func TestService_ProcessData(t *testing.T) {
service := &Service{ctx: context.Background()}
// Cannot inject different contexts for different test cases
// Cannot test timeout behavior
// Cannot test cancellation behavior
// Cannot test with different request IDs
}
Production War Story: A background job service stored context in a struct field. The context came from an HTTP request with a 30-second timeout. When the HTTP request finished, the context was cancelled, but the background jobs kept running for hours using the cancelled context. All database operations failed with "context canceled" errors, but the jobs didn't realize they were failing and kept retrying forever, creating an infinite loop of database connection attempts.
✅ CORRECT: Context flows through method parameters
// RIGHT: Context passed as parameter
type Service struct {
logger *slog.Logger // ✅ No context stored
}
func (s *Service) ProcessData(ctx context.Context, data string) error {
return s.doWork(ctx, data) // ✅ Fresh context for each operation
}
Why This Works:
- Lifecycle Control: Each operation gets appropriate context
- Testability: Can inject different contexts per test
- Cancellation: Operations can be cancelled independently
- Tracing: Request-specific tracing information flows correctly
Understanding Context Derivation: Every derived context shares certain properties with its parent while having its own lifecycle. This is fundamental to Go's context system.
What Derived Contexts Share with Parent:
- Values: All key-value pairs from parent context
- Deadline: The most restrictive deadline (parent or child)
- Done Channel: Parent cancellation cascades to children
- Error: Cancellation reason propagates
What Derived Contexts Don't Share:
- Cancellation Functions: Each has its own cancel function
- Lifecycle: Child can be cancelled without affecting parent
- Memory: Child context doesn't hold parent alive unnecessarily
✅ DO: Master context derivation patterns
// Context derivation hierarchy
func (s *UserService) ProcessUserBatch(ctx context.Context, userIDs []string) error {
// 1. Operation-level context with timeout
batchCtx, batchCancel := context.WithTimeout(ctx, 2*time.Minute)
defer batchCancel()
// Add operation-specific values
batchCtx = context.WithValue(batchCtx, operationKey, "process_user_batch")
batchCtx = context.WithValue(batchCtx, batchSizeKey, len(userIDs))
logger := s.logger.With(
"operation", "process_user_batch",
"batch_size", len(userIDs),
"correlation_id", getCorrelationID(batchCtx),
)
logger.InfoContext(batchCtx, "batch processing started")
// 2. Individual user processing contexts
g, groupCtx := NewSafeGroup(batchCtx, logger)
g.SetLimit(10)
for _, userID := range userIDs {
userID := userID
g.Go(groupCtx, func() error {
// 3. Per-user context with individual timeout
userCtx, userCancel := context.WithTimeout(groupCtx, 30*time.Second)
defer userCancel()
// Add user-specific values
userCtx = context.WithValue(userCtx, userIDKey, userID)
return s.processUserWithContext(userCtx, userID)
})
}
if err := g.Wait(); err != nil {
logger.ErrorContext(batchCtx, "batch processing failed", "error", err)
return fmt.Errorf("batch processing failed: %w", err)
}
logger.InfoContext(batchCtx, "batch processing completed successfully")
return nil
}
// Helper functions for context values
var (
operationKey = &struct{}{}
batchSizeKey = &struct{}{}
userIDKey = &struct{}{}
)
func getCorrelationID(ctx context.Context) string {
if id, ok := ctx.Value(correlationIDKey).(string); ok {
return id
}
return "unknown-correlation"
}
❌ FOOTGUN #1: Storing large objects in context values
// WRONG: Large objects in context cause memory retention
func processUser(ctx context.Context, userData []byte) error {
// ❌ This keeps userData alive for the entire context tree lifetime
ctx = context.WithValue(ctx, userDataKey, userData)
return longRunningOperation(ctx)
}
What Goes Wrong: Context values are copied to all derived contexts. Large objects (slices, structs) can cause memory leaks if the context tree is long-lived.
✅ The Fix: Pass data as function parameters
// GOOD: Pass data explicitly
func processUser(ctx context.Context, userData []byte) error {
return longRunningOperation(ctx, userData) // ✅ Explicit parameter
}
❌ FOOTGUN #2: Not cancelling derived contexts
// WRONG: Context leak - derived context never cancelled
func performOperation(ctx context.Context) error {
opCtx, _ := context.WithTimeout(ctx, time.Minute) // ❌ Cancel function ignored
return doWork(opCtx) // Context may leak resources
}
✅ The Fix: Always defer cancel
// GOOD: Proper context cleanup
func performOperation(ctx context.Context) error {
opCtx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel() // ✅ Always clean up
return doWork(opCtx)
}
✅ Context Do's:
- Always pass context as first parameter
- Use context.WithTimeout for operations with time limits
- Derive contexts for adding values or deadlines
- Always defer cancel() when creating cancellable contexts
- Use context for cancellation, deadlines, and request-scoped values
❌ Context Don'ts:
- Never store context in struct fields
- Don't use context for passing optional parameters
- Don't put large objects in context values
- Don't ignore context cancellation in loops
- Don't create contexts without cancel functions
Go's interfaces enable true dependency inversion without complex frameworks. The key principle: define interfaces where they're used, not where they're implemented.
Decision Rationale: Small, focused interfaces enable testing, flexibility, and clear API boundaries. They're the foundation of maintainable systems.
flowchart LR
subgraph Web[HTTP Layer]
H[Handler]
end
subgraph Biz[Domain/Service]
S[UserService]
end
subgraph Ports[Interfaces]
I1[(UserRepository)]
I2[(EmailSender)]
end
subgraph Adapters[Adapters]
A1[PostgresRepo]
A2[SendgridSender]
end
H --> S
S --> I1
S --> I2
I1 -.implemented by.-> A1
I2 -.implemented by.-> A2
classDef layer fill:#f5f5f5,stroke:#777;
class Web,Biz,Ports,Adapters layer;
What Goes Wrong Without Interface-Driven Design:
// DISASTER: Tight coupling nightmare
package handler
import "myapp/mysql" // Direct dependency on MySQL
type HTTPHandler struct {
db *mysql.Database // Tightly coupled to MySQL
}
func (h *HTTPHandler) GetUser(w http.ResponseWriter, r *http.Request) {
// Direct MySQL calls in HTTP handler
user := h.db.QueryUser("SELECT * FROM users WHERE id = ?", userID)
json.Encode(w, user)
}
Production Nightmare: A team needed to migrate from MySQL to PostgreSQL. Because HTTP handlers directly imported MySQL packages, they had to:
- Change 47 import statements across 23 files
- Rewrite all SQL queries (MySQL → PostgreSQL syntax differences)
- Update all database connection handling
- Rewrite all unit tests to use PostgreSQL test containers
- Deal with compilation errors for 2 weeks while migration was in progress
The Fix: Interface-driven design
// SOLUTION: Interface-driven design
package handler
// Small, focused interfaces defined WHERE THEY'RE USED
type UserService interface {
GetUser(ctx context.Context, id string) (*User, error)
CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error)
}
type HTTPHandler struct {
userService UserService // Depends on interface, not concrete type
logger *slog.Logger
}
func (h *HTTPHandler) GetUser(w http.ResponseWriter, r *http.Request) {
user, err := h.userService.GetUser(r.Context(), userID)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
json.NewEncoder(w).Encode(user)
}
Result: Database migration required zero changes to HTTP handlers. Swapped implementation at startup, everything worked perfectly.
✅ DO: Define small, focused interfaces in consuming packages
// Define interfaces where they're used
package handler
// Small, focused interfaces
type UserService interface {
GetUser(ctx context.Context, id string) (*User, error)
CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error)
}
type EmailSender interface {
SendEmail(ctx context.Context, to, subject, body string) error
}
// Handler depends on interfaces, not concrete types
type HTTPHandler struct {
userService UserService
emailSender EmailSender
logger *slog.Logger
}
func NewHTTPHandler(userService UserService, emailSender EmailSender, logger *slog.Logger) *HTTPHandler {
return &HTTPHandler{
userService: userService,
emailSender: emailSender,
logger: logger,
}
}
Why This Pattern Is Critical:
- Testing Simplicity: Easy to mock individual interfaces
- Clear Dependencies: You can see exactly what each component needs
- Interface Segregation: Small interfaces are easier to implement
- Flexible Implementation: Swap implementations without changing business logic
What Goes Wrong with God Interfaces:
// BAD: God interface violates Interface Segregation Principle
type UserRepository interface {
// User operations
GetUser(ctx context.Context, id string) (*User, error)
CreateUser(ctx context.Context, user *User) error
UpdateUser(ctx context.Context, user *User) error
DeleteUser(ctx context.Context, id string) error
ListUsers(ctx context.Context, filter UserFilter) ([]*User, error)
// Order operations (why is this here?)
GetOrdersForUser(ctx context.Context, userID string) ([]*Order, error)
CreateOrder(ctx context.Context, order *Order) error
// Email operations (this is getting ridiculous)
SendWelcomeEmail(ctx context.Context, userID string) error
SendPasswordResetEmail(ctx context.Context, email string) error
// Analytics (seriously?)
TrackUserEvent(ctx context.Context, userID, event string) error
GetUserMetrics(ctx context.Context, userID string) (*UserMetrics, error)
}
Production Consequences:
- Impossible to Test: Must mock 12 methods for simple user creation test
- Violates SRP: One interface doing four different jobs
- Hard to Implement: New implementations must implement ALL methods
- Tight Coupling: Changes to email functionality force user repository changes
✅ DO: Use explicit constructor injection for all dependencies
// UserService with explicit dependencies
type UserService struct {
repo UserRepository
validator Validator
logger *slog.Logger
clock Clock // Even time is injectable for testing
}
// Clock interface makes time testable
type Clock interface {
Now() time.Time
}
type RealClock struct{}
func (RealClock) Now() time.Time { return time.Now() }
func NewUserService(repo UserRepository, validator Validator, logger *slog.Logger, clock Clock) *UserService {
return &UserService{
repo: repo,
validator: validator,
logger: logger,
clock: clock,
}
}
func (s *UserService) CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error) {
if err := s.validator.ValidateCreateUserRequest(req); err != nil {
return nil, fmt.Errorf("validation failed: %w", err)
}
user := &User{
ID: generateID(),
Email: req.Email,
CreatedAt: s.clock.Now(), // Testable time
}
return user, s.repo.SaveUser(ctx, user)
}
// Helper function for ID generation
func generateID() string {
return fmt.Sprintf("user_%d_%d", time.Now().UnixNano(), rand.Intn(1000))
}
Why This Works:
- Testable: Every dependency can be mocked
- Explicit: All dependencies visible in constructor signature
- Flexible: Can swap implementations easily
- No Globals: No hidden dependencies or side effects
Decision Rationale: Go's embedding enables composition without inheritance complexity. Use it consciously to add meaningful functionality.
✅ DO: Use embedding to extend functionality
// Good: Embedding adds meaningful functionality
type LoggingDB struct {
*sql.DB
logger *slog.Logger
}
func (db *LoggingDB) QueryContext(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error) {
start := time.Now()
rows, err := db.DB.QueryContext(ctx, query, args...)
duration := time.Since(start)
db.logger.InfoContext(ctx, "database query executed",
"query", query,
"duration_ms", duration.Milliseconds(),
"error", err)
return rows, err
}
db.Begin()
, it returns a standard *sql.Tx
, not a wrapped, logging-aware transaction. Any subsequent calls on that transaction object will bypass your logging.
// The embedding footgun
func (db *LoggingDB) processTransaction() error {
tx, err := db.Begin() // Returns *sql.Tx, NOT *LoggingTx
if err != nil {
return err
}
defer tx.Rollback()
// These operations are NOT logged!
_, err = tx.Exec("INSERT INTO users...")
_, err = tx.Exec("UPDATE accounts...")
return tx.Commit()
}
A truly comprehensive logging solution for database/sql
requires a more invasive wrapper or a different driver. This example illustrates the principle of embedding, but be aware of its boundaries.
❌ DON'T: Embed types that expose unwanted functionality
// Bad: Exposes mutex methods to external users
type Counter struct {
sync.Mutex // ❌ Now Counter.Lock() and Counter.Unlock() are public
value int
}
// Good: Use composition instead
type Counter struct {
mu sync.Mutex // Private field
value int
}
func (c *Counter) Increment() {
c.mu.Lock()
defer c.mu.Unlock()
c.value++
}
Go 1.21's log/slog
brings enterprise-grade structured logging to the standard library. Unstructured logs are useless at scale.
Decision Rationale: Structured logging is essential for production systems. Unstructured logs cannot be searched, aggregated, or alerted on effectively.
What Goes Wrong with Unstructured Logging:
// DISASTER: Unstructured logging nightmare
func (s *UserService) CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error) {
fmt.Printf("Creating user: %s\n", req.Email) // No timestamp, no context
user, err := s.repo.SaveUser(ctx, &User{Email: req.Email})
if err != nil {
fmt.Printf("Error: %v\n", err) // No context, no correlation ID
return nil, err
}
fmt.Printf("User created successfully\n") // Which user? When?
return user, nil
}
Production Horror Story: A payment service processed millions of transactions daily. When customers reported missing payments, debugging required manually scanning through 50GB of log files looking for patterns like "Error: database timeout". It took 3 days to find that payment ID "pay_12345" failed because you had to grep through unstructured text files with no correlation between related log entries.
After implementing structured logging, the same investigation took 3 minutes:
# Find all events for specific payment
kubectl logs | jq 'select(.payment_id == "pay_12345")'
# Find all database timeouts in the last hour
kubectl logs | jq 'select(.error_type == "database_timeout" and .timestamp > "2025-01-15T10:00:00Z")'
- Structured: Machine-readable format (JSON)
- Contextual: Request tracing and correlation
- Actionable: Logs that help solve problems
✅ DO: Implement production-grade logging from day one
import (
"context"
"log/slog"
"os"
"time"
)
func setupLogger(level string) (*slog.Logger, error) {
var logLevel slog.Level
switch level {
case "debug":
logLevel = slog.LevelDebug
case "info":
logLevel = slog.LevelInfo
case "warn":
logLevel = slog.LevelWarn
case "error":
logLevel = slog.LevelError
default:
logLevel = slog.LevelInfo
}
handler := slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: logLevel,
AddSource: true, // Add file:line for debugging
ReplaceAttr: func(groups []string, a slog.Attr) slog.Attr {
if a.Key == slog.TimeKey {
return slog.Attr{
Key: "timestamp",
Value: slog.StringValue(time.Now().UTC().Format(time.RFC3339)),
}
}
return a
},
})
return slog.New(handler), nil
}
Decision Rationale: In distributed systems, a single request spans multiple services. Without correlation IDs, debugging becomes impossible.
What Goes Wrong Without Correlation IDs:
# Distributed system without correlation - debugging nightmare
service-a: "Processing user creation"
service-b: "Database connection timeout"
service-c: "Email sent successfully"
service-a: "User creation failed"
service-b: "Database connection restored"
service-c: "Email delivery failed"
# Which logs are related? Which user creation failed?
# Which email failed? Impossible to tell!
✅ DO: Implement comprehensive request tracing
// Context keys for request tracking
var (
correlationIDKey = &struct{}{}
requestIDKey = &struct{}{}
userIDKey = &struct{}{}
)
func getCorrelationID(ctx context.Context) string {
if id, ok := ctx.Value(correlationIDKey).(string); ok {
return id
}
return "unknown-correlation"
}
func getRequestID(ctx context.Context) string {
if id, ok := ctx.Value(requestIDKey).(string); ok {
return id
}
return "unknown-request"
}
// Service demonstrates structured logging with context
func (s *UserService) ProcessUser(ctx context.Context, userID string) error {
// Create operation-scoped logger with consistent context
logger := s.logger.With(
"service", "user_service",
"operation", "process_user",
"user_id", userID,
"correlation_id", getCorrelationID(ctx),
"request_id", getRequestID(ctx),
)
logger.InfoContext(ctx, "processing user started")
start := time.Now()
defer func() {
duration := time.Since(start)
logger.InfoContext(ctx, "processing user completed",
"duration_ms", duration.Milliseconds())
}()
user, err := s.repo.GetUser(ctx, userID)
if err != nil {
logger.ErrorContext(ctx, "failed to fetch user",
"error", err,
"step", "fetch_user")
return fmt.Errorf("failed to fetch user: %w", err)
}
logger.InfoContext(ctx, "user processing successful")
return nil
}
Result with Correlation IDs:
{"timestamp":"2025-01-15T10:15:30Z","level":"INFO","msg":"processing user started","correlation_id":"req_12345","service":"user_service","user_id":"user_789"}
{"timestamp":"2025-01-15T10:15:31Z","level":"ERROR","msg":"database timeout","correlation_id":"req_12345","service":"database","query":"SELECT * FROM users"}
{"timestamp":"2025-01-15T10:15:32Z","level":"ERROR","msg":"failed to fetch user","correlation_id":"req_12345","service":"user_service","user_id":"user_789"}
Now you can trace the entire request flow: user service → database timeout → user service failure, all linked by correlation_id: req_12345
.
Decision Rationale: Different log levels serve different purposes. Understanding when to use each level is crucial for effective monitoring.
Level | Purpose | Examples | Production Volume | What Goes Wrong If Misused |
---|---|---|---|---|
DEBUG | Development only | Variable values, flow control | Disabled | Performance impact, storage costs |
INFO | Business operations | User created, payment processed | Normal | Log spam if overused |
WARN | Recoverable issues | Retry attempts, fallback used | Low | Alert fatigue if not actionable |
ERROR | Service failures | Database errors, external API failures | Very low | Ignored if overused |
✅ Production Logging Example:
func (s *PaymentService) ProcessPayment(ctx context.Context, req *PaymentRequest) (*PaymentResult, error) {
logger := s.logger.With(
"service", "payment_service",
"operation", "process_payment",
"payment_id", req.PaymentID,
"amount", req.Amount.String(),
"correlation_id", getCorrelationID(ctx),
)
// INFO: Business operations
logger.InfoContext(ctx, "payment processing started")
if err := s.validatePaymentRequest(req); err != nil {
// WARN: Recoverable errors - user can fix this
logger.WarnContext(ctx, "payment validation failed",
"error", err,
"step", "validation")
return nil, &ValidationError{Field: "payment_request", Message: err.Error()}
}
result, err := s.gateway.ProcessPayment(ctx, req)
if err != nil {
// ERROR: Critical failures - requires investigation
logger.ErrorContext(ctx, "payment gateway failed",
"error", err,
"gateway", "stripe",
"step", "gateway_call")
return nil, fmt.Errorf("payment gateway failed: %w", err)
}
// INFO: Successful completion
logger.InfoContext(ctx, "payment processed successfully",
"transaction_id", result.TransactionID,
"status", result.Status)
return result, nil
}
Decision Rationale: Production services must handle shutdown signals gracefully to prevent data corruption, lost requests, and inconsistent state. Kubernetes and Docker send SIGTERM signals that must be handled properly.
What Goes Wrong Without Proper Signal Handling:
// DISASTER: Abrupt shutdown nightmare
func main() {
server := &http.Server{Addr: ":8080"}
// Start server
log.Fatal(server.ListenAndServe()) // Just dies on SIGTERM
}
Production Consequences:
- Data Loss: In-flight transactions are lost during abrupt shutdown
- Connection Leaks: Open database connections aren't closed properly
- Inconsistent State: Partially completed operations leave system in bad state
- Client Errors: Clients receive connection errors instead of graceful responses
- Kubernetes Issues: Pods get killed forcefully after grace period
Real Production Disaster: An e-commerce platform deployed to Kubernetes without graceful shutdown. During a deployment, 200 concurrent checkout operations were abruptly terminated when pods were killed. This resulted in:
- $50,000 in lost sales (customers got errors during checkout)
- 150 orders in inconsistent state (payment charged but order not created)
- 3 hours of manual data recovery
- Customer trust issues
sequenceDiagram
participant A as Kubernetes
participant B as Application
participant C as HTTP Server
A->>B: SIGTERM
B->>C: Shutdown
C-->>B: Draining connections
B-->>A: exit 0
Note right of A: Graceful shutdown sequence
Understanding Context Derivation: Every derived context shares certain properties with its parent while having its own lifecycle. This is fundamental to Go's context system.
Context Keys: The Idiomatic Way:
// ALWAYS use unexported struct pointer keys to prevent collisions
var (
requestIDKey = &struct{}{}
correlationIDKey = &struct{}{}
userIDKey = &struct{}{}
operationKey = &struct{}{}
batchSizeKey = &struct{}{}
userDataKey = &struct{}{}
)
// Helper functions for type-safe context access
func WithRequestID(parent context.Context, id string) context.Context {
return context.WithValue(parent, requestIDKey, id)
}
func GetRequestID(ctx context.Context) string {
if id, ok := ctx.Value(requestIDKey).(string); ok && id != "" {
return id
}
return "unknown-request"
}
func WithCorrelationID(parent context.Context, id string) context.Context {
return context.WithValue(parent, correlationIDKey, id)
}
func GetCorrelationID(ctx context.Context) string {
if id, ok := ctx.Value(correlationIDKey).(string); ok && id != "" {
return id
}
return "unknown-correlation"
}
func WithUserID(parent context.Context, id string) context.Context {
return context.WithValue(parent, userIDKey, id)
}
func GetUserID(ctx context.Context) string {
if id, ok := ctx.Value(userIDKey).(string); ok && id != "" {
return id
}
return "anonymous"
}
Why Struct Pointer Keys Work: Each variable points to a unique memory address, making collisions impossible across packages. This is the standard library pattern.
What Derived Contexts Share with Parent:
- Values: All key-value pairs from parent context
- Deadline: The most restrictive deadline (parent or child)
- Done Channel: Parent cancellation cascades to children
- Error: Cancellation reason propagates
What Derived Contexts Don't Share:
- Cancellation Functions: Each has its own cancel function
- Lifecycle: Child can be cancelled without affecting parent
- Memory: Child context doesn't hold parent alive unnecessarily
✅ DO: Master context derivation patterns
// Context derivation hierarchy
func (s *UserService) ProcessUserBatch(ctx context.Context, userIDs []string) error {
// 1. Operation-level context with timeout
batchCtx, batchCancel := context.WithTimeout(ctx, 2*time.Minute)
defer batchCancel()
// Add operation-specific values
batchCtx = context.WithValue(batchCtx, operationKey, "process_user_batch")
batchCtx = context.WithValue(batchCtx, batchSizeKey, len(userIDs))
logger := s.logger.With(
"operation", "process_user_batch",
"batch_size", len(userIDs),
"correlation_id", GetCorrelationID(batchCtx),
)
logger.InfoContext(batchCtx, "batch processing started")
// 2. Individual user processing contexts
g, groupCtx := NewSafeGroup(batchCtx, logger)
g.SetLimit(10)
for _, userID := range userIDs {
userID := userID
g.Go(groupCtx, func() error {
// 3. Per-user context with individual timeout
userCtx, userCancel := context.WithTimeout(groupCtx, 30*time.Second)
defer userCancel()
// Add user-specific values
userCtx = context.WithValue(userCtx, userIDKey, userID)
return s.processUserWithContext(userCtx, userID)
})
}
if err := g.Wait(); err != nil {
logger.ErrorContext(batchCtx, "batch processing failed", "error", err)
return fmt.Errorf("batch processing failed: %w", err)
}
logger.InfoContext(batchCtx, "batch processing completed successfully")
return nil
}
// Placeholder for processUserWithContext
func (s *UserService) processUserWithContext(ctx context.Context, userID string) error {
// Implementation would go here
return nil
}
// Example service struct
type UserService struct {
logger *slog.Logger
}
❌ FOOTGUN #1: Storing large objects in context values
// WRONG: Large objects in context cause memory retention
func processUser(ctx context.Context, userData []byte) error {
// ❌ This keeps userData alive for the entire context tree lifetime
ctx = context.WithValue(ctx, userDataKey, userData)
return longRunningOperation(ctx)
}
✅ The Fix: Pass data as function parameters
// GOOD: Pass data explicitly
func processUser(ctx context.Context, userData []byte) error {
return longRunningOperation(ctx, userData) // ✅ Explicit parameter
}
// Placeholder function
func longRunningOperation(ctx context.Context, data []byte) error {
// Implementation would process the data
return nil
}
❌ FOOTGUN #2: Not cancelling derived contexts
// WRONG: Context leak - derived context never cancelled
func performOperation(ctx context.Context) error {
opCtx, _ := context.WithTimeout(ctx, time.Minute) // ❌ Cancel function ignored
return doWork(opCtx) // Context may leak resources
}
✅ The Fix: Always defer cancel
// GOOD: Proper context cleanup
func performOperation(ctx context.Context) error {
opCtx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel() // ✅ Always clean up
return doWork(opCtx)
}
// Placeholder function
func doWork(ctx context.Context) error {
// Implementation would do the work
return nil
}
Context Builder for Complex Scenarios (Use Sparingly):
// ContextBuilder for request processing (only when needed)
type RequestContextBuilder struct {
parent context.Context
}
func NewRequestContext(parent context.Context) *RequestContextBuilder {
if parent == nil {
parent = context.Background()
}
return &RequestContextBuilder{parent: parent}
}
func (b *RequestContextBuilder) WithTimeout(timeout time.Duration) (*RequestContextBuilder, context.CancelFunc) {
ctx, cancel := context.WithTimeout(b.parent, timeout)
b.parent = ctx
return b, cancel
}
func (b *RequestContextBuilder) WithRequestID(id string) *RequestContextBuilder {
b.parent = context.WithValue(b.parent, requestIDKey, id)
return b
}
func (b *RequestContextBuilder) WithUserID(id string) *RequestContextBuilder {
b.parent = context.WithValue(b.parent, userIDKey, id)
return b
}
func (b *RequestContextBuilder) Build() context.Context {
return b.parent
}
// Usage for complex request setup
func (h *HTTPHandler) setupRequestContext(r *http.Request) (context.Context, context.CancelFunc) {
builder := NewRequestContext(r.Context())
// Add request timeout
builder, cancel := builder.WithTimeout(30 * time.Second)
// Add request metadata
requestID := r.Header.Get("X-Request-ID")
if requestID == "" {
requestID = generateRequestID()
}
userID := extractUserIDFromAuth(r)
ctx := builder.
WithRequestID(requestID).
WithUserID(userID).
Build()
return ctx, cancel
}
// Placeholder functions
func generateRequestID() string {
return fmt.Sprintf("req_%d", time.Now().UnixNano())
}
func extractUserIDFromAuth(r *http.Request) string {
// Implementation would extract user ID from auth header
return "user_123"
}
// Example HTTP handler struct
type HTTPHandler struct{}
💡 Advanced: Context cancellation reasons for better debugging
import "context"
// WithCancelCause enables rich cancellation reasons
func processWithDetailedCancellation(ctx context.Context) error {
ctx, cancel := context.WithCancelCause(ctx)
defer cancel(nil) // Call with nil if no specific cause
go func() {
// Background operation that might fail
if err := riskyOperation(); err != nil {
// Cancel with specific cause
cancel(fmt.Errorf("risky operation failed: %w", err))
}
}()
// Wait for completion or cancellation
<-ctx.Done()
// Check cancellation cause
if cause := context.Cause(ctx); cause != nil {
return fmt.Errorf("operation cancelled: %w", cause)
}
return ctx.Err() // Standard cancellation
}
// Placeholder function
func riskyOperation() error {
// Simulate a risky operation that might fail
return nil
}
When to Use: When you need to distinguish between different cancellation reasons for debugging or error handling.
✅ DO: Implement comprehensive signal handling
import (
"context"
"os"
"os/signal"
"syscall"
"time"
"log/slog"
"fmt"
)
// GracefulShutdown manages clean application termination
type GracefulShutdown struct {
logger *slog.Logger
timeout time.Duration
handlers []ShutdownHandler
}
type ShutdownHandler interface {
Shutdown(ctx context.Context) error
Name() string
}
func NewGracefulShutdown(logger *slog.Logger, timeout time.Duration) *GracefulShutdown {
return &GracefulShutdown{
logger: logger,
timeout: timeout,
}
}
func (gs *GracefulShutdown) RegisterHandler(handler ShutdownHandler) {
gs.handlers = append(gs.handlers, handler)
}
func (gs *GracefulShutdown) WaitForSignal(ctx context.Context) {
sigChan := make(chan os.Signal, 1)
// Register for shutdown signals
signal.Notify(sigChan,
syscall.SIGINT, // Ctrl+C
syscall.SIGTERM, // Kubernetes/Docker termination
syscall.SIGQUIT, // Quit signal
)
select {
case sig := <-sigChan:
gs.logger.InfoContext(ctx, "shutdown signal received", "signal", sig.String())
gs.performShutdown(ctx)
case <-ctx.Done():
gs.logger.InfoContext(ctx, "context cancelled, initiating shutdown")
gs.performShutdown(ctx)
}
}
func (gs *GracefulShutdown) performShutdown(ctx context.Context) {
shutdownCtx, cancel := context.WithTimeout(context.Background(), gs.timeout)
defer cancel()
gs.logger.InfoContext(shutdownCtx, "graceful shutdown initiated",
"timeout", gs.timeout,
"handlers_count", len(gs.handlers))
// Use SafeGroup for concurrent shutdown
g, gCtx := NewSafeGroup(shutdownCtx, gs.logger)
for _, handler := range gs.handlers {
handler := handler // Capture for goroutine
g.Go(gCtx, func() error {
start := time.Now()
err := handler.Shutdown(gCtx)
duration := time.Since(start)
if err != nil {
gs.logger.ErrorContext(gCtx, "component shutdown failed",
"component", handler.Name(),
"duration", duration,
"error", err)
return fmt.Errorf("shutdown failed for %s: %w", handler.Name(), err)
}
gs.logger.InfoContext(gCtx, "component shutdown completed",
"component", handler.Name(),
"duration", duration)
return nil
})
}
if err := g.Wait(); err != nil {
gs.logger.ErrorContext(shutdownCtx, "shutdown completed with errors", "error", err)
} else {
gs.logger.InfoContext(shutdownCtx, "graceful shutdown completed successfully")
}
}
import (
"net/http"
"database/sql"
)
// Shutdown handlers for different components
type HTTPServerHandler struct {
server *http.Server
logger *slog.Logger
}
func (h *HTTPServerHandler) Shutdown(ctx context.Context) error {
h.logger.InfoContext(ctx, "shutting down HTTP server")
return h.server.Shutdown(ctx)
}
func (h *HTTPServerHandler) Name() string {
return "http_server"
}
type DatabaseHandler struct {
db *sql.DB
logger *slog.Logger
}
func (h *DatabaseHandler) Shutdown(ctx context.Context) error {
h.logger.InfoContext(ctx, "closing database connections")
return h.db.Close()
}
func (h *DatabaseHandler) Name() string {
return "database"
}
type BackgroundWorkerHandler struct {
cancel context.CancelFunc
done <-chan struct{}
logger *slog.Logger
}
func (h *BackgroundWorkerHandler) Shutdown(ctx context.Context) error {
h.logger.InfoContext(ctx, "stopping background workers")
h.cancel() // Signal workers to stop
select {
case <-h.done:
return nil
case <-ctx.Done():
return fmt.Errorf("background workers shutdown timeout: %w", ctx.Err())
}
}
func (h *BackgroundWorkerHandler) Name() string {
return "background_workers"
}
// Cache handler for in-memory caches
type CacheHandler struct {
cache map[string]interface{}
logger *slog.Logger
}
func (h *CacheHandler) Shutdown(ctx context.Context) error {
h.logger.InfoContext(ctx, "clearing cache")
// Clear cache to free memory
for k := range h.cache {
delete(h.cache, k)
}
return nil
}
func (h *CacheHandler) Name() string {
return "cache"
}
✅ Complete Production Example:
// Application demonstrates signal handling integration
type Application struct {
httpServer *http.Server
db *sql.DB
logger *slog.Logger
shutdown *GracefulShutdown
// NOTE: NO context stored - contexts flow through method parameters
}
func (app *Application) Run(ctx context.Context) error {
// Setup shutdown handlers
app.shutdown.RegisterHandler(&HTTPServerHandler{
server: app.httpServer,
logger: app.logger,
})
app.shutdown.RegisterHandler(&DatabaseHandler{
db: app.db,
logger: app.logger,
})
// Start services with proper context management
g, appCtx := NewSafeGroup(ctx, app.logger)
// Start HTTP server
g.Go(appCtx, func() error {
app.logger.InfoContext(appCtx, "starting HTTP server", "addr", app.httpServer.Addr)
if err := app.httpServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
return fmt.Errorf("HTTP server failed: %w", err)
}
return nil
})
// Start background workers
workerCtx, workerCancel := context.WithCancel(appCtx)
workerDone := make(chan struct{})
app.shutdown.RegisterHandler(&BackgroundWorkerHandler{
cancel: workerCancel,
done: workerDone,
logger: app.logger,
})
g.Go(appCtx, func() error {
defer close(workerDone)
return app.runBackgroundWorkers(workerCtx)
})
// Wait for shutdown signal
g.Go(appCtx, func() error {
app.shutdown.WaitForSignal(appCtx)
return nil
})
return g.Wait()
}
func (app *Application) runBackgroundWorkers(ctx context.Context) error {
workerCtx := context.WithValue(ctx, operationKey, "background_workers")
app.logger.InfoContext(workerCtx, "starting background workers")
g, gCtx := NewSafeGroup(workerCtx, app.logger)
// Email worker
g.Go(gCtx, func() error {
return app.runEmailWorker(gCtx)
})
// Data sync worker
g.Go(gCtx, func() error {
return app.runDataSyncWorker(gCtx)
})
// Metrics collection worker
g.Go(gCtx, func() error {
return app.runMetricsWorker(gCtx)
})
return g.Wait()
}
func (app *Application) runEmailWorker(ctx context.Context) error {
emailCtx := context.WithValue(ctx, operationKey, "email_worker")
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
for {
select {
case <-ticker.C:
// Process emails with operation-specific timeout
processCtx, cancel := context.WithTimeout(emailCtx, 25*time.Second)
if err := app.processEmailQueue(processCtx); err != nil {
app.logger.ErrorContext(emailCtx, "email processing failed", "error", err)
// Continue processing despite errors
}
cancel()
case <-ctx.Done():
app.logger.InfoContext(emailCtx, "email worker stopping")
return ctx.Err()
}
}
}
func (app *Application) runDataSyncWorker(ctx context.Context) error {
syncCtx := context.WithValue(ctx, operationKey, "data_sync_worker")
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for {
select {
case <-ticker.C:
syncOpCtx, cancel := context.WithTimeout(syncCtx, 4*time.Minute)
if err := app.syncData(syncOpCtx); err != nil {
app.logger.ErrorContext(syncCtx, "data sync failed", "error", err)
}
cancel()
case <-ctx.Done():
app.logger.InfoContext(syncCtx, "data sync worker stopping")
return ctx.Err()
}
}
}
func (app *Application) runMetricsWorker(ctx context.Context) error {
metricsCtx := context.WithValue(ctx, operationKey, "metrics_worker")
ticker := time.NewTicker(10 * time.Second)
defer ticker.Stop()
for {
select {
case <-ticker.C:
metricsOpCtx, cancel := context.WithTimeout(metricsCtx, 5*time.Second)
if err := app.collectMetrics(metricsOpCtx); err != nil {
app.logger.ErrorContext(metricsCtx, "metrics collection failed", "error", err)
}
cancel()
case <-ctx.Done():
app.logger.InfoContext(metricsCtx, "metrics worker stopping")
return ctx.Err()
}
}
}
// Placeholder methods
func (app *Application) processEmailQueue(ctx context.Context) error {
// Implementation would process email queue
return nil
}
func (app *Application) syncData(ctx context.Context) error {
// Implementation would sync data with external systems
return nil
}
func (app *Application) collectMetrics(ctx context.Context) error {
// Implementation would collect and export metrics
return nil
}
// Production main function with signal handling
func main() {
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelInfo,
}))
app, err := NewApplication(logger)
if err != nil {
logger.Error("failed to initialize application", "error", err)
os.Exit(1)
}
defer app.Close()
// Setup graceful shutdown with 30-second timeout
app.shutdown = NewGracefulShutdown(logger, 30*time.Second)
// Run application with root context
ctx := context.Background()
if err := app.Run(ctx); err != nil {
logger.Error("application failed", "error", err)
os.Exit(1)
}
logger.Info("application stopped successfully")
}
// Placeholder constructor and cleanup
func NewApplication(logger *slog.Logger) (*Application, error) {
// Implementation would initialize HTTP server, database, etc.
server := &http.Server{
Addr: ":8080",
ReadTimeout: 15 * time.Second,
WriteTimeout: 15 * time.Second,
}
return &Application{
httpServer: server,
logger: logger,
}, nil
}
func (app *Application) Close() {
// Cleanup any remaining resources
if app.db != nil {
app.db.Close()
}
}
// SafeGroup placeholder (would be imported from previous sections)
type SafeGroup struct {
// Implementation details would be here
}
func NewSafeGroup(ctx context.Context, logger *slog.Logger) (*SafeGroup, context.Context) {
// Implementation would return panic-safe errgroup wrapper
return &SafeGroup{}, ctx
}
func (sg *SafeGroup) Go(ctx context.Context, fn func() error) {
// Implementation would execute function with panic recovery
}
func (sg *SafeGroup) Wait() error {
// Implementation would wait for all goroutines to complete
return nil
}
✅ Context Do's:
- Always pass context as first parameter
- Use context.WithTimeout for operations with time limits
- Derive contexts for adding values or deadlines
- Always defer cancel() when creating cancellable contexts
- Use context for cancellation, deadlines, and request-scoped values
- Use struct pointer keys to prevent collisions
❌ Context Don'ts:
- Never store context in struct fields
- Don't use context for passing optional parameters
- Don't put large objects in context values
- Don't ignore context cancellation in loops
- Don't create contexts without cancel functions
- Don't use string keys for context values
✅ Graceful Shutdown Do's:
- Handle SIGTERM, SIGINT, and SIGQUIT signals
- Use timeout contexts for shutdown operations
- Shut down components concurrently when possible
- Log shutdown progress for debugging
- Clean up resources (database connections, file handles)
- Allow in-flight requests to complete
❌ Graceful Shutdown Don'ts:
- Don't ignore shutdown signals
- Don't use infinite timeouts for shutdown
- Don't forget to close background workers
- Don't block shutdown on non-critical operations
- Don't leak goroutines during shutdown
Production Impact: Proper signal handling prevents:
- Data corruption from interrupted transactions
- Resource leaks from unclosed connections
- Client errors from abrupt disconnections
- SLA violations from ungraceful restarts
- Manual cleanup work after deployments
Decision Rationale: Go's garbage collector is optimized for low-latency applications. Working with it, rather than against it, is crucial for performance.
- Stack allocation is free - avoid heap escapes
- Object pooling - reuse expensive objects
- Pre-allocation - avoid slice/map growth
- Cache-friendly layouts - minimize pointer chasing
Understanding What Escapes to Heap:
// STAYS ON STACK: Returning values
func createUser(name string) User {
return User{Name: name} // ✅ Stack allocated
}
// ESCAPES TO HEAP: Returning pointers
func createUserPtr(name string) *User {
return &User{Name: name} // ❌ Escapes to heap
}
// SOPHISTICATED: Control escape behavior
func processUsers(users []User) []Result {
// Pre-allocate prevents multiple heap allocations
results := make([]Result, 0, len(users)) // ✅ Single allocation
for _, user := range users {
result := processUser(user) // ✅ Value processing on stack
results = append(results, result)
}
return results
}
// Placeholder for processUser function
func processUser(user User) Result {
return Result{UserID: user.Name} // Simplified processing
}
// Example types for demonstration
type User struct {
Name string
}
type Result struct {
UserID string
}
How to Check Escape Analysis: You can use the Go compiler flags to see escape analysis decisions:
go build -gcflags="-m" your_file.go
# For more detailed output:
go build -gcflags="-m -m" your_file.go
This will print lines like ./file.go:10:2: moved to heap: varName
indicating what escapes.
Why This Matters:
- Performance: Stack allocation is ~100x faster than heap allocation
- GC Pressure: Stack allocations don't contribute to garbage collection
- Memory Usage: Stack memory is automatically managed
What Goes Wrong with Heap Escapes:
// DISASTER: Unnecessary heap allocations
func processRequests(requests []Request) []*Response {
var responses []*Response
for _, req := range requests {
// Every response escapes to heap!
response := &Response{
ID: req.ID,
Data: processRequestData(req),
}
responses = append(responses, response) // Growing slice causes more allocations
}
return responses
}
Production Impact: A high-traffic API was allocating 50,000 response objects per second on the heap. Each allocation required garbage collection, causing 99th percentile latency to spike from 50ms to 300ms during GC pauses.
The Fix: Use values instead of pointers when possible
// BETTER: Stack-based processing
func processRequests(requests []Request) []Response {
// Pre-allocate with known capacity
responses := make([]Response, 0, len(requests))
for _, req := range requests {
// Value stays on stack
response := Response{
ID: req.ID,
Data: processRequestData(req),
}
responses = append(responses, response)
}
return responses
}
// Placeholder functions
type Request struct {
ID string
}
type Response struct {
ID string
Data string
}
func processRequestData(req Request) string {
return "processed_" + req.ID
}
🚨 CRITICAL PRODUCTION FOOTGUN: Taking a small slice from a very large one can prevent the large slice's underlying array from being garbage collected.
❌ The Silent Memory Leak:
func getFirstN(largeSlice []byte, n int) []byte {
// Returns a slice header pointing to the original large array.
// If largeSlice is 1GB, the GC cannot reclaim that 1GB
// as long as the small returned slice is alive.
return largeSlice[:n]
}
func processLogFile(filename string) ([]byte, error) {
// Read entire 100MB log file
data, err := os.ReadFile(filename)
if err != nil {
return nil, err
}
// Extract just the header (first 1KB)
header := data[:1024] // ❌ Keeps entire 100MB in memory!
return header, nil
}
Production War Story: A data processing service reading large files (1GB+) and extracting small headers (1KB) suffered from massive memory usage. The small header slices were preventing garbage collection of the entire large files, causing out-of-memory crashes. The service was using 50GB of RAM to store what should have been 50KB of actual data.
What Happens:
os.ReadFile
allocates 100MB array for file contentsdata[:1024]
creates slice header pointing to same 100MB array- Even though you only need 1KB, entire 100MB stays in memory
- GC cannot free the 100MB because slice header still references it
- Multiply by 1000 files = 100GB memory usage for 1MB of actual data
✅ The Fix: Explicitly Copy the Data:
func getFirstNSafe(largeSlice []byte, n int) []byte {
// Create a new slice of the exact size needed
result := make([]byte, n)
// Copy only the data we need
copy(result, largeSlice[:n])
// Now the reference to largeSlice's array is gone,
// and it can be garbage collected
return result
}
func processLogFileSafe(filename string) ([]byte, error) {
data, err := os.ReadFile(filename)
if err != nil {
return nil, err
}
// Extract header with explicit copy
header := make([]byte, 1024)
copy(header, data[:1024]) // ✅ Only keeps 1KB, not 100MB
return header, nil
}
When This Matters: Any time you're sub-slicing large data structures and keeping the small slice around long-term.
Additional Slice Footgun: Slice truncation with pointers
// MEMORY LEAK: Truncating slice with pointers
func removeFirst(items []*Item) []*Item {
truncated := items[1:] // Still references first item!
return truncated
}
// FIX: Nil out the reference
func removeFirstSafe(items []*Item) []*Item {
items[0] = nil // Allow GC to collect the first item
return items[1:]
}
// Example type for demonstration
type Item struct {
Data string
}
✅ DO: Use sync.Pool for expensive, reusable objects
import "sync"
// BufferPool manages reusable byte buffers
type BufferPool struct {
pool sync.Pool
}
func NewBufferPool() *BufferPool {
return &BufferPool{
pool: sync.Pool{
New: func() interface{} {
return make([]byte, 0, 1024) // Reasonable initial capacity
},
},
}
}
func (p *BufferPool) Get() []byte {
buf := p.pool.Get().([]byte)
return buf[:0] // Reset length but keep capacity
}
func (p *BufferPool) Put(buf []byte) {
// Only pool buffers within reasonable size limits
if cap(buf) > 64*1024 {
return // Don't pool very large buffers
}
// CRITICAL: Reset slice and prevent hidden capacity retention
p.pool.Put(buf[:0:cap(buf)])
}
// Usage in high-throughput service
func (h *HTTPHandler) processRequest(w http.ResponseWriter, r *http.Request) {
buf := h.bufferPool.Get()
defer h.bufferPool.Put(buf)
// Use buffer for processing
buf = processRequestData(r, buf)
w.Write(buf)
}
// Example HTTP handler struct and method
type HTTPHandler struct {
bufferPool *BufferPool
}
func processRequestData(r *http.Request, buf []byte) []byte {
// Simulate processing request data into buffer
data := "processed: " + r.URL.Path
return append(buf, data...)
}
Production Impact: High-traffic APIs report 50-70% reduction in garbage generation and significantly improved 99th percentile latency.
What Goes Wrong Without Object Pooling:
// DISASTER: Constant allocation pressure
func handleRequests(requests <-chan Request) {
for req := range requests {
// Creates new buffer every time!
buf := make([]byte, 1024) // ❌ Constant heap allocation
processRequest(req, buf)
// Buffer is garbage collected, creating GC pressure
}
}
Production War Story: A message processing service was creating 100,000 temporary buffers per second. Each buffer allocation triggered garbage collection, causing periodic 100ms latency spikes that violated SLA requirements. Implementing buffer pooling reduced allocation rate by 95% and eliminated latency spikes.
✅ DO: Optimize struct field ordering for memory efficiency
// BAD: Poor field alignment wastes memory
type BadUser struct {
IsActive bool // 1 byte + 7 bytes padding
ID int64 // 8 bytes
Name string // 16 bytes
Age int32 // 4 bytes + 4 bytes padding
}
// Memory layout: [bool•••••••][int64---][string--------][int32••••]
// Total: 40 bytes with padding waste
// GOOD: Optimized field alignment
type GoodUser struct {
ID int64 // 8 bytes
Name string // 16 bytes
Age int32 // 4 bytes
IsActive bool // 1 byte + 3 bytes natural padding
}
// Memory layout: [int64---][string--------][int32][bool•••]
// Total: 32 bytes, 20% memory savings (• = padding)
Why This Matters: In high-volume applications, struct alignment can save significant memory. With millions of structs, 20% memory savings means better cache performance and lower memory costs.
Production Impact: A trading system reduced memory usage by 30% just by reordering struct fields in their high-frequency data structures. This improved L1 cache hit rates and reduced latency by 15%.
What Goes Wrong with Poor Alignment:
// WASTEFUL: Poor memory layout
type WastefulOrder struct {
IsProcessed bool // 1 byte + 7 bytes padding
OrderID int64 // 8 bytes
UserID int64 // 8 bytes
IsPaid bool // 1 byte + 7 bytes padding
Amount float64 // 8 bytes
IsActive bool // 1 byte + 7 bytes padding
}
// Total: 48 bytes (21 bytes wasted on padding!)
// EFFICIENT: Optimized memory layout
type EfficientOrder struct {
OrderID int64 // 8 bytes
UserID int64 // 8 bytes
Amount float64 // 8 bytes
IsProcessed bool // 1 byte
IsPaid bool // 1 byte
IsActive bool // 1 byte + 5 bytes padding
}
// Total: 32 bytes (13 bytes saved per struct)
✅ DO: Use strings.Builder for efficient string construction
import "strings"
// EXCELLENT: Pre-calculated string building
func buildQuery(table string, fields []string, conditions []string) string {
var builder strings.Builder
// Calculate capacity to avoid reallocations
estimatedSize := len("SELECT ") + len(" FROM ") + len(table)
for _, field := range fields {
estimatedSize += len(field) + 2 // ", "
}
for _, condition := range conditions {
estimatedSize += len(condition) + 5 // " AND "
}
builder.Grow(estimatedSize) // Pre-allocate
builder.WriteString("SELECT ")
for i, field := range fields {
if i > 0 {
builder.WriteString(", ")
}
builder.WriteString(field)
}
builder.WriteString(" FROM ")
builder.WriteString(table)
if len(conditions) > 0 {
builder.WriteString(" WHERE ")
for i, condition := range conditions {
if i > 0 {
builder.WriteString(" AND ")
}
builder.WriteString(condition)
}
}
return builder.String()
}
What Goes Wrong with String Concatenation:
// DISASTER: Quadratic performance nightmare
func buildQueryBad(fields []string) string {
query := "SELECT "
for i, field := range fields {
if i > 0 {
query += ", " // Creates new string every iteration!
}
query += field // Another new string!
}
return query
}
Production Horror: A report generation service used string concatenation to build SQL queries with 1000+ fields. Each field addition created a new string, copying all previous data. For 1000 fields, this performed 500,000+ string copies, making query building take longer than query execution.
Why String Concatenation Is Quadratic:
Iteration 1: "SELECT " (7 chars) → copy 7 chars
Iteration 2: "SELECT field1, " (15 chars) → copy 15 chars
Iteration 3: "SELECT field1, field2, " (23 chars) → copy 23 chars
...
Total copies: 7 + 15 + 23 + ... = O(n²) complexity
strings.Builder Performance:
func demonstrateStringPerformance() {
// BAD: O(n²) time complexity
start := time.Now()
result := ""
for i := 0; i < 10000; i++ {
result += fmt.Sprintf("field%d, ", i) // Copies entire string each time
}
fmt.Printf("Concatenation: %v\n", time.Since(start))
// GOOD: O(n) time complexity
start = time.Now()
var builder strings.Builder
builder.Grow(200000) // Pre-allocate capacity
for i := 0; i < 10000; i++ {
builder.WriteString(fmt.Sprintf("field%d, ", i)) // Appends to buffer
}
result = builder.String()
fmt.Printf("Builder: %v\n", time.Since(start))
}
Decision Rationale: GOMEMLIMIT enables soft memory limits, allowing applications to stay within memory budgets while maintaining performance. This is crucial for containerized environments with memory constraints.
✅ DO: Set memory limits for production applications
import (
"runtime/debug"
"os"
"strconv"
"fmt"
)
func setupMemoryLimit() {
// Option 1: Environment variable (recommended for containers)
// export GOMEMLIMIT=1GiB
// Option 2: Programmatic setting
if limit := os.Getenv("MEMORY_LIMIT_MB"); limit != "" {
if mb, err := strconv.Atoi(limit); err == nil {
// Set limit with 10% headroom for non-Go memory
limitBytes := int64(mb) * 1024 * 1024 * 9 / 10
debug.SetMemoryLimit(limitBytes)
fmt.Printf("Memory limit set to %d MB\n", mb*9/10)
}
}
}
// Production example: Container startup
func main() {
setupMemoryLimit()
// Your application logic...
}
Container Integration:
# Set memory limit in container
ENV GOMEMLIMIT=1GiB
# Container memory limit should be higher (e.g., 1.2GB) to account for non-Go memory
Why This Matters: Without GOMEMLIMIT, Go applications can consume all available memory before triggering garbage collection, leading to OOM kills in containers.
Production War Story: A microservice running in Kubernetes pods with 2GB memory limits was getting OOM killed during traffic spikes. Setting GOMEMLIMIT=1.8GiB
allowed the GC to kick in earlier, preventing container kills and maintaining stable performance.
What Goes Wrong Without Memory Limits:
// DANGEROUS: Unbounded memory growth
func processLargeDataset() {
var cache [][]byte
for i := 0; i < 1000000; i++ {
// Each iteration adds 1MB to memory
data := make([]byte, 1024*1024)
cache = append(cache, data)
// Without GOMEMLIMIT, this keeps growing until OOM
// GC doesn't kick in until it's too late
}
}
The Fix with GOMEMLIMIT:
- GC triggers more aggressively as memory approaches limit
- Application stays within container memory boundaries
- Prevents sudden OOM kills during traffic spikes
- Maintains predictable performance characteristics
✅ DO: Use Go's built-in profiling tools
import (
_ "net/http/pprof" // Enable pprof endpoints
"net/http"
"log"
)
// Add to your main function or setup
func enableProfiling() {
go func() {
log.Println("pprof server starting on :6060")
log.Println(http.ListenAndServe(":6060", nil))
}()
}
// Usage:
// go tool pprof http://localhost:6060/debug/pprof/heap
// go test -bench=. -cpuprofile=cpu.prof
// go tool pprof cpu.prof
// Example benchmark with memory profiling
func BenchmarkDataProcessing(b *testing.B) {
data := generateLargeDataset(10000)
b.ResetTimer()
b.ReportAllocs() // Show allocation stats
for i := 0; i < b.N; i++ {
result := processData(data)
_ = result // Prevent optimization
}
}
// Placeholder functions for profiling example
func generateLargeDataset(size int) []int {
data := make([]int, size)
for i := range data {
data[i] = i
}
return data
}
func processData(data []int) []int {
result := make([]int, len(data))
for i, v := range data {
result[i] = v * v
}
return result
}
Production Profiling Workflow:
# 1. Capture heap profile from running service
go tool pprof http://your-service:6060/debug/pprof/heap
# 2. Analyze memory usage
(pprof) top10
(pprof) list functionName
(pprof) web
# 3. Capture CPU profile during load test
go tool pprof http://your-service:6060/debug/pprof/profile?seconds=30
# 4. Find CPU hotspots
(pprof) top10
(pprof) web
What Profiling Reveals:
// Example: Memory leak discovered through profiling
func leakyFunction() {
cache := make(map[string][]byte)
for i := 0; i < 1000000; i++ {
key := fmt.Sprintf("key_%d", i)
// This cache grows without bounds!
cache[key] = make([]byte, 1024)
}
// Profile shows this function consuming 1GB+ memory
// Solution: Add cache eviction or size limits
}
❌ Critical Performance Mistakes:
Anti-Pattern | Why It's Slow | Better Approach | Production Impact |
---|---|---|---|
Growing slices without capacity | Multiple reallocations | make([]T, 0, expectedSize) |
10x slower allocation performance |
String concatenation with + |
Creates new strings each time | strings.Builder |
Quadratic performance degradation |
Not pooling expensive objects | Constant allocation/GC pressure | sync.Pool |
High latency spikes during GC |
Unnecessary pointer escapes | Heap allocation overhead | Return values, not pointers | 100x allocation performance penalty |
Large struct copying | Expensive memory copies | Pass pointers for large structs | CPU cache misses, memory bandwidth waste |
Poor struct field alignment | Memory waste, cache misses | Order fields by size (large to small) | 20-40% memory waste |
Sub-slicing large arrays | Prevents GC of large backing arrays | Explicit copying with make + copy |
Memory leaks, OOM crashes |
Ignoring GOMEMLIMIT | OOM kills in containers | Set appropriate memory limits | Service instability, container restarts |
Case Study 1: API Response Serialization:
// BEFORE: Slow JSON marshaling
func (h *Handler) getUsers(w http.ResponseWriter, r *http.Request) {
users := h.service.GetUsers(r.Context())
// Creates new encoder each time
json.NewEncoder(w).Encode(users) // ❌ Slow
}
// AFTER: Pooled encoders + pre-allocated buffers
type Handler struct {
encoderPool *sync.Pool
bufferPool *BufferPool
}
func (h *Handler) getUsersOptimized(w http.ResponseWriter, r *http.Request) {
users := h.service.GetUsers(r.Context())
buf := h.bufferPool.Get()
defer h.bufferPool.Put(buf)
encoder := h.encoderPool.Get().(*json.Encoder)
defer h.encoderPool.Put(encoder)
encoder.Reset(bytes.NewBuffer(buf))
encoder.Encode(users)
w.Write(buf)
}
Result: 60% reduction in response time, 80% reduction in allocations.
Case Study 2: Database Query Building
// BEFORE: String concatenation nightmare
func buildComplexQuery(filters []Filter) string {
query := "SELECT * FROM users WHERE 1=1"
for _, filter := range filters {
query += " AND " + filter.Column + " = '" + filter.Value + "'" // ❌ Quadratic
}
return query
}
// AFTER: Pre-allocated builder
func buildComplexQueryOptimized(filters []Filter) string {
const baseQuery = "SELECT * FROM users WHERE 1=1"
// Estimate capacity
capacity := len(baseQuery)
for _, filter := range filters {
capacity += len(" AND ") + len(filter.Column) + len(" = '") + len(filter.Value) + len("'")
}
var builder strings.Builder
builder.Grow(capacity)
builder.WriteString(baseQuery)
for _, filter := range filters {
builder.WriteString(" AND ")
builder.WriteString(filter.Column)
builder.WriteString(" = '")
builder.WriteString(filter.Value)
builder.WriteString("'")
}
return builder.String()
}
// Example filter type
type Filter struct {
Column string
Value string
}
Result: O(n²) → O(n) complexity, 95% performance improvement for large filter sets.
❌ runtime.GC()
is Almost Always Wrong
// BAD: Fighting the runtime instead of working with it
func processLargeFile(filename string) error {
// ... processing logic
// Force GC "to help performance"
runtime.GC() // ❌ Usually makes things worse
return nil
}
Why Manual GC is Dangerous:
- Masks Memory Leaks: Hides the real problem
- Unpredictable Latency: Introduces pause spikes
- Runtime Conflict: Fights against optimized GC algorithms
- Code Smell: Usually indicates poor allocation patterns
What Happens with Manual GC:
- Stop-the-World: All goroutines pause during GC
- Poor Timing: GC runs when you force it, not when optimal
- Wasted Work: May run GC when little garbage exists
- Performance Regression: Often makes performance worse
✅ Better Solutions:
- Reduce Allocations: Use object pooling and pre-allocation
- Profile First: Use
go tool pprof
to understand allocation patterns - Batch Processing: Control memory usage through batching
- Let GC Work: Trust the optimized garbage collector
- Use GOMEMLIMIT: Set soft memory limits instead
Production Example of Manual GC Gone Wrong:
// DISASTER: Manual GC causing latency spikes
func processUserBatch(users []User) {
for _, user := range users {
processUser(user)
// "Helping" the GC every 100 users
if len(users)%100 == 0 {
runtime.GC() // ❌ Causes 50ms pause every 100 users
}
}
}
// SOLUTION: Let GC work naturally
func processUserBatchOptimized(users []User) {
// Pre-allocate if needed
results := make([]Result, 0, len(users))
for _, user := range users {
result := processUser(user) // Stack allocation when possible
results = append(results, result)
}
// GC runs automatically at optimal times
return results
}
The Rule: If you're calling runtime.GC()
, you probably have an allocation problem, not a GC problem. Fix the allocations instead.
Decision Rationale: Generics enable type-safe, reusable code without the performance overhead of interface{}
. They're essential for modern Go applications.
Decision Rationale: Generics are powerful but can add complexity. Use this framework to decide when they're appropriate.
// GENERICS DECISION TREE
// ✅ Use generics when:
// - Building reusable data structures (trees, caches, queues)
// - Implementing type-safe algorithms (sort, filter, map, reduce)
// - Avoiding interface{} type assertions in performance-critical paths
// - Creating type-safe APIs where compile-time checking prevents bugs
//
// ❌ Avoid generics when:
// - A simple interface suffices (e.g., io.Reader, fmt.Stringer)
// - Only one concrete type is needed (YAGNI principle)
// - They obscure readability without tangible benefits
// - The abstraction is forced rather than natural
// GOOD: Natural abstraction
func Map[T, U any](items []T, fn func(T) U) []U {
result := make([]U, len(items))
for i, item := range items {
result[i] = fn(item)
}
return result
}
// BAD: Unnecessary complexity
func Print[T any](value T) {
fmt.Println(value) // Just use fmt.Println(value interface{})
}
// GOOD: Type-safe cache
type Cache[K comparable, V any] struct {
data map[K]V
mu sync.RWMutex
}
func (c *Cache[K, V]) Get(key K) (V, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
val, ok := c.data[key]
return val, ok
}
// BAD: Over-genericized simple operation
func Add[T int | float64](a, b T) T {
return a + b // Just write separate functions if only 2 types needed
}
When in Doubt: Start with concrete types or interfaces. Refactor to generics only when you have multiple concrete implementations and proven need for type safety.
✅ DO: Use generics for type-safe repositories
import (
"context"
"database/sql"
"fmt"
)
// Generic repository interface
type Repository[T any, ID comparable] interface {
Create(ctx context.Context, entity T) error
GetByID(ctx context.Context, id ID) (T, error)
Update(ctx context.Context, entity T) error
Delete(ctx context.Context, id ID) error
List(ctx context.Context, filter Filter) ([]T, error)
}
// Filter for listing operations
type Filter struct {
Limit int
Offset int
}
// Generic repository implementation
type SQLRepository[T any, ID comparable] struct {
db *sql.DB
tableName string
mapper EntityMapper[T]
}
func NewSQLRepository[T any, ID comparable](db *sql.DB, tableName string, mapper EntityMapper[T]) *SQLRepository[T, ID] {
return &SQLRepository[T, ID]{
db: db,
tableName: tableName,
mapper: mapper,
}
}
func (r *SQLRepository[T, ID]) GetByID(ctx context.Context, id ID) (T, error) {
var zero T
query := fmt.Sprintf("SELECT * FROM %s WHERE id = $1", r.tableName)
row := r.db.QueryRowContext(ctx, query, id)
entity, err := r.mapper.ScanFrom(row)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return zero, fmt.Errorf("%w: entity %v", sql.ErrNoRows, id)
}
return zero, fmt.Errorf("failed to get entity: %w", err)
}
return entity, nil
}
// EntityMapper defines contract for mapping database rows.
// Note: sql.Row.Scan and sql.Rows.Scan are standard methods for reading
// query results into Go variables. The mapper encapsulates this logic
// for a specific type T.
type EntityMapper[T any] interface {
ScanFrom(row *sql.Row) (T, error)
ScanFromRows(rows *sql.Rows) (T, error)
}
// Concrete implementation for User
type UserMapper struct{}
func (m UserMapper) ScanFrom(row *sql.Row) (User, error) {
var user User
err := row.Scan(&user.ID, &user.Email, &user.Name, &user.CreatedAt)
return user, err
}
func (m UserMapper) ScanFromRows(rows *sql.Rows) (User, error) {
var user User
err := rows.Scan(&user.ID, &user.Email, &user.Name, &user.CreatedAt)
return user, err
}
// Usage: Type-safe repositories
func setupRepositories(db *sql.DB) {
userRepo := NewSQLRepository[User, string](db, "users", UserMapper{})
orderRepo := NewSQLRepository[Order, int64](db, "orders", OrderMapper{})
// Type-safe usage
user, err := userRepo.GetByID(ctx, "user-123")
order, err := orderRepo.GetByID(ctx, int64(456))
}
// Example types for demonstration
type User struct {
ID string `json:"id"`
Email string `json:"email"`
Name string `json:"name"`
CreatedAt time.Time `json:"created_at"`
}
type Order struct {
ID int64 `json:"id"`
UserID string `json:"user_id"`
Amount int `json:"amount"`
}
type OrderMapper struct{}
func (m OrderMapper) ScanFrom(row *sql.Row) (Order, error) {
var order Order
err := row.Scan(&order.ID, &order.UserID, &order.Amount)
return order, err
}
func (m OrderMapper) ScanFromRows(rows *sql.Rows) (Order, error) {
var order Order
err := rows.Scan(&order.ID, &order.UserID, &order.Amount)
return order, err
}
✅ DO: Use constraints for type-safe generic functions
// Ordered constraint for sortable types
type Ordered interface {
~int | ~int8 | ~int16 | ~int32 | ~int64 |
~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 |
~float32 | ~float64 | ~string
}
// Generic min/max functions
func Min[T Ordered](a, b T) T {
if a < b {
return a
}
return b
}
func Max[T Ordered](a, b T) T {
if a > b {
return a
}
return b
}
// Numeric constraint for mathematical operations
type Numeric interface {
~int | ~int8 | ~int16 | ~int32 | ~int64 |
~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 |
~float32 | ~float64
}
// Generic mathematical functions
func Sum[T Numeric](values []T) T {
var sum T
for _, v := range values {
sum += v
}
return sum
}
func Average[T Numeric](values []T) T {
if len(values) == 0 {
return T(0)
}
return Sum(values) / T(len(values))
}
✅ DO: Combine generics with functional options for flexible APIs
// Generic configuration with options
type Config[T any] struct {
timeout time.Duration
retries int
validator Validator[T]
transformer Transformer[T]
}
type Option[T any] func(*Config[T])
func WithTimeout[T any](timeout time.Duration) Option[T] {
return func(c *Config[T]) {
c.timeout = timeout
}
}
func WithRetries[T any](retries int) Option[T] {
return func(c *Config[T]) {
c.retries = retries
}
}
func WithValidator[T any](validator Validator[T]) Option[T] {
return func(c *Config[T]) {
c.validator = validator
}
}
// Interface definitions for type safety
type Validator[T any] interface {
Validate(T) error
}
type Transformer[T any] interface {
Transform(T) T
}
// Generic service with options
type Service[T any] struct {
config Config[T]
}
func NewService[T any](options ...Option[T]) *Service[T] {
config := Config[T]{
timeout: 30 * time.Second,
retries: 3,
}
for _, option := range options {
option(&config)
}
return &Service[T]{config: config}
}
// Example validators for concrete types
type UserValidator struct{}
func (v UserValidator) Validate(user User) error {
if user.Email == "" {
return errors.New("email is required")
}
return nil
}
type OrderValidator struct{}
func (v OrderValidator) Validate(order Order) error {
if order.Amount <= 0 {
return errors.New("amount must be positive")
}
return nil
}
// Usage: Type-safe service configuration
func main() {
userService := NewService[User](
WithTimeout[User](10 * time.Second),
WithRetries[User](5),
WithValidator[User](UserValidator{}),
)
orderService := NewService[Order](
WithTimeout[Order](15 * time.Second),
WithValidator[Order](OrderValidator{}),
)
}
- Monomorphization: Each type instantiation creates separate code
- Compilation Time: Many generic instantiations slow compilation
- Binary Size: Multiple instantiations increase binary size
- Runtime Performance: Generally same as hand-written code
✅ Best Practices:
- Use generics for genuine type safety, not just to avoid typing
- Prefer interface{} for simple cases where type safety isn't critical
- Profile before optimizing - measure actual impact
- Consider code generation for very performance-critical paths
Range over Functions (Stable in Go 1.25):
// Iterator pattern with range over func
func Numbers(start, end int) func(func(int) bool) {
return func(yield func(int) bool) {
for i := start; i < end; i++ {
if !yield(i) {
return
}
}
}
}
// Usage (stable in Go 1.25)
for n := range Numbers(1, 10) {
fmt.Println(n)
}
Enhanced Error Handling:
- Improved error messages in generics
- Better type inference in complex scenarios
- Performance improvements in error wrapping
Decision Rationale: Functional options provide flexible configuration without breaking API compatibility. Essential for libraries and services with many configuration parameters.
What Goes Wrong Without Functional Options:
// NIGHTMARE: Constructors with many parameters
func NewServer(addr string, readTimeout time.Duration, writeTimeout time.Duration,
maxHeaderBytes int, enableCORS bool, enableGzip bool, enableMetrics bool,
logLevel string, middleware []Middleware) *Server {
// 9 parameters is already unmanageable
// Adding new config options breaks all existing calls
}
// Caller nightmare
server := NewServer(":8080", 15*time.Second, 15*time.Second, 1024*1024,
true, true, false, "info", []Middleware{}) // What do these booleans mean?
✅ DO: Use functional options for complex configuration
// Server with many optional configurations
type Server struct {
addr string
readTimeout time.Duration
writeTimeout time.Duration
maxHeaderBytes int
middleware []Middleware
errorHandler ErrorHandler
}
type Option func(*Server)
func WithAddr(addr string) Option {
return func(s *Server) {
s.addr = addr
}
}
func WithTimeouts(read, write time.Duration) Option {
return func(s *Server) {
s.readTimeout = read
s.writeTimeout = write
}
}
func WithMiddleware(middleware ...Middleware) Option {
return func(s *Server) {
s.middleware = append(s.middleware, middleware...)
}
}
func WithErrorHandler(handler ErrorHandler) Option {
return func(s *Server) {
s.errorHandler = handler
}
}
func NewServer(options ...Option) *Server {
// Sensible defaults
server := &Server{
addr: ":8080",
readTimeout: 15 * time.Second,
writeTimeout: 15 * time.Second,
maxHeaderBytes: 1 << 20, // 1MB
errorHandler: DefaultErrorHandler{},
}
// Apply options (nil slice is safe - range over nil slice is no-op)
for _, option := range options {
if option != nil { // Defensive check for nil functional options
option(server)
}
}
return server
}
// Usage: Clean and extensible
server := NewServer(
WithAddr(":3000"),
WithTimeouts(10*time.Second, 30*time.Second),
WithMiddleware(
corsMiddleware(),
requestIDMiddleware(),
recoveryMiddleware(),
),
WithErrorHandler(CustomErrorHandler{}),
)
// Example middleware and handler types
type Middleware func(http.Handler) http.Handler
type ErrorHandler interface {
HandleError(w http.ResponseWriter, r *http.Request, err error)
}
type DefaultErrorHandler struct{}
func (h DefaultErrorHandler) HandleError(w http.ResponseWriter, r *http.Request, err error) {
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
}
type CustomErrorHandler struct{}
func (h CustomErrorHandler) HandleError(w http.ResponseWriter, r *http.Request, err error) {
// Custom error handling logic
http.Error(w, "Custom Error Response", http.StatusInternalServerError)
}
// Example middleware functions
func corsMiddleware() Middleware {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Access-Control-Allow-Origin", "*")
next.ServeHTTP(w, r)
})
}
}
func requestIDMiddleware() Middleware {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
requestID := generateRequestID()
w.Header().Set("X-Request-ID", requestID)
next.ServeHTTP(w, r)
})
}
}
func recoveryMiddleware() Middleware {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
defer func() {
if err := recover(); err != nil {
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
}
}()
next.ServeHTTP(w, r)
})
}
}
func generateRequestID() string {
return fmt.Sprintf("req_%d", time.Now().UnixNano())
}
Decision Rationale: Repository pattern separates data access from business logic, enabling testability and flexibility in storage implementations.
✅ DO: Implement clean repository interfaces
// Domain model (pure business logic)
type User struct {
ID string `json:"id"`
Email string `json:"email"`
Name string `json:"name"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
}
// Repository interface (defined in domain layer)
type UserRepository interface {
Create(ctx context.Context, user *User) error
GetByID(ctx context.Context, id string) (*User, error)
GetByEmail(ctx context.Context, email string) (*User, error)
Update(ctx context.Context, user *User) error
Delete(ctx context.Context, id string) error
List(ctx context.Context, filter UserFilter) ([]*User, error)
}
type UserFilter struct {
Email string
Limit int
Offset int
}
// SQL implementation (infrastructure layer)
type sqlUserRepository struct {
db *sql.DB
logger *slog.Logger
}
func NewSQLUserRepository(db *sql.DB, logger *slog.Logger) UserRepository {
return &sqlUserRepository{
db: db,
logger: logger,
}
}
func (r *sqlUserRepository) Create(ctx context.Context, user *User) error {
query := `
INSERT INTO users (id, email, name, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5)`
_, err := r.db.ExecContext(ctx, query,
user.ID, user.Email, user.Name, user.CreatedAt, user.UpdatedAt)
if err != nil {
r.logger.ErrorContext(ctx, "failed to create user",
"error", err,
"user_id", user.ID)
return fmt.Errorf("failed to create user: %w", err)
}
return nil
}
func (r *sqlUserRepository) GetByID(ctx context.Context, id string) (*User, error) {
query := `
SELECT id, email, name, created_at, updated_at
FROM users
WHERE id = $1`
var user User
err := r.db.QueryRowContext(ctx, query, id).Scan(
&user.ID, &user.Email, &user.Name, &user.CreatedAt, &user.UpdatedAt)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, ErrUserNotFound
}
return nil, fmt.Errorf("failed to get user: %w", err)
}
return &user, nil
}
Connecting the Dots: Remember to inject the concrete repository implementation into your services. For example:
// userService := NewUserService(NewSQLUserRepository(db, logger), logger)
This ties back to our dependency injection principles from Section 6.
The Most Common Go Bug: Loop variable capture affects even experienced developers.
❌ The Bug:
// WRONG: All goroutines process the same item!
for _, item := range items {
go func() {
process(item) // BUG: captures the loop variable reference
}()
}
What Goes Wrong: All goroutines process the last item in the slice because the loop variable item
is reused across iterations.
Why This Happens:
Iteration 1: item points to items[0] ──┐
Iteration 2: item points to items[1] ──┼─► All goroutines read this
Iteration 3: item points to items[2] ──┘ same memory location!
Production War Story: A major streaming service had recommendation bugs where all users saw the same content. The issue was loop variable capture in goroutines processing user preferences. 10 million users received identical recommendations because all goroutines captured the last user's preferences.
✅ The Fix:
// CORRECT: Each goroutine gets its own copy
for _, item := range items {
item := item // Capture the loop variable
go func() {
process(item) // Safe: each goroutine has its own item
}()
}
// Better: Use SafeGroup (recommended)
g, ctx := NewSafeGroup(ctx, logger)
for _, item := range items {
item := item
g.Go(ctx, func() error {
return process(ctx, item)
})
}
return g.Wait()
The Surprise: An interface is only nil if both its type and value are nil.
❌ The Problem:
func returnsTypedNil() error {
var err *MyError = nil
return err // This interface is NOT nil!
}
func main() {
if err := returnsTypedNil(); err != nil {
fmt.Println("Error occurred") // This will print!
}
}
type MyError struct {
msg string
}
func (e *MyError) Error() string {
return e.msg
}
Why This Happens: The interface contains a non-nil type (*MyError
) even though the value is nil.
Interface Structure:
interface{} = (Type: *MyError, Value: nil) // NOT nil interface
interface{} = (Type: nil, Value: nil) // nil interface
Production War Story: A logging service reported errors even when none occurred because functions returned typed nil errors that were treated as real errors. Alert systems fired continuously, causing on-call fatigue and masking real issues.
✅ The Fix:
func returnsNil() error {
var err *MyError = nil
if err != nil {
return err
}
return nil // Explicitly return untyped nil
}
// Better: Return concrete errors directly
func betterError() error {
if someCondition {
return &MyError{msg: "something went wrong"}
}
return nil
}
// Example condition for demonstration
var someCondition = false
The Confusion: Slice behavior depends on capacity, leading to unexpected mutations.
❌ The Problem:
func modifySlice(s []int) {
s[0] = 999 // Always modifies original
s = append(s, 42) // May or may not affect original
}
original := []int{1, 2, 3}
modifySlice(original)
// original[0] is now 999, but length unchanged!
What Goes Wrong: Index assignment always affects the underlying array, but append behavior depends on capacity.
Why This Happens:
Original: [1, 2, 3] (len=3, cap=3)
s[0] = 999: [999, 2, 3] (modifies original array)
append(s, 42): Creates new array because cap is full
original is still [999, 2, 3], append had no effect
✅ The Solution:
// Be explicit about mutation intentions
func modifySliceInPlace(s []int) {
if len(s) > 0 {
s[0] = 999 // Caller expects modification
}
}
func appendToSlice(s []int, value int) []int {
return append(s, value) // Return new slice
}
// Defensive copying when needed
func processSliceSafely(s []int) []int {
result := make([]int, len(s)) // Create copy
copy(result, s)
// Safe to modify result
for i := range result {
result[i] *= 2
}
return result
}
❌ The Problem: Maps are not thread-safe
// DANGEROUS: Concurrent map access
var cache = make(map[string]string)
func getValue(key string) string {
return cache[key] // Race condition!
}
func setValue(key, value string) {
cache[key] = value // Race condition!
}
What Goes Wrong: Concurrent reads and writes to maps cause race conditions that can corrupt data or crash the program.
Production Crash: A cache service crashed with "fatal error: concurrent map writes" during peak traffic. Multiple goroutines were updating the same map simultaneously, causing the Go runtime to detect the race condition and terminate the program.
✅ The Fix:
// Use sync.Map for concurrent access
var cache sync.Map
func getValue(key string) (string, bool) {
value, ok := cache.Load(key)
if !ok {
return "", false
}
return value.(string), true
}
func setValue(key, value string) {
cache.Store(key, value)
}
// Or use a mutex for more complex operations
type SafeCache struct {
mu sync.RWMutex
cache map[string]string
}
func NewSafeCache() *SafeCache {
return &SafeCache{
cache: make(map[string]string),
}
}
func (c *SafeCache) Get(key string) (string, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
value, ok := c.cache[key]
return value, ok
}
func (c *SafeCache) Set(key, value string) {
c.mu.Lock()
defer c.mu.Unlock()
c.cache[key] = value
}
❌ The Problem: Default HTTP client has no timeouts
// DANGEROUS: Can hang forever
func callAPI() error {
resp, err := http.Get("https://api.example.com/data")
if err != nil {
return err
}
defer resp.Body.Close()
// Process response...
return nil
}
What Goes Wrong: Without timeouts, HTTP requests can hang indefinitely, causing goroutine leaks and service degradation.
✅ The Fix: Always set timeouts
import (
"net"
"net/http"
"time"
)
// SAFE: Proper timeout configuration
func callAPISafe() error {
client := &http.Client{
Timeout: 30 * time.Second,
Transport: &http.Transport{
DialContext: (&net.Dialer{
Timeout: 10 * time.Second,
}).DialContext,
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 10 * time.Second,
IdleConnTimeout: 60 * time.Second,
MaxIdleConns: 10,
MaxIdleConnsPerHost: 2,
DisableKeepAlives: false, // Reuse connections for performance
},
}
resp, err := client.Get("https://api.example.com/data")
if err != nil {
return err
}
defer resp.Body.Close()
// Process response...
return nil
}
❌ The time.After
Leak in Loops
// BAD: Creates a new timer on each iteration that never gets garbage collected
for {
select {
case <-time.After(1 * time.Second): // ❌ Timer leak!
doWork()
case <-ctx.Done():
return
}
}
What Goes Wrong: time.After
creates a timer that isn't garbage collected until it fires. In an infinite loop, this creates thousands of timers consuming memory.
✅ The Fix: Reuse timers:
// GOOD: Reuse timer to prevent memory leaks
timer := time.NewTimer(1 * time.Second)
defer timer.Stop()
for {
select {
case <-timer.C:
doWork()
timer.Reset(1 * time.Second) // ✅ Safe reuse
case <-ctx.Done():
return
}
}
// Placeholder for doWork function
func doWork() {
// Implementation would go here
}
❌ Ticker Not Stopped:
// BAD: time.Tick cannot be stopped, keeps running forever
ticker := time.Tick(5 * time.Second) // ❌ Memory leak!
for range ticker {
if shouldStop() {
return // Timer keeps running!
}
doPeriodicWork()
}
✅ The Fix: Use NewTicker:
// GOOD: NewTicker can be stopped
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop() // ✅ Properly cleaned up
for {
select {
case <-ticker.C:
if shouldStop() {
return
}
doPeriodicWork()
case <-ctx.Done():
return
}
}
// Placeholder functions
func shouldStop() bool {
return false // Implementation would determine when to stop
}
func doPeriodicWork() {
// Implementation would go here
}
❌ The Problem: Deferred functions accumulate
// BAD: Defers accumulate, not executed per iteration
func processFiles(filenames []string) error {
for _, filename := range filenames {
file, err := os.Open(filename)
if err != nil {
return err
}
defer file.Close() // ❌ All files stay open until function returns!
// Process file...
}
return nil
}
What Goes Wrong: All defer
statements are executed when the function returns, not when the loop iteration ends. This keeps all files open simultaneously.
Production Impact: A log processing service tried to process 10,000 files using this pattern. It ran out of file descriptors and crashed because all files remained open.
✅ The Fix:
// GOOD: Use anonymous function or separate function
func processFiles(filenames []string) error {
for _, filename := range filenames {
if err := processFile(filename); err != nil {
return err
}
}
return nil
}
func processFile(filename string) error {
file, err := os.Open(filename)
if err != nil {
return err
}
defer file.Close() // ✅ Closes immediately after processing
// Process file...
return nil
}
// Alternative: Anonymous function
func processFilesAlternative(filenames []string) error {
for _, filename := range filenames {
err := func() error {
file, err := os.Open(filename)
if err != nil {
return err
}
defer file.Close() // ✅ Executes at end of anonymous function
// Process file...
return nil
}()
if err != nil {
return err
}
}
return nil
}
Go's built-in testing framework embodies the same principles as the language itself: simple, explicit, and effective. Writing testable code is a core theme throughout this guide—here are the canonical patterns for writing the tests themselves.
Decision Rationale: Go treats testing as a first-class citizen. The testing
package provides everything needed for unit tests, benchmarks, and fuzzing without external dependencies.
✅ DO: Use table-driven tests for multiple test cases
import (
"testing"
)
// Example function being tested
func Min(a, b int) int {
if a < b {
return a
}
return b
}
func TestMin(t *testing.T) {
testCases := []struct {
name string
a, b int
expected int
}{
{"a is smaller", 5, 10, 5},
{"b is smaller", 10, 5, 5},
{"equal numbers", 5, 5, 5},
{"negative numbers", -5, -10, -10},
{"zero values", 0, 0, 0},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
got := Min(tc.a, tc.b)
if got != tc.expected {
t.Errorf("Min(%d, %d) = %d; want %d", tc.a, tc.b, got, tc.expected)
}
})
}
}
Why This Pattern Works:
- Maintainable: Easy to add new test cases
- Clear: Each case is self-documenting with descriptive names
- Isolated:
t.Run
creates subtests that run independently - Debuggable: Can run specific cases with
go test -run TestMin/equal
✅ DO: Use t.Helper()
in test utility functions
// Test helper that reports failures at the correct line
func assertEqual[T comparable](t *testing.T, got, want T) {
t.Helper() // This is the magic!
if got != want {
t.Errorf("want %v, got %v", want, got) // Standard library convention: want first
}
}
func assertNoError(t *testing.T, err error) {
t.Helper()
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}
func TestUserService_CreateUser(t *testing.T) {
service := setupTestService(t)
user, err := service.CreateUser(context.Background(), &CreateUserRequest{
Email: "[email protected]",
Name: "Test User",
})
// If these fail, the error points to THIS line, not inside the helpers
assertNoError(t, err)
assertEqual(t, user.Email, "[email protected]")
assertEqual(t, user.Name, "Test User")
}
// Placeholder function for test setup
func setupTestService(t *testing.T) *UserService {
// Would return a properly configured service for testing
return &UserService{}
}
// Example types for testing
type CreateUserRequest struct {
Email string
Name string
}
type UserService struct{}
func (s *UserService) CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error) {
return &User{
Email: req.Email,
Name: req.Name,
}, nil
}
Why t.Helper()
Is Critical: Without it, test failures are reported inside the helper function, making debugging much harder.
✅ DO: Create test doubles using interfaces
import (
"context"
"errors"
"io"
"log/slog"
"testing"
)
// Mock implementation for testing
type mockUserRepository struct {
users map[string]*User
err error
}
func (m *mockUserRepository) GetUser(ctx context.Context, id string) (*User, error) {
if m.err != nil {
return nil, m.err
}
user, exists := m.users[id]
if !exists {
return nil, ErrUserNotFound
}
return user, nil
}
func (m *mockUserRepository) SaveUser(ctx context.Context, user *User) error {
if m.err != nil {
return m.err
}
m.users[user.ID] = user
return nil
}
// Test using the mock
func TestUserService_GetUser(t *testing.T) {
tests := []struct {
name string
userID string
setupMock func(*mockUserRepository)
expectedUser *User
expectedError error
}{
{
name: "user exists",
userID: "user-123",
setupMock: func(mock *mockUserRepository) {
mock.users["user-123"] = &User{
ID: "user-123",
Email: "[email protected]",
Name: "Test User",
}
},
expectedUser: &User{
ID: "user-123",
Email: "[email protected]",
Name: "Test User",
},
},
{
name: "user not found",
userID: "nonexistent",
setupMock: func(mock *mockUserRepository) {
// No setup needed - empty map
},
expectedError: ErrUserNotFound,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
mock := &mockUserRepository{users: make(map[string]*User)}
tt.setupMock(mock)
// Avoid global defaults in tests: build a logger explicitly
logger := slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{Level: slog.LevelDebug}))
service := NewUserService(mock, logger)
user, err := service.GetUser(context.Background(), tt.userID)
if tt.expectedError != nil {
if !errors.Is(err, tt.expectedError) {
t.Fatalf("expected error %v, got %v", tt.expectedError, err)
}
return // Error was expected, test passes
}
assertNoError(t, err)
assertEqual(t, user.ID, tt.expectedUser.ID)
assertEqual(t, user.Email, tt.expectedUser.Email)
})
}
}
✅ DO: Write benchmarks for performance-critical code
func BenchmarkUserProcessing(b *testing.B) {
service := setupBenchmarkService() // Setup test service with mocks
users := generateTestUsers(1000) // Create test data
b.ResetTimer() // Don't count setup time
b.ReportAllocs() // Report memory allocations
for i := 0; i < b.N; i++ {
for _, user := range users {
_ = service.ProcessUser(context.Background(), user)
}
}
}
// Benchmark different implementations
func BenchmarkStringBuilding(b *testing.B) {
tests := []struct {
name string
fn func([]string) string
}{
{"concat", concatenateStrings},
{"builder", buildStringsWithBuilder},
{"join", strings.Join},
}
words := []string{"hello", "world", "this", "is", "a", "benchmark"}
for _, test := range tests {
b.Run(test.name, func(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = test.fn(words)
}
})
}
}
// Example string building functions for benchmarking
func concatenateStrings(words []string) string {
result := ""
for _, word := range words {
result += word + " "
}
return result
}
func buildStringsWithBuilder(words []string) string {
var builder strings.Builder
for _, word := range words {
builder.WriteString(word)
builder.WriteString(" ")
}
return builder.String()
}
// Placeholder functions for benchmarking setup
func setupBenchmarkService() *UserService {
return &UserService{}
}
func generateTestUsers(count int) []User {
users := make([]User, count)
for i := 0; i < count; i++ {
users[i] = User{
ID: fmt.Sprintf("user-%d", i),
Email: fmt.Sprintf("user%[email protected]", i),
Name: fmt.Sprintf("User %d", i),
}
}
return users
}
func (s *UserService) ProcessUser(ctx context.Context, user User) error {
// Placeholder processing logic
return nil
}
Running Benchmarks:
# Run benchmarks with memory stats
go test -bench=. -benchmem
# Run with race detection
go test -race -run=^$ -bench=.
# Use benchstat for statistical comparison
go test -bench=. -count=10 > old.txt
# Make changes...
go test -bench=. -count=10 > new.txt
benchstat old.txt new.txt
✅ DO: Use fuzzing to discover edge cases
func FuzzParseEmail(f *testing.F) {
// Seed corpus with known inputs
f.Add("[email protected]")
f.Add("[email protected]")
f.Add("invalid-email")
f.Fuzz(func(t *testing.T, email string) {
result, err := ParseEmail(email)
// Invariants that should always hold
if err == nil {
// Valid email should have @ symbol
if !strings.Contains(result.Address, "@") {
t.Errorf("valid email missing @: %s", result.Address)
}
// Should be able to round-trip
if result.String() != email {
t.Errorf("round-trip failed: %s != %s", result.String(), email)
}
}
// Function should never panic
// (if it panics, the fuzzer will catch it)
})
}
// Example email parsing function for fuzzing
type EmailAddress struct {
Address string
}
func (e EmailAddress) String() string {
return e.Address
}
func ParseEmail(email string) (*EmailAddress, error) {
if !strings.Contains(email, "@") {
return nil, errors.New("invalid email format")
}
return &EmailAddress{Address: email}, nil
}
Running Fuzzing:
# Run fuzzing for 10 seconds (good for CI)
go test -fuzz=FuzzParseEmail -fuzztime=10s
# Run until failure is found
go test -fuzz=FuzzParseEmail
Pattern | When to Use | Example |
---|---|---|
Table-Driven | Multiple similar test cases | Validation functions, algorithms |
Subtests | Grouping related tests | t.Run() for test organization |
Mocks/Stubs | External dependencies | Database, HTTP clients, time |
Benchmarks | Performance-critical code | Parsers, serialization, algorithms |
Fuzzing | Edge case discovery | Input validation, parsers |
Integration | End-to-end flows | HTTP handlers, database operations |
❌ Common Testing Mistakes:
Anti-Pattern | Why It's Wrong | Better Approach |
---|---|---|
No t.Helper() |
Confusing error locations | Always use in test utilities |
Testing Implementation | Brittle, breaks on refactoring | Test behavior, not internals |
Shared Mutable State | Tests affect each other | Create fresh test data per test |
No Error Testing | Missing unhappy path coverage | Test both success and failure cases |
Slow Tests | Developers skip running them | Use mocks, avoid real I/O |
Anti-Pattern | Why It's Wrong | Production Impact | Fix |
---|---|---|---|
Global Variables | Hidden dependencies, testing nightmare | Cannot test in isolation, race conditions, debugging hell | Dependency injection |
Context in Structs | Stale contexts, lifecycle issues | Goroutine leaks, cancelled operations, memory leaks | Context as parameters |
String Context Keys | Key collisions across packages | Silent data corruption, debugging nightmare | Unexported struct pointer keys |
sync.WaitGroup without error handling | No error handling, no cancellation | Silent failures, resource leaks, no fail-fast | Use errgroup |
Error Equality == |
Breaks with wrapped errors | Wrong HTTP status codes, broken retry logic | Use errors.Is/As |
Ignoring Errors | Silent failures, data corruption | Money transfers fail silently, data loss | Always handle errors |
Hard-coded Dependencies | Cannot test, no flexibility | Impossible unit testing, deployment issues | Interface injection |
✅ Architecture:
- All dependencies injected through constructors
- Interfaces defined where they're used
- No global state (except errors/constants)
- Context flows through parameters
- Structured logging with correlation IDs
✅ Error Handling:
- All errors handled explicitly
- Use
errors.Is/As
for error checking - Rich error context with wrapping
- Custom error types for structured handling
- No panics in production paths
✅ Concurrency:
- Use errgroup instead of sync.WaitGroup
- Panic recovery in concurrent code
- Context cancellation respected
- Resource limits set (
g.SetLimit
) - Loop variables captured correctly
✅ Performance:
- Pre-allocate slices with known capacity
- Use object pooling for expensive allocations
- Understand escape analysis implications
- Avoid unnecessary pointer escapes
- Profile before optimizing
✅ Code Quality:
- Intention-revealing names
- Small, focused functions
- Single responsibility principle
- No god packages or interfaces
- Consistent error handling patterns
Type | Pattern | Example | Notes |
---|---|---|---|
Package | Short, lowercase | user , http |
No underscores |
Interface | Behavior + -er | Reader , UserFinder |
Single method = -er suffix |
Struct | Clear noun | User , HTTPClient |
No stuttering |
Method | Verb phrase | GetUser , Close |
Action-oriented |
Variable | Scope-appropriate | u (short), currentUser (long) |
Short scope = short name |
Constant | Descriptive | DefaultTimeout |
Clear intent |
Error Variable | Err prefix | ErrNotFound |
Standard pattern |
Go succeeds because it optimizes for the right things:
- Reading over Writing: Code is read 10x more than written
- Clarity over Cleverness: Simple solutions beat complex ones
- Explicit over Implicit: Make dependencies and error paths visible
- Composition over Inheritance: Build from small, composable parts
- Practicality over Purity: Choose what works in production
Remember: Following these patterns doesn't just make your Go code better—it makes you a better engineer. You learn to think in systems, value simplicity, design for testing, handle failures gracefully, and build for observability.
Go teaches us that great software engineering is about building reliable, maintainable systems—not showing off how clever we are.
Authoritative Sources:
- Effective Go - The foundational document
- Google Go Style Guide - Google's internal standards
- Uber Go Style Guide - Battle-tested patterns
- Go Code Review Comments - Community wisdom
- golang.org/x/sync/errgroup - Modern concurrency
- log/slog Package - Structured logging standard
- Contexts and structs - Context lifecycle management
Performance and Memory:
- Go Memory Model - Understanding concurrency
- Escape Analysis - Stack vs heap allocation
- Go GC Guide - Garbage collector behavior
Modern Go Features:
- Go Generics Tutorial - Type parameters
- When to Use Generics - Official guidance
- Go 1.24 Release Notes - Latest features
"The key to making programs fast is to make them do practically nothing." - Mike Haertel
This guide represents the collective wisdom of the Go community, distilled into actionable patterns for building production-grade applications. Master these patterns, and you'll write Go code that is not only correct and performant, but maintainable, testable, and joyful to work with.
Contributing: This guide evolves with the Go community. Share your production experiences, war stories, and improvements to help fellow engineers build better Go applications.**: Consistent error naming enables reliable error handling with errors.Is
and makes error types instantly recognizable in logs and debugging.
@kaspertoq sorry for deletion of my last guide. I thought no one would use it. I uploaded a newer version with easy accesibility. Please check now,