Skip to content

Instantly share code, notes, and snippets.

@embano1
Created December 15, 2025 12:21
Show Gist options
  • Select an option

  • Save embano1/d55544749630ae0917303b17efc990dc to your computer and use it in GitHub Desktop.

Select an option

Save embano1/d55544749630ae0917303b17efc990dc to your computer and use it in GitHub Desktop.
durable-functions-clinerule.md

AWS Lambda Durable Functions SDK - Cline Development Guide

Prerequisites

Required Knowledge:

  • TypeScript/JavaScript fundamentals
  • AWS Lambda concepts and deployment
  • Promise-based asynchronous programming
  • Basic understanding of state machines and workflows

System Requirements:

  • Node.js 18+ for CDK (recommended: 22+ to align with Lambda durable functions required runtime)
  • AWS CLI configured with appropriate permissions
  • AWS CDK 2.232.1+ (for durableConfig support)
  • TypeScript 4.5+ for proper type support

AWS Permissions Required:

  • AWSLambdaBasicDurableExecutionRolePolicy
  • lambda:InvokeFunction when using durable invokes
  • SendDurableExecutionCallbackSuccess (and related operations) when using callbacks
  • logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents (for logging, included in managed policy)
  • CloudFormation permissions (for CDK deployments)

Overview

AWS Lambda durable functions extends Lambda's programming model to build multi-step applications and AI workflows with automatic state persistence. This feature enables applications that can run for days or months, survive failures, and only incur charges for actual compute time when waiting on external events such as human-in-the-loop processes.

Core Concepts

The Three Core Primitives

  1. Steps - Execute business logic with automatic checkpointing and transparent retries
  2. Waits - Suspend execution without compute charges (for delays, human approvals, scheduled tasks)
  3. Durable Invokes - Reliable function chaining for modular, composable architectures

The Replay Model

CRITICAL CONCEPT: Durable functions use a "replay" execution model:

  • On first invocation: Code executes normally, steps run and checkpoint results
  • On replay (after wait/failure/resume): Code runs from the beginning
  • Steps that already completed return their checkpointed results WITHOUT re-executing
  • Code OUTSIDE steps executes again on every replay

This means: All non-deterministic code MUST be inside steps, or replay will produce inconsistent results.


Critical Rules & Constraints

Rule 1: Deterministic Code Outside Steps

ALL code outside context.step() MUST be deterministic (produce same result every time).

// ❌ WRONG: Non-deterministic code outside steps
export const handler = withDurableExecution(async (event, context) => {
  const id = uuid.v4(); // ⚠️ Different UUID on each replay!
  const timestamp = Date.now(); // ⚠️ Different time on each replay!
  const random = Math.random(); // ⚠️ Different value on each replay!

  await context.step(async () => processData(id, timestamp, random));
  return { id, timestamp, random };
});

// ✅ CORRECT: Non-deterministic code inside steps
export const handler = withDurableExecution(async (event, context) => {
  const id = await context.step("generate-id", async () => uuid.v4());
  const timestamp = await context.step("get-time", async () => Date.now());
  const random = await context.step("random", async () => Math.random());

  await context.step(async () => processData(id, timestamp, random));
  return { id, timestamp, random };
});

Must be in steps:

  • Date.now(), new Date(), timestamp generation
  • Math.random(), UUID generation
  • API calls (may return different results)
  • Database queries
  • File system operations
  • Any external I/O

Rule 2: No Nested Durable Operations

You CANNOT call durable operations (step, wait, etc.) inside a step function.

// ❌ WRONG: Nested durable operations
await context.step("process-order", async () => {
  await context.wait({ seconds: 1 }); // ❌ ERROR: context not available
  await context.step(async () => ...); // ❌ ERROR: cannot nest
  return result;
});

// ✅ CORRECT: Use runInChildContext for grouping
await context.runInChildContext("process-order", async (childCtx) => {
  await childCtx.wait({ seconds: 1 }); // ✅ childCtx has full capabilities
  const step1 = await childCtx.step(async () => validateOrder(order));
  const step2 = await childCtx.step(async () => chargePayment(step1));
  return step2;
});

ESLint Rule: Install @aws/durable-execution-sdk-js-eslint-plugin to catch this at development time.

Rule 3: Closure Variable Mutations Are Lost on Replay

Variables mutated inside steps are NOT preserved across replays.

// ❌ WRONG: Counter mutations lost on replay
export const handler = withDurableExecution(async (event, context) => {
  let counter = 0;

  await context.step(async () => {
    counter++; // Mutation happens during step execution
    return saveToDatabase(counter);
  });

  console.log(counter);
  // During first execution: 1 (mutation preserved)
  // During replay: 0 (mutation lost - step didn't re-execute!)
  // ⚠️ INCONSISTENT BEHAVIOR!

  return counter; // Returns different values!
});

// ✅ CORRECT: Return values from steps, don't rely on mutations
export const handler = withDurableExecution(async (event, context) => {
  let counter = 0;

  counter = await context.step(async () => {
    const newValue = counter + 1;
    await saveToDatabase(newValue);
    return newValue; // Return the new value
  });

  console.log(counter); // Consistently 1 on both execution and replay
  return counter;
});

// ✅ ALSO CORRECT: Keep state in returned objects
export const handler = withDurableExecution(async (event, context) => {
  const state = await context.step(async () => {
    return { counter: 1 };
  });

  console.log(state.counter); // Always 1
  return state;
});

Rule 4: Avoid Side Effects Outside Steps

Side effects (API calls, database writes) outside steps happen on EVERY replay.

Exception: context.logger is replay-aware and designed to be used anywhere. It automatically deduplicates log messages across replays and enriches logs with execution context (execution ID, step name, etc.).

// ❌ WRONG: console.log and API calls outside steps
export const handler = withDurableExecution(async (event, context) => {
  console.log("Processing user", event.userId); // Logs multiple times on replay!
  await sendEmail(event.userEmail, "Starting"); // ⚠️ Sends multiple emails!

  await context.step("process", async () => processUser(event.userId));

  console.log("User processed"); // Also logs on every replay!
  return "success";
});

// ✅ CORRECT: Use context.logger outside steps (replay-aware)
export const handler = withDurableExecution(async (event, context) => {
  // context.logger is safe to use anywhere - it's replay-aware
  context.logger.info("Processing user", { userId: event.userId });

  const result = await context.step("process", async (stepCtx) => {
    // Inside steps, use stepCtx.logger for step-scoped logging
    stepCtx.logger.debug("Executing process step");
    return processUser(event.userId);
  });

  context.logger.info("User processed", { result });
  return result;
});

Logging Best Practices:

  • ✅ Use context.logger outside steps for workflow-level logging
  • ✅ Use stepCtx.logger inside steps for step-scoped logging
  • ✅ Provide your own logger via context.configureLogger()
  • ❌ Avoid console.log - it lacks execution context and repeats on replay

SDK API Reference

Core Handler Wrapper

import {
  withDurableExecution,
  DurableContext,
} from "@aws/durable-execution-sdk-js";

// Wrap your Lambda handler
export const handler = withDurableExecution(
  async (event: any, context: DurableContext) => {
    // Your durable workflow code here
    return result;
  },
);

context.step() - Execute Atomic Operations

Execute a function with automatic retry and state persistence.

// Basic step
const result = await context.step(async () => {
  return fetchUserFromAPI(userId);
});

// Named step (recommended for operational visibility)
const result = await context.step("fetch-user", async () => {
  return fetchUserFromAPI(userId);
});

// Step with retry configuration
const result = await context.step("api-call", async () => callExternalAPI(), {
  retryStrategy: (error, attemptCount) => {
    if (attemptCount >= 3) {
      return { shouldRetry: false };
    }
    return {
      shouldRetry: true,
      delay: { seconds: Math.pow(2, attemptCount) },
    };
  },
  semantics: StepSemantics.AtMostOncePerRetry, // or AtLeastOncePerRetry
  serdes: customSerdes, // Custom serialization
});

Important: Step functions receive a StepContext (for logging only), NOT a full DurableContext. Cannot nest durable operations.

context.wait() - Pause Execution

Suspend execution for a duration without compute charges.

// Wait 5 seconds
await context.wait({ seconds: 5 });

// Wait with multiple units
await context.wait({ hours: 1, minutes: 30, seconds: 15 });

// Named wait (recommended)
await context.wait("rate-limit-delay", { seconds: 30 });

// Long delays (e.g., 7 days)
await context.wait("followup-delay", { days: 7 });

Duration object: { days?, hours?, minutes?, seconds? }

context.invoke() - Call Other Durable Functions

Invoke another durable Lambda function with guaranteed idempotency.

⚠️ Qualified Invocation Required: When invoking durable functions, you MUST use a qualified function name (with version or alias, e.g., my-function:$LATEST or my-function:production). Non-durable Lambda functions can be invoked without qualification.

💡 Best Practice: Inject function ARNs/names dynamically via environment variables set in your CDK stack, rather than hardcoding them. This enables proper multi-environment deployments and avoids deployment-time coupling.

// ✅ RECOMMENDED: Use environment variables for function ARNs
// Set in CDK: environment: { PAYMENT_PROCESSOR_ARN: paymentFunction.functionArn }
const paymentProcessorArn = process.env.PAYMENT_PROCESSOR_ARN!;

const result = await context.invoke(
  "process-payment",
  paymentProcessorArn,
  { amount: 100, currency: "USD" },
);

// With custom serialization options
const orderProcessorArn = process.env.ORDER_PROCESSOR_ARN!;

const result = await context.invoke(
  "process-order",
  orderProcessorArn,
  orderData,
  { serdes: customSerdes },
);

// ❌ AVOID: Hardcoded ARNs (works but couples code to specific deployment)
const result = await context.invoke(
  "arn:aws:lambda:us-east-1:123456789012:function:payment-processor:prod",
  { amount: 100, currency: "USD" },
);

context.runInChildContext() - Group Operations

Run multiple durable operations with isolated state tracking.

// Basic child context
const result = await context.runInChildContext(async (childCtx) => {
  const step1 = await childCtx.step(async () => validate(data));
  await childCtx.wait({ seconds: 1 });
  const step2 = await childCtx.step(async () => transform(step1));
  return step2;
});

// Named child context
const result = await context.runInChildContext(
  "process-batch",
  async (childCtx) => {
    // Child context has its own step counter and state
    const validated = await childCtx.step("validate", async () =>
      validate(data),
    );
    const transformed = await childCtx.step("transform", async () =>
      transform(validated),
    );
    return transformed;
  },
  {
    subType: "batch-processor", // Optional subtype for tracking
    serdes: customSerdes,
  },
);

context.waitForCallback() - External System Integration

Wait for external systems to complete operations.

// Basic callback
const result = await context.waitForCallback(
  async (callbackId, ctx) => {
    // Submit callback ID to external system
    await submitToExternalAPI(callbackId);
  },
  { timeout: { minutes: 5 } },
);

// Named callback with submitter function
const result = await context.waitForCallback(
  "wait-for-approval",
  async (callbackId, ctx) => {
    ctx.logger.info("Sending approval request", { callbackId });
    await sendApprovalEmail(callbackId);
  },
  {
    timeout: { hours: 24 },
    serdes: customSerdes,
  },
);

// External system completes with:
// SendDurableExecutionCallbackSuccess(callbackId, result)
// SendDurableExecutionCallbackFailure(callbackId, error)

context.createCallback() - Manual Callback Management

Create a callback ID for external systems to use.

// Create callback and get ID
const [callbackPromise, callbackId] = await context.createCallback(
  "external-approval",
  { timeout: { hours: 1 } },
);

// Send callback ID to external system
await sendToExternalSystem(callbackId);

// Wait for external system to complete
const result = await callbackPromise;

context.waitForCondition() - Polling Pattern

Wait for a condition by periodically checking state.

const finalState = await context.waitForCondition(
  "wait-for-job-completion",
  async (currentState, ctx) => {
    // Check current state and return updated state
    const status = await checkJobStatus(currentState.jobId);
    return { ...currentState, status };
  },
  {
    initialState: { jobId: "job-123", status: "pending" },
    waitStrategy: (state, attempt) => {
      if (state.status === "completed") {
        return { shouldContinue: false }; // Stop polling
      }
      return {
        shouldContinue: true,
        delay: { seconds: Math.min(attempt * 2, 60) }, // Exponential backoff
      };
    },
    serdes: customSerdes,
  },
);

context.map() - Process Arrays with Concurrency Control

Map over items with durable operations, controlled concurrency, and completion policies.

const items = [1, 2, 3, 4, 5];

const results = await context.map(
  "process-items",
  items,
  async (ctx, item, index) => {
    return await ctx.step(`process-${index}`, async () => {
      return item * 2;
    });
  },
  {
    maxConcurrency: 2, // Process 2 at a time
    completionConfig: {
      minSuccessful: 4, // Need at least 4 successes
      toleratedFailureCount: 1, // Can tolerate 1 failure
    },
    itemNamer: (item, index) => `Item-${index}`, // Custom names
  },
);

// Check results
console.log(
  `Succeeded: ${results.successCount}, Failed: ${results.failureCount}`,
);
results.throwIfError(); // Throws if completion policy not met

// Get individual results
const allResults = results.getResults(); // [2, 4, 6, 8, 10]
const succeeded = results.getSucceeded(); // Successful results only
const failed = results.getFailed(); // Failed results with errors

context.parallel() - Execute Functions in Parallel

Run multiple JavaScript functions (in-process) concurrently with controlled execution. Each branch receives a child context for durable operations.

Note: This executes local JavaScript functions in parallel, not separate Lambda invocations. To call other durable Lambda functions in parallel, use context.invoke() within your parallel branches.

// Basic parallel with unnamed branches
const results = await context.parallel([
  async (ctx) => ctx.step(async () => "result1"),
  async (ctx) => ctx.step(async () => "result2"),
  async (ctx) => ctx.step(async () => "result3"),
]);

// Named parallel with named branches
const results = await context.parallel(
  "parallel-operations",
  [
    {
      name: "task1",
      func: async (ctx) => await ctx.step(async () => fetchData1()),
    },
    {
      name: "task2",
      func: async (ctx) => await ctx.step(async () => fetchData2()),
    },
    async (ctx) => await ctx.step(async () => fetchData3()), // Unnamed branch
  ],
  {
    maxConcurrency: 2,
    completionConfig: {
      minSuccessful: 2,
      toleratedFailurePercentage: 33,
    },
  },
);

// Type-safe parallel (all same type)
const results = await context.parallel<string>([
  async (ctx) => ctx.step(async () => "task1"),
  async (ctx) => ctx.step(async () => "task2"),
]);

context.promise.* - Promise Combinators

For fast, in-memory operations (prefer map() or parallel() for durable operations).

// promise.all - Wait for all to resolve
const [user, posts, comments] = await context.promise.all([
  fetchUser(userId),
  fetchPosts(userId),
  fetchComments(userId),
]);

// promise.allSettled - Wait for all to settle
const results = await context.promise.allSettled([
  fetchData1(),
  fetchData2(),
  fetchData3(),
]);

// promise.any - First successful result
const result = await context.promise.any([
  fetchFromPrimary(),
  fetchFromSecondary(),
  fetchFromCache(),
]);

// promise.race - First to settle (resolve or reject)
const result = await context.promise.race([
  fetchFromAPI(userId),
  new Promise((_, reject) =>
    setTimeout(() => reject(new Error("Timeout")), 5000),
  ),
]);

// All support optional names
await context.promise.all("fetch-all-data", promises);

⚠️ Important: Promise combinators accept already-executing promises. They cannot:

  • Control concurrency
  • Provide durability (survive Lambda timeouts)
  • Implement completion policies
  • Retry individual operations

Use map() or parallel() instead for:

  • Concurrency control
  • Durability across replays
  • Completion policies
  • Per-item retry strategies

Configuration Options

Retry Strategies

import {
  retryPresets,
  createRetryStrategy,
  JitterStrategy,
} from "@aws/durable-execution-sdk-js";

// Use built-in presets
await context.step("api-call", async () => callAPI(), {
  retryStrategy: retryPresets.exponentialBackoff(),
});

// Custom retry strategy
await context.step("custom-retry", async () => riskyOperation(), {
  retryStrategy: (error, attemptCount) => ({
    shouldRetry: attemptCount < 5 && error.message.includes("timeout"),
    delay: { seconds: attemptCount * 2 },
  }),
});

// Advanced retry with builder
const retryStrategy = createRetryStrategy({
  maxAttempts: 5,
  initialDelay: { seconds: 1 },
  maxDelay: { seconds: 60 },
  exponentialDelayFactor: 2,
  jitterStrategy: JitterStrategy.FULL, // NONE, FULL, HALF
});

Step Semantics

import { StepSemantics } from "@aws/durable-execution-sdk-js";

// At-most-once per retry (default) - idempotent operations
await context.step("update-db", async () => updateDatabase(), {
  semantics: StepSemantics.AtMostOncePerRetry,
});

// At-least-once per retry - can execute multiple times per retry
await context.step("send-notification", async () => sendEmail(), {
  semantics: StepSemantics.AtLeastOncePerRetry,
});

Custom Serialization

import {
  createClassSerdes,
  createClassSerdesWithDates,
  defaultSerdes,
} from "@aws/durable-execution-sdk-js";

// Class with Date fields
class User {
  constructor(
    public name: string,
    public createdAt: Date,
    public updatedAt: Date,
  ) {}
}

const result = await context.step(
  "create-user",
  async () => new User("Alice", new Date(), new Date()),
  {
    serdes: createClassSerdesWithDates(User, ["createdAt", "updatedAt"]),
  },
);

// Simple class serialization
class MyClass {
  constructor(public value: string) {}
}

const serdes = createClassSerdes(MyClass);

// Use default JSON serialization
const serdes = defaultSerdes;

Completion Policies (map/parallel)

// Minimum successful items
const results = await context.map(items, processFn, {
  completionConfig: {
    minSuccessful: 8, // Need at least 8 successes
  },
});

// Tolerate failures by count
const results = await context.parallel(branches, {
  completionConfig: {
    toleratedFailureCount: 2, // Can tolerate up to 2 failures
  },
});

// Tolerate failures by percentage
const results = await context.map(items, processFn, {
  completionConfig: {
    toleratedFailurePercentage: 20, // Can tolerate 20% failures
  },
});

// Combine constraints
const results = await context.map(items, processFn, {
  completionConfig: {
    minSuccessful: 8,
    toleratedFailureCount: 2,
    toleratedFailurePercentage: 20,
    // Stops when first condition is met
  },
});

Testing Guide

⚠️ CRITICAL TESTING PATTERNS

DO:

  • ✅ Use runner.getOperation("name") to find operations
  • ✅ Use WaitingOperationStatus.STARTED for callback operations
  • ✅ JSON.stringify callback parameters: sendCallbackSuccess(JSON.stringify(data))
  • ✅ Parse callback results: JSON.parse(result.approval)
  • ✅ Name all operations for test reliability

DON'T:

  • ❌ Use getOperationByIndex() unless absolutely necessary
  • ❌ Assume operation indices are stable (parallel creates nested ops)
  • ❌ Send objects to sendCallbackSuccess - stringify first!
  • ❌ Forget that callback results are JSON strings
  • ❌ Use incorrect enum values (check @aws-sdk/client-lambda for current OperationType values)

Local Testing Setup

Install testing library:

npm install --save-dev @aws/durable-execution-sdk-js-testing

Basic Test Structure

import { LocalDurableTestRunner } from "@aws/durable-execution-sdk-js-testing";
import { handler } from "./my-handler";

describe("My Durable Function", () => {
  // Setup test environment once per suite
  beforeAll(() => LocalDurableTestRunner.setupTestEnvironment({ skipTime: true }));
  afterAll(() => LocalDurableTestRunner.teardownTestEnvironment());

  it("should process user data", async () => {
    const runner = new LocalDurableTestRunner({
      handlerFunction: handler,
    });

    // Run the handler
    const execution = await runner.run({ userId: "123" });

    // Verify result
    expect(execution.getResult()).toEqual({ success: true });

    // Verify operations
    expect(execution.getOperations()).toHaveLength(3);

    // Check specific operation
    const stepOp = runner.getOperationByIndex(0);
    expect(stepOp.getType()).toBe(OperationType.STEP);
    expect(stepOp.getStatus()).toBe(OperationStatus.SUCCEEDED);
    expect(stepOp.getStepDetails()?.result).toEqual("user processed");
  });
});

Test Helper Pattern

import { createTests } from "./utils/test-helper";

createTests({
  name: "step-basic test",
  functionName: "step-basic",
  handler,
  tests: (runner, isCloud) => {
    it("should execute step", async () => {
      const execution = await runner.run({ input: "test" });

      expect(execution.getResult()).toStrictEqual("step completed");
      expect(execution.getOperations()).toHaveLength(1);

      const stepOp = runner.getOperationByIndex(0);
      expect(stepOp.getType()).toBe(OperationType.STEP);
      expect(stepOp.getStatus()).toBe(OperationStatus.SUCCEEDED);
    });
  },
});

Testing Durable Invokes

it("should test invoke operations", async () => {
  const childHandler = withDurableExecution(
    async (input: any, context: DurableContext) => {
      return await context.step(async () => `processed-${input.value}`);
    },
  );

  const runner = new LocalDurableTestRunner({
    handlerFunction: mainHandler,
  });

  // Register child function
  runner.registerDurableFunction("child-function", childHandler);

  const execution = await runner.run({ value: "test" });
  expect(execution.getResult()).toEqual("processed-test");
});

Testing waitForCallback Operations

CRITICAL: When testing callback operations, follow the waitForData (with correct status) pattern to avoid flaky tests due to promise races:

import { WaitingOperationStatus } from "@aws/durable-execution-sdk-js-testing";

it("should handle waitForCallback", async () => {
  const runner = new LocalDurableTestRunner({
    handlerFunction: handler,
  });

  // Start execution (it will pause at callback)
  const executionPromise = runner.run({ payload: data });

  // Get callback operation BY NAME (not by index!)
  const callbackOp = runner.getOperation("wait-for-approval");

  // Wait for operation to reach STARTED status
  await callbackOp.waitForData(WaitingOperationStatus.STARTED);

  // Send callback success - MUST use JSON.stringify!
  await callbackOp.sendCallbackSuccess(
    JSON.stringify({ approved: true, note: "Approved" }),
  );

  const execution = await executionPromise;

  // Parse result - waitForCallback returns JSON string!
  const result: any = execution.getResult();
  const approval =
    typeof result.approval === "string"
      ? JSON.parse(result.approval)
      : result.approval;

  expect(approval.approved).toBe(true);
});

Key Points:

  • Always use runner.getOperation("operation-name") not getOperationByIndex()
  • Use WaitingOperationStatus.STARTED to ensure it successfully started (avoid flaky tests due to promise races)
  • Callback data MUST be JSON stringified: JSON.stringify(data)
  • Callback results are returned as JSON strings - parse them before assertions
  • Call sendCallbackSuccess()/sendCallbackFailure() on the operation, not the runner

For callback failures:

await callbackOp.sendCallbackFailure({
  ErrorMessage: "Rejection reason",
});

const execution = await executionPromise;
expect(execution.getError()).toBeDefined();

Testing Parallel Operations

When testing context.parallel(), understand the operation structure:

it("should verify parallel execution", async () => {
  const runner = new LocalDurableTestRunner({
    handlerFunction: handler,
  });

  const execution = await runner.run({ payload: data });

  // Get operations BY NAME for reliability
  const parallelOp = runner.getOperation("parallel-processing");
  const combineOp = runner.getOperation("combine-results");

  // Verify named branches exist
  const branchA = runner.getOperation("process-dataset-a");
  const branchB = runner.getOperation("process-dataset-b");

  expect(parallelOp).toBeDefined();
  expect(branchA).toBeDefined();
  expect(branchB).toBeDefined();
  expect(combineOp).toBeDefined();
});

IMPORTANT: Parallel operations create child contexts with nested operations. Don't rely on operation indices - always use operation names.

Operation Verification

import {
  OperationType,
  OperationStatus,
} from "@aws/durable-execution-sdk-js-testing";

const execution = await runner.run();

// Get by index
const op1 = runner.getOperationByIndex(0);

// Get by name
const op2 = runner.getOperation("fetch-user");

// Get by name and index (for repeated names)
const op3 = runner.getOperationByNameAndIndex("process-item", 2);

// Get by ID
const op4 = runner.getOperationById("op-id-123");

// Verify operation details
expect(op1.getType()).toBe(OperationType.STEP);
expect(op1.getStatus()).toBe(OperationStatus.SUCCEEDED);
expect(op1.getStepDetails()?.result).toEqual("expected result");

// Available operation types (from @aws-sdk/client-lambda)
OperationType.STEP;
OperationType.WAIT;
OperationType.CALLBACK;
OperationType.CHAINED_INVOKE;
OperationType.CONTEXT;
OperationType.EXECUTION;

Cloud Testing

import { CloudDurableTestRunner } from "@aws/durable-execution-sdk-js-testing";

const runner = new CloudDurableTestRunner({
  functionName: "my-deployed-lambda",
  clientConfig: {
    endpoint: process.env.LAMBDA_ENDPOINT,
    region: "us-east-1",
  },
});

const execution = await runner.run({ input: "test" });
expect(execution.getResult()).toBeDefined();

Test Runner API Patterns

Issue: Incorrect test runner API usage causes type errors.

// ❌ WRONG: Type errors and incorrect parameter structure
const execution = await runner.run({ name: "Alice" });
const result = execution.getResult();
expect(result.greeting).toBe("Hello, Alice!"); // Error: 'result' is of type 'unknown'

// ✅ CORRECT: Use payload wrapper and type casting
const execution = await runner.run({ payload: { name: "Alice" } });
const result = execution.getResult() as any;
expect(result.greeting).toBe("Hello, Alice!");

Critical Test Runner Patterns:

  1. Payload Structure: Always wrap event data in payload object
  2. Type Casting: Cast getResult() to specific type or any
  3. Operation Access: Use getOperation("name") not getOperationByIndex()
// Complete test pattern
it("should execute handler", async () => {
  const runner = new LocalDurableTestRunner({
    handlerFunction: handler,
  });

  // ✅ Correct payload structure
  const execution = await runner.run({
    payload: { name: "Alice" },
  });

  // ✅ Type casting for result
  const result = execution.getResult() as {
    greeting: string;
    message: string;
    timestamp: string;
  };

  // ✅ Access operations by name
  const greetingStep = runner.getOperation("generate-greeting");
  expect(greetingStep.getStepDetails()?.result).toBe("Hello, Alice!");
});

Project Setup from Scratch

1. Install Dependencies

npm install @aws/durable-execution-sdk-js
npm install --save-dev @aws/durable-execution-sdk-js-eslint-plugin
npm install --save-dev @aws/durable-execution-sdk-js-testing
npm install --save-dev @types/aws-lambda

2. Create Lambda Handler

// handler.ts
import {
  withDurableExecution,
  DurableContext,
} from "@aws/durable-execution-sdk-js";

interface MyInput {
  userId: string;
}

export const handler = withDurableExecution(
  async (event: MyInput, context: DurableContext) => {
    // Your durable workflow
    const user = await context.step("fetch-user", async () => {
      return fetchUser(event.userId);
    });

    await context.wait({ seconds: 5 });

    const result = await context.step("process-user", async () => {
      return processUser(user);
    });

    return result;
  },
);

3. Configure ESLint

Install the ESLint plugin to prevent common mistakes:

npm install --save-dev @aws/durable-execution-sdk-js-eslint-plugin

Option A: Using ESLint flat config (eslint.config.js)

// eslint.config.js
import durableExecutionPlugin from "@aws/durable-execution-sdk-js-eslint-plugin";

export default [
  {
    plugins: {
      "@aws/durable-execution-sdk-js": durableExecutionPlugin,
    },
    rules: {
      "@aws/durable-execution-sdk-js/no-nested-durable-operations": "error",
    },
  },
];

Option B: Using recommended config

// eslint.config.js
import durableExecutionPlugin from "@aws/durable-execution-sdk-js-eslint-plugin";

export default [
  durableExecutionPlugin.configs.recommended,
  // Your other configs...
];

Option C: Legacy .eslintrc.json

{
  "plugins": ["@aws/durable-execution-sdk-js-eslint-plugin"],
  "extends": ["plugin:@aws/durable-execution-sdk-js-eslint-plugin/recommended"],
  "rules": {
    "@aws/durable-execution-sdk-js-eslint-plugin/no-nested-durable-operations": "error"
  }
}

4. TypeScript Configuration

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "commonjs",
    "lib": ["ES2022"],
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "moduleResolution": "node",
    "resolveJsonModule": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules", "dist"]
}

5. Package.json Scripts

{
  "scripts": {
    "build": "tsc",
    "test": "jest",
    "lint": "eslint src/**/*.ts"
  }
}

Deploying Durable Functions with AWS CDK

AWS Cloud Development Kit (CDK) provides a streamlined way to deploy durable Lambda functions with proper infrastructure configuration.

1. CDK Project Setup

Install CDK dependencies:

npm install aws-cdk-lib constructs
npm install -g aws-cdk  # Install CDK CLI globally (or without -g for local installation)

Project Structure:

my-cdk-project/
├── bin/
│   └── app.ts              # CDK app entry point
├── lib/
│   ├── stack.ts            # CDK stack definition
│   └── lambda/
│       ├── handler.ts      # Durable function handler
│       └── handler.test.ts # Tests
├── cdk.json                # CDK configuration
├── package.json
├── tsconfig.json
└── jest.config.js

2. Creating the CDK Stack

Create your stack file (lib/my-stack.ts):

import * as cdk from 'aws-cdk-lib/core';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as nodejs from 'aws-cdk-lib/aws-lambda-nodejs';
import { Construct } from 'constructs';
import * as path from 'path';

export class MyDurableStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Create a Lambda function for durable execution
    const durableFunction = new nodejs.NodejsFunction(this, 'MyDurableFunction', {
      functionName: 'my-durable-function',
      description: 'Durable function with step-wait-step pattern',
      runtime: lambda.Runtime.NODEJS_22_X,
      handler: 'handler',
      entry: path.join(__dirname, 'lambda', 'handler.ts'),
      
      // Standard Lambda timeout (per invocation)
      timeout: cdk.Duration.minutes(1),
      memorySize: 256,
      
      // Durable execution configuration
      durableConfig: {
        executionTimeout: cdk.Duration.minutes(15), // Max durable execution time
        retentionPeriod: cdk.Duration.days(1),      // State retention period
      },
      
      // Bundling configuration
      bundling: {
        minify: true,
        sourceMap: true,
        externalModules: [], // Bundle all dependencies including SDK
      },
      
      // Environment variables
      environment: {
        NODE_OPTIONS: '--enable-source-maps',
      },
    });

    // Output the function ARN
    new cdk.CfnOutput(this, 'FunctionArn', {
      value: durableFunction.functionArn,
      description: 'ARN of the durable function',
    });

    // Output the function name
    new cdk.CfnOutput(this, 'FunctionName', {
      value: durableFunction.functionName,
      description: 'Name of the durable function',
    });
  }
}

3. CDK Configuration (cdk.json)

{
  "app": "npx ts-node --prefer-ts-exts bin/app.ts",
  "watch": {
    "include": ["**"],
    "exclude": [
      "README.md",
      "cdk*.json",
      "**/*.d.ts",
      "**/*.js",
      "tsconfig.json",
      "package*.json",
      "yarn.lock",
      "node_modules",
      "test"
    ]
  },
  "context": {
    "@aws-cdk/aws-lambda:recognizeLayerVersion": true,
    "@aws-cdk/core:checkSecretUsage": true,
    "@aws-cdk/core:target-partitions": ["aws", "aws-cn"]
  }
}

4. Understanding Durable Configuration

durableConfig Properties

⚠️ Version Requirement: The durableConfig property requires a recent aws-cdk-lib version like 2.232.1 or higher. If you're using an older version, you'll need to upgrade your CDK dependencies.

durableConfig: {
  // Maximum time for the entire durable execution
  executionTimeout: cdk.Duration.days(30),
  
  // How long to retain execution state after completion
  retentionPeriod: cdk.Duration.days(14),
}

5. Lambda Handler Structure

Create your handler file (lib/lambda/handler.ts):

import { withDurableExecution, DurableContext } from '@aws/durable-execution-sdk-js';

interface MyInput {
  userId: string;
}

export const handler = withDurableExecution(
  async (event: MyInput, context: DurableContext) => {
    // Step 1: Fetch data
    const user = await context.step('fetch-user', async (stepCtx) => {
      stepCtx.logger.info('Fetching user', { userId: event.userId });
      return await fetchUser(event.userId);
    });

    // Wait for processing delay
    await context.wait('processing-delay', { seconds: 30 });

    // Step 2: Process data
    const result = await context.step('process-user', async (stepCtx) => {
      stepCtx.logger.info('Processing user', { userId: event.userId });
      return await processUser(user);
    });

    return result;
  }
);

6. Testing Configuration for CDK Projects

Update jest.config.js to include Lambda handler tests:

module.exports = {
  preset: 'ts-jest',
  testEnvironment: 'node',
  roots: ['<rootDir>/test', '<rootDir>/lib'],
  testMatch: ['**/*.test.ts'],
  transform: {
    '^.+\\.tsx?$': 'ts-jest'
  },
  collectCoverageFrom: [
    'lib/**/*.ts',
    '!lib/**/*.d.ts',
  ],
};

Create handler tests (lib/lambda/handler.test.ts):

import { LocalDurableTestRunner, OperationType, OperationStatus } from '@aws/durable-execution-sdk-js-testing';
import { handler } from './handler';

describe('My Durable Function', () => {
  beforeAll(() => LocalDurableTestRunner.setupTestEnvironment({ skipTime: true }));
  afterAll(() => LocalDurableTestRunner.teardownTestEnvironment());

  it('should execute step-wait-step pattern', async () => {
    const runner = new LocalDurableTestRunner({
      handlerFunction: handler,
    });

    const execution = await runner.run({ payload: { userId: '123' } });

    // Verify invocations
    expect(execution.getInvocations().length).toBe(2);

    // Verify result
    const result: any = execution.getResult();
    expect(result).toBeDefined();

    // Verify operations
    const fetchStep = runner.getOperation('fetch-user');
    expect(fetchStep.getType()).toBe(OperationType.STEP);
    expect(fetchStep.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const waitOp = runner.getOperation('processing-delay');
    expect(waitOp.getType()).toBe(OperationType.WAIT);
    expect(waitOp.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const processStep = runner.getOperation('process-user');
    expect(processStep.getType()).toBe(OperationType.STEP);
    expect(processStep.getStatus()).toBe(OperationStatus.SUCCEEDED);
  });
});

7. Deployment Workflow

First-Time Deployment

# Install dependencies
npm install

# Build the project
npm run build

# Run tests
npm test

# Bootstrap CDK (first time only)
cdk bootstrap aws://ACCOUNT-ID/REGION

# Synthesize CloudFormation template
cdk synth

# Deploy the stack
cdk deploy

Subsequent Deployments

# Build and test
npm run build && npm test

# Deploy changes
cdk deploy

# Deploy without confirmation prompt
cdk deploy --require-approval never

Useful CDK Commands

# List all stacks
cdk ls

# Compare deployed stack with current state
cdk diff

# Destroy the stack
cdk destroy

# Watch mode (auto-deploy on changes)
cdk watch

8. Best Practices for CDK Deployments

Timeout Configuration

// ✅ CORRECT: Set appropriate timeouts
const durableFunction = new nodejs.NodejsFunction(this, 'Function', {
  timeout: cdk.Duration.minutes(15), // Lambda timeout
  durableConfig: {
    executionTimeout: cdk.Duration.hours(1), // Workflow timeout
    retentionPeriod: cdk.Duration.days(7),
  },
});

Guidelines:

  • Set Lambda timeout to maximum expected single invocation time
  • Set executionTimeout to total workflow duration including waits
  • Set retentionPeriod based on debugging and audit requirements

Custom Log Group Management

Best Practice: Explicitly create and manage CloudWatch Log Groups for better control over retention, cleanup, and costs.

import * as logs from 'aws-cdk-lib/aws-logs';
import * as iam from 'aws-cdk-lib/aws-iam';

// 1. Create explicit log group with retention and removal policy
const functionLogGroup = new logs.LogGroup(this, 'MyFunctionLogGroup', {
  logGroupName: '/aws/lambda/my-durable-function',
  retention: logs.RetentionDays.ONE_WEEK,
  removalPolicy: cdk.RemovalPolicy.DESTROY, // Delete on stack destroy
});

// 2. Link to function using logGroup parameter
const durableFunction = new nodejs.NodejsFunction(this, 'MyDurableFunction', {
  // ... standard function configuration ...
  logGroup: functionLogGroup, // Link to managed log group
});

// 3. Add durable execution policy (required when using explicit log groups)
durableFunction.role?.addManagedPolicy(
  iam.ManagedPolicy.fromAwsManagedPolicyName(
    'service-role/AWSLambdaBasicDurableExecutionRolePolicy'
  )
);

Benefits:

  • Explicit Cleanup: removalPolicy: cdk.RemovalPolicy.DESTROY ensures log groups are deleted when stack is destroyed, preventing orphaned resources
  • Custom Retention: Set retention periods that match your compliance/debugging needs (ONE_DAY, ONE_WEEK, ONE_MONTH, etc.)
  • Predictable Naming: Control the exact log group name for easier identification and log aggregation
  • Cost Control: Avoid accumulating costs from forgotten log groups after function deletion
  • Consistent Configuration: Apply the same log retention policy across multiple functions

When to use:

  • ✅ Production environments where log retention policies must be enforced
  • ✅ Development/test environments where automatic cleanup saves costs
  • ✅ Multi-function stacks where consistent log management is needed
  • ❌ Quick prototypes where default CDK behavior is acceptable (log groups persist by default)

Important: Don't forget to add the AWSLambdaBasicDurableExecutionRolePolicy managed policy when creating durable functions with explicit log groups, as CDK won't automatically add it.

IAM Permissions

For durable functions that invoke other (durable) functions:

// Grant permission to invoke another function
otherFunction.grantInvoke(durableFunction);

// Or grant broad Lambda invoke permissions (less secure)
durableFunction.addToRolePolicy(new iam.PolicyStatement({
  actions: ['lambda:InvokeFunction'],
  resources: ['arn:aws:lambda:*:*:function:*'],
}));

Stack Outputs

// Export important values
new cdk.CfnOutput(this, 'FunctionArn', {
  value: durableFunction.functionArn,
  exportName: 'DurableFunctionArn',
  description: 'ARN of the durable function',
});

// Export for cross-stack references
new cdk.CfnOutput(this, 'FunctionName', {
  value: durableFunction.functionName,
  exportName: 'DurableFunctionName',
});

9. Multi-Function Deployments

For complex workflows with multiple durable functions:

export class MyDurableStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Main orchestrator function
    const orchestrator = new nodejs.NodejsFunction(this, 'Orchestrator', {
      functionName: 'workflow-orchestrator',
      entry: path.join(__dirname, 'lambda', 'orchestrator.ts'),
      timeout: cdk.Duration.minutes(15),
      durableConfig: {
        executionTimeout: cdk.Duration.hours(2),
        retentionPeriod: cdk.Duration.days(7),
      },
    });

    // Worker function 1
    const worker1 = new nodejs.NodejsFunction(this, 'Worker1', {
      functionName: 'worker-validate',
      entry: path.join(__dirname, 'lambda', 'validate-worker.ts'),
      timeout: cdk.Duration.minutes(5),
      durableConfig: {
        executionTimeout: cdk.Duration.minutes(10),
        retentionPeriod: cdk.Duration.days(1),
      },
    });

    // Worker function 2
    const worker2 = new nodejs.NodejsFunction(this, 'Worker2', {
      functionName: 'worker-process',
      entry: path.join(__dirname, 'lambda', 'process-worker.ts'),
      timeout: cdk.Duration.minutes(10),
      durableConfig: {
        executionTimeout: cdk.Duration.minutes(30),
        retentionPeriod: cdk.Duration.days(1),
      },
    });

    // Grant orchestrator permission to invoke workers
    worker1.grantInvoke(orchestrator);
    worker2.grantInvoke(orchestrator);

    // Export ARNs for orchestrator to use
    new cdk.CfnOutput(this, 'Worker1Arn', {
      value: worker1.functionArn,
      exportName: 'Worker1Arn',
    });

    new cdk.CfnOutput(this, 'Worker2Arn', {
      value: worker2.functionArn,
      exportName: 'Worker2Arn',
    });
  }
}

10. Environment Management & Versioning

Function Versioning Strategy

// Create versioned functions for different environments
const prodFunction = new nodejs.NodejsFunction(this, 'ProdFunction', {
  functionName: 'my-durable-function-prod',
  // ... configuration
});

const devFunction = new nodejs.NodejsFunction(this, 'DevFunction', {
  functionName: 'my-durable-function-dev',
  // ... configuration
});

// Create aliases for stable references
const prodAlias = prodFunction.addAlias('production');
const stagingAlias = prodFunction.addAlias('staging');

Environment-Specific Configuration

const environment = this.node.tryGetContext('environment') || 'dev';

const durableFunction = new nodejs.NodejsFunction(this, 'Function', {
  functionName: `my-function-${environment}`,
  durableConfig: {
    executionTimeout: environment === 'prod' 
      ? cdk.Duration.hours(24) 
      : cdk.Duration.minutes(30),
    retentionPeriod: environment === 'prod' 
      ? cdk.Duration.days(30) 
      : cdk.Duration.days(7),
  },
  environment: {
    ENVIRONMENT: environment,
    LOG_LEVEL: environment === 'prod' ? 'INFO' : 'DEBUG',
  },
});

Deployment Commands by Environment

# Deploy to development
cdk deploy --context environment=dev

# Deploy to staging
cdk deploy --context environment=staging

# Deploy to production
cdk deploy --context environment=prod --require-approval broadening

11. Common CDK Deployment Issues

Issue Cause Solution
Bundle size too large Including unnecessary dependencies Use externalModules for AWS SDK v3
Timeout errors during deployment Large Lambda bundle Increase CDK timeout or optimize bundle
Permission errors Missing IAM permissions Use grantInvoke() or add policy statements
Cannot find handler Incorrect entry path Verify entry path in NodejsFunction
TypeScript compilation errors Misconfigured tsconfig Ensure proper module resolution

12. Invoking Durable Functions with AWS CLI

After deploying your durable function, you can invoke it using the AWS CLI. Durable functions support both synchronous and asynchronous invocation patterns.

Critical Requirements

⚠️ Important Invocation Rules:

  1. Qualified Function Name Required: You MUST provide a qualified function name with a version (e.g., :$LATEST or :1). Unqualified invocations are not supported for durable functions.
  2. Idempotency with durable-execution-name: Use the --durable-execution-name parameter to ensure idempotency. The same execution name will always refer to the same execution, preventing duplicate processing.
  3. Binary Format: Use --cli-binary-format raw-in-base64-out to avoid base64 encoding issues with JSON payloads.

Synchronous Invocation (RequestResponse)

Synchronous invocation waits for the function to complete and returns the result immediately. This is suitable for workflows with short execution times.

# Basic synchronous invocation
aws lambda invoke \
  --function-name 'my-durable-function:$LATEST' \
  --invocation-type RequestResponse \
  --durable-execution-name "execution-123" \
  --payload '{"userId":"12345","action":"process"}' \
  --cli-binary-format raw-in-base64-out \
  --output json \
  response.json

# View the response
cat response.json

When to use RequestResponse:

  • Short-running workflows (under 15 minutes total)
  • When you need the result immediately
  • Interactive applications requiring synchronous responses

Asynchronous Invocation (Event)

Asynchronous invocation returns immediately with the execution ID, allowing the function to run in the background. This is ideal for long-running workflows.

# Basic asynchronous invocation
aws lambda invoke \
  --function-name 'my-durable-function:$LATEST' \
  --invocation-type Event \
  --durable-execution-name "background-task-456" \
  --payload '{"orderId":"ORD-789","amount":99.99}' \
  --cli-binary-format raw-in-base64-out \
  --output json \
  response.json

# Response contains execution ID, not the result
cat response.json

When to use Event:

  • Long-running workflows (hours, days, or longer)
  • Background processing tasks
  • When you don't need immediate results
  • Workflows with wait operations or human-in-the-loop steps

Idempotency with durable-execution-name

The --durable-execution-name parameter ensures that the same execution is never created twice:

# First invocation - creates new execution
aws lambda invoke \
  --function-name 'my-durable-function:$LATEST' \
  --invocation-type RequestResponse \
  --durable-execution-name "order-processing-ORD-123" \
  --payload '{"orderId":"ORD-123"}' \
  --cli-binary-format raw-in-base64-out \
  response.json

# Second invocation with same execution name - returns existing execution result
aws lambda invoke \
  --function-name 'my-durable-function:$LATEST' \
  --invocation-type RequestResponse \
  --durable-execution-name "order-processing-ORD-123" \
  --payload '{"orderId":"ORD-123"}' \
  --cli-binary-format raw-in-base64-out \
  response.json

Using Specific Function Versions

You must invoke using specific function versions or aliases ($LATEST is also supported, unqualified will throw an exception):

# Invoke a specific version
aws lambda invoke \
  --function-name 'my-durable-function:1' \
  --invocation-type RequestResponse \
  --durable-execution-name "versioned-exec-1" \
  --payload '{"test":"data"}' \
  --cli-binary-format raw-in-base64-out \
  response.json

# Invoke using an alias
aws lambda invoke \
  --function-name 'my-durable-function:production' \
  --invocation-type RequestResponse \
  --durable-execution-name "prod-exec-1" \
  --payload '{"test":"data"}' \
  --cli-binary-format raw-in-base64-out \
  response.json

Error Handling & Debugging

Comprehensive Error Handling Patterns

Step-Level Error Handling

export const handler = withDurableExecution(async (event, context) => {
  try {
    // Critical step with custom retry
    const result = await context.step("critical-operation", async () => {
      return await riskyExternalCall();
    }, {
      retryStrategy: (error, attempt) => ({
        shouldRetry: attempt < 5 && error.statusCode !== 404,
        delay: { seconds: Math.min(Math.pow(2, attempt), 60) },
      }),
    });

    return result;
  } catch (error) {
    // Log error with context
    context.logger.error("Workflow failed", { 
      error: error.message,
      stack: error.stack,
      event 
    });
    
    // Perform cleanup if needed
    await context.step("cleanup-on-error", async () => {
      return await performCleanup(event);
    });
    
    throw error; // Re-throw to mark execution as failed
  }
});

Timeout Handling

Handling waitForCallback Timeouts:

When waitForCallback times out, it throws an error. You can catch and handle this error to implement fallback behavior:

import { 
  withDurableExecution, 
  DurableContext,
} from "@aws/durable-execution-sdk-js";

export const handler = withDurableExecution(async (event, context) => {
  try {
    // Wait for external approval with timeout
    const approval = await context.waitForCallback(
      "wait-for-approval",
      async (callbackId, ctx) => {
        ctx.logger.info("Sending approval request", { callbackId });
        await sendApprovalEmail(event.approverEmail, callbackId);
      },
      { timeout: { hours: 24 } },
    );

    context.logger.info("Approval received", { approval });
    return { status: "approved", approval };

  } catch (error: any) {
    // Check for callback timeout
    if (error.name === "CallbackTimeoutError" || error.message?.includes("timeout")) {
      context.logger.warn("Approval timed out after 24 hours", {
        approverEmail: event.approverEmail,
        error: error.message,
      });

      // Implement fallback: auto-escalate or auto-reject
      await context.step("handle-timeout", async (stepCtx) => {
        stepCtx.logger.info("Escalating to manager due to timeout");
        await escalateToManager(event);
      });

      return { status: "timeout", escalated: true };
    }

    // Re-throw other errors
    throw error;
  }
});

General Timeout Pattern with Promise.race:

For step-level timeouts (within a single Lambda invocation), use Promise.race:

export const handler = withDurableExecution(async (event, context) => {
  try {
    // Operation with local timeout
    const result = await Promise.race([
      context.step("long-operation", async () => longRunningTask()),
      new Promise((_, reject) => 
        setTimeout(() => reject(new Error("Operation timeout")), 30000)
      ),
    ]);
    
    return result;
  } catch (error) {
    if (error.message === "Operation timeout") {
      context.logger.warn("Operation timed out, implementing fallback");
      return await context.step("fallback", async () => fallbackOperation());
    }
    throw error;
  }
});

Note: The Promise.race pattern only works within a single Lambda invocation. For timeouts across replays (e.g., long waits), use the timeout option on waitForCallback or waitForCondition.

Debugging Production Issues

CloudWatch Logs Analysis

# Filter logs by execution ID
aws logs filter-log-events \
  --log-group-name "/aws/lambda/my-durable-function" \
  --filter-pattern "{ $.executionId = \"exec-123\" }" \
  --start-time 1640995200000

# Search for errors
aws logs filter-log-events \
  --log-group-name "/aws/lambda/my-durable-function" \
  --filter-pattern "ERROR" \
  --max-items 50

Common Error Patterns

Error Pattern Cause Solution
DurableExecutionTimeout Execution exceeded timeout Increase executionTimeout or optimize workflow
StepRetryExhausted Step failed after all retries Review retry strategy and error handling
CallbackTimeout External system didn't respond Increase callback timeout or add fallback
SerializationError Cannot serialize step result Use custom serdes or simplify return objects

Common Patterns & Examples

1. Simple Multi-Step Workflow

export const handler = withDurableExecution(
  async (event: any, context: DurableContext) => {
    context.logger.info("Starting workflow", { event });

    // Step 1: Validate input
    const validated = await context.step("validate", async (stepCtx) => {
      stepCtx.logger.info("Validating input");
      return validateInput(event);
    });

    // Step 2: Process data
    const processed = await context.step("process", async (stepCtx) => {
      stepCtx.logger.info("Processing data", { validated });
      return processData(validated);
    });

    // Step 3: Wait before final step
    await context.wait("cooldown-period", { seconds: 30 });

    // Step 4: Send results
    await context.step("send-results", async (stepCtx) => {
      stepCtx.logger.info("Sending results");
      return sendResults(processed);
    });

    context.logger.info("Workflow completed successfully");
    return { success: true, data: processed };
  },
);

2. GenAI Agent with Agentic Loop

export const handler = withDurableExecution(
  async (event: { prompt: string }, context: DurableContext) => {
    context.logger.info("Starting AI agent", { prompt: event.prompt });
    const messages = [{ role: "user", content: event.prompt }];

    while (true) {
      // Invoke AI model
      const { response, reasoning, tool } = await context.step(
        "invoke-model",
        async (stepCtx) => {
          stepCtx.logger.info("Invoking AI model", { messageCount: messages.length });
          return await invokeAIModel(messages);
        },
      );

      // If no tool needed, return response
      if (tool == null) {
        context.logger.info("AI agent completed - no tool needed");
        return response;
      }

      // Execute tool
      const toolResult = await context.step(
        `execute-tool-${tool.name}`,
        async (stepCtx) => {
          stepCtx.logger.info("Executing tool", { toolName: tool.name });
          return await executeTool(tool, response);
        },
      );

      // Add result to conversation
      messages.push({
        role: "assistant",
        content: toolResult,
      });

      context.logger.debug("Tool result added to conversation", { toolName: tool.name });
    }
  },
);

3. Human-in-the-Loop Approval

export const handler = withDurableExecution(
  async (
    event: { actionData: any; approverEmail: string },
    context: DurableContext,
  ) => {
    context.logger.info("Starting approval workflow", { 
      approverEmail: event.approverEmail 
    });

    // Generate action plan
    const actionPlan = await context.step("generate-plan", async (stepCtx) => {
      stepCtx.logger.info("Generating action plan");
      return await generateActionPlan(event.actionData);
    });

    // Wait for human approval
    context.logger.info("Waiting for human approval", { 
      timeout: "24 hours" 
    });
    
    const answer = await context.waitForCallback(
      "wait-for-approval",
      async (callbackId, ctx) => {
        ctx.logger.info("Sending approval email", { 
          approverEmail: event.approverEmail, 
          callbackId 
        });
        await sendApprovalEmail(event.approverEmail, actionPlan, callbackId);
      },
      { timeout: { hours: 24 } },
    );

    // Execute based on approval
    if (answer === "APPROVED") {
      context.logger.info("Action approved, executing");
      await context.step("execute-action", async (stepCtx) => {
        stepCtx.logger.info("Performing approved action");
        return await performAction(actionPlan);
      });
      return { status: "completed", actionPlan };
    } else {
      context.logger.info("Action rejected", { answer });
      await context.step("record-rejection", async (stepCtx) => {
        stepCtx.logger.info("Recording rejection");
        return await recordRejection(actionPlan, event.approverEmail);
      });
      return { status: "rejected", actionPlan };
    }
  },
);

4. Saga Pattern (Compensating Transactions)

export const handler = withDurableExecution(
  async (
    event: { customerId: string; flight: any; car: any; hotel: any },
    context: DurableContext,
  ) => {
    context.logger.info("Starting travel booking saga", { 
      customerId: event.customerId 
    });

    const compensations: Array<{ name: string; fn: () => Promise<void> }> = [];

    try {
      // Book flight
      await context.step("book-flight", async (stepCtx) => {
        stepCtx.logger.info("Booking flight", { flight: event.flight });
        await flightClient.book(event.customerId, event.flight);
      });
      compensations.push({ 
        name: "cancel-flight", 
        fn: () => flightClient.cancel(event.customerId) 
      });

      // Book car rental
      await context.step("book-car", async (stepCtx) => {
        stepCtx.logger.info("Booking car rental", { car: event.car });
        await carRentalClient.book(event.customerId, event.car);
      });
      compensations.push({ 
        name: "cancel-car", 
        fn: () => carRentalClient.cancel(event.customerId) 
      });

      // Book hotel
      await context.step("book-hotel", async (stepCtx) => {
        stepCtx.logger.info("Booking hotel", { hotel: event.hotel });
        await hotelClient.book(event.customerId, event.hotel);
      });
      compensations.push({ 
        name: "cancel-hotel", 
        fn: () => hotelClient.cancel(event.customerId) 
      });

      context.logger.info("All bookings completed successfully");
      return {
        success: true,
        bookings: { flight: true, car: true, hotel: true },
      };
    } catch (error) {
      // Rollback all bookings in reverse order
      context.logger.error("Booking failed, starting rollback", { 
        error: error.message,
        completedBookings: compensations.length,
      });

      for (const compensation of compensations.reverse()) {
        await context.step(compensation.name, async (stepCtx) => {
          stepCtx.logger.info("Executing compensation", { name: compensation.name });
          await compensation.fn();
        });
      }

      context.logger.info("Rollback completed");
      throw error;
    }
  },
);

Development Tips & Common Setup Issues

Tool Usage & File Navigation

Issue: Attempting to read directories as files causes errors.

# ❌ WRONG: Using read_file on a directory
# Error: EISDIR: illegal operation on a directory, read

# ✅ CORRECT: Use list_files for directories
list_files(path: "demo-samples", recursive: true)

Best Practices:

  • Use list_files to explore directory structure
  • Use read_file only for individual files
  • Check file extensions in environment details to identify files vs directories

Jest Configuration Requirements

Issue: Missing Jest configuration causes TypeScript parsing errors.

Jest encountered an unexpected token
SyntaxError: Missing semicolon

Solution: Always create jest.config.js with proper TypeScript support:

module.exports = {
  preset: "ts-jest",
  testEnvironment: "node",
  roots: ["<rootDir>/src"],
  testMatch: ["**/*.test.ts"],
  transform: {
    "^.+\\.ts$": "ts-jest",
  },
  moduleNameMapper: {
    "^@aws/durable-execution-sdk-js$":
      "<rootDir>/../packages/aws-durable-execution-sdk-js/src",
    "^@aws/durable-execution-sdk-js-testing$":
      "<rootDir>/../packages/aws-durable-execution-sdk-js-testing/src",
  },
};

Key Configuration Points:

  • preset: 'ts-jest' - Essential for TypeScript support
  • transform - Maps .ts files to ts-jest transformer
  • moduleNameMapper - Links to SDK source files in monorepo setups
  • testMatch - Specifies test file patterns

TypeScript Configuration Tips

Ensure proper module resolution in monorepo setups:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "commonjs",
    "lib": ["ES2022"],
    "moduleResolution": "node",
    "resolveJsonModule": true,
    "types": ["jest", "node"],
    "paths": {
      "@aws/durable-execution-sdk-js": [
        "../packages/aws-durable-execution-sdk-js/src"
      ],
      "@aws/durable-execution-sdk-js-testing": [
        "../packages/aws-durable-execution-sdk-js-testing/src"
      ]
    }
  }
}

Quick Setup Checklist for New Projects

When starting a new durable function project:

  • Install dependencies (@aws/durable-execution-sdk-js, testing & eslint packages)
  • Create jest.config.js with ts-jest preset
  • Configure tsconfig.json with proper paths and module resolution
  • Set up ESLint with durable execution plugin
  • Create handler with withDurableExecution wrapper
  • Write tests using LocalDurableTestRunner
  • Use skipTime: true for fast test execution
  • Verify TypeScript compilation with npx tsc --noEmit
  • Run tests to confirm setup: npm test

Note on Documentation: Only create README.md files when explicitly requested by the user. Focus on code implementation and testing first.

Common Setup Errors & Solutions

Error Cause Solution
EISDIR: illegal operation on a directory Using read_file on directory Use list_files instead
Jest encountered an unexpected token Missing Jest configuration Create jest.config.js with ts-jest
'result' is of type 'unknown' Missing type casting in tests Cast result: as any or specific type
'name' does not exist in type 'InvokeRequest' Wrong test runner API Wrap event in payload: {}
Cannot find module '@aws/durable-execution-sdk-js' Missing module mapping Add paths in tsconfig.json
Nested operation errors Durable operations in step functions Use runInChildContext instead

Summary & Quick Reference

Essential Checklist

When writing durable functions, always verify:

Code Structure:

  • All non-deterministic code (timestamps, UUIDs, random, API calls) is inside steps
  • No durable operations nested inside step functions (use runInChildContext)
  • No reliance on closure variable mutations (return values from steps)
  • Side effects (logging, external calls) are inside steps or use context.logger

Operations:

  • All important operations have descriptive names
  • Batch operations have appropriate completion policies
  • Concurrent operations use parallel() or map() with child contexts
  • Error handling includes retry strategies where appropriate

Testing & Development:

  • Testing includes verification of operation order and results
  • ESLint plugin is installed and configured
  • Jest configuration is properly set up with ts-jest
  • Test runner uses correct API (payload wrapper, type casting)
  • TypeScript paths are configured for monorepo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment