Skip to content

Instantly share code, notes, and snippets.

@skylarmb
Last active November 10, 2025 19:13
Show Gist options
  • Select an option

  • Save skylarmb/cf4c4784c31404d1fd664a11219f0927 to your computer and use it in GitHub Desktop.

Select an option

Save skylarmb/cf4c4784c31404d1fd664a11219f0927 to your computer and use it in GitHub Desktop.
Search workflow editor design thoughts

Goal

Build a simple, type-safe workflow system where users compose basic building blocks to build their own search experience.

Data Types System

Simplicity and strong typing are explicit goals, and are prioritized over infinite flexibility.

  • All interfaces use discriminated / tagged unions for identity
  • Primitive data types that flow between nodes are limited, although internal/intermediate data types within processing steps may diverge.
  • All types are serializable (no function or map types etc)
  • Both structured and unstructured data should be supported as inputs and outputs.
  • Unstructured data values are explicity wrapped with polymorphic interfaces (no any, object, unknown, or JSON types as I/O, although these can be wrapped in a well-typed interface that explicitly identifies the unstructured data).
Code snippet
// Restricted set of primitives that flow between nodes. I dont like the name "Document" here but couldnt think of anything better for now
type DataValue =
  | StringValue         // raw text, e.g. LLM output
  | StringListValue     // list of text, e.g. multi-LLM output, or list of summaries, etc
  | DocumentValue       // serializable structured data, e.g. a search result
  | DocumentListValue;  // list of serializable structured data

interface StringValue {
  type: 'string';
  value: string;
}

interface StringListValue {
  type: 'string_list';
  value: string[];
}

interface DocumentValue {
  type: 'document';
  value: {
    id: string;
    metadata: Record<string, string | number>;
    content: // ... structured data
  };
}

interface DocumentListValue {
  type: 'document_list';
  value: DocumentValue['value'][];
}

Node Types System

  • Transforms via programmatic logic
  • Transforms via LLM
  • Conditionals for branching
  • Source and Sink nodes

1. Processor Node

Performs programmatic transformations on data.

  • May be primitive operations (equal / greater / less than)
  • May be named functions (aggregate, await-multi, summarize, etc)
Code snippet
interface ProcessorNode<
  TInput extends DataValue = DataValue,
  TOutput extends DataValue = DataValue
> {
  type: 'processor';
  id: string;
  config: {
    processorType: // TBD, e.g. 'map' | 'filter' | 'transform' | 'aggregate';
    // Function name or operation identifier
    operation: string;
  };
  inputs: { [key: string]: TInput['type'] };
  outputs: { [key: string]: TOutput['type'] };
}

2. LLM Node

Processes data using language models.

Code snippet
interface LLMNode<
  TInput extends DataValue = DataValue,
  TOutput extends DataValue = DataValue
> {
  type: 'llm';
  id: string;
  config: {
    model: 'gpt-4' | 'claude-3' | 'grok-4';
    prompt: string; // Template with {{input}} placeholders
    maxTokens: number;
  };
  inputs: { [key: string]: TInput['type'] };
  outputs: { [key: string]: TOutput['type'] };
}

3. Conditional Node

Routes data based on conditions.

  • Could support primitive comparisons, regex, or fast LLM evaluation.
  • Serves as a passthrough for input data, only adding the evaluation information to the input.
Code snippet
interface ConditionalNode<TInput extends DataValue = DataValue> {
  type: 'conditional';
  id: string;
  config: {
    condition: {
      operator: 'equals' | 'contains' | 'gt' | 'lt' | 'matches'; // ... prob more?
      field: string;
      value: string | number;
    };
  };
  inputs: { data: TInput['type'] };
  outputs: {
    true: TInput['type'];
    false: TInput['type'];
  };
}

4. Source/Sink Nodes

Entry and exit points for workflow data.

  • Each workflow has exactly one source and one sink.
  • Inputs may be user generated, hardcoded, or other.
  • Output is universal and guaranteed for every graph execution
  • Every node has implicit edge to Sink node (to handle error case), graph traversal to Sink node on error is explicit
Code snippet
interface SourceNode<TData extends DataValue = DataValue> {
  type: 'source';
  id: string;
  config: {
    name: string;
    description?: string;
    defaultValue?: TData['value'];
  };
  outputs: { data: TData['type'] }; // Could be user query, hardcoded, cron trigger, etc
}

interface SinkNode<TData extends DataValue = DataValue> {
  type: 'sink';
  id: string;
  config: {
    name: string;
    description?: string;
  };
  inputs: { data: TData['type'] };
  output: ExecutionResult; // a single universal output type regardless of workflow
}

Workflow Structure

  • Nodes have named and typed input/output ports. The edges specify exactly which output port connects to which input port.
  • TBD if graph structure (e.g. x/y coords of nodes) is embedded in nodes and edges or contained in a separate metadata map, since this data is irrelevant to the processors.
Code snippet
interface WorkflowEdge {
  id: string;
  source: string; // source node id
  sourceOutput: string; // source output key / ID
  target: string; // target node id
  targetInput: string; // target input key / ID
}

interface Workflow {
  id: string;
  name: string;
  description: string;
  nodes: (ProcessorNode | LLMNode | ConditionalNode | SourceNode | SinkNode)[];
  edges: WorkflowEdge[];
  metadata: {
    version: string;
    createdAt: string;
    updatedAt: string;
  };
}

Execution Model / API design

TBD, but some ideas:

  • All processing of the workflow could happen on the backend and the interface with the browser is just the source/sink node interface
  • OR, alternatively, every operation could be exposed via API and the frontend could step through the workflow (I don't like this quite as much but can't think of a reason this would be horrible)
  • Execution result is universal, may be an error or a result.
  • Errors contain feedback about location and nature of the error so the UI can be updated accordingly.
  • Validation of the workflow happens on the backend, so there should be a /compile or /dry-run type endpoint that just validates the graph without executing.
  • Execution results can be partial, and can be streamed to the client for live updates.
  • Execution results are guaranteed through use of error boundaries.
Code snippet
interface ExecutionResult {
  outputs: Record<string, DataValue>;
  executionTime: number;
  nodeExecutions: Array<{
    nodeId: string;
    startTime: number;
    endTime: number;
    status: 'pending' | 'success' | 'error';
  }>;
}

Processing Strategy

  1. Topological Sort: Order nodes based on dependencies
  2. Lazy Evaluation: Only execute nodes when inputs are ready
  3. Type Validation: Runtime validation matches compile-time types
  4. Error Boundaries: Each node execution is isolated; failures don't crash workflow

Serialization

All workflow definitions are JSON-serializable by design:

  • No functions or maps in config (only function references/names)
  • All data types have explicit type discriminators
  • Edges use simple ID references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment