Search workflow editor design thoughts

Goal

Build a simple, type-safe workflow system where users compose basic building blocks to build their own search experience.

Data Types System

Simplicity and strong typing are explicit goals, and are prioritized over infinite flexibility.

All interfaces use discriminated / tagged unions for identity
Primitive data types that flow between nodes are limited, although internal/intermediate data types within processing steps may diverge.
All types are serializable (no function or map types etc)
Both structured and unstructured data should be supported as inputs and outputs.
Unstructured data values are explicity wrapped with polymorphic interfaces (no any, object, unknown, or JSON types as I/O, although these can be wrapped in a well-typed interface that explicitly identifies the unstructured data).

Code snippet

// Restricted set of primitives that flow between nodes. I dont like the name "Document" here but couldnt think of anything better for now
type DataValue =
  | StringValue         // raw text, e.g. LLM output
  | StringListValue     // list of text, e.g. multi-LLM output, or list of summaries, etc
  | DocumentValue       // serializable structured data, e.g. a search result
  | DocumentListValue;  // list of serializable structured data

interface StringValue {
  type: 'string';
  value: string;
}

interface StringListValue {
  type: 'string_list';
  value: string[];
}

interface DocumentValue {
  type: 'document';
  value: {
    id: string;
    metadata: Record<string, string | number>;
    content: // ... structured data
  };
}

interface DocumentListValue {
  type: 'document_list';
  value: DocumentValue['value'][];
}

Node Types System

Transforms via programmatic logic
Transforms via LLM
Conditionals for branching
Source and Sink nodes

1. Processor Node

Performs programmatic transformations on data.

May be primitive operations (equal / greater / less than)
May be named functions (aggregate, await-multi, summarize, etc)

Code snippet

interface ProcessorNode<
  TInput extends DataValue = DataValue,
  TOutput extends DataValue = DataValue
> {
  type: 'processor';
  id: string;
  config: {
    processorType: // TBD, e.g. 'map' | 'filter' | 'transform' | 'aggregate';
    // Function name or operation identifier
    operation: string;
  };
  inputs: { [key: string]: TInput['type'] };
  outputs: { [key: string]: TOutput['type'] };
}

2. LLM Node

Processes data using language models.

Code snippet

interface LLMNode<
  TInput extends DataValue = DataValue,
  TOutput extends DataValue = DataValue
> {
  type: 'llm';
  id: string;
  config: {
    model: 'gpt-4' | 'claude-3' | 'grok-4';
    prompt: string; // Template with {{input}} placeholders
    maxTokens: number;
  };
  inputs: { [key: string]: TInput['type'] };
  outputs: { [key: string]: TOutput['type'] };
}

3. Conditional Node

Routes data based on conditions.

Could support primitive comparisons, regex, or fast LLM evaluation.
Serves as a passthrough for input data, only adding the evaluation information to the input.

Code snippet

interface ConditionalNode<TInput extends DataValue = DataValue> {
  type: 'conditional';
  id: string;
  config: {
    condition: {
      operator: 'equals' | 'contains' | 'gt' | 'lt' | 'matches'; // ... prob more?
      field: string;
      value: string | number;
    };
  };
  inputs: { data: TInput['type'] };
  outputs: {
    true: TInput['type'];
    false: TInput['type'];
  };
}

4. Source/Sink Nodes

Entry and exit points for workflow data.

Each workflow has exactly one source and one sink.
Inputs may be user generated, hardcoded, or other.
Output is universal and guaranteed for every graph execution
Every node has implicit edge to Sink node (to handle error case), graph traversal to Sink node on error is explicit

Code snippet

interface SourceNode<TData extends DataValue = DataValue> {
  type: 'source';
  id: string;
  config: {
    name: string;
    description?: string;
    defaultValue?: TData['value'];
  };
  outputs: { data: TData['type'] }; // Could be user query, hardcoded, cron trigger, etc
}

interface SinkNode<TData extends DataValue = DataValue> {
  type: 'sink';
  id: string;
  config: {
    name: string;
    description?: string;
  };
  inputs: { data: TData['type'] };
  output: ExecutionResult; // a single universal output type regardless of workflow
}

Workflow Structure

Nodes have named and typed input/output ports. The edges specify exactly which output port connects to which input port.
TBD if graph structure (e.g. x/y coords of nodes) is embedded in nodes and edges or contained in a separate metadata map, since this data is irrelevant to the processors.

Code snippet

interface WorkflowEdge {
  id: string;
  source: string; // source node id
  sourceOutput: string; // source output key / ID
  target: string; // target node id
  targetInput: string; // target input key / ID
}

interface Workflow {
  id: string;
  name: string;
  description: string;
  nodes: (ProcessorNode | LLMNode | ConditionalNode | SourceNode | SinkNode)[];
  edges: WorkflowEdge[];
  metadata: {
    version: string;
    createdAt: string;
    updatedAt: string;
  };
}

Execution Model / API design

TBD, but some ideas:

All processing of the workflow could happen on the backend and the interface with the browser is just the source/sink node interface
OR, alternatively, every operation could be exposed via API and the frontend could step through the workflow (I don't like this quite as much but can't think of a reason this would be horrible)
Execution result is universal, may be an error or a result.
Errors contain feedback about location and nature of the error so the UI can be updated accordingly.
Validation of the workflow happens on the backend, so there should be a /compile or /dry-run type endpoint that just validates the graph without executing.
Execution results can be partial, and can be streamed to the client for live updates.
Execution results are guaranteed through use of error boundaries.

Code snippet

interface ExecutionResult {
  outputs: Record<string, DataValue>;
  executionTime: number;
  nodeExecutions: Array<{
    nodeId: string;
    startTime: number;
    endTime: number;
    status: 'pending' | 'success' | 'error';
  }>;
}

Processing Strategy

Topological Sort: Order nodes based on dependencies
Lazy Evaluation: Only execute nodes when inputs are ready
Type Validation: Runtime validation matches compile-time types
Error Boundaries: Each node execution is isolated; failures don't crash workflow

Serialization

All workflow definitions are JSON-serializable by design:

No functions or maps in config (only function references/names)
All data types have explicit type discriminators
Edges use simple ID references

skylarmb/workflow_editor.md

Select an option

No results found

Select an option

No results found

Goal

Data Types System

Node Types System

1. Processor Node

2. LLM Node

3. Conditional Node

4. Source/Sink Nodes

Workflow Structure

Execution Model / API design

Processing Strategy

Serialization