Build a simple, type-safe workflow system where users compose basic building blocks to build their own search experience.
Simplicity and strong typing are explicit goals, and are prioritized over infinite flexibility.
- All interfaces use discriminated / tagged unions for identity
- Primitive data types that flow between nodes are limited, although internal/intermediate data types within processing steps may diverge.
- All types are serializable (no function or map types etc)
- Both structured and unstructured data should be supported as inputs and outputs.
- Unstructured data values are explicity wrapped with polymorphic interfaces (no
any,object,unknown, orJSONtypes as I/O, although these can be wrapped in a well-typed interface that explicitly identifies the unstructured data).
Code snippet
// Restricted set of primitives that flow between nodes. I dont like the name "Document" here but couldnt think of anything better for now
type DataValue =
| StringValue // raw text, e.g. LLM output
| StringListValue // list of text, e.g. multi-LLM output, or list of summaries, etc
| DocumentValue // serializable structured data, e.g. a search result
| DocumentListValue; // list of serializable structured data
interface StringValue {
type: 'string';
value: string;
}
interface StringListValue {
type: 'string_list';
value: string[];
}
interface DocumentValue {
type: 'document';
value: {
id: string;
metadata: Record<string, string | number>;
content: // ... structured data
};
}
interface DocumentListValue {
type: 'document_list';
value: DocumentValue['value'][];
}- Transforms via programmatic logic
- Transforms via LLM
- Conditionals for branching
- Source and Sink nodes
Performs programmatic transformations on data.
- May be primitive operations (equal / greater / less than)
- May be named functions (aggregate, await-multi, summarize, etc)
Code snippet
interface ProcessorNode<
TInput extends DataValue = DataValue,
TOutput extends DataValue = DataValue
> {
type: 'processor';
id: string;
config: {
processorType: // TBD, e.g. 'map' | 'filter' | 'transform' | 'aggregate';
// Function name or operation identifier
operation: string;
};
inputs: { [key: string]: TInput['type'] };
outputs: { [key: string]: TOutput['type'] };
}Processes data using language models.
Code snippet
interface LLMNode<
TInput extends DataValue = DataValue,
TOutput extends DataValue = DataValue
> {
type: 'llm';
id: string;
config: {
model: 'gpt-4' | 'claude-3' | 'grok-4';
prompt: string; // Template with {{input}} placeholders
maxTokens: number;
};
inputs: { [key: string]: TInput['type'] };
outputs: { [key: string]: TOutput['type'] };
}Routes data based on conditions.
- Could support primitive comparisons, regex, or fast LLM evaluation.
- Serves as a passthrough for input data, only adding the evaluation information to the input.
Code snippet
interface ConditionalNode<TInput extends DataValue = DataValue> {
type: 'conditional';
id: string;
config: {
condition: {
operator: 'equals' | 'contains' | 'gt' | 'lt' | 'matches'; // ... prob more?
field: string;
value: string | number;
};
};
inputs: { data: TInput['type'] };
outputs: {
true: TInput['type'];
false: TInput['type'];
};
}Entry and exit points for workflow data.
- Each workflow has exactly one source and one sink.
- Inputs may be user generated, hardcoded, or other.
- Output is universal and guaranteed for every graph execution
- Every node has implicit edge to Sink node (to handle error case), graph traversal to Sink node on error is explicit
Code snippet
interface SourceNode<TData extends DataValue = DataValue> {
type: 'source';
id: string;
config: {
name: string;
description?: string;
defaultValue?: TData['value'];
};
outputs: { data: TData['type'] }; // Could be user query, hardcoded, cron trigger, etc
}
interface SinkNode<TData extends DataValue = DataValue> {
type: 'sink';
id: string;
config: {
name: string;
description?: string;
};
inputs: { data: TData['type'] };
output: ExecutionResult; // a single universal output type regardless of workflow
}- Nodes have named and typed input/output ports. The edges specify exactly which output port connects to which input port.
- TBD if graph structure (e.g. x/y coords of nodes) is embedded in nodes and edges or contained in a separate metadata map, since this data is irrelevant to the processors.
Code snippet
interface WorkflowEdge {
id: string;
source: string; // source node id
sourceOutput: string; // source output key / ID
target: string; // target node id
targetInput: string; // target input key / ID
}
interface Workflow {
id: string;
name: string;
description: string;
nodes: (ProcessorNode | LLMNode | ConditionalNode | SourceNode | SinkNode)[];
edges: WorkflowEdge[];
metadata: {
version: string;
createdAt: string;
updatedAt: string;
};
}TBD, but some ideas:
- All processing of the workflow could happen on the backend and the interface with the browser is just the source/sink node interface
- OR, alternatively, every operation could be exposed via API and the frontend could step through the workflow (I don't like this quite as much but can't think of a reason this would be horrible)
- Execution result is universal, may be an error or a result.
- Errors contain feedback about location and nature of the error so the UI can be updated accordingly.
- Validation of the workflow happens on the backend, so there should be a
/compileor/dry-runtype endpoint that just validates the graph without executing. - Execution results can be partial, and can be streamed to the client for live updates.
- Execution results are guaranteed through use of error boundaries.
Code snippet
interface ExecutionResult {
outputs: Record<string, DataValue>;
executionTime: number;
nodeExecutions: Array<{
nodeId: string;
startTime: number;
endTime: number;
status: 'pending' | 'success' | 'error';
}>;
}- Topological Sort: Order nodes based on dependencies
- Lazy Evaluation: Only execute nodes when inputs are ready
- Type Validation: Runtime validation matches compile-time types
- Error Boundaries: Each node execution is isolated; failures don't crash workflow
All workflow definitions are JSON-serializable by design:
- No functions or maps in config (only function references/names)
- All data types have explicit type discriminators
- Edges use simple ID references