LLM Agent tools are converging on graph based workflow solutions for handling the deterministic control problems amongst the dynamic nature of generative AI
What does solving this AI Agent orchestration problem look like?
AI agent tools are converging on similar APIs that may involve abstractions like prompting, templates, actions, conversations, or structured outputs.
These components are often chained together with normal code sometimes dynamically during inference with an LLM.
The projects building these AI Agent tools are converging on this "workflow" of steps model.
But workflows aren't a new thing. They go back to old school AI algorithms like RETE and prolog. Many ETL and data pipeline tools these days are using these kinds of dataflow connected graphs.
There's a lot of deep software engineering going on in building scalable, distributed, serverless workflow systems in the modern cloud.
So is it easier to add AI Agents to a modern workflow engine or a workflow engine to an AI Agent framework?
One factor is how big the workflow engine is and what's involved in using one or implementing with it.
Another angle is how high of an abstraction the workflow solution is providing.
Third is if there is any rationale in using an existing system or building both outright to control for user experience.
Runic is a library designed to add dynamic workflow composition without requiring a whole service with a database or a cloud subscription - just a small functional dependency.
However to have that small a surface area it's also not providing persistence or a runtime topology for scheduling tasks and following up.
Most Workflow Engines, Serverless Systems, Low-Code, DAG pipeline, Actor Models, Durable Object -esque systems require a lot of extra pieces like queues, databases, web servers and GUIs in practice so they're bundled in.
Ideally these are small pieces with optional choices like if you want to use SQLite and DuckDB over Postgres and Redis over RabbitMQ.
However this sentiment of modularity is the sort of thing only engineers care about. The majority just wants some specific thing done for them (services) or to wield a fleet of AI agent minions.
That's what we're building right? Smart computers to help the humans out?
These workflows are a mechanism of deterministic control made for dynamic problems. During inference an Agent might want to choose tools to invoke, maybe a variety of them at once. The inference is dynamic, and could be anything especially given the user's context, but the workflow, even if its existence is transitory, represents a deterministic runnable artifact. Like an incantation imbued with mana ready to be cast.
The conversational model of LLMs feels accidental to the first productized interface being that of chat. But the loop of producing one or more generated messages and waiting on the next user interaction to follow feels fine.
The best we can seem to do is narrow to the single prompt: the google search interface just type and enter.
But we can also see that while everyone is building the same "prompt and we generate slop for you" app the specific slop is different and you don't always want to interact with a plucky verbose assistant that only replies politically safe.
The variety of preferred interactions we'll want with these large language models trained on all human knowledge is as varied as the models themselves.
We'd be fools to take a state space as wide as "all human knowledge" and access it in such a narrow way.
So at this point most of these AI tools have different "Agents" and workflows to choose from. Reasoning mode? Search? Research Reports? GPT4.5-0RLY-mini(Limited)
?
But these AI apps are still exposing only the workflows the company providing the service has produced internally. It's a closed garden that will inevitably fail to contain the variety of AI.
If you want to make your own you've got to "use the API." This is fine but its a big jump of need-to-know and what if someone has already made a workflow suitable for your weird niche interest?
Thus even if LLMs are commoditized, perhaps the moat defending one of these AI companies will be the network effects of a social network aggregating the workflows leveraging the AI?
- A set of modular workflow engine components
- A suite of AI components and developer tools to use the workflow engine
- A content aggregator to collect community contributions
- A managed cloud solution for the wider audience
The first two should simply be open source software. A workflow engine has to be self hostable and hackable and extensible like a webserver is. Getting widespread adoption for such a developer centric user base is impossible or ill-advised without being OSS.
The content aggregator and the managed solution are both naturally centralized and the driver of revenue.
- LangGraph / LangChain
- Haystack
- Jido
- Ember Framework
- Selvedge
- DSPy
"Declarative Self-Improving Python -- compositional Python code"
- Google ADK
- Agent Development Kit
- PydanticAI
- Huggingface Smolagents
- CrewAI
- links to docs containing their dynamic workflow abstraction
- OpenAI's "Practical guide to building agents"
- Flowise
- Gumloop
- TLDRaw Computer
Dynamic workflows are not a new thing
-
Maybe existing workflow tools add AI features?
-
Will the AI agent framework folks get widespread adoption or will the workflow engine folks get people on their platforms with AI?
-
Or will neither get very far and will developers just use lower level developer tools and build into concrete domains?
-
Probably all of the above