Orchestration is the practice of programming in the large.
In rigorous terms, orchestration represents higher-order programs that wire up first-order programs. Consider the analogy of wires and components. This is a first order program. A higher program would be where the components (along with their connecting wires) can be sent as data through the wires of a "higher" program. This is the same idea as higher order functions (where they are functions on functions), and higher order logic and also exponential objects in categories or the concept of "closed categories". A truly higher order program reuses the same primitives in the first order progam to program at the higher level. This reuse of components allows you to build recursively higher programs allowing one to encapsulate complexity at arbitrary levels.
Another common situation with orchestration is that the components of orchestration are often abstracted from the orchestrator. That is the orchestrator may work with "black boxes" with defined interfaces, but not the ability to manipulate or introspect the internal workings of the component. This makes type-safety difficult, unless we can trust that the components have been built according to a specification.
Real life orchestrators very rarely represent the ideal of what is described above. Instead there are various trade-offs made in order to make a practical orchestrator.
Some ideas that you should understand:
- Pull-architecture means the receiver for information initiates communication.
- Push-architecture means the sender of information initiates communication.
- Tight coupling vs loose coupling. Coupling simply means that a component knows about another component. The more that component knows about that component, the more tightly coupled the knowing component is. Note that this relationship is not commutative. That is a component
A
can know about componentB
but componentB
may not know about componentA
. This means componentA
is coupled to componentB
but componentB
is not coupled to componentA
. There is a spectrum of tight coupling to loose coupling depending on how much information is hardcoded. For example if you work against standard communication protocols, this is less coupled than when you work against a communication protocol that only the other component supports. Read more here: https://en.wikipedia.org/wiki/Coupling_(computer_programming) - The orchestrator generally knows about the orchestrated components. It has knowledge of them because it needs to know how to start them. The orchestrated components does in general need to know about the orchestrator. This means if the orchestrator needs information from the orchestrated, it does this by polling the orchestrated components. This would be a pull-architecture. If the orchestrator and orchestrated knows of each other, then they are tightly coupled to each other.
- The orchestrator may only natively support a single language, or support multiple languages. This is really about the boundary of control and what executable format it supports along with the IPC protocols that it supports. Some orchestrator technologies specify a stringent set of executable formats and IPC protocols, while others are far more flexible. The choice of executable format and communication protocol affects how the orchestrator can be used.
The OS shell is an orchestration language. Consider the shell bash
or zsh
. It
has the ability to orchestrate Unix executables with a standard communication
protocol of FIFO pipes and other IPC mechanisms. It also the ability to share
side-effectful state via the filesystem.
It is generally designed to work on a single local node. However it is a turing complete programming language, and the same language is used for both interactive commands at a REPL prompt (terminal), and as scripts for encapsulation.
AWS Step Functions is an Orchestrator.
AWS Step Functions runs tasks. There are 2 kinds of tasks: AWS Lambda Functions and Activity Tasks.
Lambda Functions are scripts with defined lambda function entrypoints. This is important because there is no binary executable and loadable format defined for AWS Lambda. They are quite limited in their execution environment like 5 minutes runtime.
Activity tasks can be ECS containers. And then can be long running. AWS Step Functions expects that the ECS container process to notify AWS Step Functions through HTTP APIs (provided by language-specific SDKs) on their task status. The relevant API methods are:
GetActivityTask
SendTaskSuccess
SendTaskFailure
SendTaskHeartbeat
Even AWS Step Functions is incapable of pushing information down to the tasks. Instead the tasks are expected to launch and query the AWS Step Functions for input information. (Note that at this point in time, AWS Step Functions cannot directly launch ECS tasks, but instead they only do it by-proxy of a AWS Lambda Function). This means AWS Step Functions does not have 1 single language for orchestration, but instead in order to use ECS tasks, you must spread out the orchestration language between the JSON configuration of AWS Step Functions and custom orchestration instructions in the AWS Lambda Function.
AWS Step Functions follows a push architecture from the perspective of the orchestrated components. The components are expected to push information to AWS Step Functions. This is not an orchestrator that polls its component tasks which could be a pull-architecture.
For example passing a variable to multiple stages in a pipeline which is very trivial on OS shell orchestrators is not possible with AWS step functions:
foo=bar
command1 $foo | command2 $foo | command3 $foo
This inverts the coupling relationship between the application-level and infrastructure-level. AWS is able to maintain genericity/modularity in their orchestrator implementation. They don't need to hardcode knowledge about your system. You cannot maintain genericity/modularity and need to hardcode knowledge about their system.
https://github.com/tweag/funflow
https://www.tweag.io/posts/2016-02-25-hello-sparkle.html
http://docs.dask.org/en/latest/spark.html
That's us!