Skip to content

Instantly share code, notes, and snippets.

@emdash
Last active January 2, 2022 05:47
Show Gist options
  • Save emdash/bcb8e2a950634da215bfdbb8805b601b to your computer and use it in GitHub Desktop.
Save emdash/bcb8e2a950634da215bfdbb8805b601b to your computer and use it in GitHub Desktop.
On "Shell Actors" [Working Title]

Overview

I propose extending the the existing notion of "structured data over pipes" to a paradigm that embraces explicit message-passing semantics.

I outline a set of complementary ideas, united under the banner of some suitably pithy term of art -- such as: shell actors, record streams, shell sockets, etc (the bikeshedding about which I wholeheartedly encourage).

Central in this paradigm is the abstract notion of a channel, supporting "atomic send and receive" operations through some concrete implemenation.

Near the bottom rests a proposed concrete framing protocol with a set of useful properties I suspect will prove essential, and which may take on a life of its own, independent of oil or any other shell.

Some of these concepts may dovetail, overlap with, or duplicate, existing proposals or planned features.. Bring these to my attention, and I will update the proposal to reflect these things. I have a lot to catch up on.

Terminology

This is a summary of each of the key terms. A few are elaborated further in the Details section.

channel

Of all the notions presented, this is the only one I propose to reify as a first-class shell construct.

A channel is a shell interface providing "atomic" send and receive over some transport layer.

Like a pipe, a channel logically has two ends. A given process generally only sees one end.

Unlike a pipe, a channel only supports sending data in discrete messages.

  • Slogan: "A channel preserves message boundaries".
  • Slogan: "A channel is a record-oriented abstraction".

The fundamental operations a channel provides should be thought of as send and receive, regardless of the surface syntax or API eventually chosen.

A channel will usually wrap an underlying file descriptor with some concrete framing protocol. But this is an implementation detail that is only surfaced when the channel is created. Once created, a channel is an opaque object which only supports send, receive, and close operations of discrete message payloads.

transport layer

A transport layer is some mechanism by which data may be sent and received. It will usually be backed by a file descriptor, but it need not be.

From the perspective of OSH/Oil code, a transport layer is a potentially hidden implementation detail. A transport layer may or may not have some concrete representation observable by OSH/Oil code.

Most often, it will be a unix pipe, but it could be implemented in some more exotic fashion, e.g. shared memory.

message

  • aka record
  • aka frame

A self-contained chunk of data sent or received over a channel.

The contents of a message is referred to as its payload.

In OSH / Oil, a message will have a concrete payload type of either:

  • string, if the payload can be safely presented as "plain text".
  • bytestring, if the payload contains some arbitrary byte sequences.

Any individual message, will be either text or binary. Any given channel may support text, binary, or both message types, depending on its concrete format.

A message payload is sent "by value" regardless of the transport layer. In other words: it is "logically copied". No downstream receiver should ever observe non-local mutations to payload it has received. I.e. transport layers should never simply pass pointers.

There shouldbe absolutely no "spooky action at a distance". Any such observable behavior should be considered a serious bug.

bytestring

A concrete shell data type, the moral equivalent of python's bytes type.

  • initially mutable, can be explicitly frozen.
  • Oil could provide some builtins for marshalling common formats (msgpack, utf8, etc) into native types.
  • or, scripts can manually decode the contents,
  • or, scripts can forward to them an external process in a variety of ways.

This type corresponds to a binary message. The channel format must support binary data to send or receive a bytestring value.

bytestream

An interface which models a uni-directional, linear sequence of bytes, of unknown length. It can be either read from, or written to, but not both.

The semantics are the moral equivalent of posix read and write operations on a pipe. The interface provides no support for random access.

A transport layer will usually be some form of bytestream, but need not be so long as it can emulate message-passing semantics.

format

AKA framing protocol.

Any scheme for implementing channel mesage-passing semantics in terms of bytestreams, as described above.

See Formats in the details section for more.

Oil External Protocol (OEP)

A particular format optimized for maximum flexibility.

See the Details section for more.

Adapter

AKA shim.

Any external tool which can produce or consume structured data via OEP over pipes, or any other supported combination transport layer and format.

Terms defined elsewhere

  • "actor", as in "the actor model"
  • "process" as in a unix process
  • TBD: scrub this for any other borrwoed terms.

Applications / Motivating Examples

The theme is that efficient support for binary records can extend the reach of shell to new niches, and ease the use of shell in its current niches.

In no particular order....

Process Isolation: Communicating with sandboxed processes

A process isolated within a container, vm, or other sandbox environment can be difficult to interact with. Network and filesystem access is generally restricted, forcing developers to specify application-specific configuration of the sand-boxed environment.

Unix pipelines would offer a simple method, but this is generally avoided for performance and security reasons, as it is usually restricted to plain-text protocols with all the attendent quoting hassles.

OEP channels would allow efficient communication with sandboxed processes operating on (potentially large) binary payloads through ordinary shell pipelines.

Distributed Systems

Using essentially the same approach as above, we can build distributed systems.

A shell pipeline operating with channel semantics essentially becomes an actor. A pipeline can't tell whether its components are

  • shell functions, running in the same shell process
  • some external process on the same machine
  • some process running on a remote machine
  • a real or virtual hardware device

In addition to shell-centric processing of messages, it's worth pointing out that, OEP in particular, and other supported formats, in general, may be forwarded directly over SSL, TCP, HTTP(S), and is trivially adapted to WebSocket, MQTT, RSS, etc.

Media processing

Somewhat outrageously, an entire distributed media processing framework could be constructed in terms of tools that produce and / or consume OEP, all coordinated via simple shell scripts. Consider:

  • a demuxer reads a media stream, and outputs cmpressed frames via OEP
  • a decoder consumes the compressed frames, and outputs raw frames via OEP.
  • or, a shell script could distribute a large workload among multiple worker hosts.
  • adapters could be written to support, e.g. hardware codecs, or GPU processing.

Concrete Example: Transcoding

A transcoder reduces a pipeline of the form demuxer <options> | decoder <options> | encoder <options> | muxer <options>

Shell scripts could be used to provide "dynamic autoplugging" of components (in the style of the GStreamer framework), when the media formats in question are not statically known.

Concrete Example: CGI Render Farm

A CGI rendering process might consume scene data from upstream via OEP, producing rendered frames downstream via OEP.

The workload can be flexibly distributed across available machines.

For example, a render node could the user's own gaming rig, or an ephemeral GPU instance rented temporarily, or an entire dedicated render farm, just by making small changes to a shell pipeline.

Concrete Example: Webcam Server

A shell script can be used to coordinate existing command-line tools into a custom webcam server.

  • a thin "v4l adapter" captures video frames froma v4l device, and writes them to stdout using OEP.
  • a thin "broadcast adapter" consumes the incoming frames via OEP from stdin, and serves them to network clients on a specified port, using some standard streaming protocol.

Punchline:

  • v4l_adapter $video_device | encoder $enc_opts | muxer $mux_opts | broadcaster $port
  • all this happens in memory

Event-Driven Programming

channel semantics AKA actors enable event-driven programming.

Concrete Example: Rich Interactive Shell

interactive shell hooks can now easily consume arbitrary structured data from:

  • a shell function
  • an external process
  • a kernel event stream
  • any other data source thus-far mentioned.

any interactive shell feature, including: command completion, history queries, etc, can be now easily moved out of process if they prove too cumbersome / slow in shell.

punchline: your shell REPL could provide interactive error-checking and syntax highlighting from an external process communicating via LSP.

punchline: your shell prompt can be dynamically refreshed to stay in sync with any external data source you care to include. Eg:

  • a realtime clock
  • weather conditions
  • bus arrival times
  • current network status
  • current removable devices
  • etc....

Reclaiming the "System Layer"

AKA "SysVI Init" (i.e. what comes after "SysV Init").

The "system layer" encompasses what we used to call an "init system", but which now encompasses a whole host of far more dynamic behavior, namely:

  • device discovery at boot
  • service management, at boot and upon user request
  • device hot-plugging
    • including, but not limited to, removable media
  • network configuration
  • hardware configuration and dynamic control
  • tuning the system via kernel interfaces
  • responding to ACPI events

Historically, this was all done with shell scripts. This worked fine when hardware configurations were relatively static; more recently, the dynamic, and event-driven nature of modern hardware and kernels has motivated the shift away from shell-based init systems.

By supporting event-driven programming, shell scripts can reclaim their birthright as the implementation language of choice for the "system layer" in modern distros.

All that's required is a suite of adapters to present a uniform, channel-oriented view of the various ad-hoc kernel interfaces (evdev, acpi, udev, etc).

Other Project Ideas (TBD)

  • media tags db example:

    • imagine a database containing large-ish image files, and we want to extract all the EXIF tags.
    • write a shell script to do this.
      • implement a thin SQL client adapter to output each image file as an OEP record on stdout
      • iterate over each image record with a plain shell loop
      • pass each opaque bytestring directly imagemagick which in turn prints them directly to stdout as text.
      • note that all processing occurs without creating intermediate files.
  • CAN-bus decoder

    • implement a CAN adapter which forwards CAN bus messages to stdout using OEP.
      • requires a system with a CAN bus interface, like Raspberry PI or a USB dongle
    • process the CAN data via shell.
      • can dynamically look up CAN PIDs
    • trouble-shoot your CEL using a shell script!

Details

write builtin

Inverse, or dual, of read builtin.

Some mechanism must be introduced which can transferi the contents of a string or bytestring directly to a bytestream with minimal overhead, and with no undesired or implicit alterations to the payload.

echo, apparently, is not a builtin.

Formats

A format structures a bytestream into messages.

  • encodes a bytestrings as a binary frame, if supported.
  • encodes a string as a text frame, if supported.

Formats should also:

  • propagate in-band errors to a shell trap
  • gracefully handle eof.

Formats that should be supported:

  • text:

    • raw, with explicit delimiters and escapes.
      • you must specify at least how the delimiter is escaped.
      • can't support all variants, and that's the point
    • textual record formats: QSN, newline-delimited JSON, etc
    • OEP
  • binary:

    • raw: fixed length packet, no delimiter
    • raw: length-prefixed binary, fixed max length
    • base64: with explicit delimiter (chosen from a white-list)
    • netstring
    • OEP

Oil External Protocol (OEP)

Oil should define a concrete binary framing protocol which allows mixing text and binary frames with low overhead.

Websocket's framing format is the inspiration. See: https://datatracker.ietf.org/doc/html/rfc6455#section-5 for the relevant portion of the spec.

Here's a summary of its useful properties:

  • distinguishes payload type at the frame level, allowing mixed mode channels.
  • imposes no arbitrary limit on payload size.
  • requires minimal processing of the payload data itself.
    • just bitwise XOR
  • efficiently encodes a range of payload sizes:
    • uses length prefixing up to some threshold (64k I think).
    • uses single byte length for small packets
    • larger payloads automatically chunked, in which case a flag on the final packet establishes the message boundary.
  • provides some safety against bogus or malicious payloads
    • user cannot inject arbitrary bytes on the underlying bytestream, intentionally or otherwise.
      • the payload can be masked with a platform-chosen 32-bit value. this can be disabled for debugging.
    • an additional checksum helps detect malformed payloads.
  • allows for low-latency (see below)
  • ease of implementation and specification, since it can be derived from some existing work.
  • provides out-of-band signal path for errors, control, or other unanticipated needs.

Reference implementation would exist as stand-alone library against which oil links, enabling 3rd party tools to re-use Oil's internal implementation.

It would also be fully specified, to encourage 3rd party implementations (pure python, rust, etc, haskell, etc).

I'm open to renaming this something catchy and lipid-themed.

Support "Low Latency" for Event-Driven Code

Websocket enables low-latency message delivery over HTTP. OEP should enable low-latency message delivery over bytesteams.

Note: this is latency on par with websockets. Good enough for interactive updates, not necessarily good enough for real-time audio.

At minimum, a send on a channel should implicitly fflush the underlying file descriptor, so that buffered IO doesn't cause unpredictable delays downstream.

Obviously one can take it further than that, but this would get the ball rolling and work well enough for many uses.

Alternatively, if throughput is paramount, some sort of shell option or channel configuration can suppress this behavior.

Adapters

Adapters are small, potentially even trivial external tools which lift and / or lower data to / from OEP.

They are just external tools which sit on either end of a shell pipeline.

Ideally, perform minimal processing of the payload, besides optionally marshalling well supported formats.

Examples:

  • websocket client adapter.
    • establishes connection, forwards messages to stdin/out.
    • supporting OEP allows routing of even binary messages.
    • any process speaking OEP now also speaks websocket.
  • polling adapter
    • output contents of a file to OEP at a specified rate
    • useful for testing / debugging
  • inotify adapter
    • like above, but receives kernel notifications to avoid polling and lower latency.
  • udev adapter
    • watch for hotplug events
    • useful for system-layer (init scripts).
  • CAN bus adapter
    • useful for hardware hacking on devices with CAN bus interfaces.
    • you can talk to your ECU with a shell script!
  • sql client adapter
    • handles (potentially large) binary database records
    • connects to a local or remote db, and either
      • executes a SELECT query, lifting the result to OEP, or
      • executes an INSERT query, consuming OEP

optional: "internal IO" aka "var redirection"

Shorthand for treating contents of variable as stdin/out of a single command. Might be handy. Not strictly necessary. Maybe not all that readable.

  • something like bar <$ foo, with optional bar $> foo,
    • the latter is equivalent to foo=$(bar);

optional: async receive

Oil could provide some mechanism to invoke a shell function, oil process, or oil block whenever a channel receives message.

It's unclear to me if this feature is truly necessary, but it mirrors the asynchronous API which browsers provide for async websocket message receipt.

Data on the channel would decodes in the background, while normal script execution continues.

This facility might simplify the implementation of interactive shell features convenicences, such as command-completion, while making them more responsive.

However, it potentially requires that Oil/OSh implement an internal event loop to handle the background decoding, a heavy-weight component that traditionally shells have not required.

Moreover, years of working with callbacks in languages like JavasScript suggests that callback-based interfaces are not desirable. I'm not particularly thrilled at the prospect of appending "callback hell" to the already lengthy list of shell-programming pitfalls.

Languages like JavaScript and C use callbacks to work around their inherent single-threadness and lack of constructs for explicit concurrency. Shell, which has long embraced explicit concurrency, can express the same patterns via communicating processes, each with its own "locally synhronous" view of the world.

I'm not entirely opposed, I will wait until a convincing argument in favor of -- or some concrete concrete code demonstrably improved by -- this feature, presents itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment