Skip to content

Instantly share code, notes, and snippets.

@zblanco
Created April 24, 2025 15:41
Show Gist options
  • Save zblanco/ebe78e9ff7f7eb3dd0a7fb151ab255b2 to your computer and use it in GitHub Desktop.
Save zblanco/ebe78e9ff7f7eb3dd0a7fb151ab255b2 to your computer and use it in GitHub Desktop.
What are graphs and why are they everywhere?

What are graphs and why are they everywhere?

Mix.install([
  {:kino, "~> 0.15.3"},
  {:libgraph, "~> 0.16.0"}
])

Graphs not Charts

A graph is

A graph data structure consists of a finite (and possibly mutable) set of vertices (also called nodes or points), together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges (also called links or lines), and for a directed graph are also known as edges but also sometimes arrows or arcs.

Charts:

charts

Graphs:

network graph

TLDR;

A graph is circles and lines. Nodes and connections. Vertices and edges.

What about DAGs?

DAG stands for Directed Acyclic Graph

i.e. Circles with lines that have an arrow between them.

The "acyclic" means no "loop backs" i.e. there cannot be "cycles" where an arrow takes you back up the graph where you've already been.

DAGs for Dataflow

DAGs are especially good at representing dataflow in a pipeline where each node/vertex represents a step where a computation occurs, transforming the input and producing a new output.

Once a step such as a in the image above completes, we know we can execute a and c in parallel.

All a dataflow graph is, is just data moving following the arrows down.

Dataflow Graphs naturally model parallelism.

Graphs can model programs which can change at runtime; managed dynamically.

DAGs / Dataflow Graphs are everywhere in computing

Flink

flink dataflow visual

Business workflow modeling

bpmn-viz

Streaming Materialized Views

materialize

Rete Network

rete

More graphs in software

Graphs in the Elixir ecosystem

Making your own DAG

Elixir graph libraries:

  • Libgraph

    • Uses consistent hashing :erlang.phash2/1 and vanilla maps and mapsets
  • Erlang STDLib's Digraph

    • Implemented before Maps were in Erlang and uses ETS tables instead
defmodule GraphTools do
  def to_mermaid(%Graph{} = graph) do
    Enum.reduce(
      Graph.edges(graph),
      "graph TD;",
      fn
        %{v1: v1, v2: v2}, acc ->
          acc <>
            "#{vertex_id(v1)}([#{vertex_name(v1)}])-->#{vertex_id(v2)}([#{vertex_name(v2)}]);"
      end
    )
  end

  defp vertex_name(:root), do: :root
  defp vertex_name(%{work: fun}), do: "step_#{fun_name(fun)}"
  defp vertex_name(%{check: fun}), do: "cond_#{fun_name(fun)}"
  defp vertex_name(fun) when is_function(fun), do: "fun_#{fun_name(fun)}"
  defp vertex_name(vertex) when is_atom(vertex), do: to_string(vertex)

  defp vertex_name(otherwise) do
    try do
      to_string(otherwise)
    catch
      _any -> :erlang.phash2(otherwise)
    end
  end

  defp vertex_id(thing) when is_atom(thing), do: to_string(thing)
  defp vertex_id(thing), do: :erlang.phash2(thing)

  defp fun_name(fun), do: Function.info(fun, :name) |> elem(1)
end
g =
  Graph.new(type: :directed)
  |> Graph.add_edges([
    {:a, :b},
    {:a, :c},
    {:b, :d}
  ])

g
|> GraphTools.to_mermaid()
|> Kino.Mermaid.new()

Making a Dataflow DAG

Functions are just data and graphs are data structures, so what if we put functions in our graphs?

fun_1 = fn num -> num * 2 end
fun_2 = fn num -> num + 2 end
fun_3 = fn num -> num * 42 end
fun_4 = fn num -> num + 42 end

fun_dag =
  Graph.new(type: :directed)
  |> Graph.add_edges([
    {:root, fun_1},
    {fun_1, fun_2},
    {fun_1, fun_3},
    {fun_2, fun_4}
  ])

fun_dag
|> GraphTools.to_mermaid()
|> Kino.Mermaid.new()

Now that we've got lambda functions in our dag - how to we evaluate it?

For some input

  1. start at the top and run that function with the input to get a new output
  2. find the next step(s) by traveling down the arrows (i.e. the out_neighbors)
  3. feed the output from the previous step into the next steps
  4. Profit
input = 42

neighbors_of_root = Graph.out_neighbors(fun_dag, :root)
first_runnables = Enum.map(neighbors_of_root, fn fun -> {fun, input} end)
first_result =
  first_runnables
  |> Enum.map(fn {fun, input} -> fun.(input) end)
  |> List.first()
fun_from_first_runnable = first_runnables |> List.first() |> elem(0)

neighbors_of_first_runnable =
  fun_dag
  |> Graph.out_neighbors(fun_from_first_runnable)
  |> Enum.map(fn fun -> {fun, first_result} end)
  |> Enum.map(fn {fun, input} ->
    fun.(input)
  end)

Wow neat - but why couldn't we just do this in normal code?

  1. Parallelization opportunities
  2. Change the dag at runtime

Sprinkling in some concurrency

neighbors_of_first_runnable =
  fun_dag
  |> Graph.out_neighbors(fun_from_first_runnable)
  |> Task.async_stream(fn fun ->
    fun.(first_result)
  end)
  |> Enum.map(fn {:ok, result} -> result end)

We could've also...

used Flow, Genstage, Pools of Genservers, spawn primitives, poolboy, oban, rabbitmq, ..........

Changing our DAG

new_fun_dag = Graph.add_edge(fun_dag, fun_from_first_runnable, fn _num -> 42 end)

new_fun_dag
|> GraphTools.to_mermaid()
|> Kino.Mermaid.new()

Summary

  • Graphs are everywhere because they're useful for modeling relationships between things such as task dependencies (e.g. dataflow)

  • Elixir is well suited for building things with dataflow graphs because the runtime is built for concurrency and distribution

Graphs let us model parallelism in a runtime-modifiable program enabling a "functional-core imperative shell" separation of concerns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment