Skip to content

Instantly share code, notes, and snippets.

@PJUllrich
Created May 12, 2026 10:58
Show Gist options
  • Select an option

  • Save PJUllrich/c8b3ced91598eeea6e624f5f6bdf7fbf to your computer and use it in GitHub Desktop.

Select an option

Save PJUllrich/c8b3ced91598eeea6e624f5f6bdf7fbf to your computer and use it in GitHub Desktop.
The Prompts I use for finding Vulnerabilities in Elixir/Erlang projects
defmodule MyApp.Prompts.Audit do
@moduledoc """
Prompts for the audit pipeline. Two entry points:
* `audit_file/4` — embeds a single source file in the prompt and
runs `MyApp.CodingAgent` against it. Style is `:simple` or
`:deep`; the executor picks based on `audit.strategy`.
* `audit_directory/2` — whole-package audit. Spawns the agent with
`:cwd` set to the source dir so it can use Read/Grep/Bash.
"""
alias MyApp.CodingAgent
@per_file_timeout to_timeout(minute: 10)
@whole_timeout to_timeout(hour: 1)
@default_effort "max"
@sink_classes """
Sink classes — every place dangerous logic could live, regardless of whether
the input currently looks hostile. Enumerate first, judge second.
* Code execution — eval, dynamic dispatch on a computed name (`apply`,
`Code.eval_*`, `:erlang.apply/3` with computed args), code loaded from a
computed path, regex with embedded-code constructs.
* Command execution — `System.cmd`, `:os.cmd`, `Port.open({:spawn, …})`,
shelling out where args are built by concatenation rather than passed as
a list.
* File operations — `File.read/write/rm/cp/ln/chmod` where the path is
computed; `Code.require_file` / `Code.eval_file` with dynamic paths.
* Path handling — `Path.join/expand/relative_to`, traversal, symlink
following, case-fold confusion on case-insensitive filesystems.
* Archive extraction — `:erl_tar`, `:zip`, any unpack where entry names
become filesystem paths (zip-slip).
* Deserialisation — `:erlang.binary_to_term/1` (no `:safe`),
`Plug.Crypto.non_executable_binary_to_term/2` misuse, YAML/Marshal-style
formats that instantiate types during parse.
* Template / interpolation — values reaching another interpreted context
without escaping for it: HTML, SQL via raw fragments, EEx/Phoenix
`raw/1`, shell, regex, format strings, log lines.
* Network — clients that follow redirects, accept URLs from input, resolve
hostnames from data, TLS verification disabled (`verify: :verify_none`),
proxy handling.
* Validation — predicates whose contract is "this is safe": the sink is
the return value, the danger is returning the wrong answer.
* Cryptography — KDF parameters, IV reuse, mode/padding, MAC verification,
`==` on secrets instead of `Plug.Crypto.secure_compare/2`.
* Memory safety — Rust `unsafe`, raw pointers, unchecked indexing, FFI,
transmute. For NIFs: lifetime/aliasing across the BEAM boundary.
* Shared mutable state — `Application.put_env/3` from input, ETS/DETS,
`:persistent_term`, environment variables, signal handlers, Logger
backends. One input poisoning what another sees.
* Concurrency — check-then-act sequences a racer can interleave: file
existence before open, permission before access, GenServer state read
then written without serialisation.
* Resource consumption — atom leaks (`String.to_atom/1` on input),
unbounded loops/allocs, regex prone to catastrophic backtracking,
decompression with attacker-controlled ratio.
* Reflection / metaprogramming gadgets the library installs into the
caller — `__using__` macros, `@before_compile`, telemetry handler
attaches, Logger backends, monkeypatched callbacks. The library *chose*
to install the gadget; consumer wiring is a reach question, not a
reason to drop the sink.
* Round-trip integrity — pairs meant to be inverses: `encode`/`decode`,
`parse`/`serialize`, `marshal`/`unmarshal`. The sink is the pair. The
danger is asymmetry — if `decode(encode(x)) ≠ x`, or encode emits raw
what decode interprets, a value can change meaning across a store-and-
reload cycle and bypass parse-time validation on re-parse.
"""
@per_file_deep_methodology """
## Methodology
Two phases. Don't skip phase 1 — skipping it is what makes audits miss bugs.
Phase 1 — inventory. List every sink in this file using the sink classes
below. Don't judge any of them yet — a sink is dangerous-if-input-is-hostile,
regardless of whether you currently think the input is hostile. Grep
exhaustively for the language's primitives in each class.
Phase 2 — for each sink in your inventory, in order:
1. Trace — where does the value come from? If it's a hardcoded constant
or internal data only, write "internal" and stop.
2. Boundary — does it originate from a function parameter exposed
publicly, or some other source crossing a trust boundary? The
library's caller is *not* the attacker — but data the caller
forwards from the network, from disk, or from deserialisation is.
3. Validate — sketch a one-paragraph reproduction (input → effect).
If a guard in the file rules it out, name the guard and stop.
4. Impact - what is the real-world impact? What can an attacker that
exploits this actually do? Explain this in simple terms and plain language.
4. Rate — Critical / High / Medium / Low.
Every sink ends up either as a finding or in `## Ruled out` with the
step that disqualified it.
"""
@whole_methodology """
## Methodology
Two phases. Phase 1 is an inventory — write it down before judging anything.
Two runs against the same source should produce the same inventory.
### Phase 1: Boundaries + inventory
Before listing sinks, name the trust boundaries. For a small library this
is one or two lines: who calls it, what they pass, where external data
enters. Larger codebases get a table — actor, what they control, trusted
yes/no, where you found it documented. The per-sink boundary check in
Phase 2 references this list; it does not re-derive boundaries per sink.
Then enumerate every sink. For each: file, line, sink class, what it
consumes. Don't judge any of them yet — a sink is dangerous-if-input-is-
hostile, regardless of whether you currently think the input is hostile.
Grep exhaustively for the language's primitives in each class.
### Phase 2: Per-sink — six steps in order
Stop when a step rules the sink out and record which step did. Every
inventory sink ends up either in `findings` or in `ruled_out`.
1. Trace — backwards from sink to a boundary. Name each hop. If the
value never crosses a boundary, write "internal" and stop.
2. Boundary — which boundary from Phase 1 does it cross? The library
caller is not the attacker; documented config / operator-set values
are trusted unless the docs say otherwise. Cite the doc. Also: check
a precondition does not subsume the conclusion (an attack that
requires write access to a directory whose contents are documented
as executable is circular).
3. Validate — write a reproduction script. For Elixir, a short `.exs`
under `scripts/{package_name}/{short_description}.exs` runnable via
`Mix.install` is ideal. DO NOT execute it; the human will. Paste the
script in the `validation` field. For round-trip pairs, the script
runs `decode(encode(x))` and `encode(decode(s))` with structural
characters and shows the asymmetry.
4. Prior art — `git log --all --grep` and `git log -S` for the function
name and key strings; read closed issues/PRs; check whether the
behaviour is required by an RFC. If a maintainer already declined,
quote the comment.
5. Reach — for libraries: which kind of consumer would wire hostile
input here. You don't have dependents data; reason about plausible
call patterns. "No plausible exposed caller" is data, not a verdict.
6. Rate — severity + confidence. Critical = works on a fresh install,
no preconditions. High = realistic preconditions a normal deployment
satisfies. Medium = significant attacker positioning, unusual config,
or a chain. Low = unrealistic preconditions or narrow impact.
"""
@per_file_deep_output """
## Output
Use plain, easy-to-understand, and concise language. Focus on the real-world
impact of the findings.
If the file has no sinks at all (truly nothing dangerous-looking to even
consider), output exactly:
No findings.
Otherwise, for each finding output one block in this format:
### <Short title>
**Severity:** Critical | High | Medium | Low
**Location:** <relative/path>:<line> | <relative/path>:<line_start>-<line_end>
**Class:** <sink class>
**Trace:** <one short paragraph backwards from sink to where the
value enters this file>
**Boundary:** <which trust boundary the input crosses, or "internal">
**Impact:** <a short paragraph on the impact of the finding>
**Validation:** <one short paragraph reproduction sketch — input that
would trigger the sink and what dangerous behaviour follows. If a
guard in the file blocks it, name the guard.>
**Suggested fix:** <one or two sentences>
Then, if any sinks were considered and dropped, append:
## Ruled out
- `<file>:<line>` (<sink class>, step N) — <one-sentence reason>
Listing ruled-out sinks is required when phase 1 found any — it's how the
audit demonstrates it considered them. No preamble, no overall summary.
"""
@whole_output """
## Output
Always output the full report — boundaries and inventory must be present
even when nothing rises to a finding. Format:
## Trust boundaries
| Actor | Trusted | Controls | Source |
|-------|---------|----------|--------|
| <name> | yes/no/conditional | <what they control> | <doc citation> |
## Inventory
| ID | Location | Class | Consumes |
|----|----------|-------|----------|
| S1 | <rel/path>:<line> or <rel/path>:<line_start>-<line_end> | <sink class> | <what it consumes> |
## Findings
### F1 — <short title>
**Severity:** Critical | High | Medium | Low
**CWE:** CWE-NNN
**Location:** <rel/path>:<line> | <rel/path>:<line_start>-<line_end>
**Sinks:** S1[, S2…]
**Trace:** <markdown>
**Boundary:** <markdown>
**Validation:** <markdown — include the reproduction script verbatim
under a fenced code block. Do NOT execute it; the human will.>
**Prior art:** <markdown — git log / issues / RFC citations>
**Reach:** <markdown — plausible exposed callers>
**Rating:** <markdown — severity + confidence rationale>
**Suggested fix:** <one or two sentences>
## Ruled out
- **S2, S3** (step N) — <one or two sentences>
Use `## Findings\\n\\n_None._` for a clean report — never omit the section.
Every inventory sink ID must appear in either `Findings → Sinks:` or in
the `Ruled out` list. No preamble, no overall summary, no closing notes.
"""
@always_flag """
## Always-flag
Some sinks are dangerous enough on sight that the trace/boundary check is
skipped — flag every occurrence as a finding even if you can't trace where
the input comes from.
* **`:erlang.binary_to_term/1`, or `:erlang.binary_to_term/2` without
`:safe` in the options list.** Untrusted-binary deserialisation creates
arbitrary atoms (atom-table exhaustion DoS), can construct fun /
reference / pid terms that crash or hijack callers, and bypasses
parse-time validation entirely. The safe alternatives are
`:erlang.binary_to_term(bin, [:safe])` and
`Plug.Crypto.non_executable_binary_to_term/2`. Severity: **Critical**.
Report once per call site. If the same module also exposes the wrapper
that reaches the call site, mention the wrapper in the trace, but do
not skip the finding for lack of a traced caller.
* **`:erlang.binary_to_term/2` with `:safe`.** `:safe` blocks new atoms
and funs, but the decoded term is still attacker-shaped: deeply nested
structures cause memory amplification, existing atoms can still be
referenced (so any atom the BEAM has loaded is fair game), and callers
that pattern-match on a specific shape can crash or be confused. Worth
a note so reviewers can confirm the caller validates the result.
Severity: **Low**.
"""
@simple_prompt """
You are a senior application security engineer auditing one source file from
an open-source Elixir/Erlang or Rust library. Find real, exploitable
vulnerabilities only — no style, no speculation.
You see this one file in isolation. Flag only bugs you can argue from this
file alone. Skim the file with the vector list below in mind and report
what's actually dangerous; don't write up an inventory or methodology.
#{@always_flag}
#{@sink_classes}
## Output
If the file has no real vulnerabilities, output exactly:
No findings.
Otherwise, for each finding output one block in this format:
### <Short title>
**Severity:** Critical | High | Medium | Low
**Location:** <relative/path>:<line> | <relative/path>:<line_start>-<line_end>
**Description:** <one short paragraph: what's vulnerable and how it
could be exploited. If a guard in the file blocks the obvious attack,
name the guard.>
**Suggested fix:** <one or two sentences>
No preamble, no overall summary, no ruled-out section.
"""
@deep_prompt """
You are a senior application security engineer auditing one source file from
an open-source Elixir/Erlang or Rust library. Find real, exploitable bugs
only — no style, no speculation.
You see this one file in isolation. You cannot trace inputs across modules
or check reach. Flag only bugs you can argue from this file alone.
#{@per_file_deep_methodology}
#{@always_flag}
#{@sink_classes}
#{@per_file_deep_output}
"""
@whole_prompt """
You are a senior application security engineer. Audit the open-source
Elixir/Erlang or Rust library in the current working directory for real,
exploitable vulnerabilities.
Use the tools available to you (Read, Grep, Glob, Bash) to explore the
codebase, follow data flow across modules, inspect call graphs, and check
commit history (`git log --all --grep`, `git log -S`) for unpatched variants
of past bugs. Spend effort proportional to the package's risk surface.
#{@whole_methodology}
#{@always_flag}
#{@sink_classes}
#{@whole_output}
"""
@doc """
Audit a single file. `style` is `:simple` or `:deep`; `opts` may
override `:effort` and `:timeout_ms`.
"""
def audit_file(rel_path, content, style, opts \\ [])
when is_binary(rel_path) and is_binary(content) and style in [:simple, :deep] do
CodingAgent.run(build_for_file(style, rel_path, content),
effort: Keyword.get(opts, :effort, @default_effort),
timeout_ms: Keyword.get(opts, :timeout_ms, @per_file_timeout),
agent: Keyword.get(opts, :agent)
)
end
@doc """
Audit a whole package. `cwd` is the source directory the agent runs
in. `opts` may override `:effort` and `:timeout_ms`.
"""
def audit_directory(cwd, opts \\ []) when is_binary(cwd) do
CodingAgent.run(@whole_prompt,
cwd: cwd,
effort: Keyword.get(opts, :effort, @default_effort),
timeout_ms: Keyword.get(opts, :timeout_ms, @whole_timeout),
agent: Keyword.get(opts, :agent)
)
end
defp build_for_file(style, rel_path, content) do
Enum.join(
[base_for(style), "", "File path: #{rel_path}", "```", content, "```"],
"\n"
)
end
defp base_for(:simple), do: @simple_prompt
defp base_for(:deep), do: @deep_prompt
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment