Skip to content

Instantly share code, notes, and snippets.

@Dev-iL
Last active May 15, 2026 11:53
Show Gist options
  • Select an option

  • Save Dev-iL/3ee77257fff578b884a0f7492d082093 to your computer and use it in GitHub Desktop.

Select an option

Save Dev-iL/3ee77257fff578b884a0f7492d082093 to your computer and use it in GitHub Desktop.
Claude skill for creating Airflow-specific Ruff rules (`new-airflow-rule`)

new-airflow-rule

A Claude Code skill that guides the end-to-end implementation of a new Airflow lint rule (the AIR prefix) in the Ruff codebase.

What it does

When invoked, the skill walks the agent through:

  1. Pre-implementation interview — establishes the Airflow version target (2, 3, or both), the DAG-API surface (TaskFlow vs. operator API vs. both), and whether a local Airflow checkout is available for validation.
  2. Rule numbering — picks a code in the right range (AIR0xx for general best-practice, AIR3xx for migration rules).
  3. Code generation — produces a rule file from one of two skeletons, registers it in codes.rs, exports it from rules/mod.rs, and wires up the dispatch in checkers/ast/analyze/{statement,expression}.rs.
  4. Fixture + snapshot workflow — creates a positive/negative test fixture, runs cargo nextest, accepts insta snapshots, regenerates schemas via cargo dev generate-all, and runs clippy + uvx prek run -a.
  5. Real-world validation — runs the new rule against a local Airflow checkout (if available) to surface false positives before finalizing.

The skill encodes conventions specific to this codebase — Dag/DAG terminology, seen_module(Modules::AIRFLOW) guards, is_guarded_by_try_except for migration rules, the dual-path ["airflow", "decorators", ...] | ["airflow", "sdk", ...] matching pattern, any_qualified_base_class for transitive inheritance, and so on — so the agent doesn't have to rediscover them from the source tree on every rule.

When it activates

The skill's frontmatter description triggers it on phrases like "add a new airflow rule", "create an airflow lint rule", "implement an airflow inspection", "new AIR rule", and similar formulations. The agent invokes it via Claude Code's Skill tool before writing any code.

Structure

new-airflow-rule/
├── SKILL.md              ← workflow + numbering scheme + key conventions + reference index
└── references/
    ├── import-paths.md   ← Airflow 2-to-3 module path mapping table
    ├── templates.md      ← Rust skeletons for AIR0xx (best-practice) and AIR3xx (migration) rules
    ├── helpers.md        ← Inventory of `crates/ruff_linter/src/rules/airflow/helpers.rs` utilities
    ├── patterns.md       ← Common code patterns (decorators, dual-dispatch, string scanning, …)
    ├── documentation.md  ← Rule docstring structure, terminology, message vs. fix-title conventions
    └── pitfalls.md       ← Review-informed bugs to avoid

SKILL.md is the entry point and is loaded into the agent's context when the skill is invoked. It contains the workflow, the 12-step checklist, the rule-numbering table, and key conventions — everything the agent always needs.

The references/ files are loaded on demand. Each checklist step points to the relevant reference (e.g., Step 2 → templates.md, Step 6 → pitfalls.md for the deferred-annotation gotcha). This split keeps the always-resident context small while still making the long-tail detail available when a step needs it.

Reference file purposes

File Read when …
import-paths.md The rule must match both Airflow 2 (deprecated) and Airflow 3 (airflow.sdk) import paths.
templates.md Creating a new rule file (Step 2 of the checklist).
helpers.md Looking for an existing helper before writing a new utility, or unsure which existing one applies.
patterns.md Implementing a non-trivial pattern: decorator detection, dual-dispatch for TaskFlow + operator API, recursive return collection, mixed-content string scanning, etc.
documentation.md Writing the rule's ## What it does / ## Why is this bad? / message / fix-title text.
pitfalls.md Writing predicates that resolve names, chain fallback checks, or work with annotation expressions.

Maintaining this skill

When a Ruff PR review surfaces a bug, missing convention, or footgun that future rule authors should know about:

  1. If it's a review-informed gotcha (e.g., a bug pattern an agent could repeat), add it to references/pitfalls.md.
  2. If it's a new shared helper or idiomatic API, add it to references/helpers.md (and ensure the helper itself lives in airflow/helpers.rs).
  3. If it's a recurring code pattern, add it to references/patterns.md with a pointer to the rule that exemplifies it.
  4. If it's a docstring/message convention, add it to references/documentation.md.
  5. If it's a workflow correction (the order of steps was wrong, a command needs a different flag), update SKILL.md directly.

Keep SKILL.md short — under ~250 lines is a good target. Detail belongs in references/.

Documentation Guidelines

Rule documentation (the doc comments in the rule struct) must follow these conventions.

Structure

Use this exact section order (include only sections that apply):

  1. ## What it does (required) — One-line description of what the rule checks for
  2. ## Why is this bad? (required) — Explanation of why the pattern is problematic
  3. ## Example (required) — Python code that triggers the violation
  4. Use instead: (required) — Corrected Python code
  5. ## Fix safety (optional) — Notes about fix edge cases or when fixes might be unsafe
  6. ## Options (optional) — Configuration settings that affect the rule
  7. ## References (optional) — Links to external documentation

Terminology

  • Dag (capitalization):

    • Use DAG when referring to the class: "The DAG() constructor..."
    • Use @dag when referring to the decorator: "Functions decorated with @dag..."
    • Use "Dag" (proper noun, no backticks) when referring to a workflow as a general concept: "If your Dag does not have...", "the serialized Dag hash"
  • Task terminology:

    • Use @task for the decorator
    • Use "task" (lowercase, no backticks) for the general concept
    • Use specific operator names in backticks: PythonOperator, BranchPythonOperator
  • Airflow versions: Write as "Airflow 2", "Airflow 3", "Airflow 3.0", "Airflow 3.1" (capitalize, no backticks)

  • Code references: Always use backticks for:

    • Class names: DAG, BaseOperator
    • Function/method names: execute(), datetime.now()
    • Decorators: @dag(), @task.branch
    • Parameters/arguments: schedule, task_id, python_callable
    • Module paths: airflow.operators.python, airflow.sdk
    • String literals in explanations: `"schedule"`

Style Guidelines

  • Conciseness: Keep "What it does" to one sentence. Expand details in "Why is this bad?"
  • Active voice: Prefer "Using X causes..." over "X may cause..." or "It is possible that X causes..."
  • Imperative starts: Begin sentences in "Why is this bad?" with action verbs: "Using...", "This leads to...", "These symbols were removed..."
  • Specific consequences: Don't just say "this is bad practice"—explain the actual impact (performance, compatibility, maintainability)
  • Code examples: Keep them minimal but complete enough to demonstrate the issue. Include necessary imports.

Message Guidelines

Error messages (the message() method) should:

  • Be a generic description of the problem — do NOT include context-specific suggestions in the message
  • Use backticks around code symbols: "`{deprecated}` is removed in Airflow 3.0"
  • Avoid starting with "Checks for" or "Detects" (this is for documentation, not messages)

Fix titles (the fix_title() method) should:

  • Contain context-specific suggestions (e.g., "Use Jinja templates" vs "Move into a @task-decorated function")
  • Start with an imperative verb: "Use schedule", "Replace with...", "Remove..."
  • Be very brief (2-5 words when possible)
  • Even without an actual auto-fix, fix_title() is displayed as a separate help: ... line in diagnostics

Pattern: Generic message + context-specific fix title (from AIR003):

fn message(&self) -> String {
    "`Variable.get()` outside of a task".to_string()  // Generic
}

fn fix_title(&self) -> Option<String> {
    if self.in_function {
        Some("Move into a `@task`-decorated function".to_string())  // Context-specific
    } else {
        Some("Use Jinja templates instead".to_string())  // Context-specific
    }
}

Examples from Existing Rules

Good documentation structure (from AIR002):

/// ## What it does
/// Checks for a `DAG()` class or `@dag()` decorator without an explicit
/// `schedule` parameter.
///
/// ## Why is this bad?
/// The default value of the `schedule` parameter on Airflow 2 is
/// `timedelta(days=1)`, which is almost never what a user is looking for.
/// Airflow 3 changed the default value to `None`, which would break
/// existing Dags using the implicit default.
///
/// ## Example
/// ```python
/// from airflow import DAG
///
/// # Using the implicit default schedule.
/// dag = DAG(dag_id="my_dag")
/// ```
///
/// Use instead:
/// ```python
/// from datetime import timedelta
/// from airflow import DAG
///
/// dag = DAG(dag_id="my_dag", schedule=timedelta(days=1))
/// ```

Note the use of:

  • "Dag" (proper noun) in prose: "would break existing Dags"
  • `DAG()` and `@dag()` for class and decorator
  • Backticks for parameters: `schedule`, `timedelta(days=1)`
  • "Airflow 2" and "Airflow 3" (capitalized, no backticks)

Shared Helpers (helpers.rs)

The crates/ruff_linter/src/rules/airflow/helpers.rs module provides shared utilities. Import from crate::rules::airflow::helpers.

Replacement Enums

Used by migration rules to describe what replaces a deprecated symbol:

  • Replacement — for builtin/SDK moves:

    • None — no replacement available
    • Message(&'static str) — custom message, no auto-fix
    • AttrName(&'static str) — attribute renamed (e.g., dataset to asset)
    • Rename { module, name } — moved to new module with new name
    • SourceModuleMoved { module, name } — module changed, name stays
    • SourceModuleMovedToSDK { module, name, version } — moved to SDK
    • SourceModuleMovedWithMessage { module, name, message, suggest_fix } — with custom message
  • ProviderReplacement — for provider migrations (AIR302/312):

    • Rename { module, name, provider, version } — moved to provider with rename
    • SourceModuleMovedToProvider { module, name, provider, version } — module changed
  • FunctionSignatureChange — for AIR303:

    • Message(&'static str) — describes the signature change

Try-Except Guarding

Prevents false positives when symbols are conditionally imported:

use crate::rules::airflow::helpers::is_guarded_by_try_except;

// Skip if the usage is inside a try-except that catches ImportError/AttributeError
if is_guarded_by_try_except(expr, "airflow.old_module", "OldName", checker.semantic()) {
    return;
}

This checks whether the expression is in a try-except block that:

  • For imports: catches ImportError or ModuleNotFoundError, and the try block imports from the new location
  • For attributes: catches AttributeError, and the try block accesses the new attribute

Fix Generation

use crate::rules::airflow::helpers::{generate_import_edit, generate_remove_and_runtime_import_edit};

// When symbol name changes (e.g., Dataset -> Asset):
if let Some(fix) = generate_import_edit(checker, stmt, "old_name", "new_module", "new_name") {
    diagnostic.set_fix(fix);  // Safe edit
}

// When module changes but name stays (provider migration):
if let Some(fix) = generate_remove_and_runtime_import_edit(checker, stmt, "new_module", "name") {
    diagnostic.set_fix(fix);  // Unsafe edit
}

Class/Module Identification

use crate::rules::airflow::helpers::{is_airflow_builtin_or_provider, is_method_in_subclass};

// Check if qualified name matches airflow.<module>.**.*<suffix> or provider equivalent:
is_airflow_builtin_or_provider(segments, "operators", "Operator")
is_airflow_builtin_or_provider(segments, "secrets", "Backend")
is_airflow_builtin_or_provider(segments, "hooks", "Hook")

// Check if a method is defined in a subclass of a specific base:
is_method_in_subclass(function_def, semantic, "execute", |qn| {
    matches!(qn.segments(), ["airflow", "models" | "sdk", .., "BaseOperator"])
})

Note: BaseOperator task-execution-time methods include execute, pre_execute, and post_execute. All three run at task execution time (not DAG parse time).

Operator Argument Scope (template_fields)

Airflow operators declare a template_fields class attribute that determines which keyword arguments receive Jinja template rendering at runtime. This varies per operator (e.g., BashOperator templates bash_command and env; PythonOperator templates op_args, op_kwargs, templates_dict). When writing rules that inspect operator string arguments for template patterns, check all arguments (both positional and keyword) rather than restricting to known names — the pattern matching is typically specific enough to avoid false positives, and restricting to known names would miss custom operators and providers. Add a code comment explaining the rationale (see AIR201 for an example).

Transitive Inheritance Checks

When checking if a class inherits from a base class, use any_qualified_base_class from ruff_python_semantic::analyze::class instead of directly iterating class_def.bases(). This handles transitive inheritance (e.g., class A(BaseOperator) → class B(A) → class C(B)):

use ruff_python_semantic::analyze::class::any_qualified_base_class;

any_qualified_base_class(class_def, semantic, &|qn| {
    matches!(qn.segments(), ["airflow", "models" | "sdk", .., "BaseOperator"])
})

Resolving Named Annotations to Class Definitions

When an annotation is a Name (e.g., def f() -> MyTD:) and the rule needs to inspect the referenced class — e.g., to check whether MyTD is a TypedDict subclass, an Enum, a Pydantic model, etc. — resolve the name to its binding, confirm it's a ClassDefinition, fetch the ClassDef statement, and then run any_qualified_base_class on it:

use ruff_python_ast::Stmt;
use ruff_python_semantic::{BindingKind, analyze};

fn annotation_is_typed_dict_subclass(annotation: &Expr, semantic: &SemanticModel) -> bool {
    let Expr::Name(name) = annotation else { return false };

    // `resolve_name` consults the `resolved_names` map populated by the
    // visitor. With `from __future__ import annotations`, names *inside*
    // annotations are not added to that map (they are treated as forward
    // references and deferred). `lookup_symbol` does a live scope-based
    // lookup that works regardless of deferred-annotation context — use it
    // as a fallback so the rule behaves consistently with and without the
    // `__future__` import.
    let Some(binding_id) = semantic
        .resolve_name(name)
        .or_else(|| semantic.lookup_symbol(&name.id))
    else { return false };

    let binding = semantic.binding(binding_id);
    if !matches!(binding.kind, BindingKind::ClassDefinition(_)) {
        return false;
    }
    let Some(Stmt::ClassDef(class_def)) = binding.statement(semantic) else {
        return false;
    };
    analyze::class::any_qualified_base_class(class_def, semantic, |qualified_name| {
        semantic.match_typing_qualified_name(&qualified_name, "TypedDict")
    })
}

This pattern catches the in-module case (the class is defined in the same file). Cross-module imports of subclasses would require re-export following and are usually out of scope for a linter — call that out explicitly in ## Fix safety rather than trying to handle it.

See AIR202 (task_implicit_multiple_outputs.rs) for a working example.

Detecting DAG Files

To check if a file is a Dag definition file, check for imports of DAG or dag from airflow. This is simpler and more reliable than checking for actual DAG() calls or @dag decorators, since types must be imported before use:

fn is_dag_file(semantic: &SemanticModel) -> bool {
    semantic.global_scope().binding_ids().any(|binding_id| {
        semantic
            .binding(binding_id)
            .as_any_import()
            .is_some_and(|import| {
                matches!(
                    import.qualified_name().segments(),
                    ["airflow", .., "DAG" | "dag"]
                )
            })
    })
}

Scope-Based Context Detection

When determining whether code is at module level vs inside a function (e.g., to vary diagnostic messages), prefer semantic.current_scope().kind over semantic.current_statements().any(...). The scope approach correctly handles nested classes inside functions:

let in_function = matches!(
    checker.semantic().current_scope().kind,
    ScopeKind::Function(_) | ScopeKind::Lambda(_)
);

current_scope() returns the innermost scope, so Variable.get() inside a class body nested in a function returns ScopeKind::Class (not function).

Airflow 2-to-3 Import Path Mapping

Rules that target both Airflow 2 and 3 must handle both old (deprecated) and new (airflow.sdk) import paths. Match against both in qualified_name.segments().

Old Import Path (Deprecated) New Import Path (airflow.sdk)
airflow.decorators.dag airflow.sdk.dag
airflow.decorators.task airflow.sdk.task
airflow.decorators.task_group airflow.sdk.task_group
airflow.decorators.setup airflow.sdk.setup
airflow.decorators.teardown airflow.sdk.teardown
airflow.models.dag.DAG airflow.sdk.DAG
airflow.models.baseoperator.BaseOperator airflow.sdk.BaseOperator
airflow.models.param.Param airflow.sdk.Param
airflow.models.param.ParamsDict airflow.sdk.ParamsDict
airflow.models.baseoperatorlink.BaseOperatorLink airflow.sdk.BaseOperatorLink
airflow.sensors.base.BaseSensorOperator airflow.sdk.BaseSensorOperator
airflow.hooks.base.BaseHook airflow.sdk.BaseHook
airflow.notifications.basenotifier.BaseNotifier airflow.sdk.BaseNotifier
airflow.utils.task_group.TaskGroup airflow.sdk.TaskGroup
airflow.utils.context.Context airflow.sdk.Context
airflow.datasets.Dataset airflow.sdk.Asset
airflow.datasets.DatasetAlias airflow.sdk.AssetAlias
airflow.datasets.DatasetAll airflow.sdk.AssetAll
airflow.datasets.DatasetAny airflow.sdk.AssetAny
airflow.models.connection.Connection airflow.sdk.Connection
airflow.models.variable.Variable airflow.sdk.Variable
airflow.io.* airflow.sdk.io.*

Example pattern for matching both paths:

match qualified_name.segments() {
    // Match both old and new import paths
    ["airflow", "decorators", "task"] | ["airflow", "sdk", "task"] => { /* ... */ }
    ["airflow", "models", "dag", "DAG"] | ["airflow", "sdk", "DAG"] => { /* ... */ }
    _ => return,
}

Common Patterns

Checking Decorators

Important: Before writing a new decorator-checking function, check if one already exists in helpers.rs (e.g., is_airflow_task in AIR301). If a decorator check is needed by multiple rules, extract it to helpers.rs as a shared utility.

Use map_callable to handle both @decorator and @decorator() forms:

use ruff_python_ast::helpers::map_callable;

fn has_some_decorator(function_def: &StmtFunctionDef, checker: &Checker) -> bool {
    function_def.decorator_list.iter().any(|decorator| {
        let expr = map_callable(&decorator.expression);
        checker
            .semantic()
            .resolve_qualified_name(expr)
            .is_some_and(|qn| matches!(qn.segments(), ["airflow", "decorators", "some_name"]))
    })
}

For rules targeting both Airflow 2 and 3, match both old and new decorator paths:

fn is_airflow_task(function_def: &StmtFunctionDef, semantic: &SemanticModel) -> bool {
    function_def.decorator_list.iter().any(|decorator| {
        semantic
            .resolve_qualified_name(map_callable(&decorator.expression))
            .is_some_and(|qn| matches!(qn.segments(),
                ["airflow", "decorators", "task"] | ["airflow", "sdk", "task"]
            ))
    })
}

For attribute-style decorators like @task.branch, check the Expr::Attribute and resolve the value part:

let expr = map_callable(&decorator.expression);
if let Expr::Attribute(ast::ExprAttribute { value, attr, .. }) = expr {
    if attr.as_str() == "branch" {
        checker.semantic().resolve_qualified_name(value)
            .is_some_and(|qn| matches!(qn.segments(), ["airflow", "decorators", "task"]))
    }
}

Also see ruff_python_semantic::analyze::visibility for generic decorator utilities: is_staticmethod, is_classmethod, is_overload, is_abstract, is_property, etc.

Dual-Dispatch: Targeting Both TaskFlow and Operator APIs

When a rule must detect the same anti-pattern in both @task.<variant> decorated functions and operator callables (e.g., BranchPythonOperator(python_callable=func)), use this architecture:

  1. Shared analysis helper — extract the core logic into a private function that operates on a function body:
fn could_be_short_circuit(body: &[Stmt]) -> bool { /* ... */ }
  1. Violation enum — add a Kind enum to produce context-specific messages:
pub(crate) struct MyRule { kind: MyKind }

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum MyKind { Decorator, Operator }
  1. Statement-level entry point (for @task.<variant> decorators) — dispatched from statement.rs on StmtFunctionDef:
pub(crate) fn my_rule_decorator(checker: &Checker, function_def: &StmtFunctionDef) {
    // Check decorator, then call shared helper on function_def.body
}
  1. Expression-level entry point (for operator calls) — dispatched from expression.rs on ExprCall. Resolve the python_callable argument to the function definition using the semantic model:
pub(crate) fn my_rule_operator(checker: &Checker, call: &ExprCall) {
    // 1. Resolve call.func to qualified name, match operator paths
    // 2. Extract python_callable keyword argument
    // 3. Resolve name to function definition:
    let Expr::Name(name_expr) = &keyword.value else { return; };
    let Some(binding_id) = semantic.only_binding(name_expr) else { return; };
    let BindingKind::FunctionDefinition(scope_id) = semantic.binding(binding_id).kind else { return; };
    let ScopeKind::Function(function_def) = semantic.scopes[scope_id].kind else { return; };
    // 4. Call shared helper on function_def.body
}

Operator import paths to match (example for BranchPythonOperator):

  • ["airflow", "operators", "python", "BranchPythonOperator"]
  • ["airflow", "operators", "python_operator", "BranchPythonOperator"] (legacy)
  • ["airflow", "providers", "standard", "operators", "python", "BranchPythonOperator"]

See AIR003 (task_branch_as_short_circuit.rs) for a complete working example.

Checking Function Calls and Arguments

let Expr::Call(ast::ExprCall { func, arguments, .. }) = expr else { return; };

// Resolve the function being called:
checker
    .semantic()
    .resolve_qualified_name(func)
    .is_some_and(|qn| matches!(qn.segments(), ["airflow", .., "SomeClass"]))

// Check keyword arguments:
if let Some(keyword) = arguments.find_keyword("some_arg") {
    // keyword.value is the argument value expression
}

Matching Airflow Operators (builtin + providers)

match qualified_name.segments() {
    // Builtin operators
    ["airflow", "operators", ..] => true,
    // Provider operators (operators must appear somewhere in the middle)
    ["airflow", "providers", rest @ ..] => {
        rest.iter().position(|&s| s == "operators")
            .is_some_and(|pos| pos + 1 < rest.len())
    }
    _ => false,
}

Collecting Return Statements (recursive)

Use ReturnStatementVisitor to find all returns including those in nested blocks:

use ruff_python_ast::helpers::ReturnStatementVisitor;
use ruff_python_ast::visitor::Visitor;

let mut visitor = ReturnStatementVisitor::default();
for stmt in &function_def.body {
    visitor.visit_stmt(stmt);
}
let returns = &visitor.returns;

Expression Type Dispatching (migration rules)

Migration rules typically dispatch on expression type:

match expr {
    Expr::Call(ast::ExprCall { func, arguments, .. }) => {
        check_call_arguments(checker, func, arguments);
    }
    Expr::Attribute(ast::ExprAttribute { attr, .. }) => {
        check_name(checker, expr, attr.range());
    }
    Expr::Name(_) => {
        check_name(checker, expr, expr.range());
    }
    Expr::Subscript(ast::ExprSubscript { value, slice, .. }) => {
        check_subscript_access(checker, value, slice);
    }
    _ => {}
}

Violations with Dynamic Fields

For migration rules, use struct fields to parameterize messages:

pub(crate) struct MyRule {
    deprecated: String,
    replacement: String,
}

impl Violation for MyRule {
    const FIX_AVAILABILITY: FixAvailability = FixAvailability::Sometimes;

    #[derive_message_formats]
    fn message(&self) -> String {
        let MyRule { deprecated, replacement } = self;
        format!("`{deprecated}` is removed; use `{replacement}`")
    }

    fn fix_title(&self) -> Option<String> {
        let MyRule { replacement, .. } = self;
        Some(format!("Use `{replacement}`"))
    }
}

Function ordering

When adding new helper functions or methods, place the higher-level predicates or public-facing helpers above the lower-level utilities they call. That way reviewers see the intent/entry point first, followed by the supporting helpers (e.g., in_airflow_task_function before is_airflow_task).

Mixed-content string scanning

When a rule must detect a pattern within a string argument — not just recognize the string's overall structure — use a raw-source scanner rather than operating on to_str(). This handles all quote styles (single, double, triple) uniformly and produces accurate sub-range TextRange values for in-place fixes.

When to use: the pattern appears inside a larger Jinja template or shell command string that has surrounding content (e.g., "echo {{ ti.xcom_pull('task') }}"). The fix should replace only the detected sub-expression, not the entire string argument.

Pattern (from AIR201):

use ruff_text_size::TextSize;

struct MyMatch {
    start: TextSize, // absolute file position of match start
    end: TextSize,   // absolute file position just past match end
    // ... extracted fields ...
}

fn scan_my_patterns(source: &str, literal_start: TextSize) -> Vec<MyMatch> {
    let mut matches = Vec::new();
    let mut pos = 0;

    while pos < source.len() {
        let remaining = &source[pos..];
        let mut cursor = Cursor::new(remaining);

        // Try to match at this position; on mismatch, advance by one byte.
        let Some(token) = parse_identifier(&mut cursor) else {
            pos += 1;
            continue;
        };
        if token != "expected_receiver" {
            pos += token.len(); // skip entire identifier to avoid suffix re-matches
            continue;
        }

        // ... parse rest of pattern using eat_char, parse_identifier, etc. ...

        let consumed = remaining.len() - cursor.as_str().len();

        if let (Ok(s), Ok(e)) = (u32::try_from(pos), u32::try_from(pos + consumed)) {
            matches.push(MyMatch {
                start: literal_start + TextSize::new(s),
                end: literal_start + TextSize::new(e),
                // ...
            });
            pos += consumed;
            continue;
        }
        pos += 1;
    }
    matches
}

In the rule function:

// Pure-template path first (entire string is the expression):
if let Some(result) = parse_pure_template(string_value) {
    // whole-argument replacement fix
    continue;
}

// Mixed-content path: pattern appears within larger string
let raw_source = checker.locator().slice(string_literal.range());
for m in scan_my_patterns(raw_source, string_literal.start()) {
    let mut diagnostic = checker.report_diagnostic(MyViolation { .. }, arg_value.range());
    if in_scope {
        diagnostic.set_fix(Fix::unsafe_edit(Edit::range_replacement(
            "replacement_text".to_string(),
            TextRange::new(m.start, m.end),
        )));
    }
}

Key invariants:

  • checker.locator().slice(string_literal.range()) returns the raw source starting at string_literal.start(), so adding a within-source byte offset to string_literal.start() gives the correct absolute TextRange.
  • The scanner is quote-agnostic: the target pattern cannot appear in ", ', or """ quote characters.
  • Advancing pos += token.len() (not just pos += 1) when a non-matching identifier is found prevents false positives from identifier suffixes (e.g., multi_ti matching as ti).
  • Store TextSize in the match struct (not usize) and use u32::try_from to convert — avoids as u32 cast truncation lint.
  • Multiple matches in one string each get their own Fix with IsolationLevel::NonOverlapping (the default), which ruff applies in a single pass when ranges do not overlap.

See crates/ruff_linter/src/rules/airflow/rules/xcom_pull_in_template_string.rs (scan_xcom_pull_patterns) for a complete working example.

Review-Informed Pitfalls

Bugs and footguns surfaced in past PR reviews. Check this list whenever you write code that resolves names or chains fallback predicates.

Chained return matches!(...) swallows the false branch

When a predicate has multiple fallback strategies (e.g., "is this a known qualified name? if not, is it a TypedDict subclass?"), an early return matches!(qn.segments(), ...) exits the function on false as well as on true, skipping the rest:

// BUG — returns false instead of falling through to the next check.
if let Some(qn) = semantic.resolve_qualified_name(head) {
    return matches!(qn.segments(), ["collections", "abc", "Mapping" | ..]);
}
annotation_is_typed_dict_subclass(head, semantic)  // never reached when qn resolves

Fix: make the early return conditional on the match being true, so the function falls through on a non-match:

if let Some(qn) = semantic.resolve_qualified_name(head)
    && matches!(qn.segments(), ["collections", "abc", "Mapping" | ..])
{
    return true;
}
annotation_is_typed_dict_subclass(head, semantic)

resolve_name returns None inside deferred annotations

When the file has from __future__ import annotations (PEP 563), names inside return-type / parameter annotations are forward references and do not appear in the resolved_names map. semantic.resolve_name(&name) returns None. Always pair it with semantic.lookup_symbol(&name.id) as a fallback when resolving annotation-position names — see "Resolving Named Annotations to Class Definitions" in helpers.md for the canonical pattern.

This is easy to miss: a hand-rolled fixture without the __future__ import will pass, but the standard airflow fixtures use it and silently exhibit the bug.

Field naming: semantics, not derivation

A violation field used to drive fix-title selection should be named for what it captures, not how it was derived. A reader of the struct should be able to tell what condition the bool represents without reading the rule body.

// Worse: reader has to find the call site to learn what `inferred` means.
pub(crate) struct AirflowTaskImplicitMultipleOutputs { inferred: bool }

// Better: the field name is the assertion.
pub(crate) struct AirflowTaskImplicitMultipleOutputs { annotation_is_mapping: bool }

Eager let before short-circuit &&

If one of two predicates is much more expensive than the other (e.g., a function-body traversal versus a name lookup), don't bind both to let before combining them — that forces the expensive one to run even when the cheap one would have short-circuited. Inline the expensive call into the && so it only runs when necessary:

// Worse: body traversal runs for every flagged decorator, even when the
// annotation already determined the answer.
let annotation_is_mapping = ...;
let body_returns_dict = body_has_dict_return(function_def, semantic);
if !annotation_is_mapping && !body_returns_dict { return; }

// Better: short-circuits when the annotation suffices.
let annotation_is_mapping = ...;
if !annotation_is_mapping && !body_has_dict_return(function_def, semantic) {
    return;
}

Single-use const &[&str] allowlists

If a const &[&str] is only consulted at one call site, inline it as a matches! arm at the use site. Named constants pay for themselves only when they have multiple callers or independent documentation value; a single-caller constant just adds a layer of indirection.

Rule File Templates

Choose the template below based on rule category. Place the resulting file at crates/ruff_linter/src/rules/airflow/rules/<snake_case_name>.rs.

Template: General Best-Practice Rule (AIR001-099)

use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::{self as ast};
use ruff_python_semantic::Modules;
use ruff_text_size::Ranged;

use crate::Violation;
use crate::checkers::ast::Checker;

/// ## What it does
/// <One-line description of what the rule checks for.>
///
/// ## Why is this bad?
/// <Explanation of why the flagged pattern is problematic.>
///
/// ## Example
/// ```python
/// <Python code that triggers the violation>
/// ```
///
/// Use instead:
/// ```python
/// <Corrected Python code>
/// ```
#[derive(ViolationMetadata)]
#[violation_metadata(preview_since = "NEXT_RUFF_VERSION")]
pub(crate) struct MyRuleName;

impl Violation for MyRuleName {
    #[derive_message_formats]
    fn message(&self) -> String {
        "<Diagnostic message shown to the user>".to_string()
    }
}

/// AIRxxx
pub(crate) fn my_rule_name(checker: &Checker, /* appropriate AST node */) {
    if !checker.semantic().seen_module(Modules::AIRFLOW) {
        return;
    }

    // Rule logic here...

    checker.report_diagnostic(MyRuleName, node.range());
}

Template: Airflow 3 Migration Rule (AIR3xx)

For rules that flag removed/moved/renamed symbols, use the existing infrastructure in helpers.rs:

use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::{self as ast, Expr};
use ruff_python_semantic::Modules;
use ruff_text_size::Ranged;

use crate::{FixAvailability, Violation};
use crate::checkers::ast::Checker;
use crate::rules::airflow::helpers::{Replacement, is_guarded_by_try_except};

/// ## What it does
/// Checks for uses of deprecated Airflow symbols removed in Airflow X.Y.
///
/// ## Why is this bad?
/// These symbols were removed/moved in Airflow X.Y and will cause runtime errors.
///
/// ## Example / Use instead blocks...
#[derive(ViolationMetadata)]
#[violation_metadata(preview_since = "NEXT_RUFF_VERSION")]
pub(crate) struct MyMigrationRule {
    deprecated: String,
    replacement: String,
}

impl Violation for MyMigrationRule {
    const FIX_AVAILABILITY: FixAvailability = FixAvailability::Sometimes;

    #[derive_message_formats]
    fn message(&self) -> String {
        let MyMigrationRule { deprecated, replacement } = self;
        format!("`{deprecated}` is removed in Airflow X.Y; use `{replacement}` instead")
    }

    fn fix_title(&self) -> Option<String> {
        let MyMigrationRule { replacement, .. } = self;
        Some(format!("Use `{replacement}`"))
    }
}

pub(crate) fn my_migration_rule(checker: &Checker, expr: &Expr) {
    if !checker.semantic().seen_module(Modules::AIRFLOW) {
        return;
    }

    // Dispatch based on expression type:
    match expr {
        Expr::Attribute(ast::ExprAttribute { attr, .. }) => {
            check_name(checker, expr, attr.range());
        }
        Expr::Name(_) => {
            check_name(checker, expr, expr.range());
        }
        _ => {}
    }
}

fn check_name(checker: &Checker, expr: &Expr, ranged: TextRange) {
    let Some(qualified_name) = checker.semantic().resolve_qualified_name(expr) else {
        return;
    };

    let (replacement, module, name) = match qualified_name.segments() {
        ["airflow", "old_module", "OldName"] => (
            Replacement::Rename { module: "airflow.new_module", name: "NewName" },
            "airflow.old_module",
            "OldName",
        ),
        _ => return,
    };

    // Skip if guarded by try-except (conditional import):
    if is_guarded_by_try_except(expr, module, name, checker.semantic()) {
        return;
    }

    let mut diagnostic = checker.report_diagnostic(
        MyMigrationRule {
            deprecated: name.to_string(),
            replacement: replacement.to_string(),
        },
        ranged,
    );

    // Optionally generate a fix:
    // if let Some(fix) = generate_import_edit(...) {
    //     diagnostic.set_fix(fix);
    // }
}
name new-airflow-rule
description This skill should be used when the user asks to "add a new airflow rule", "create an airflow lint rule", "implement an airflow inspection", "new AIR rule", or discusses creating Ruff linter rules in the airflow category.
license Apache-2.0

Creating a New Airflow Lint Rule in Ruff

This skill guides creating new Airflow-specific lint rules (AIR prefix) in the Ruff codebase. Detail content lives in references/ files — load them as needed instead of carrying them in context up front.

Reference Index

Read the relevant file when you reach a step that needs it:

  • references/import-paths.md — Airflow 2-to-3 import path mapping table (use when the rule must match both old and new SDK paths).
  • references/templates.md — Full rule-file skeletons for general best-practice (AIR001-099) and Airflow 3 migration (AIR3xx) rules.
  • references/helpers.mdhelpers.rs utilities: Replacement enums, try-except guarding, fix generation, class/module identification, template_fields scope, transitive inheritance, resolving named annotations to class defs, Dag-file detection, scope-based context detection.
  • references/patterns.md — Common code patterns: decorator checks, dual-dispatch (TaskFlow + operator API), call/argument inspection, operator matching, recursive return collection, expression-type dispatch, parameterized violations, function ordering, mixed-content string scanning.
  • references/documentation.md — Rule docstring structure, terminology (Dag/DAG/@dag), style, message vs. fix-title conventions.
  • references/pitfalls.md — Review-informed bugs to avoid: chained return matches!(), resolve_name under from __future__ import annotations, field naming, eager let before &&, single-use allowlists. Skim this before writing predicates that resolve names or chain fallback checks.

Pre-Implementation: Gather Context

Before writing any code, ask the user the following questions (if not already answered in their request):

  1. Airflow version target:

    Which version of Airflow does this rule target?

    • Airflow 2 only
    • Airflow 3 onward
    • Both Airflow 2 and 3
  2. DAG API targeting (for general best-practice rules AIR001-099 only):

    Airflow DAGs can be written using the TaskFlow API (decorators like @task.branch) or the standard operator API (BranchPythonOperator). Both are equally supported. Should this rule target:

    • TaskFlow API only (decorator-based)
    • Operator API only (operator-based)
    • Both (recommended — the rule should be DAG implementation-agnostic)

    If both: see the dual-dispatch pattern in references/patterns.md.

  3. Local Airflow repository for validation:

    Do you have a local clone of the Airflow repository? If so, provide the path (e.g., ~/repositories/airflow). If available, use it after implementation to validate against real-world code (see Step 12 below).

Code prefix validation: If a rule targets Airflow 3 (onward), its code MUST start with AIR3## (e.g., AIR301, AIR302, AIR311). If the user specified a code that doesn't follow this pattern, WARN them before proceeding.

Important distinction for AIR3xx rules: AIR3xx rules are migration rules that flag old-style (Airflow 2) imports/patterns to help migrate to Airflow 3. They should only match deprecated import paths — NOT the new airflow.sdk paths (which are the correct replacements). However, shared helpers that check context (e.g., "is this function decorated with @task?") should match both old and new paths, since deprecated patterns inside a task function need to be flagged regardless of which import style the decorator uses.

Rule Numbering Scheme

Range Category Description
AIR001-099 General best-practice Style, readability, common mistakes (not version-specific)
AIR301 Removed in 3.0 Symbols/args fully removed in Airflow 3.0 with no compat layer
AIR302 Moved to provider in 3.0 Symbols moved to external provider packages (required migration)
AIR303 Signature change in 3.0 Function/method signatures changed (args renamed, reordered, etc.)
AIR311 Suggested update for 3.0 Deprecated with compat layer — still works but will break later
AIR312 Suggested provider move for 3.0 Deprecated compat layer for provider migrations
AIR321 Moved in 3.1 Symbols moved/deprecated in Airflow 3.1

Checklist

Follow these steps in order. Each step is mandatory.

0. Search for Reusable Utilities

Before writing rule logic, search for existing utilities that can be reused:

  • ruff_python_semantic: Check SemanticModel methods (e.g., resolve_qualified_name, match_builtin_expr, match_typing_expr) and analyze/visibility.rs for decorator-checking patterns.
  • ruff_python_ast: Check helpers.rs for AST traversal utilities (e.g., map_callable, ReturnStatementVisitor).
  • ruff_python_trivia::Cursor: For rules that need ad-hoc parsing of string content (e.g., Jinja templates, SQL fragments), use Cursor instead of chaining strip_prefix/strip_suffix/find (see AIR201 for an example).
  • crate::rules::airflow::helpers: Check for existing airflow-specific helpers — full inventory in references/helpers.md.
  • Existing airflow rules: Check rules like AIR301 (removal_in_3.rs) for patterns that may already exist or could be extracted.

If a pattern would be useful in multiple rules, extract it into helpers.rs as a shared utility rather than duplicating code.

1. Choose a Rule Code and Name

  • Check existing codes in crates/ruff_linter/src/codes.rs under the // airflow section.
  • Pick the next available code in the appropriate range.
  • Name the struct following the convention: the name should make sense as "allow ${name}". For example, TaskBranchAsShortCircuit reads as "allow task branch as short circuit".
  • Do NOT use prefixes like Disallow or Banned.

2. Create the Rule File

Create crates/ruff_linter/src/rules/airflow/rules/<snake_case_name>.rs.

Use the appropriate skeleton from references/templates.md (general best-practice or migration rule). For Airflow-2-and-3 rules, also consult references/import-paths.md for the dual-path matching pattern.

3. Register the Rule Code

In crates/ruff_linter/src/codes.rs, add an entry under the // airflow section:

(Airflow, "xxx") => rules::airflow::rules::MyRuleName,

Keep the entries sorted by code number.

4. Export the Module

In crates/ruff_linter/src/rules/airflow/rules/mod.rs:

  • Add pub(crate) use my_rule_name::*; in the use-declarations block (alphabetical order).
  • Add mod my_rule_name; in the mod-declarations block (alphabetical order).

Do NOT add a duplicate module declaration in crates/ruff_linter/src/rules/airflow/mod.rs — only rules/mod.rs needs it.

5. Add the Rule Dispatch

In crates/ruff_linter/src/checkers/ast/analyze/:

  • For statement-based rules (function defs, assignments, class defs): add to statement.rs
  • For expression-based rules (function calls, attribute access): add to expression.rs
if checker.is_rule_enabled(Rule::MyRuleName) {
    airflow::rules::my_rule_name(checker, node);
}

Place the dispatch near other airflow rule dispatches for consistency.

6. Create the Test Fixture

Create crates/ruff_linter/resources/test/fixtures/airflow/AIRxxx.py.

The fixture must include:

  • Cases that SHOULD trigger the rule (with # AIRxxx comments)
  • Cases that should NOT trigger the rule (edge cases, similar but valid patterns)
  • For migration rules: cases guarded by try-except (should NOT trigger)
  • If the rule resolves names inside annotations: include at least one fixture with from __future__ import annotations to catch the deferred-annotation bug — see references/pitfalls.md.

7. Add the Test Case

In crates/ruff_linter/src/rules/airflow/mod.rs, add a #[test_case] line:

#[test_case(Rule::MyRuleName, Path::new("AIRxxx.py"))]

Keep test cases sorted by rule code.

8. Format Code

Always run cargo fmt before testing or committing. Rust formatting issues will cause CI failures.

cargo fmt -p ruff_linter

9. Run Tests and Accept Snapshots

# Verify output manually first:
cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/airflow/AIRxxx.py --no-cache --preview --select AIRxxx

# Run the test (will fail first time, generating a snapshot):
RUFF_UPDATE_SCHEMA=1 cargo nextest run -p ruff_linter -- "airflow::tests"

# Accept the snapshot:
cargo insta accept

# Verify the test passes:
RUFF_UPDATE_SCHEMA=1 cargo nextest run -p ruff_linter -- "airflow::tests"

10. Regenerate Docs and Schemas

cargo dev generate-all

11. Run All Checks

cargo clippy -p ruff_linter --all-targets --all-features -- -D warnings
uvx prek run -a

12. Validate Against Airflow (if local repo available)

If the user provided a path to a local Airflow repository during pre-implementation, run the new rule against it:

cargo run -p ruff -- check <airflow_repo_path> --no-cache --preview --select AIRxxx

Review the output:

  • True positives: Violations that correctly flag the anti-pattern. Report the count and sample locations.
  • False positives: Violations that flag code that is actually correct. If found:
    1. Identify the pattern causing the false positive.
    2. Update the rule logic to exclude it (e.g., add a guard clause).
    3. Add the false-positive pattern as a non-violation test case in the fixture.
    4. Re-run steps 8–11 to update snapshots and verify.

Report findings to the user before finalizing.

Key Conventions

  • Use checker.report_diagnostic(ViolationStruct, range) — NOT Diagnostic::new().
  • Add #[violation_metadata(preview_since = "NEXT_RUFF_VERSION")] for new rules.
  • Always guard with checker.semantic().seen_module(Modules::AIRFLOW).
  • For migration rules, use is_guarded_by_try_except to avoid false positives on conditional imports.
  • For rules targeting both Airflow 2 and 3, match both old and new (airflow.sdk) import paths (see references/import-paths.md).
  • Reuse over duplication: before writing a utility function, search ruff_python_semantic, ruff_python_ast, and airflow/helpers.rs for existing implementations. If a pattern is useful in multiple rules, add it to helpers.rs rather than keeping it local.
  • Follow early-return style (guard clauses) rather than deeply nested if-let chains.
  • Prefer let chains (if let combined with &&) over nested if let when possible.
  • Avoid panic!, unreachable!, or .unwrap().
  • Use #[expect()] over #[allow()] for suppressing clippy lints.
  • For internal (non-public) functions, implementation notes should be /// doc comments, not // comments.
  • Short-circuit expensive checks: when a violation requires either of two predicates and one of them traverses the function body, order the cheap one first in && so the expensive one only runs when needed. Avoid eager let-bindings of expensive traversals.
  • Inline single-use allowlists: if a const &[&str] is only consulted at one call site, inline it as a matches! arm at the use site.
  • Name violation fields by their semantics, not their derivation: a field used to decide between two fix titles is annotation_is_mapping, not inferred.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment