Skip to content

Instantly share code, notes, and snippets.

@nihalpasham
Last active October 8, 2025 13:58
Show Gist options
  • Save nihalpasham/103131f89407c4ba80188ef4f89c5403 to your computer and use it in GitHub Desktop.
Save nihalpasham/103131f89407c4ba80188ef4f89c5403 to your computer and use it in GitHub Desktop.
stable-mir's goal is to provide a stable interface to the Rust compiler, enabling tool developers to perform advanced analyses with lower maintenance costs, without slowing compiler development.

stable-mir

For a video version of this — https://www.youtube.com/watch?v=lfi2pCOaGGk&t=927s

Disclaimer

image 2

MIR - Rust’s mid-level IR

IMG_0190 2

  • Simplified, control-flow-oriented representation.
  • Closer to machine code than HIR.
    • this is where—borrow checking, optimizations such as ConstProp, CopyProp, dse, and monomorphization happens
  • Right now — MIR is lowered to LLVM IR
    • or CLIF IR if we’re using the cranelift backend
  • But what if we could intercept MIR and do cool stuff with it
    • like advanced analyses — formal verification (kani team at AWS driving this)
    • support new hardware with different program execution models— i.e. write regular Rust that runs on accelerators (TPU, GPU etc.)

stable-mir

  • That’s where stable-mir comes in
  • MIR is rustc’s internal IR i.e. not meant to be stable and can (more like will) undergo changes between compiler versions.

“The goal of the Stable MIR project is to provide a stable interface to the Rust compiler that allow tool developers to develop sophisticated analysis with a reduced maintenance cost without compromising the compiler development speed.”

Stable MIR Design

stable-mir design

image

Added two crates to the Rust compiler,

  • stable_mir has been renamed to rustc_public

  • rustc_smir has been renamed to rustc_public_bridge

  • rustc_public is the user facing public API. There’s a proposal to have two of these

    • One is to be published on crates.io. This will be the base of any minor update. This crate will compatible with multiple versions of the compiler. We will use conditional compilation based on the compiler version to do that.
    • The other will be developed as part of rustc which will be kept up-to-date with the compiler, and it will serve as the basis for the next major release of rustc_public. This rustc_public has no compatibility or stability guarantees.
  • rustc_public_bridge —developed as part of the rustc library will interface with rustc’s internal APIs. Implements the interface between public APIs and the compiler internal APIs

rustc_public impl

1. Driver Integration via Macros

#[macro_export]
macro_rules! run {
    ($args:expr, $callback_fn:ident) => {
        $crate::run_driver!($args, || $callback_fn())
    };
}

The run! macro creates a Callbacks implementation that hooks into rustc's compilation pipeline at the after_analysis phase - after MIR generation but before codegen.

cd demo && cargo expand main 2>&1

The expansion shows that run!(&rustc_args, start_demo) expands to the following:

  1. Defines a RustcPublic struct - Holds the callback and result
  2. Implements Callbacks trait - Hooks into rustc's compilation pipeline via after_analysis
  3. Calls run_compiler - Invokes rustc with the provided arguments
  4. Executes your callback - Runs start_demo() after analysis is complete
  5. Returns the result - Wrapped in Result<C, CompilerError<B>>

Macro instantiates the struct and and runs the driver at the end:

RustcPublic::new(|| start_demo()).run(&rustc_args)

This creates the driver, passes the callback, and runs the compiler with the arguments.

2. Thread-Local Context Management

scoped_tls::scoped_thread_local!(static TLV: Cell<*const ()>);

pub(crate) fn run<F, T>(interface: &dyn CompilerInterface, f: F) -> Result<T, Error>
where
    F: FnOnce() -> T,
{
    if TLV.is_set() {
        Err(Error::from("rustc_public already running"))
    } else {
        let ptr: *const () = (&raw const interface) as _;
        TLV.set(&Cell::new(ptr), || Ok(f()))
    }
}

Uses thread-local storage to maintain compiler context during analysis, preventing nested invocations.

3. Stable/Unstable Translation Bridge

pub fn run<F, T>(tcx: TyCtxt<'_>, f: F) -> Result<T, Error>
where
    F: FnOnce() -> T,
{
    let compiler_cx = RefCell::new(CompilerCtxt::new(tcx));
    let container = Container { tables: RefCell::new(Tables::default()), cx: compiler_cx };

    crate::compiler_interface::run(&container, || init(&container, f))
}

The bridge maintains:

  • Tables: Map between stable IDs and internal rustc representations
  • CompilerCtxt: Wrapper around TyCtxt for safe access to compiler internals

4. Visitor Pattern for MIR Analysis

cd rustc_public
cargo expand mir::visit
//! For every mir item, the trait has a `visit_<item>` and a `super_<item>` method.
//! - `visit_<item>`, by default, calls `super_<item>`
//! - `super_<item>`, by default, destructures the `<item>` and calls `visit_<sub_item>`

Provides a structured way to traverse and analyze MIR, similar to rustc's internal visitors.

Callback Execution Flow

  1. Compilation Phase: rustc compiles the target crate and generates MIR
  2. Hook Activation: after_analysis callback is triggered
  3. Context Setup: Bridge establishes stable/unstable translation tables
  4. User Callback: Your analysis function runs with access to stable APIs
  5. Cleanup: Context is torn down, compilation continues or stops

This design ensures that external tools get a stable, safe interface to rustc's powerful analysis capabilities without directly depending on unstable rustc internals.

Internals:

What does MIR look like

Types in the MIR

Types appear after the colon (:) in variable declarations and expressions:

  • () - unit type (the return type of main)
  • i32 - 32-bit signed integer
  • (i32, bool) - tuple type for overflow checking results
  • &i32 - immutable reference to i32
  • (&i32,) - single-element tuple containing a reference
  • std::fmt::Arguments<'_> - formatting arguments with lifetime
  • [core::fmt::rt::Argument<'_>; 1] - array of 1 element
  • &[&str; 2] - reference to array of 2 string slices
  • &[core::fmt::rt::Argument<'_>; 1] - reference to array

All locals (_0 through _12) have explicit types declared[3].

Operations in the MIR

Operations are the computational actions performed, categorized as Statements and Terminators:

Statements (within basic blocks)
  • Assignments: _2 = 42_i32; - assigns constant to local
  • _3 = CheckedAdd(_2, 1_i32); - CheckedAdd operation that returns (result, overflow_flag) tuple
  • _1 = move (_3.0: i32); - Move operation extracting tuple field
  • _7 = &_1; - Borrow operation creating reference
  • _6 = (move _7); - Aggregate operation constructing tuple
  • _12 = CopyForDeref((_6.0: &i32)); - CopyForDeref operation for tuple field access
  • _8 = [move _9]; - Array aggregate construction
Terminators (end basic blocks with control flow)
  • assert(!move (_3.1: bool), ...) - Assert terminator checking overflow flag with success/unwind branches[5]
  • _9 = core::fmt::rt::Argument::<'_>::new_display::<i32>(_12) -> [return: bb2, unwind unreachable]; - Call terminator with return destination
  • return; - Return terminator ending function execution
Rvalues (right-hand side expressions)
  • Constants: 42_i32, 1_i32
  • Binary operations: CheckedAdd (other examples would include Sub, Mul, etc.)
  • References: &_1
  • Aggregates: tuples (move _7), arrays [move _9]
  • Projections: (_3.0: i32), (_3.1: bool), (_6.0: &i32) - tuple field accesses

Attributes and Metadata

These provide additional context but don't execute operations:

Debug Information
debug x => 42_i32;
debug y => _1;
debug args => _6;
debug args => _8;

These map source-level variable names (x, y, args) to MIR locals or values, enabling debuggers to show meaningful variable names[3][1].

Source Information
  • {alloc4<imm>: &[&str; 2]} - allocation with immutability attribute
  • Type annotations: (_3.0: i32) includes type information for clarity
  • Unwind attributes: unwind unreachable indicates panic is not expected to be caught
Control Flow Annotations
  • [success: bb1, unwind unreachable] - branch targets for assert
  • [return: bb2, unwind unreachable] - call return destinations

Structure Summary

Basic Blocks (bb0 through bb4)

Each basic block is a region containing:

  • Zero or more statements (operations without control flow)
  • Exactly one terminator (control flow operation)
Locals (_0 through _12)

Variable declarations at the top serve as SSA-like values, though MIR technically allows reassignment (more like registers than pure SSA)[3].

The demo example in rustc_public

When the run! macro is called, it triggers a chain of function calls that sets up the Rust compiler, runs analysis, and executes our callback with access to compiler internals.


1. Entry Point: The run! Macro

#[macro_export]
macro_rules! run {
    ($args:expr, $callback_fn:ident) => {
        $crate::run_driver!($args, || $callback_fn())
    };
    ($args:expr, $callback:expr) => {
        $crate::run_driver!($args, $callback)
    };
}

What it does: Simply delegates to run_driver! macro, wrapping function identifiers in closures.


2. The run_driver! Macro - Core Driver Setup

macro_rules! run_driver {
    ($args:expr, $callback:expr $(, $with_tcx:ident)?) => {{
        pub struct RustcPublic<B = (), C = (), F = fn(...) -> ControlFlow<B, C>>
        where
            B: Send,
            C: Send,
            F: FnOnce(...) -> ControlFlow<B, C> + Send,
        {
            callback: Option<F>,
            result: Option<ControlFlow<B, C>>,
        }
        ...
Key Type: RustcPublic<B, C, F>

Type Parameters:

  • B: Break value type (when callback returns ControlFlow::Break(B))
  • C: Continue value type (when callback returns ControlFlow::Continue(C))
  • F: The callback function type

Fields:

  • callback: Option<F> - Stores the user's callback (taken once during execution)
  • result: Option<ControlFlow<B, C>> - Stores the callback's return value

3. RustcPublic::run() Method

pub fn run(&mut self, args: &[String]) -> Result<C, CompilerError<B>> {
    let compiler_result = rustc_driver::catch_fatal_errors(|| -> interface::Result::<()> {
        run_compiler(&args, self);
        Ok(())
    });
    ...
}

What it does:

  • Calls rustc_driver::run_compiler() (from the actual Rust compiler)
  • Passes self (which implements the Callbacks trait)
  • The compiler will call back into after_analysis() at the right time

4. The Callbacks Trait Implementation

impl<B, C, F> Callbacks for RustcPublic<B, C, F>
where
    B: Send,
    C: Send,
    F: FnOnce(...) -> ControlFlow<B, C> + Send,
{
    fn after_analysis<'tcx>(
        &mut self,
        _compiler: &interface::Compiler,
        tcx: TyCtxt<'tcx>,
    ) -> Compilation {
        if let Some(callback) = self.callback.take() {
            rustc_internal::run(tcx, || {
                self.result = Some(callback(...));
            })
            .unwrap();
            ...
        }
    }
}

What it does:

  • This is called by rustc after type checking and analysis but before code generation
  • Receives TyCtxt<'tcx> - the compiler's type context with lifetime 'tcx
  • Calls rustc_internal::run() to set up the bridge

5. rustc_internal::run() - Bridge Setup

pub fn run<F, T>(tcx: TyCtxt<'_>, f: F) -> Result<T, Error>
where
    F: FnOnce() -> T,
{
    let compiler_cx = RefCell::new(CompilerCtxt::new(tcx));
    let container = Container { 
        tables: RefCell::new(Tables::default()), 
        cx: compiler_cx 
    };

    crate::compiler_interface::run(&container, || init(&container, f))
}
Key Types Created Here:

CompilerCtxt<'tcx> (from rustc_public_bridge)

  • Wraps the TyCtxt<'tcx> from rustc
  • Provides methods to query compiler information
  • Lifetime 'tcx ties it to the compiler's type context

Tables<'tcx, B: Bridge> (from rustc_public_bridge)

  • Bidirectional mapping between rustc internal types and stable API types
  • Caches conversions to avoid redundant work
  • Generic over B: Bridge trait

Container<'tcx, B: Bridge> (from rustc_public_bridge)

pub struct Container<'tcx, B: Bridge> {
    pub tables: RefCell<Tables<'tcx, B>>,
    pub cx: RefCell<CompilerCtxt<'tcx, B>>,
}

Why RefCell?

  • Allows interior mutability
  • Multiple parts of code need mutable access to tables/context
  • Checked at runtime (will panic if borrowed incorrectly)

6. Two nested thread-local scopes:

demo/src/main.rs
  main()
    └─► run!(&rustc_args, start_demo)                     [macro expands]
         └─► run_driver!(...)                             [creates RustcPublic callback wrapper]
              └─► rustc_driver::run_compiler()            [rustc compiles & analyzes code]
                   └─► after_analysis(tcx)                [callback hook after analysis]
                        └─► rustc_internal::run(tcx, || callback())
                             │
                             ├─ Creates: Container { tables, compiler_cx }
                             │
                             └─► compiler_interface::run(&container, || init(&container, f))
                                  │                                         │
                                  ├─ OUTER: Sets CompilerInterface TLV      │
                                  │                                         │
                                  └─────────────────────────────────────────┤
                                                                            │
                                                                            ├─ INNER: Sets Container TLV
                                                                            │
                                                                            └─► f() → start_demo()

What Happens at rustc_internal::run

pub fn run<F, T>(tcx: TyCtxt<'_>, f: F) -> Result<T, Error> {
    let compiler_cx = RefCell::new(CompilerCtxt::new(tcx));
    let container = Container { tables: RefCell::new(Tables::default()), cx: compiler_cx };
    
    crate::compiler_interface::run(&container, || init(&container, f))
    //                              ^^^^^^^^^^      ^^^^^^^^^^^^^^^^^^^^
    //                              OUTER SCOPE     INNER SCOPE
}

Two nested thread-local scopes:

  1. OUTER: compiler_interface::run(&container, ...)

    • Sets TLV = pointer to CompilerInterface
    • Enables high-level API queries
  2. INNER: init(&container, f)

    • Sets TLV = pointer to Container (tables + compiler context)
    • Enables translation between stable ↔ internal types
  3. Finally: User Callback Executes

    • start_demo() runs with both thread-locals set
    • Can call rustc_public::local_crate()all_local_items(), etc.
    • These APIs use the thread-locals to access compiler state

The two-layer thread-local setup happens in this single line:

compiler_interface::run(&container, || init(&container, f))
//                                      ^^^^^^^^^^^^^^^^^^^^
//                                      Inner scope wraps user callback

Both scopes need the same &container, but they set different thread-local variables to make different parts of the system work!


7. Accessing the Context: with_container() and with()

rustc_internal::with_container()
pub(crate) fn with_container<R, B: Bridge>(
    f: impl for<'tcx> FnOnce(&mut Tables<'tcx, B>, &CompilerCtxt<'tcx, B>) -> R,
) -> R {
    assert!(TLV.is_set());
    TLV.with(|tlv| {
        let ptr = tlv.get();
        assert!(!ptr.is_null());
        let container = ptr as *const Container<'_, B>;
        let mut tables = unsafe { (*container).tables.borrow_mut() };
        let cx = unsafe { (*container).cx.borrow() };
        f(&mut *tables, &*cx)
    })
}

What it does:

  • Retrieves the Container from thread-local storage
  • Borrows tables mutably and cx immutably
  • Calls the provided closure with both
compiler_interface::with()
pub(crate) fn with<R>(f: impl FnOnce(&dyn CompilerInterface) -> R) -> R {
    assert!(TLV.is_set());
    TLV.with(|tlv| {
        let ptr = tlv.get();
        assert!(!ptr.is_null());
        f(unsafe { *(ptr as *const &dyn CompilerInterface) })
    })
}

What it does:

  • Retrieves the CompilerInterface trait object from thread-local storage
  • Calls the provided closure with it
Summary: Two Different TLVs, Two Different Access Patterns
Function TLV Used What It Accesses When Called
with OUTER (compiler_interface) &dyn CompilerInterface (Container) High-level API calls like local_crate(), all_local_items()
with_container INNER (rustc_internal) Tables + CompilerCtxt Type conversions between stable ↔ internal
The Flow:
start_demo()
  │
  ├─► rustc_public::local_crate()
  │    └─► with(|cx| cx.local_crate())
  │         └─► Accesses OUTER TLV → gets Container
  │              └─► Container::local_crate() → queries CompilerCtxt
  │
  ├─► rustc_public::all_local_items()
  │    └─► with(|cx| cx.all_local_items())
  │         └─► Accesses OUTER TLV → gets Container
  │              └─► Container::all_local_items() → queries CompilerCtxt
  │                   └─► Internally may call .stable() on items
  │                        └─► with_container(|tables, cx| ...)
  │                             └─► Accesses INNER TLV → gets Tables + CompilerCtxt
  │
  └─► rustc_public::entry_fn()
       └─► with(|cx| cx.entry_fn())
            └─► Accesses OUTER TLV → gets Container

Both TLVs point to the same Container, but they're accessed through different scoped thread-local variables to separate concerns between:

  • High-level queries (via with)
  • Type translation (via with_container)

8. The CompilerInterface Trait

pub(crate) trait CompilerInterface {
    fn entry_fn(&self) -> Option<CrateItem>;
    fn all_local_items(&self) -> CrateItems;
    fn mir_body(&self, item: DefId) -> mir::Body;
    fn has_body(&self, item: DefId) -> bool;
    // ... many more methods
}

Implemented by: Container<'tcx, BridgeTys>

What it provides:

  • High-level API for querying compiler information
  • All methods internally use tables and cx to convert between internal and stable types

Complete Call Chain Summary

User Code
  ↓
run!(args, callback)
  ↓
run_driver! macro
  ↓
RustcPublic::new(callback)
  ↓
RustcPublic::run(args)
  ↓
rustc_driver::run_compiler(args, self)  ← Enters rustc
  ↓
[Rustc runs parsing, type checking, analysis...]
  ↓
RustcPublic::after_analysis(tcx)  ← Callback from rustc
  ↓
rustc_internal::run(tcx, || { ... })
  ├─ Creates CompilerCtxt::new(tcx)
  ├─ Creates Container { tables, cx }
  └─ Calls compiler_interface::run(&container, ...)
      ├─ Sets TLV #1 (compiler_interface::TLV) → pointer to Container as CompilerInterface
      └─ Calls rustc_internal::init(&container, ...)
          ├─ Sets TLV #2 (rustc_internal::TLV) → pointer to Container
          └─ Executes user callback
              ├─ User calls stable_mir APIs
              ├─ APIs call compiler_interface::with() → retrieves Container via TLV #1
              ├─ APIs call with_container() → retrieves Container via TLV #2
              └─ Container uses tables + cx to convert types

Key Design Patterns

  1. Double Thread-Local Storage

    • TLV #1 (compiler_interface::TLV): Stores &dyn CompilerInterface
    • TLV #2 (rustc_internal::TLV): Stores &Container<'tcx, B>
    • Both point to the same Container, but provide different access patterns
  2. Interior Mutability with RefCell

    • Container uses RefCell for both tables and cx
    • Allows multiple borrows throughout the call stack
    • Runtime borrow checking prevents conflicts
  3. Lifetime Management

    • 'tcx lifetime ties everything to the compiler's type context
    • Ensures stable API types don't outlive the compiler session
    • Scoped thread locals ensure cleanup
  4. Bridge Pattern

    • Container acts as a bridge between rustc internals and stable API
    • Tables caches conversions
    • CompilerCtxt wraps TyCtxt and provides query methods

Notes on Architecture

  1. Safety: Thread-local storage ensures the compiler context is only accessible during valid compilation
  2. Ergonomics: Users don't need to pass context explicitly everywhere
  3. Flexibility: Two TLVs allow different access patterns (trait object vs concrete type)
  4. Performance: Tables caches conversions to avoid redundant work
  5. Separation: Clear boundary between rustc internals and stable API

This architecture allows rustc_public to provide a stable API while internally working with rustc's unstable internals, all while maintaining safety and ergonomics.

Stable-mir dialect in Pliron

The fundamental structure of stable_mir (now rustc_public) is very similar to unstable MIR, but with key differences focused on stability and API design[1][2]. Things we need to know about stable_mir/rustc_public for creating a dialect in pliron:

Core Structure (Types, Operations, Interfaces)

Stable_mir maintains the same conceptual model as unstable MIR with these key components[2][3]:

Types

  • Body: The IR representation of a single function
  • BasicBlock: Control-flow graph nodes containing statements and terminators
  • Local: Local variables with type information (indexed via Local type alias)
  • Place: Memory locations (variables, fields, derefs) with projections
  • Type system: Full Rust type information (though simplified from HIR)

Operations

  • Statements (StatementKind): Non-control-flow operations like assignments, storage management (StorageLive/StorageDead), and no-ops
  • Terminators (TerminatorKind): Control-flow operations (return, call, switch, goto, drop, etc.)
  • Rvalues: Right-hand side expressions including binary operations (BinOp), unary operations (UnOp), aggregates, casts, and references
  • Operands: Values used in operations (constants, moves, copies)

Additional Elements (Similar to Attributes)

  • ProjectionElem: Field accesses, derefs, array indexing
  • AggregateKind: Tuple, array, ADT construction
  • CastKind: Type conversions
  • BorrowKind, Mutability: Ownership and mutability annotations
  • SourceInfo: Debug and span information
  • VarDebugInfo: Variable debugging metadata

Key Differences from Unstable MIR

Stability Guarantees

The main difference is that stable_mir/rustc_public aims to provide semantic versioning and a stable API surface[1][4][5]. The internal rustc MIR can change arbitrarily between compiler versions, while stable_mir will maintain backward compatibility.

API Design

  • Context management: The TyCtxt compiler context is hidden from users in stable_mir, managed through thread-local storage and accessed via with() function[1]
  • Cleaner interfaces: Simplified APIs that reduce the need to understand deep compiler internals
  • Conversion layer: The rustc_smir crate handles translation between internal MIR and stable_mir, isolating users from internal changes[4]
  • rustc_internal module: Provides internal() and stable() methods for bidirectional conversion when needed (though unstable)[1]

Coverage

Stable_mir currently has less coverage than full unstable MIR, focusing on what static analysis tools need[1][4]. Some advanced or compiler-internal features may not yet be exposed.

What do we need for the Pliron MIR Dialect

When modeling this in pliron:

  1. Operations: Create pliron ops for each StatementKind (Assign, StorageLive/Dead, etc.) and TerminatorKind (Return, Call, Assert, Goto, etc.)
  2. Types: Model MIR's type system as pliron types (primitives, tuples, references, arrays, ADTs)
  3. Attributes: Attach debug info (VarDebugInfo), source spans (SourceInfo), mutability/borrow kinds, and allocation metadata as pliron attributes
  4. Blocks/Regions: Map basic blocks to pliron blocks with appropriate control flow
  5. Operands: Model places (locals with projections) and constants as SSA values or special operand types

The key point is that statements and terminators are operations, locals and expressions have types, and debug/source/flow metadata are attributes.

The structure is conceptually the same—stable_mir just provides a stable, versioned API surface over the same underlying concepts that unstable MIR exposes[8][2][3].

Sources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment