Skip to content

Instantly share code, notes, and snippets.

@rikkimax
Last active December 16, 2024 14:44
Show Gist options
  • Save rikkimax/bdb5742b2fe4da0972bdc1dda9338c49 to your computer and use it in GitHub Desktop.
Save rikkimax/bdb5742b2fe4da0972bdc1dda9338c49 to your computer and use it in GitHub Desktop.

TODO: @gcout?

Escape Analysis

Field Value
DIP: (number/id -- assigned by DIP Manager)
Author: Richard (Rikki) Andrew Cattermole [email protected]
Implementation: (links to implementation PR if any)
Status: Draft

Abstract

The movement of pointers within a program graph easily escapes known points that own that memory within the call stack of a given thread. This logic error can result in program termination or undetected corruption of the program state. This proposal is a new attempt at preventing this corruption within the @safe code.

To do this the introduction of an escape set modelling at a function signature offers the ability to set better defaults for relationship strengths. A redesign of the escape set analysis from that of DIP1000, allows the escape set to grow and shrink during a function body enabling more code to work.

Contents

Rationale

In a review of the existing escape analysis solution implemented in D's reference compiler DIP1000, there is one major limitation of what it models and assumption growth to facilitate functionality.

The implementation of DIP1000 models a single output variable per function, this is the return value or if void the first parameter (could be the this pointer). In practice functions typically have more than one output, this includes mutable pointers in, ref and out function parameters.

int* /* output */ func();

struct S {
	int* /* output */ method1();
	void method2() /* output */;
}

The relationship between parameters is modelled using the return ref and return scope attributes. These communicate to the compiler the varying input and how it relates to the output for that parameter.

Needing two different attributes to determine the relationship status between parameters has been highly incommunicable to experienced programmers.

Due to it not being able to model multiple outputs, a lot of typical D code cannot be safely represented using DIP1000. The design does not protect you from extending past the modelled subset of the language.

To resolve both of these core issues in the existing design, an escape set must be modelled per parameter. While this resolves the callee's side, it does not protect the caller from misusing the callee. The design DIP1000 attempts to solve this by modelling the relationship between parameters using the two different attributes.

Describing just the escape set, and then the relationship with good defaults allows for removing the DIP1000 attribute mess. An example of this is inRefOutRef which only needs the escape set to be annotated without any modifiers, but intRefOutPtr would need both the escape set and modifiers annotated.

ref int* inRefOutRef(@escape(return/*&*/) ref int* input) => input;
int** intRefOutPtr(@escape(return&) ref int* input) => &input;

With DIP1000 to do either of these function prototypes you would use the return ref + scope attributes on the parameter. Instead, these are two separate attributes return + ref with an invalid combination of return and scope as return has a larger escape set than scope.

Another solution to this problem is to utilize the information provided by escapes and inverse it, given an output and given the inputs that form it, protect the inputs so that nothing can invalidate the output. This resulted in the proposal that was @live, an opt-in analysis that does not communicate to either the callee or caller any guarantees cross-function, making it functionally irrelevant to the guarantees of DIP1000.

An opt-in solution to ownership does not allow for reference counting to occur safely. To safely do this, the referenced counted owner must be pinned and made effectively read-only so that both a reference to it and the borrowed resource may be passed around. This was a blocker determined by Walter and Timon for adding reference counting to the language.

Furthermore without the entry point to escape analysis having analysis associated with it, there is no differentiation of what can constitute of a safe to borrow from source and what can't be. An example of this is with a global, in the case of a variable thread local storage, it is possible in fully @safe code with DIP1000 turned on to cause a segfault.

import std;

int* tlsGlobal;

@safe:

void main() {
    tlsGlobal = new int(2);
    assert(*tlsGlobal == 2);
    
    toCall();
    assert(*tlsGlobal == 2); // Segfault
}

void toCall() {
    tlsGlobal = null;
}

Prior Work

Escape analysis as a subject matter is primarily an analysis of graphs. How they are mutated and who owns them at what places. Modelling this can be an expensive set of operations in the form of data flow analysis. For sufficient and best experience, a full program analysis is needed with a full graph of manipulation and logic therein analysed.

Native programming languages do not align themselves to full program analysis, due to the separate compilation model. D is a native programming language that uses this model almost exclusively. For this reason, it cannot use a full program analysis and full program graph analysis for modelling escaping. Instead, a flattened view of the graph must be representable inside a function signature.

At the time of this proposal, a solution for escape analysis has been implemented in the D reference compiler that is commonly referred to by its DIP number, DIP1000. This does not cover memory ownership guarantees, instead @live as an opt-in attribute enables some localized to the given function guarantees.

The attributes that DIP1000 describes in its model are the following:

DIP1000 Input-Output Relationship
scope No Return †
return See return scope and return ref
return scope Returns ‡♦
return ref Returns ‡, ref
return ref scope Returns ‡♦, ref

† Cannot include other escapes

‡ May include other escapes, minimum escape set

♦ Escapes must be modellable and not globals or throws

♥ The by-ref value is what is being protected

It uses three keywords to offer five different combinations with only four unique relationships between a given input and its outputs. Of note is that none of the relationships described include the value stored within a by-ref parameter, only the by-ref pointer. Of one return it can be used to denote either return scope or return ref depending on context.

These attributes have led to significant confusion in the usage of DIP1000, and do not model heap memory to a usable level, which has resulted in abandonment and usage of @trusted where it should not have been @trusted.

In Rust ownership is a transfer based system, so that only one variable has any ownership of memory. In contrast to D, where this is modelled and attempting to enforce this would not match how garbage collected memory would be used. Further guarantees are given, in that when a borrow occurs from an owner, only one mutable borrow is allowed in a given scope. This complements the ownership transfer system as it guarantees nobody else has the potential for aliasing.

Description

An escape set is the relationship between an input, and its outputs. Each output can have different reasons for being in the set, therefore each needs to know why it was placed in the set.

The escape set usage can be thought of as existing in two steps:

  1. The optional set definition: @escape(output, ...) input
  2. An action to establish the input having said output output = input;

Whilst placing an output into its appropriete input's escape sets, the relationship strength must go with it. This will take the form of a character such as =, &, or ^ following the output. If a strength matches the default given the input and output variables, then it may be omitted. @escape(output^, ...)

An escape set may be placed upon function parameters, and on a function on the right hand side which applies to the this pointer.

Analysis

The analysis that is run on a function will function as such:

  1. The escape set is always inferred. In the following example, DIP1000 requires that return to be annotated on the function parameter input.
     @safe:
    
    void main() {
        int delta;
        Foo foo = someFunc(delta);
    }
    
    Foo someFunc(/* @escape(return&) */ ref int input) {
        return Foo(&input);
    }
    
    struct Foo {
        int* ptr;
    }
  2. When the escape set is annotated, it is not considered until it is time to converge upon the input variable. It acts as a constraint on the output of the inferration. In the prior example, if we swap the variables foo and delta, it'll cause an error in DIP1000, as well as this proposal.
    Foo foo;
    int delta;
    foo = someFunc(delta);
    As it would look to the compiler as:
    Foo foo;
    
    {
    	int delta;
    	foo = someFunc(delta);
    	// @escape(foo&) delta
    }
  3. The escape set establishes all relationships between a given input, and its outputs. For DIP1000, a lack of annotation may indicate this escape, or it may indicate unannotated and requires inferration. 3.1. This includes outputs that are not modeled such as exceptions.
    void routine(/* @escape(__unknown) */ int* input) {
    	throw new Exception(input);
    }
    3.2. Or globals.
    int* global;
    void routine(/* @escape(__unknown) */ int* input) {
    	global = input;
    }
  4. An implementation is done using a single forward pass, but it is scope aware for each variable.
    int* ptr;
    
    for(;;) {
    	ptr = new int;
    }
    Is perceived during analysis as if it was written:
    int* ptr_outer;
    
    for(;;) {
    	int* ptr_inner = ptr_outer;
    	
    	ptr_inner = new int;
    	
    	ptr_outer = ptr_inner; // Converge here!
    }
  5. The state of a variable changes, over the course of analysis. This applies to metadata on a variable (such as what object is in it), as well as the escapes of the variable to others.
    int* ptr = null;
    // ptr is null
    
    ptr = new int;
    // ptr is non-null with object 1
    
    {
    	int** borrow = &ptr;
    	// @escape(borrow) ptr
    }
    // @escape() ptr
    
    ptr = null;
    // ptr is null

Consequences of Analysis

Consequences of the prior analysis rules are as follows:

  1. Pointer types will be allocated on the stack readily.

    void main() {
    	NotATeaPot myObject = new NotATeaPot;
    	// @escape() myObject
    }
    
    class NotATeaPot {
    	this() /* @escape() */ {
    		import std.stdio;
    		writeln("I'm a tea pot!");
    	}
    }

    The compiler will see at the point of convergance of myObject that it does not escape and will change it from a heap allocation to a stack allocation. Today you can do this explicitly by using scope as a storage class for the variable myObject.

    1.1. This promotion is not offered by DIP1000, and instead functions as a limiter should the scope storage class by applied. An example of this limiter behavior where DIP1000 will error, which this proposal will not do. This is in part due to DIP1000 being limited to modelling outputs that are the this pointer or return value.

    @safe:
        
    void main() {
        scope NotATeaPot myObject = new NotATeaPot;
        // @escape() myObject
        
        {
       	    NotATeaPot myObject2;
    	    myObject.callMe(myObject2);
    	    // @escape(myObject2) myObject
        }
        // @escape() myObject
    }
    
    class NotATeaPot {
    @safe:
        this() /* @escape() */ {
            import std.stdio;
            writeln("I'm a tea pot!");
        }
        
        void callMe(out NotATeaPot self) scope {
            self = this; // Error: scope variable `this` assigned to `ref` variable `self` with longer lifetime
        }
    }
  2. The escape set existing does not imply that any restrictions exist on an output. A modification of an earlier example, show cases that by moving a pointer around, it will never trigger an error. Other actions must take place to cause it. This has the same behavior as DIP1000 should the input parameter be annotated return.

    @safe:
    
    void main() {
    	int* stuff;
    	Foo foo = someFunc(stuff);
    }
    
    Foo someFunc(/* @escape(return) */ int* input) {
    	return Foo(input);
    }
    
    struct Foo {
    	int* ptr;
    }
  3. One level of indirection acquisition and dereferencing are defined behaviors. This behavior is a product of variable based analysis with escape sets from the input to its outputs. Along with individual relationship strengths. Displaying this behavior is the following example:

    struct LockBox {
    	int* field;
    	
    	void callIt() /* @escape() */ {
    		callMe(&this.field); // @escape(__temporary&) this
    		// @escape() this
    	}
    }
    
    void callMe(/* @escape() */ int** ptr) {
    }

    The responsibility of escape analysis in this example, is to ensure that callMe cannot escape either the ptr variable, of the field contents. Escape analysis does not offer lifetime guarantees that it remains in the same object after the call to callMe. For this a head const type qualifier would be required.

    Alternatively DIP1000 does not define this as having valid behavior leading to an error:

    @safe:
    
    void main() {
        LockBox lb;
        lb.callIt;
    }
    
    struct LockBox {
     	int* field;
        
        @safe:
        
        void callIt() scope {
            callMe(&this.field); // Error: cannot take address of `scope` variable `this` since `scope` applies to first indirection only
        }
    }
    
    void callMe(scope int** ptr) {
    }

    Dereferencing the input pointer will result in a relationship to the input. It may then be returned, the analysis and its guarantees would be responsible of the caller:

    void callIt() /* @escape() */ {
    	{
    		int* output = callMe(&this.field);// @escape(__temporary&) this
    		// @escape(output&) this
    	}
    	// @escape() this
    }
    
    int* callMe(/* @escape(return=) */ int** ptr) {
    	return *ptr;
    }

    In this example the lifetime of the object in this.field was not considered. Escape analysis does not model objects outside of a function body, nor once indirection takes place.

Relationship Strengths

Three relationship strengths are defined by this proposal:

  1. Assignment of or into an output, by the input. output = input; or output.field = input;
  2. Taking a pointer to the input, and assigning of or into an output. output = &input; or output.field = &input;
  3. Borrowed, the input is protected from mutation as long as the relationship to output exists. Can come in any form i.e. output = input;

The syntax provided are examples of where a relationship would be established with a given strength, not its exclusive set of options.

Both taking a pointer to the input, and borrowed must have the input outlive the output. Assignment may have this restriction applied to it, it depends upon the caller's state for the input.

This is ok:

int input;

{
	int* output = &input;
}

This is not:

int* output;

{
	int input;
	output = &input;
}

The default relationship strength depends upon the input and output. If both are ref, then it'll be taking a pointer, otherwise it will be assignment. Rarely should code have to annotate the relationship strength given this default.

ref int* takePointer(/* @escape(return&) */ ref int* input) {
	return input;
}

int* assignment(/* @escape(return=) */ ref int* input) {
	return input;
}

This differs from DIP1000, its stronger relationship return ref where the return escape would bind to the ref, and would protect the by-ref pointer, not the int* held within it.

@safe:
int* global;
    
void main() {
	int* ptr;
    global = assignment(ptr); // Error: address of variable `ptr` assigned to `global` with longer lifetime
}

int* assignment(return ref int* input) {
	return input;
}

The addition of scope attribute to turn return ref into return scope + ref is required to make this legal with DIP1000.

Protection of an input from mutation appears as:

void main() {
	int* ptr;
	int* borrowed = borrow(ptr);
	*ptr = 2; // Error: Variable `ptr` has a borrow in variable `borrowed` so it may not be mutated.
}

int* borrow(@escape(return^) ref int* input) {
	return input;
}

The purpose of protecting an input from mutation, is to guarantee safety in some kinds of data structures and when using reference counting.

Extendable

The data flow analysis described in this proposal is meant to be used with other language features and can be extended with new features.

  1. Eliding of reference count additions and subtractions. This requires modelling of objects that are in variables.

  2. Type state analysis, this enables three important states to be modelled in language enabling more code to work in @safe functions (can write to uninitialized memory, but not read from it), and prevent always wrong operations from being available to you such as dereferencing a null pointer.

    2.1. Ownership transfer, i.e. isolated from Midori. This requires modelling of objects that are in variables. It needs this to disallow access on an object after it has been transfered.

  3. Some indirection could be modelled within a function body, tuples, static arrays, dynamic arrays, and structs. This requires an additional index applied to the variable state.

Safety

Errors will be generated for @safe functions. All other functions will not generate errors.

For @trusted functions to have their signature to be inferred, @system functions must also be inferred.

Grammar

AtAttribute:
+    @ EscapeAttribute

ParameterAttributes:
+    @ EscapeAttribute

ParameterStorageClass:
+	return EscapeRelationshipModifier
	return

MemberFunctionAttribute:
+	return EscapeRelationshipModifier
	return

+ EscapeAttribute:
+    escape ( EscapeRelationships )
+    escape ( )
+    escape

+ EscapeRelationships:
+	EscapeRelationship
+	EscapeRelationship , EscapeRelationships

+ EscapeRelationship:
+   Identifier EscapeRelationshipModifier|opt

+ EscapeRelationshipModifier:
+    ^
+    &
+    =

Of note is the existing scope storage class, this may be used in place of the empty escape set. @escape()

The return storage class on a parameter or this pointer, may be used from DIP1000, with an optional relationship strength. return& int* input which translates to: @escape(return&) int* input

No specific mangling changes are provided here. Some will be desirable, however the decision to make them should be dependent upon implementation cost, not what would be required to fully describe the escape set.

Removal of Existing Language Elements

The language design elements that are being removed are DIP1000 and @live. Together these attempted to do this proposal but in a non-integrated way that has shown concerning adoption conclusions.

AtAttribute:
-   @ live

FuncAttr:
-	FuncAttrReturn
-	FuncAttrScope
-	FuncAttrLive

- FuncAttrReturn:
-	Nj

- FuncAttrScope:
-   Nl

- FuncAttrLive:
-	Nm

No timeline is specified for removal.

Breaking Changes and Deprecations

This proposal introduces an attribute @escape. This may conflict with an existing user-defined attribute. If so it could be limited to a given edition and above or take preference over it.

No conflicts with DIP1000 are expected, these proposals can co-exist in syntax. Only one of these proposals can be active at a time.

Any new semantic analysis would only cause errors to be applied to a new edition and would not affect the base D2 language.

Reference

Copyright & License

Copyright (c) 2024 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

History

The DIP Manager will supplement this section with links to forum discussions and a summary of the formal assessment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment