Field | Value |
---|---|
DIP: | (number/id -- assigned by DIP Manager) |
Author: | Richard (Rikki) Andrew Cattermole [email protected] |
Implementation: | (links to implementation PR if any) |
Status: | Draft |
The introduction of an escape set modelling at a function signature offers the ability to set better defaults for relationship strengths. A redesign of the escape set analysis from that of DIP1000, allows the escape set to grow and shrink during a function body enabling more code to work.
- Rationale
- Prior Work
- Description
- Breaking Changes and Deprecations
- Reference
- Copyright & License
- History
The purpose of this proposal is to introduce a framework to do memory safety analysis in the compiler, with enough optional information specifiable to allow the description to scale with need.
Memory passed into a function can go into any of these four places. Being able to trace this offers accurate borrowing safety for owning representation.
- Into the unknown
- Another parameter
- As the return
- Into the this pointer
The movement of memory from an input to an output is not all equal. Some establish a strong relationship where the output depends upon the input to be valid. Others say the input contributed to the output, and it depends upon what the caller thinks. These relationship strengths are offered using modifiers. The ones specified here, are not required to be implemented unless analysis exists to take advantage of them. They can represent DIP1000 behaviour, or other escape analysis proposals due to the flattening at the function signature being fairly limited.
Describing just the escape set, and then the relationship with good defaults allows for removing the DIP1000 attribute mess. An example of this is inRefOutRef
which only needs the escape set to be annotated without any modifiers, but intRefOutPtr
would need both the escape set and modifiers annotated.
ref int* inRefOutRef(@escape(return/*&*/) ref int* input) => input;
int** intRefOutPtr(@escape(return&) ref int* input) => &input;
With DIP1000 to do either of these function prototypes you would use the return ref
+ scope
attributes on the parameter. Instead, these are two separate attributes return
+ ref
with an invalid combination of return
and scope
as return
has a larger escape set than scope
.
Existing analysis in the form of DIP1000 offers both escape analysis and owner escape analysis which is intended for memory owned by a point in the stack.
The attributes that DIP1000 describes in its model are the following:
DIP1000 | Input-Output Relationship |
---|---|
scope |
No Return † |
return |
See return scope and return ref |
return scope |
Returns ‡♦ |
return ref |
Returns ‡, ref ♥ |
return ref scope |
Returns ‡♦, ref ♥ |
† Cannot include other escapes
‡ May include other escapes, minimum escape set
♦ Escapes must be modellable and not globals or throws
♥ The by-ref value is what is being protected
It uses three keywords to offer five different combinations with only four unique relationships between a given input and its outputs. Of note is that none of the relationships described include the value stored within a by-ref parameter, only the by-ref pointer. Of one return
it can be used to denote either return scope
or return ref
depending on context.
These attributes have led to significant confusion in the usage of DIP1000, and do not model heap memory to a usable level, which has resulted in abandonment and usage of @trusted
where it should not have been @trusted
.
This proposal introduces the new escape set with configurable modifiers per input-output relationship. Subtle changes are made in the analysis compared to DIP1000, to enable growth and shrink during the body analysis, with late catching of errors.
The following grammar changes are made and are non-optional. Optional grammar changes related to potential modifiers that could represent DIP1000 behaviour are done in the A Modifier Profile
heading.
AtAttribute:
+ @ EscapeAttribute
ParameterAttributes:
+ @ EscapeAttribute
+ EscapeAttribute:
+ escape ( EscapeRelationships )
+ escape ( )
+ escape
+ EscapeRelationships:
+ EscapeRelationship
+ EscapeRelationship , EscapeRelationships
+ EscapeRelationship:
+ Identifier EscapeRelationshipModifiers|opt
+ EscapeRelationshipModifiers:
+ EscapeRelationshipModifier
+ EscapeRelationshipModifier EscapeRelationshipModifiers
+ EscapeRelationshipModifier:
As an analysis, the escape set provides a framework for escape analysis and owner escape analysis to protect memory from leaving its known lifetime and potentially causing program corruption.
To do this it performs data flow analysis in a forward-only pass over a function body to detect the movement of memory into unmodellable locations and establish the relationships between variables for other analysis to work upon. It does not consider any escape set-related annotations when the analysis is started for the function signature. The annotated signature is only considered at the end of the analysis.
When the relationships have been determined at exit points, a process of convergence on the annotated parameters is performed. If a parameter has an has no user-provided annotation it is stored as inferration. If it was annotated by the user then it will be verified against the known relationship and if it does not match it is an error.
The rules on erroring for if a signature mismatched the analysis applies to @safe
functions. It does not error for @trusted
functions if it does not verify but they still infer if it is not fully annotated with the escape set. For @system
functions they will not have this analysis applied to it.
If the annotated signature has a larger escape set or a stronger modifier for a relationship it is not an error. See Why Modifiers Are Useful
heading for why this relaxation is very useful.
It is important that any analysis built upon this attempts to do error detection as late as possible.
int** global;
void escapeIt(@escape(__unknown) int** input) {
global = input;
}
int* escapeOut(@escape(return) int* input) {
{
int** val = &input;
// @escape(val) input
escapeIt(val);
// @escape(__unknown, val) input
} // @escape(__unknown) input, Error: Variable `input` escapes into an unknown location
// Do another pass and trace WHY `input` escaped into an unknown location!
// @escape() input
return input; // @escape(return) input
}
When a type acts in a tuple-like manner or can be modelled as such, each element may have its lifetime within a function. A function signature cannot model separate lifetimes between elements so it must be conflated to its containing variable.
int* func(@escape() int* input) {
int*[3] tuple;
tuple[0] = new int;
tuple[1] = input;
return tuple[0]; // ok
return tuple[1]; // Error: Variable `input` cannot be escaped as its escape set does not include `return`
return tuple[2]; // ok
}
An expression sequence functions as a tuple so does a static array. A struct can sometimes do this, however it is more involved than a simpler sequence representation. It may have mutable constructors, copy constructors, destructors, or postblit. Otherwise, all methods must be read-only. This is due to cross-function graph mutation is not modelled at the function signature level, but can be modelled within a function body.
Modifiers provide a way to describe to the compiler without a body, that the input-output relationship will have a specified set of characteristics. These characteristics typically come in the form of a strength, to denote what amount of protection is needed after the function call or the amount of protection that should not exist before it.
The goal of making the modifiers have a dedicated part of the syntax, is to eliminate dedicated keywords and new semantic behaviors wherein they look innocuous.
While no modifiers are described in this proposal as must be implemented, some potential ones are described to map into existing escape analysis designs.
Each modifier implemented will have an analysis that affects the relationships between variables within a function body. This in turn provides the verification of the function signature where it has been annotated, and inference where they are not.
In light of DIP1000 and potential proposals, three core relationship modifiers can establish how an input goes into an output. None of these need to be supported unless there is analysis that can take advantage of them. They are provided to give concrete examples of what a modifier is meant to function as.
- Take a pointer to an input (including to a field or method), into or in part of an output (
&
) - Copy the value of the input, directly into or as part of output (
=
). - Copy a value that came from the input, but not the input itself into or as part of the output (
.
)
The strength of each of these modifiers starts with the first &
, and the last two have the same strength. Both =
and .
may be elided if &
is provided on an input-to-output relationship.
The grammar changes:
EscapeRelationshipModifier:
+ &
+ =
+ .
If none of these modifiers is placed into an input-output relationship, then if both are by-ref it'll have &
default relationship otherwise =
.
It is recommended if these three modifiers are used, that when the default modifier =
is applied, it should also apply .
. This allows more compact forms that require less understanding to utilize.
The only guarantee provided by this proposal in association with these stated modifiers is that if implemented they will be accurately added to the signature during inference and validated as being accurate if manually annotated when a body is present.
ref int* inRefOutRef(/*@escape(return&)*/ ref int* input) => input;
int* inRefOut(/*@escape(return=)*/ ref int* input) => input;
int* inOut(/*@escape(return=)*/ int* input) => input;
Earlier it was stated that if a modifier is known to be the default it may be elided, all three of these examples would use the default modifier and therefore could have been elided if manually annotated.
An example of a relationship where the modifier &
would be required and cannot be elided:
int** intRefOutPtr(/*@escape(return&)*/ ref int* input) => &input;
Given these examples, it can be assumed that the default relationships should be good enough for the majority of cases. It is only when you are doing something a bit more advanced that you need to opt into stronger guarantees.
Being able to control the relationship outside of the default can be quite useful. For example with an owning object, we want to establish a strong relationship between the owner, and the borrow.
struct Owner {
private {
int* ptr;
}
int* borrow() @escape(return&) {
return this.ptr; // strength of . which is less than &
}
}
If we did not annotate the this
pointer explicitly with the &
modifier, it would have defaulted to =.
which under normal situations is what would have been wanted with GC memory.
This proposal introduces only one attribute @escape
. This may conflict with an existing user-defined attribute. If so it could be limited to a given edition and above or take preference over it.
No conflicts with DIP1000 are expected, these proposals can co-exist, although a lack of syntax reuse would be possible. Only one of these proposals should be active at one time.
Copyright (c) 2024 by the D Language Foundation
Licensed under Creative Commons Zero 1.0
The DIP Manager will supplement this section with links to forum discussions and a summary of the formal assessment.