Skip to content

Instantly share code, notes, and snippets.

@tema3210
Created April 28, 2021 12:41
Show Gist options
  • Select an option

  • Save tema3210/97669727bca4d40b5874470b796c860b to your computer and use it in GitHub Desktop.

Select an option

Save tema3210/97669727bca4d40b5874470b796c860b to your computer and use it in GitHub Desktop.
5th revision

Summary

Move reference is a new kind of references that are intended to allow moving the value, but not the memory it's stored in.

Introduces a new implied auto trait called UnwindSound.

Motivation

  • Make Box less special by creating mechanism (DerefMove) of moving out of a reference.
  • Solve drop concern of temporary moves and panics.
  • Macro-free, unsafe free stack pinning.

Guide-level explanation

&move references provide a way to borrow the memory of a binding while preserving the logic of moving its value.
The type &move T is, in fact, a reference, but unlike other references it allows moving out initialized value of referenced binding.

Core

There are a few types of move references: plain and annotated with !.

About the functionality:

&move T &move T!
Allows to move out Obligates to keep initialized, allows to temporary move out

&move T! is a move reference to initialized binding with ability to move from it. In fact it can be viewed as mutable reference. The reasons of creating it are simple:

  • It doesn't change existing behavior of &mut T.

Allowing to move out a value implies that it is initialized. So referencing an uninitialized binding by &move T or &move T! is prohibited.

This references can be coerced to another kinds of references, this way you can call methods via them.

Calling a method that takes self by value is also allowed - it will result in deinitialization of referenced binding.

Reference-level explanation

Creation

There are three ways of creating move references:

  • Reference a local binding via syntax &move ...
  • It is also possible to create move reference to a member of a binding referenced by another such reference: as simple as some_reference.some_field - this will produce move reference to the field of referenced binding.
  • Reborrow from existing move reference, possibly using DerefMoves.

Coercions to other reference kinds

  • for all T: Unpin &move T! can be coerced to &mut T and thus further down to &T.

Casting

Casting &move .. references to pointer is trivial and analogical to another kinds of references.

Casting a pointer to move reference may create ill-formed reference and thus is unsafe.

Interaction with patterns:

We introduce a new pattern kind ref move NAME!: this produces NAME of type &move T!. The reason of the ! obligation is that we may not want to left a binding (partially) deinitialized after execution of a pattern-matching construct.

Another new pattern is ref move NAME (note the absense of exclamation mark): this produces NAME of type &move T.

DerefMove traits

I also propose design of DerefMoves:

trait DerefMove: DerefMut {
  fn deref_move(&move self) -> &move <Self as Deref>::Output;
}
trait DerefMoveInit: DerefMut {
  fn deref_move_init(&move self!) -> &move <Self as Deref>::Output!;
}

The reason for two trait is that there are 2 kinds of move references with different use cases.

The Box implementation:

struct Box<T>{
  ptr: *mut T,
};
//...
impl<T> DerefMoveInit for Box<T> {
  fn deref_move_init(&move self!) -> &move T! {
    unsafe { self.ptr as &move T! } //just cast the pointer to a reference
  }
}
impl<T> DerefMove for Box<T> {
  fn deref_move(&move self) -> &move T {
    unsafe { self.ptr as &move T } //just cast the pointer to a reference
  }
}

The now unstable box keyword syntax usage now can also be written as:

...
let b: Box<C> = Box::new(..);
match b {
  ref move smth! => { //this internally calls `deref_move_init`
    //here we have smth: &move C!;
  }
};
match b {
  ref move smth => { //here we consume the box; this internally calls `deref_move`
    //here we have smth: &move SMTH;
  }
};
//b.method() //error since we have consumed box.
...

Aliasing:

Given that all move references are intended to modify referenced binding they all must be unique as &mut T is.

Interaction with panics:

&move .., panics and drops

The representation of a move reference may include not only the pointer itself, but also a bitfield storing information of whether anything was moved out of reference or not.
This allows to get rid of concerns about drops of uninitialized data during panics.

I guess, this may look like:

#[repr(C)]
struct MoveRef<T> {
  ptr: *mut T, //pointer.
  flags: MoveFlagsOf<T>, //of course, this is not real type, but a kind of intrinsic rather.
}

Also, due to reference being unique there is no need in bitfield being atomic.

All changes to a bitfield happen right after corresponding move. To avoid issue with panic.

The issue with panics is that they may interrupt modification of referred binding thus resulting in inconsistent state. But this is also true for &mut references, so it may cause only logical bugs.

To avoid really bad things that panics can expose to end user, I propose new implied auto trait: UnwindSound (This may be a bad name, but still).

The rules are simple: any type whose parts all implement UnwindSound is also UnwindSound.

This trait is, in fact, a strict version of UnwindSafe and can be relied upon for safety.

catch_unwind in turn requires it.

&move T!, however, doesn't implement this trait.

Motivation of doing this is to forbid &move T! references to cross unwinding border, and thus make it impossible to observe uninitialized bindings.

Interaction with leaks

As of these two kinds of move references can't corrupt any state if not used, leaking them should be completely fine.

Methods

As a Self type

These references are explicitly intended to refer to a binding of a type, not just a value. Thus, calling a method taking &move T! as self can only be done on mutable binding, not arbitrary value.

Calling a methods on move references

Methods that take self by value will deinitialize referred binding.

Calling any other methods trigger coercion to a less strict reference kind.

Reasoning about usage of move references

Restrictions

All move references are unique, they may not be duplicated.

The main point in their lifecycle is function boundary.
At it all move references passed to a function are assumed to hold its invariants.

In order to not run into threading problems move references may not be Send nor Sync.

In the result, if references are properly used in each consumer function, then overall usage of each such reference is in turn correct. (No multi-thread non-determinism here)

The second reason for them to not be Send is that in case of thread crush for what-ever reason we can't be sure that something has been initialized from another thread or is what we are going to deinitialize alive on another thread.

Scopes and analysis

In any scope of the program, move references created as described above must fulfil their obligations, if any. This means that any data structure holding such a reference is required to use the move reference.

This in instance means that &move T!, if something was moved from it, must be initialized back in the same scope in all possible branches. Analysis also must take into account diverging expressions: move reference have to be initialized before return and loop {..} resolving to uninhabited types. break is included in the list only if a move reference was created inside a loop that a particular break breaks.

Infinite loop{} is also not included because uninit. state won't be observable to anyone ever.

The panic!(), however is not included - we have plenty of operations that can panic and we don't want to initialize a value before each of these.

Pin, DerefMoves and stack pinning

The impls are following:

impl<P: DerefMove> DerefMove for Pin<P>
  where <P as Deref>::Target: Unpin,
{
  fn deref_move(&move self) -> &move Self::Output {
    &move self.pointer
  }
}

impl<P: DerefMoveInit> DerefMoveInit for Pin<P>
  where <P as Deref>::Target: Unpin,
{
  fn deref_move_init(&move self!) -> &move Self::Output! {
    &move self.pointer!
  }
}

impl<P> Pin<P>
  where P: DerefMove
{
  pub fn new_move(ptr: P) -> Pin<P> {
    Pin {pointer: ptr}
  }
}

impl<P> Pin<P>
  where P: DerefMoveInit
{
  pub fn new_move_init(ptr: P) -> Pin<P> {
    Pin {pointer: ptr}
  }
}

An example of use:

...
fn main() {
  let g = make_some_not_unpin_gen();
  let pinned = Pin::new_move_init(&move g!);
  //work with it!
}
...

Optimization

General

Another key property of move references, is that their usage implies moving the value in and out: this is the perfect case for GCE.

We do GCE for &move .. references.
In this case mentioned earlier move flags of a reference should live on caller's stack.

Drawbacks

  • This adds an entire kind of references. We'll need to teach this.
  • Requires new implied auto-trait.

Rationale and alternatives

The feature serves one need: moving a value but not the memory.

Alternatives are either on the library level or in previous proposals.

Prior art

Unresolved questions

  • How exactly we should balance usage of GCE in prospect of panics?
  • Do we want implicit syntax of creating a move reference, like:
fn a(&move B!){...};
fn main(){
  let b: B = Default::default();a(b)//it creates move reference implicitly.
}

Future possibilities

&move T* kind

This kind of move references obligates to move in referenced binding, doesn't require it to be initialized.

Currently, we have no traits to describe mandatory operations on this kind (it's, in fact, a refined type).

Introducing this would require also Leak and !ImplicitDrop auto traits to describe things correctly.

The reason of not introducing this is that we could not fix soundness issues by only turning off GCE.

MVP

As alternative, we could introduce this kind as "not a true type", in terms that it may not participate in data structures nor as a generic parameter. This way, its purpose is unconditional deferred initialization.

In patterns

Something like ref move NAME*...

Partial initialization (views)

Partial initialization of a binding of a known type C can be described via following syntax: &move C(a!,b*,c,...).

An example:

struct C {
  a: String,
  b: String,
  c: String,
  d: u32,
}

/// ...Promises to init `b`, keep `a` and uninit `c`, doesn't touch `d` at all.
fn work(arg: &move C(a!,b*,c,.d)) { //dot prefixed `d` may have been omitted.
  let mut tmp = arg.a; //we moved the String to `tmp`
  tmp.append(&arg.c) //we may not move the 'arg.c', but we haven't gave a promise to initialize it back.

  arg.a = tmp; //we initialized `arg.a` back; removing this line is hard error.

  arg.b = "init from another function!".into();

  //println!("{:?}",arg.d ); //error: use of possibly uninitialized value.
}

fn main() {
  let trg: C;
  trg.a = "Hello ".into();
  trg.c = " Hola".into();

  work(&move trg);
  println!(&trg.b); //legal, as work gave a promise to initialize
  println!(&trg.a); //legal
  //println!(&trg.c); //error: use of definitely uninitialized value.

}

Tuples

Syntax of &move references with partial initialization of a tuples is following:

Given a tuple (u32,i64,String,&str) the move reference syntax is like: &move (.u32,i64,String!,&str*) - note the dot prefixed u32 - it will not be touched by a consumer of a reference, but is here to distinguish different tuple types from one another (in cases of named structures untouched fields are simply not mentioned).

Unmovable types and values

GCE and !Unpin values

If we guarantee copy elision for &move .. references, then we are able to not move a !Unpin value, IOW work with it in place.

The way of working with !Unpin values I imagine is just destructuring a move reference to it resulting in bunch move references to contents.

The biggest issue, however, is about how to deal when one referenced part refers to another, that is also referenced.

My guess is that we could provide a 'self lifetime that actually tells the compiler that reference will point inside of a struct it's contained in.

During destructure, observing such references might break aliasing rules and thus &move T* kind would allow to work with such parts of a value.

This will need GCE to be sound.

!Unpin types behind move references

In principle we could forbid moving such values via move references. This way, we could freely implement DerefMove for Pin<P> for any P safely. The only concern left is "values still can be moved out of binding, not involving any references", thus it's not backward compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment