WTF is Pin

// #Rust

I have glanced at the Pin documentation, read a few blogs, watched a few videos. They always make a lot of sense at the beginning before quickly spiraling in my mind and leave me wondering why I am here. So I’ll try walking the steps here to see where I get lost.

One of the powerful features of the rust borrow checker is how it guarantees that something being referenced won’t move for the lifetime of the reference. That a reference’s address remains valid. This is why any dereference of a raw pointer is unsafe, the compiler hasn’t been tracking what is referenced so it doesn’t know. So we want to try and use references as often as possible.

Using references requires conveying to the compiler the lifetime of the references. But there are some scenarios which is this can be pretty hard. For example, self referential data structures. This is when a field in a structure is referencing another field in the same structure. What is the simplest example of this?

struct SelfRef {
    data: String,
    ptr: &str,
}

let s = SelfRef {
    data: "hello".into(),
    reference: &???,
};

This does not compile in rust, there is just no way to describe the lifetime of ptr. There is a bit of a circular dependency issue when creating an instance.

It is not with references. Rust’s lifetime system does not support expressing some sort of “I have the same lifetime as my container”. Rust could maybe add support for this very specific scenario, but I think it gets complex fast with multiple layers of indirection. And the system works for 99% of use cases now, be a shame to muddy it up for one thing. So instead we use the escape hatch and drop down into raw pointer land, where we side step the borrow checker.

struct SelfRef {
    data: String,
    ptr: *const String,
}

*const is a read-only pointer, vs. *mut which is read/write.

Nothing in SelfRef enforces self-reference, but it can be with just one s.ptr = &s.data as *const String;. And then the question is, what happens if you do let moved = s? What is s.ptr pointing to now? This is undefined behavior.

But we have some use cases for self-referencing structs, so what do we do? Well before Pin, the answer was just be careful. Design APIs which avoid moving them. Avoiding moves in an API is pretty hard though, and then you still end up having to trust the caller to play nice. A caller might not even be aware that they are managing a self-referencing struct (ahem, async/await).

pin

So we don’t want to extend the referencing system for this use case, but how can we make it safer? Enter Pin. The goal of Pin is to allow the compiler to enforce data can’t move. It does this by leveraging the type system, so no runtime costs.

pub struct Pin<P> {
    pointer: P,
}

impl<P: Deref> Pin<P> {
    // All Pin methods require Deref.
}

Pin is a wrapper type around a pointer.

All of Pin’s methods require the inner type to implement Deref (be pointer-like), including the new constructor. Since the inner field is not exposed you can only use Pin on pointer-like things. This is a little interesting, because isn’t the point to get some help from the compiler to not move data? The indirection allows Pin to work with the borrow checker. Pin holds a reference to the data, so the borrow checker will make sure that data doesn’t move while the pin is alive.

I believe theoretically a Pin could be defined which wrapped the data instead of a reference to it, Pin<T> instead of Pin<&T>. But you would need some new language feature to keep the pin from moving since it now owns the data. And you would need a way to delegate method calls to the internal data type. Both of these are kinda re-inventing borrow checker and reference features. The Pin of today is weird, but it is a much smaller change.

Library APIs built on pinned pointers to emulate the notion of pinned place.

without.boats

Pin has a very narrow interface which helps establish a contract “the data at this memory location will not move for the lifetime of this Pin”. This contract is not as satisfying as others provided by the compiler, because a user still needs to be careful when establishing the pin with unsafe. But once the pin is established, the compiler offers protection between the interface and the borrow checker.

  • Borrow Checker // You can’t access data directly, there is a mutable reference alive.
  • Pin // You can’t move through this reference.
  • Combined // The data cannot move by any means…until the pin is dropped :(.

If a reference is pin’d you know you are operating in a window where it will not move. Once something is pinned it stays pinned. Or at least the contract should still be upheld even if a pin reference is dropped. Once some data has some internal pointers it needs to be treated carefully, with or without the compiler’s pin support.

Interfaces that operate on values which are in an address-sensitive state accept an argument like Pin<&mut T> or Pin<Box> to indicate this contract to the caller.

Pin Module Docs

impl SelfRef {
    // Takes Pin<&mut Self> instead of &mut self.
    fn setup_self_reference(self: Pin<&mut Self>) {
        // Safe to create internal pointers. 
    }
    
    fn use_internal_data(self: Pin<&Self>) -> &str {
        // Safe to use internal pointers. 
    }
}

let mut data = SelfRef::new();

// This won't work:
// data.setup_self_reference();

let pinned = unsafe { Pin::new_unchecked(&mut data) };
pinned.as_mut().setup_self_reference();

“I’m going to do something that requires this data to never move (like create internal pointers), so you must pin it first. Then the compiler will help you to not do anything stupid.”

What is Pin’s interface? If you put something in a pin, you can’t get a mutable reference without promising you won’t break things by typing unsafe.

unsafe Pin::new_unchecked() is how you pin a self-referencing struct (!Unpin). And it is an unsafe, “unchecked”, because the compiler is not checking the constraints which make Pin safe.

Anything that wants to interact with the pinned value in a way that has the potential to violate these guarantees must promise that it will not actually violate them, using the unsafe keyword to mark that such a promise is upheld by the user and not the compiler. In this way, we can allow other unsafe code to rely on any pointers that point to the pinned value to be valid to dereference while it is pinned.

Pin Module Docs

A user of something requiring to be pin’d, like polling a future, can do so with almost entirely safe code. Creating the pin with new_unchecked is unsafe since still ways to mess with the underlying data, but that unsafe-ness can be wrapped behind some constructors like the pin+box combo below. It is the implementer who has to handle the unsafeness to get stuff done. I am not sure any amount of language features will ever change that burden.

unpin

The marker trait Unpin was introduced with the Pin struct. The naming of Unpin is a little strange, but it means “safe to move around, I don’t contain self-referencing pointers”. It is an auto-trait which is auto applied to structs whose fields are Unpin.

Unpin allows for easier APIs. If a type is Unpin, it doesn’t have to go through the super-restricted Pin interface, it can be easily deref’d. The only way to get mutable access for a pin’d type is for it to be Unpin or some unsafe hacking.

This allows getting a mutable reference from a pinned pointer without unsafe code if the type can’t be self-referential.

without.boats

deref and drop

These traits were stabilized before Pin, but in a perfect world these would take a pin’d self. But that change would not be backwards compatible.

a library type

without.boats makes the case that the struggles of using Pin have to do with it being just a library type. Not built in to the language like references. Pin needs to play by the rules whereas references can take shortcuts. Things like re-borrowing and interactions with Drop are super painful, if technically sound. There are some attempts to iron out this pain, like the pin-project (project as in a projector) projects which generate correct unsafe code with macros.

projections and structural pinning

The term projection comes from the mathematical concept, taking a complex structure and “projecting” it down to just one component. Like casting a shadow that only shows part of the original. Dot notation is a projection operation, it extracts a sub-place in the larger place. That is why you can take a reference to fields, they point to their very own place.

struct Person { name: String, age: u32 }
let mut person = Person { /* ... */ };

// Scenario A: Whole struct reference, always fails to compile.
let person_ref = &mut person; 
// let name_ref = &mut person.name;

// Scenario B: Direct field references.
let name_ref = &mut person.name;
let age_ref = &mut person.age;
// let person_ref = &mut person;

// Scenario C: Projection! from higher level reference.
let person_ref = &mut person;
let name_ref = &mut person_ref.name;
// person_ref is now unusable

C shows how it is possible to use projection for fine-grained access to fields.

Now let’s consider projection in terms of pinning, not mutation.

struct SelfRef {
    data: String,
    ptr: *const String, // Points to self.data
}

struct Container {
    self_ref: SelfRef,  // This field must never move
    counter: u32,       // This field can move freely
}

Some simple types.

In this scenario, moving a Container would break it’s self_ref field. And moving self_ref would break Container. So if you create a projection onto self_ref it too needs to be pin’d.

Container at address 0x1000:
├── self_ref at 0x1000 (SelfRef)
│   ├── data at 0x1000 (String)
│   └── ptr at 0x1018 -> points to 0x1000
└── counter at 0x1020 (u32)

The layout of the Container type.

Mutable places and pinned places both follow the rule that projections are only narrowing access. You cannot get a mutable reference to a field from a non-mutable reference to a struct. The same generally applies for pinning (although maybe some weird cases if a parent container doesn’t ever move).

Pin<Box>

A magic combo. Box gives you a stable address which won’t get wiped when stack frames pop. And Pin takes ownership of that box and limits access for safe usage only.

Mental Model

“Being pinned” is best represented as a property of a place, rather than a property of a type, in the same way that being mutable is so represented.

without.boats

I think without.boats brings a lot of clarity to why Pin is hard. Making it backwards compatible and a library feature is tough for the task it is doing. Imagine if references were designed that way. He proposes a future language feature where pinned is some syntax sugar, but it acts much like the mut of today does with places. “I am a reference, but one with a very small interface because I am pinning the data I reference.”.