Extension Traits vs. Newtype

// #Rust

Rust has two patterns which at a high level, kinda accomplish the same goal of “add just a touch to this existing type”. But they have subtle tradeoffs, so it is not always obvious which to roll with for certain scenarios. Let’s dive into extension traits and newtypes using the scenario of u32 as a bitcoin blockheight.

Extension Traits

An extension trait extends an existing type’s functionality. Rust’s orphan rule enforces that either a type or the trait implemented for it must exist in the current crate, so extension traits are for types defined outside of the crate. Since the type lives outside of your crate, you probably can’t change it. But you want to tweak it a bit for your domain. Extension traits are a zero-cost abstraction. Memory and runtime performance is not effected at all.

For the bitcoin blockheight scenario we define the BlockHeightExt trait (the Ext suffix is convention) and implement it for the standard library’s u32.

trait BlockHeightExt {
    fn is_genesis(&self) -> bool;
    fn is_main_net_activation_height(&self) -> bool;
    fn blocks_until_halving(&self) -> u32;
}

impl BlockHeightExt for u32 {
    fn is_genesis(&self) -> bool {
        // Explicit derefence required for comparison with u32 type, no coercion in this case.
        *self == 0
    }
    
    fn is_main_net_activation_height(&self) -> bool {
        *self == 481824
    }
    
    fn blocks_until_halving(&self) -> u32 {
        let next_halving_height = ((*self / 210_000) + 1) * 210_000;
        next_halving_height - *self
    }
}

// Usage
fn process_block(height: u32) {
    // Can use native u32 operations!
    let next_height = height + 1;
    
    // And specialized block height functionality!
    if height.is_genesis() {
        println!("Processing the genesis block!");
    }
}

Extension trait for u32 to add block height functionality.

Newtypes

Newtypes are more like encapsulation than extension. They get to define the whole interface. There is a possible performance overhead if your usage issues a lot of unwrap calls, however, this often get optimized away by the compiler.

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct BlockHeight(u32);

impl BlockHeight {
    fn new(height: u32) -> Self {
        BlockHeight(height)
    }
    
    // Optional: explicit unwrapping method.
    fn into_inner(self) -> u32 {
        self.0
    }
    
    fn is_genesis(&self) -> bool {
        // Field access will automatically dereference `(*self).0` since it is using the dot operator.
        self.0 == 0
    }
    
    fn is_main_net_activation_height(&self) -> bool {
        self.0 == 481824
    }
    
    fn blocks_until_halving(&self) -> u32 {
        let next_halving_height = ((self.0 / 210_000) + 1) * 210_000;
        next_halving_height - self.0
    }
}

// Optional: Implement arithmetic operations to match the extension trait functionality.
impl std::ops::Add<u32> for BlockHeight {
    type Output = BlockHeight;
    
    fn add(self, rhs: u32) -> Self::Output {
        BlockHeight(self.0 + rhs)
    }
}

// Optional: Implement From/Into for convenient and readable conversions.
impl From<BlockHeight> for u32 {
    fn from(height: BlockHeight) -> u32 {
        height.0
    }
}

impl From<u32> for BlockHeight {
    fn from(height: u32) -> BlockHeight {
        BlockHeight(height)
    }
}

// Usage
fn process_block(height: BlockHeight) {
    // Need to use defined methods or unwrap.
    let next_height = height + 1;
    
    if height.is_genesis() {
        println!("Processing the genesis block!");
    }
}

Newtype for u32 to block height.

Subtleties

So performance concerns usually are not the deciding factor between extension traits and newtypes, it is more about API design.

Here is a quirky addition which shows how close these can be, what if you define a type alias of type BlockHeight = u32; along side the extension trait above? Then your usage looks more descriptive like a newtype, process_block(height: BlockHeight). The thing is, a type alias is kinda like a macro which is swapped to the underlying type at compile. No type checking occurs that the caller is using a BlockHeight, it is still just a u32. It is completely transparent to the type system.

And this is why you might want the more restrictive newtype in some scenarios, to favor safety over flexibility. There are many u32 based types in bitcoin land, such as heights, timestamps, and versions. We would never want to mix these together by accident (e.g. add a height to a timestamp). A type alias would document that a function expects a height not a timestamp, but the type system wouldn’t enforce that assumption. However, a newtype would be enforced. Sometimes you want the type checker to double check your own thoughts.

The downside is boilerplate to explicitly transition the inner type to the outer, and vice versa. Rust has a general principle for type conversion to always be explicit, except for a few cases, like auto dereferencing with the dot operator in the BlockHeight#is_genesis method above. Things like operators (+, -), function arguments, variable assignments, return values, and pattern matching need explicit conversions.

The wheels can be greased a bit though. The explicit type conversion cannot be hidden, but it can be made as convenient and readable as possible. This is generally done by implementing the From trait for both the inner and newtype, like From<BlockHeight> for u32 and From<u32> for BlockHeight above. Here are some of the benefits we get from the From’s, with the hook into generics being especially powerful.

// Froms not implemented.
let height = BlockHeight(42);

// Froms implemented.
let height2 = BlockHeight::from(42);   // Explicit when clarity helps.
let height3: BlockHeight = 42.into();  // Concise with type inference.

Creation calls, choose the most readable option for each context.

// No Froms implemented.
let inner = height.0;             // Public field.

// Froms implemneted.
let inner2 = u32::from(height);   // Explicit conversion.
let inner3: u32 = height.into();  // Concise with type inference.

Inner extraction, choose the most readable option for each context.

// Froms not implemented.
fn process_height(height: BlockHeight) {
    // ...
}
process_height(BlockHeight(42));

// Froms implemented.
fn process_height<H: Into<BlockHeight>>(height: H) {
    let height = height.into();
    // ...
}
process_height(BlockHeight(42));  // Pass newtype directly.
process_height(42);               // Or pass raw u32.

Generic function parameters, work with either the newtype or the inner type, making them more usable while maintaining safety.

The From conversion is infallible and consumes the input value. One can use AsRef to instead just get a reference to the inner type. But it serves a similar purpose, to allow functions to be generic over “anything which knows how to give a reference to this type”. But it might not be the best idea to implement this on a newtype.

impl AsRef<u32> for BlockHeight {
    fn as_ref(&self) -> &u32 {
        &self.0
    }
}

Allows BlockHeight to be passed to a function which is generic over types which implement AsRef<u32>…which is a silly type given that u32s are Copy.

Given that BlockHeight’s inner type is Copy, this AsRef probably won’t be used much. A function rarely would take a reference to a Copy type, it would just keep things simple and take a Copy. AsRef is seen a lot with strings where it makes sense that a handful of types can all have the same “view” of a string reference.

The next level up from AsRef is a Deref implementation. But it is a whole different beast, which probably shouldn’t be used in this scenario. This one can actually backfire and lose the power of the type checker. Deref is meant for cases opposite the newtypes, where the outer type is a metadata wrapper (e.g. smart pointers) to the type you care about.

impl Deref for BlockHeight {
    type Target = u32;
    
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

let height = BlockHeight(42);

let value: u32 = *height;       // Explicitly dereference to get 42.
takes_u32_ref(&height);         // &BlockHeight coerced to &u32 automatically.
let is_even = height.is_even(); // Calls u32::is_even() without explicit conversion

// Here is where it goes sideways.

let height = BlockHeight(42);
let block_count = BlockCount(42);   // *Another* newtype type wrapping u32.
if *height == *block_count { ... }  // With Deref, this comparison could silently work, defeating the purpose. This now compiles, but is semantically wrong!

Deref definitely brings some more magic to the table.