Rust

// Ultra-Violence

They would take their software out and race it in the black desert of the electronic night.

Productivity

Rust is a modern language which packages a closely coupled build tool, cargo, with its compiler rustc. While cargo is not completely necessary to build in rust, it sure makes it easier. Cargo does not manage the version of the rust compiler used in a project, it inherits whatever is available in the environment. Cargo, rustc, and the standard library are versioned together and expected to be used in lockstep though. A package manager, or the rustup tool, can be used to keep things lined up.

To further complicate possibilities, the rust ecosystem has three development channels stable, beta, and nightly. Stable and beta are released every six weeks with beta becoming the new stable. Nightly is special since it is released every night, but also it is where new features are tested out. Possibly for a really long time, since hitting beta would be a train to stable.

A stable version of rust has a static tag like 1.68.2. But what about beta and nightly releases? These can be referenced by date beta-YYYY-MM-DD, nightly-YYYY-MM-DD. There isn’t a direct way to trace the lineage of a stable version across the channels, but you can install based on the git commit hash of the rust repository.

toolchain mgmt

The rustup executable is the ecosystem’s way to manage the rustc compiler version along with the toolchain and standard library on a machine. It attempts to fix the lockstep problem.

Toolchains installed via rustup are located in the home directory (~/.rustup/ and ~/.cargo/), separate from a nix store or OS package manager. rustup manages link proxies in ~/.cargo/bin/rustc and tosses the proxies at the front of the PATH.

Toolchain override shorthand looks like cargo +beta (nifty plus instead of the conventional -). You see this shorthand on the cargo tool, but that is actually being intercepted by rustup. This flexes rustup’s greatest power, quick bouncing between toolchains. Cargo must be managed by rustup to use this feature, since rustup intercepts the call.

  • rustup update // update all toolchains.
  • rustup show // show installed toolchains.
  • rustup install $TOOLCHAIN // install toolchain on system.
  • rustup default $TOOLCHAIN // set default toolchain on system.
  • rustup run <TOOLCHAIN> cargo ... // run with a certain toolchain, will respect rust-toolchain.toml.
  • rustup override set $TOOLCHAIN // set directory override.

You can manually disable rustup in a project with rustup override set none. This allows the system’s rust version to be used instead of the symlink’d rustup ones. Undo with rustup override unset. There can only be one override per-directory.

rustup has three profiles for the set of tools it installs, minimal, default, and complete. Usually want default plus rust-analyzer since complete is everything under the sun and breaks more often.

A rust-toolchain.toml file is a TOML configuration file placed in your project’s root directory that tells rustup which toolchain to use for that project. It changes rustup’s behavior, but directory overrides take precedence over rust-toolchain.toml. rust-toolchain.toml complements Cargo.toml’s rust-version field. rust-toolchain.toml focuses on the developer environment whereas rust-version is for dependency management. To be honest, I am not sure how valuable it is to lock to one version of rust. Even if you have an MSRV you generally want to develop on a new channel for the better tools, and rust-toolchain.toml doesn’t expose custom toolchain options.

[toolchain]
channel = "1.68.0"

rust-toolchain.toml

Usually you are building code for your environment (e.g. x86_64). But you might want to test a few more. Rust supports building different targets, a combo of CPU architecture, vendor, OS, and ABI (THE Application Binary Interface). <arch>-<vendor>-<os>-<abi> but there isn’t a ton of consistency. Like x86_64-unknown-linux-gnu: 64-bit Linux using GNU ABI (common for most Linux distributions). To cross-compile for a different platform than your own, you grab a new target with rustup and then…target it.

Another option for toolchain management is nix. This keeps things completely sandboxed from the system. It is also possible to handcraft a toolchain using certain versions for each tool, not something which is easy to do with rustup. A downside is you lose the quick-switch, per-command, ability of rustup. There isn’t a great way to get a “best of both worlds” either, since the quick-switch is only available if rustup manages the toolchain. Maybe the best case is a shell hook in the nix development shell which installs the version in rust-toolchain.toml by default. You can just run rustup show since it installs and then shows the status. Rustup doesn’t make it very easy to mix and match components from different channels (e.g. I want the 1.74 toolchain, but with the nightly rust-analyzer component). This is way easier to manage in nix if required. Should also be noted that the rustup in nixos is wrapped to help it integrate better with the environment.

lsp

The rust-analyzer component of the toolchain is the flagship Rust LSP. LSP’s often pick up on where to find the executable based on the RUSTUP_TOOLCHAIN environment variable. This might be the best way to separate which toolchain an IDE uses vs. the MSRV of a project. The state of the LSP when using Helix can be checked with hx --health rust.

project structure

Rust has its unique words and conventions to split up a project.

  • A path names an item such as a struct, function, or module.
  • A module controls scope and privacy of paths, with things private by default. Similar to a filesystem, but lives alongside it.
  • A crate is a tree of modules which produces a library or executable. The smallest amount of code that the compiler will consider at a time. In a “hello world” single file example, the file is the crate.
    • Binary crates must have a main function. Comparable to Golang cmd/ programs.
    • The crate root is the source file which the compiler starts from and is actually a module called crate.
  • A package can contain multiple binary crates and optionally one library crate. Is a cargo level feature, so focuses on building, testing, and publishing code.
    • Cargo.toml is at the package root.
    • cargo follows conventions that src/main.rs is a binary crate with the same name as the package. And src/lib.rs is a library.
    • Other binary crates should be placed in src/bin/.
  • A workspace shares dependencies across a group of packages.

Seems sensible to start any crate with a src/main.rs and src/lib.rs to take advantage of the standard package/crate conventions. Splitting of the lib right away creates a nice interface for tests.

Module-fying code is where things get a bit more complex. This is where some encapsulation choices are being made. Modules need to first be declared before they can then be depended on by other modules in a crate. A module is declared with mod #MODULE and that declaration needs to be picked up by a file that the compiler will look at, so the first natural spot to put these would be in one of the roots (main or lib). The compiler will look in three spots for the module definition once it runs into a mod:

  1. Inline, the current file.
  2. ./$MODULE.rs (new) (child modules still go in ./$MODULE/)
  3. ./$MODULE/mod.rs (original)

The module system is like a filesystem, but it is living alongside the filesystem. This means a module is declared with mod garden then defined either in the same file, in ./garden.rs, or ./garden/mod.rs. If the garden module wants its own child modules, they are declared in the garden implementation, like mod vegatable, and the child modules are defined in one of the three possible spots. Style #2 was introduced since it debatably more searchable than a bunch of mod.rs files, even though it is kinda weird since you might assume a module is contained in a directory like other language hierarchies. It is mostly just a bummer that there are two styles now.

visibility

Rust has a lot of knobs when it comes to visibility…maybe too many.

Basic principles.

  • Items (functions, structs, etc.) are private by default.
  • Private items are only accessible within the current module and descendant modules.
  • pub keyword makes items accessible.

A new module is a new level in the namespace hierarchy. Every file is at least one new modules, not like Golang where they are simply concatenated together. Modules have separate visibility rules than items.

Module hierarchy principles (watch out, asymmetric!).

  • A child module can access its parent module’s private items.
  • A child module can access all of its ancestor modules, not just its immediate parent.
  • A parent module cannot access its child module’s private items.
  • A parent module has access to its direct child modules even if they are private. This is important for re-exporting.

The crate root module (src/lib.rs or src/main.rs) has some special properties.

  • crate:: refers to this root.
  • As an ancestor to all modules in a crate, all modules can access the root module’s private items.
    • This is equivalent as an item marked as pub(crate) in any other module.
  • Items marked pub are accessible to external crates. It can be used to define the external API.
    • If a child module of the root is public, its public items are part of the external API. There needs to be a pub chain from the root though if deeply nested.
    • Can keep child modules private and instead rely on re-exports in the root module to have fine grained control. A table of contents.

Re-exporting is a special operation. It leverages the special relationship between parent and child modules.

  1. The parent controls whether the child module exists at all.
  2. The parent controls whether external code can access the child module (by making it pub or not).
  3. But the child controls what the parent can see inside it.

I thought there might be a bit of a loophole. If all modules in a crate have access to a the root module’s private items, do they have access to all direct child modules of the root module? Turns out, no, not the case. Access to a module and access to its items are different things. Seems like a good practice could be to create a visibility firewall where root module child modules are private, but further descendants are public.

Re-exporting creates a new path to the item or module. But the export must be made in a module that has visibility to the item or module.

The root module can control the external API, but what about shared internal items which you may not want to be external? I believe there are four options.

  1. Make the item pub and ensure its module and its ancestors up the root are pub. This allows sibling modules anywhere in the crate to reach over and access it. However, it also exposes it in the external API since there is a pub chain to the root module.
  2. Define the item in the root module. Since the root module is an ancestor to every module, they can all access it. The item doesn’t need to be pub so it is hidden from the external API. But this can make the root module a bit of a dumping ground.
  3. Similar to #2, but re-export the item in the root module. Requires that the root module has visibility to the item, but allows there to be a “visibility firewall” private child module directly from the root to block visibility by default.
  4. If those first options sound kinda weak, well, turns out the rust guys added a feature just for this. You can mark the item as pub(crate). There no longer needs to be a pub chain, any module in the crate can now access it and it remains out of the external API.

Tough to compare these since it probably also depends on project says, but I still have my take. Option #1 is obviously very limited since it puts the items in the public API, so not a go to solution. #2 is simple, and probably works fine in small projects, but feels like the most likely to get abused. The root module becomes a dumping ground and items might not have clear owners, resulting in a bunch of random logic being attached. #3 is most inline with the initial vision of “child controls interface, parent controls visibility”. However, this does have the catch that all ancestors need to ensure a child is exposed, bit of boilerplate. Which is what pub(crate) addresses by giving some power back to the child module to declare its visibility intentions. Is this a good thing though? When a module sets its public interface, should it care if the calling modules are in the current crate or not? I think ideally it doesn’t care at all and focuses on making the interface as clear and concise as possible. If you are having to turn to options #2 or #4 a bunch, maybe its time to revisit your module interfaces.

While on the topic, there’s also pub(super) (visible to just parent module) and pub(in path::to::module) (visible to a specific module and its descendants), which provide even more fine-grained visibility control to a module.

testing

The Rust test tooling makes a distinction between “unit” tests and “integration” (these go by different names else where).

  • Unit // Tests are defined in an inline child module of the module they are testing. They have deep access including private methods of the module under test. They are supposed to be fast and small.
  • Integration // Tests live in a tests/ directory which is a sibling directory to src/. The test the external API of the whole crate.

crate busting

In Rust, a crate is the smallest compilation unit. Kind of like a file in C, but Rust adds in all its type goodness across files, so the unit is “raised” to a crate. Dependencies are compiled before a local crate in order to get all metadata on a dependencies and code generated in a single pass. Cross-crate optimizations are still possible, so there likely isn’t a massive performance hit busting code up into multiple crates. If Link-time optimizations (LTO) are enabled there are very few trade-offs other than build time. LTO essentially moves the optimization to the very end instead of just after a single crate.

The cargo tool has a ton of built-in features to optimize the hell out of a workspace and its packages.

Adding [workspace] to a Cargo.toml declares that level as a workspace, a holder of packages, instead of just a package itself (but it can still have a root package). A workspace has a top-level cargo.lock, so dependency versions are shared by all packages in the workspace. Dependencies can only be used in packages that explicitly call for them though (thank god). Packages (and their crates) can be broken up to keep scope and dependencies as small as possible for consumers. No need to depend on one monolith crate that pulls in the world. But the maintainer can still get the benefits of a monorepo (simpler dependency tree, assurance that the package’s will at least work with each other).

Internal packages need an explicit dependency on any other internal packages they depend on. This can be with a path dependency or version. Crates that use dependencies specified with only a path cannot be published! But path and version can be combo’d where the path is only used locally.

[patch]’s can be supplied at the root of a workspace to override things for all internal packages. This is another way to supply a local path along with a version for intra-dependencies.

TOML is used in Cargo files. The sections (e.g. [patch]) are TOML tables. Dots . in the title keys give namespaces. Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created.

[patch.crates-io.bitcoin-units]
path = "units"

[patch.crates-io]
// Inline table with curly braces.
bitcoin-units = { path = "units" }  

Equivalent in TOML.

lock files and msrv

Rust has an interesting history around dependency lock files. Especially coming from a ecosystem like Golang’s which was fairly static. Golang’s approach prioritizes simplicity and security. In a go project, the dependency tree is resolved to a “flat” list. One version of each dependency is selected and used everywhere. This sidesteps the diamond dependency issues. The resolver follows “minimal version selection” (MVS) to find a version. This is very conservative, a dependency will only update if everything in the tree agrees it is time.

Rust takes a less conservative approach than Golang and uses the maximum version of a dependency in a tree. It still only uses one version, it doesn’t take an NPM strategy where multiple versions can be flying around in a project exposing it to the diamond dependency problem. Although cargo does have a --minimal-versions flag which does what you would think, flip maximum versions to instead minimum.

The original Rust guidance was to checkin a lock file for applications (ends of the dependency chain), but not libraries. The Rust dependency resolver doesn’t take into account dependency lock files, only the rules in a dependency’s Cargo.toml. So the lock file in a library is only for development. But there was a change in Guidance in 2023. Locking a library’s dependencies means only that set of dependencies (including the transitive ones) is tested by the developers, even though the consumers will likely use different versions. Not locking dependencies is a more break-fast, rolling approach where new dependencies are quickly inherited. When Rust was first bootstrapping, the rolling approach made sense since it was relatively safe to assume all users were updating dependencies and Rust itself often as APIs were updated and bug fixes rolled out.

As the Rust ecosystem has grown and matured, it has trended to a more conservative approach for dependency management. All users in the ecosystem are starting to value stability as standard APIs settle. Library developers can no longer assume everyone is running the bleeding edge Rust version and need to do some work to make sure their code still works for the old stuff. Rust tooling, like cargo, is slowly offering more support for this conservative approach.

There was the introduction of a Minimum Supported Rust Version setting for crates, which is the oldest rust compiler version a crate is targeting. It limits language features, syntax compatibility, and standard library APIs. This is set with the rust-version field in Cargo.toml introduced in Rust 1.56.0. This was initially just informational, but the third version of Cargo’s resolver (in 1.74.0) now takes it into account.

Before the v3 resolver, maintaining an MSRV in a project’s dependencies was more involved. A maintainer needs to run their project against the MSRV and iterate on every failure. For direct dependencies, versions can be pinned in Cargo.toml. For transitive dependencies, you can use a cargo update command to set a precise version in the lock file. Updating the lock file is less intrusive than a heavy handed Cargo.toml rules approach since transitive dependencies should generally not be mentioned in Cargo.toml. Hand modifying the lock file though goes against other cargo commands which are going to try and update versions following the maximum allowed version philosophy. So you are swimming upstream and might need some extra tooling to re-modify the versions on every update. In any case, no longer with the v3 resolver!

MSRV sounds a little like an edition huh?

  • Edition changes: Breaking changes, fundamental shifts, and major language evolutions that require opt-in. More about telling the compiler how to interpret your code.
  • MSRV-gated changes: Backwards-compatible features, incremental improvements, and standard library additions.

You can use an old addition, e.g. 2015, with a new compiler to get the incremental improvements without the breaking changes. Using a newer edition in your library, or depending on a library that uses a newer edition, does raise the theoretical floor of the Minimum Supported Rust Version (MSRV) that your library can offer.

- Edition 2015: Rust 1.0
- Edition 2018: Rust 1.31
- Edition 2021: Rust 1.56
- Edition 2024: Rust 1.85

Editions and their MSRV.

But back to libraries and lock files. Lock files show some history of “at this point, these versions worked” which is discarded if you don’t checkin the file. Some maintainers have adopted an approach to checkin multiple lock files. A “minimal” version which holds the dependency versions for the project’s MSRV, and a “recent” version which holds the most recent versions of its dependencies. While this is not exhaustive of all the possible sets of valid dependency versions, it does give a lot of confidence that many will work if these two extremes work.

dependencies

While cargo attempts to simplify the dependency tree of an application, only including one version of a dependency as long as it fits all requirements. But there are common scenarios where this fails and cargo falls back to keeping two or more versoins of a dependency in an app.

* If crate A requires crate C 1.0
* And crate B requires crate C 2.0
* And your project needs both A and B

The classic diamond dependency problem.

Assuming semver constraints, crate C version 1.0 and 2.0 are incompatable, the major version change signals a large break. So if cargo chooses just one or the other, part of the app will probably break. cargo could just throw up its hands and panic, pushing responsibilty the crate maintainer. The maintainer would have to check if they could get away with just using 1.0 or 2.0, but assuming they can’t, they then need to decide if they can upgrade crate A or downgrade crate B. While upgrading crate A is probably the best thing in the very long run, all of these paths are a lot of work.

In such cases, cargo decides to favor a bit of practicality and puts both versions of the crate C in the app. But the compiler is still going to help protect against types from 1.0 being used in 2.0 spots and vice versa. It treats the types from each crate as completley different. This may sound a bit heavy handed, but the alternative is to expose the app to crazy runtime failures where slight tweaks in the types don’t explode until they get passed to different parts of the app which assume different types.

A maintainer can use the SemVer Trick to bear the burden of type versions instead of pushing it on the consumer. The maintainer releases a 1.0 minor release (e.g. 1.1) which depends on the new 2.0 version of itself. The rust compiler still treats the types between the versions as different, but now the maintainer can stich together inconsistencies to make the transtition easy for consumers. This pattern can’t solve all type issues, but it is generally good enough to use for transistions.

publishing

  • Registries hold onto actual code artifacts, not just a pointer to a code repository.
    • A .crate file is a big compressed snapshot of the repository.
    • Once published, a create is independent from the original repository.
    • Authentication with a registry is delegated to system providers.
  • crates.io is the default registry in the ecosystem.
    • Authentication can be done with Github for identity.
    • cargo login ... for credentials and cargo publish... to push.
    • Namespace conflicts on crates.io are primarily avoided through a first-come, first-served naming system.
    • Once a name is taken, it cannot be used by another crate unless the original is transferred or removed.
    • Crate Name Conventions
      • Case insensitive.
      • Must begin with alpha character.
      • Common to have top-level directory name match crate name, but not a rule.
      • Crate names use kebab-case and directories follow this, while rust internals generally use snake_case. These are just conventions, no enforced rules.
    • Ownership
      • Can transfer published crate ownership among crate.io users.
      • You cannot update metadata of published crates, but you can for new versions.

Ownership

Rust has a unique memory management model. The main purpose of ownership is to manage heap (not stack) data. Every value in Rust has an owner and only one owner. When the owner goes out of scope the value can be dropped. It is a little more explicit than languages that make use of a garbage collection runtime and less explicit than direct memory management. But it is at a spot where the compiler can statically analyze code and help us out a bunch.

Data types of known size can be kept on the stack and their scope is easy for the compiler to determine (function calls). They can also be easily copied to other scopes.

Fun fact, variables are immutable by default, let x = 5 versus let mut x = 5.

drop and move are the main memory management tools used by the Rust runtime under the hood. drop is called automatically on a value when it goes out of scope to free the memory. A move is a transfer of ownership.

Passing a variable to a function will move it (transfer ownership). A return value is the final expression in a block or can be returned early with return, this also moves ownership. In Rust lingo, ownership transfer is sometimes referred to as consuming or into.

Data stored on the stack can implement a Copy trait to avoid moves. Instead, the data is copied on the stack which is relatively low cost.

An option to avoid moving ownership is to clone the value, which clones the data on the heap. Obviously has runtime costs.

References are like pointers, but with more memory guarantees and provide some flexibility. Functions can refer to a value without taking ownership, but caller and callee have to agree on that. Creating a reference is called borrowing. You can’t modify something which is borrowed! The rust compiler is able to detect “dangling” references where a reference uses memory that is going to be dropped. A reference is created with &.

A reference can be mutable with mut. Sometimes you want to give something the ability to write without giving up ownership.

mutability and references and bindings

The mut keyword is used in two distinct spots (which I didn’t realize at first): mutable bindings and mutable references.

y: &i32          // Can't change anything.
mut y: &i32      // Can point y to new memory location, but can't update the contents of the memory.
y: &mut i32      // Can update the memory contents, but not where y is pointed at.
mut y: &mut i32  // Can update both y and memory contents.

Mutability with bindings and references.

The mut is on the binding not the type, but I find that confusing since you can still specify mutable references in generics. I guess lifetimes are specified in generics too, so it is more than just type information. It is any information to help the compiler. It also doesn’t make much sense to specify a “mutable value” in a generic, since it is not a binding to be updated. The situation I have in mind is collections like Vec<T>, where the collection owns the values (like a struct).

Helpful to remember that only places are mutable in Rust, not values.

There are some more complexities in matching (“match ergonomics”) where a binding is occurring.

box

A Box type is a fat pointer. It is still a statically sized stack allocation, but contains more metadata on the thing it is pointing too.

A box owns its data whereas a reference just borrows.

lifetimes

Lifetimes are more information used by the compiler to determine if borrows are valid. A variable’s lifetime begins when it is created and ends when it is destroyed, but sometimes the compiler needs some help determining when that is exactly.

fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) {
    println!("x is {} and y is {}", x, y);
}   

Showing off the lifetime ' syntax in the generics.`

The lifetime annotation can connect an input value to an output value of a function, so the compiler knows the input needs to be around as long as the output. Because these can get verbose fast, Rust has elision rules where it auto-adds them for common scenarios.

cells and reference counters

There are some tools to push the borrow checker rules from compile time to runtime. There is an obvious cost in performance, but sometimes it is necessary for the task. The easiest to think of is a graph of connected nodes built off of user input. The nodes need references to each other, but their lifetimes are not known until runtime.

Rc, and its concurrent counterpart Arc, allow for “shared ownership”. Instead of following the borrow checker’s “one owner” rule, Rc reference counts are checked at runtime. Every clone ups the tracked reference count instead of deep-copying like usual. The instance isn’t dropped until the reference count goes to 0, multiple owners.

Cell and RefCell (Mutex for concurrency) marks an instance as mutable without requiring a mutable reference. This is often described as interior mutability. The borrow checker for write-ability are moved to runtime.

Types

Rust is statically typed and has all sorts of bells and whistles.

tuple

let (int_param, bool_param) = pair;

let can be used to bind the members of a tuple to variables, an unboxing.

numbers

The usize primitive size is architecture dependent, either 64 or 32 bits. Seems a little like go’s int type, used a lot for indexes of collections. Probably has to with how an index is related to a pointer and that is related to the architecture…but I find it confusing!

strings and characters

The Rust char type is four bytes.

The str type is a string slice. A slice is a kind of reference, so it doesn’t have ownership. String literals (e.g. let hello_world = "Hello, World!";) are string slices. They are usually type &str. Since they are references the compiler will help us out.

String type is a dynamic heap type.

arrays and slices

Reference a contiguous sequence of elements in a collection instead of the whole thing. Slices don’t own their contents, just special references.

let slice = &a[1..3];

The range syntax makes it easy to get new views on a slice or an array

A slice is a “fat-pointer”, so a pointer and the size of its contents, and it doesn’t know that size at compile time.

  • [T; n] – An array of type T elements with length n, the length is known at compile time and is part of the type.
  • &[T; n] – A reference to an array of length n, a “thin” pointer.
  • [T] – A slice of type T elements, the length is not known at compile time. This form is not commonly coded with directly.
  • &[T] – A slice of sized type T. A “fat-pointer”.

vector

Dynamic sized collections stored on the heap.

iterators

Some iterators consume the collection (into_iter) while other just reference it (iter).

orphan rule

The orphan rule is a restriction in the rust type system. iIf a trait is implmented for a type, either the type or the trait must be defined in the crate. This ensures that trait implementations are unique. Without it, two crates could implement the same trait for the same type and it wouldn’t be clear at that point which to use.

newtypes and type aliases

The newtype pattern.

Type aliases are effectively just documentation, the compiler isn’t checking beyond the underlying type. The two types are synonyms. Generally used to just alias complicated types.

type Point = (u8, u8)    

The type keyword declares a new type, in this case a type synonym.

associated types

The type keyword shows up again! It helps define associated types. So it is only going to show up within traits and trait impl blocks.

“When a trait has a generic parameter, it can be implemented for a type multiple times, changing the concrete types of the generic type parameters each time”.

A trait with a generic can be implemented multiple times on a type. This might not be what you want all the time. A trait with an associated type doesn’t know the type ahead of time, but it can only be implemented once on a type. The trait implementation chooses its associated type.

pub trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

The Iterator trait has the associated type Item which is set by types which implement it.

Trait functions can still have type “placeholders” like if the trait was generic, but now the type is set when the trait is implemented vs. used. This is also how only one version of the trait can be applied to type.

The benefits to setting a type “upstream” is easier usage downstream. There are some scenarios where the caller is only abstracting over the outer, or parent, type and doesn’t really care about the inner types. If the outer type is generic though, all the inner type information needs to be passed along. There is the wildcard _, but that tells the compiler to infer a type and that needs to be deterministic.

If you only want a trait to be set once on a type no matter the trait’s inner type, that is a sign to use an associated type over a generic trait. Kinda shifts freedoms and burdens from caller to implementor.

type conversion and coercion

The standard library has a std::convert module (looks like it’s also in core) which holds a handful of traits a type can implement. These are “user-defined” type conversions.

The explicit cast keyword as is pretty limited to pairs of types. Some coercion is done by the compiler and doesn’t require an as, mostly reference and pointer stuff.

There is also some automatic type conversions, coercion, involved as well. Dereferencing is one of the big ones, referred to as deref coercion.

f.foo();
(&f).foo();
(&&f).foo();
(&&&&&&&&f).foo();

All these method calls are the same since they will be deref’d to the implementation.

enumerations

The built-in Option has some patterns around it.

unwrap is a shortcut method where if the result is OK variant, it returns that, else it panics. Can use expect to supply your own message, but same concept. Should probably use expect over unwrap though. Messages can be lowercase, no punctuation.

traits

Traits are Rust’s interface abstraction.

How can code consume a trait instead of the implementation (“accept interfaces, produce structs” a.k.a. polymorphism)? There are two ways with semantic differences, but each has a lot of different syntax.

First up, leverage generics at compile time. This is known as a trait bound, like bounding the possibilities of a type.

pub fn notify<T: Summary>(item: &T) {
    println!("Breaking news! {}", item.summarize());
}

// Specify multiple bounds.
pub fn notify<T: Summary + Display>(item: &T) {
    ...
}

The generic type T must implement the Summary trait.

You can shift these definitions into a where clause if they get unwieldy.

fn some_function<T, U>(t: &T, u: &U) -> i32
where
    T: Display + Clone,
    U: Clone + Debug,
{
    ...
}

where clause for long definitions.

There is also the impl syntax sugar for simple trait bounds. I dunno why this is necessary to be honest, kinda wish there was just one way to list a bound and we dealt with it!

pub fn notify(item: &impl Summary) {
    println!("Breaking news! {}", item.summarize());
}  

Syntax sugar for a trait bound.

trait objects

Ok, finally, the second type known as trait objects. A trait object is more runtime-y than the generic bounds. Another description is bounds are static while objects are dynamic (implementation is called “dynamic dispatch”).

// Box for ownership.
fn print(a: Box<dyn Printable>) {
    println!("{}", a.stringify());
} 

// Reference to borrow.
fn print(a: &dyn Printable>) {
    println!("{}", a.stringify());
}

A trait object declared with the dyn keyword.

A trait object can be used with any smart pointer (e.g. Box, Arc, etc.) or reference, but it must be a pointer since the concrete size is not known at compile time.

Trait bounds must hold a homogeneous type, but trait objects do not have this limitation. However, not all types can be trait objects.

Under the hood, bounds are implemented by the compiler generating a bunch of versions of a function. Objects require a lookup table (vtable) that is used at runtime to find what to execute.

bounds

The bounding aspect of traits is definitely my favorite quality, giving as much information to the compiler as possible. There are multiple spots to use it, some less obvious than others. The system bounds exclusively by traits, not concrete types. This limits it to what a type can do and not what it is.

function arguments

The most obvious use case is bounding the type consumed by a function. This allows the function to only care about a specific interface of the argument, and not anything else it happens to implement.

conditional implementations

Methods can be conditionally implemented on a generic type depending on trait bounds.

impl<T: Display + PartialOrd> Pair<T> {
  ...
}    

The impl block only applies to Pairs who’s inner type implements Display and PartialOrd.

A trait is a set of shared behavior for multiple types, but we have the flexibility to only implement a method based on the inner type. An inner type can be moved to an associated type, but we still have the flexibility to conditionally implement, like a bound on a default method of a trait. Looks like there might be nicer syntax for this “soon”, but in any case, the RFC gives a nice breakdown of all syntax.

blanket implementations

Conditional implementations can be inverted as well and a trait can be blanket implemented for any type restricted by a bound.

impl<T: Display> ToString for T {
  ...
}    

The ToString trait is defined for any type T which implements Display.

marker traits

A marker trait tells you something about a type, but without adding any new methods or features. Some of these are auto-added by the compiler. All auto traits are marker traits but not all marker traits are auto traits.

The Sized trait is an auto-applied marker trait that lets everyone know a type’s size is known at compile time. A type without Sized is a dynamically sized type (DST). Slices and trait objects are DSTs, but for slices, usually dealing with a sized fat-pointer.

All generic type parameters are auto-bound with Sized by default.

extension traits

Incorporate additional methods on a type outside the crate it is defined in. This fits in Rust’s orphan rule.

sealed traits

A sealed trait can only be implemented in the crate it is defined. This makes it less flexible, but easier for the maintainer to not break downstream code. A small contract with the caller. This isn’t a first class language thing, you have to get tricky with public and private trait combining.

error handling

Composability is the name of the game when it comes to ergonomic error handling in Rust. The Result and Option enums are at the heart of tying all the conventions together.

The unwrap() and expect() methods on Result and Option are useful when hacking, but are not in the composable family since the panic.

visibility

A struct being made public with pub does not mean any of its fields are made public too. One big exception is enum variants in a pub enum are also public by default.

Macros

When, how, why?

compile-time checking

Compile-time evaluation happens in certain contexts.

  1. Inside const and static definitions.
  2. Inside const generics parameters.
  3. In some attribute macros.

A const function can be called at compile-time, but won’t necessarily be run at compile-time if called from a normal function.

const _: () = match validate_hash_compile_time($s) {
    Ok(()) => (),
    Err(e) => panic!("{}", e),
};

Force compile-time evaluation with a const context through an unnamed const.

This is a bit verbose, so you might consider putting it in a normal function like so.

fn validate__hash(hash_str: &str) -> () {
    // Try to validate at compile time
    const _: () = match validate_hash_compile_time(hash_str) { // ERROR!
        Ok(()) => (),
        Err(e) => panic!("{}", e),
    };
}

let hash = validate_and_parse_hash("000000000000000012ea0ca9579299ec120e3f57e7c309216884872592b29970");

This does not compile!

The literal becomes a runtime parameter value, not in the const context. And this is where a macro can help out. A macro captures expressions and copy pastes them for you, so that they end up in const contexts.

// The macro definition
macro_rules! check_hash {
    ($s:expr) => {{
        const _: () = match validate_hash_compile_time($s) {
            Ok(()) => (),
            Err(e) => panic!("{}", e),
        };
    }};
}

// Using the macro in code
fn use_hash() {
    let hash = check_hash!("000000000000000012ea0ca9579299ec120e3f57e7c309216884872592b29970");
}

Use a macro to capture the static literal and copy it into a const context.

Concurrency

The borrow checker can help out a bunch to make programs safe for concurrency. Similar to the Golang channel strategy of “only one owner” of a message at a time, the borrow checker is clearly a more general ownership checker. For the common data races (corrupted data) and deadlock (stuck program) issues, the borrow checker guards against data races very well. Deadlocks are still very possible though. Best to follow the Golang mantra, “Do not communicate by sharing memory; instead, share memory by communicating”. In Rust, that means leveraging ownership instead of locks.

pin

Pinning keeps a pointer “pinned” in memory. It is used a lot in async-land because futures are self-referential.