Rust

// Ultra-Violence

They would take their software out and race it in the black desert of the electronic night.

Productivity

Patterns to write better code.

system with rustup

The rustup executable is the standard way to manage the Rust version and toolchain on a local machine. I have had the best luck just using it directly versus using distribution package managers to install and manage the tool chain and/or rustup itself.

  • rustup update to pull in the new changes.
  • rustup show prints out the local toolchains installed.
  • rustup default $TOOLCHAIN to update the system default.
  • Toolchain override shorthand looks like cargo +beta (nifty plus instead of the conventional -).
  • Overrides can also be set per-directory, the mappings are stored in rustup’s storage.
  • rustup has a funny profiles concept: minimal, default, and complete. Usually want default plus rust-analyzer since complete is everything under the sun and breaks all the time (apparently).

lsp

The rust-analyzer component of the toolchain is the flagship Rust LSP. I haven’t found a need to modify it much. I believe the rust-toolchain.toml file is used to specify the version, including things like the formatter version.

formatting

  • rustfmt.toml should set settings for a project, a version can be defined here for just formatting too.

bootstrap a crate

cargo new guessing_game

The cargo tool sets you up pretty good.

  • The dependencies and project settings are established in Cargo.toml
  • The source code always goes in src/
cargo run 

Build and execute the program.

  • The --manifest-path flag can be used to point it at a different directory if called from a script.
  • The --quite flag removes all the build jargon.
[dependencies]
num-bigint = "0.4.4" 

Cargo.toml dependencies section.

Add a dependency by manually updating Cargo.toml and then running cargo build to update lock file.

project structure

Rust has its unique words and conventions to split up a project.

  • A path names an item such as a struct, function, or module.
  • A module controls scope and privacy of paths, with things private by default. Similar to a file system, but lives alongside it.
  • A crate is a tree of modules which produces a library or executable. The smallest amount of code that the compiler will consider at a time. In a “hello world” single file example, the file is the crate.
    • Binary crates must have a main function. Comparable to golang cmd/ programs.
    • The crate root is the source file which the compiler starts from and is actually a module called crate.
  • A package can contain multiple binary crates and optionally one library crate. Is a cargo level feature, so focuses on building, testing, and publishing code.
    • Cargo.toml is at the package root.
    • cargo follows conventions that src/main.rs is a binary crate with the same name as the package. And src/lib.rs is a library.
    • Other binary crates should be placed in src/bin/.
  • A workspace shares dependencies across a group of packages.

Seems sensible to start any crate with a src/main.rs and src/lib.rs to take advantage of the standard package/crate conventions. Splitting of the lib right away creates a nice interface for tests.

Module-fying code is where things get a bit more complex. This is where some encapsulation choices are being made. Modules need to first be declared before they can then be depended on by other modules in a crate. A module is declared with mod #MODULE and that declaration needs to be picked up by a file that the compiler will look at, so the first natural spot to put these would be in one of the roots (main or lib). The compiler will look in three spots for the module definition once it runs into a mod:

  1. Inline, the current file.
  2. ./$MODULE.rs (new)
  3. ./$MODULE/mod.rs (original)

The module system is like a file system, but it is living alongside the file system. This means a module is declared with mod garden then defined either in the same file, in ./garden.rs, or ./garden/mod.rs. If the garden module wants its own submodules, they are declared in the garden implementation, like mod vegatable, and the submodules are defined in one of the three possible spots. Style #2 was introduced since it debatably more searchable than a bunch of mod.rs files, even though it is kinda weird since you might assume a module is contained in a directory like other language hierarchies. It is mostly just a bummer that there are two styles now.

A new module is a new level in the namespace hierarchy. First though a bit of inversion, a child module has access to a parent module’s private components, but parents don’t have that with children. But if the child module is private, it itself can still be accessed by the parent since it is like a private component of the parent. Modules have visibility rules similar to structs and functions. Where the module is declared is where it can set if it is useable by something other than the parent module. There are some quirks to module hierarchy. The first is the crate module which is an internal pseudo module to access the root of the crate’s namespace. Any module in the crate can access modules that are direct descendants of the root, doesn’t matter if they are private. Kinda confusing…wonder if left over from before pub (crate) was a thing?

testing

The Rust test tooling makes a distinction between “unit” tests and “integration” (these go by different names else where). Unit tests are in the module they are testing as submodule. They have deep access including private methods of the module under test. They are supposed to be fast and small. Integration tests live in a tests/ directory which is a sibling directory to src/.

Unit tests by convention are an inline module called tests.

The built-in Rust linting tool is clippy.

crate busting

The cargo tool has a ton of built-in features to optimize the hell out of a workspace and its packages.

Adding [workspace] to a Cargo.toml declares that level as a workspace, a holder of packages, instead of just a package itself (but it can still have a root package). A workspace has a top-level cargo.lock, so dependency versions are shared by all packages in the workspace. Dependencies can only be used in packages that explicitly call for them though (thank god). Packages (and their crates) can be broken up to keep scope and dependencies as small as possible for consumers. No need to depend on one monolith crate that pulls in the world. But the maintainer can still get the benefits of a monorepo (simpler dependency tree, assurance that the package’s will at least work with each other).

Internal packages need an explicit dependency on any other internal packages they depend on. This can be with a path dependency or version. Crates that use dependencies specified with only a path cannot be published! But path and version can be combo’d where the path is only used locally.

[patch]’s can be supplied at the root of a workspace to override things for all internal packages. This is another way to supply a local path along with a version for intra-dependencies.

TOML is used in Cargo files. The sections (e.g. [patch]) are TOML tables. Dots . in the title keys give namespaces. Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created.

[patch.crates-io.bitcoin-units]
path = "units"

[patch.crates-io]
// Inline table with curly braces.
bitcoin-units = { path = "units" }  

Equivalent in TOML.

publishing to registries

  • Registries hold onto actual code artififacts, not just a pointer to a code repository.
    • A .crate file is a big compressed snapshot of the repository.
    • Once published, a create is independent from the original repository.
  • crates.io is the default registry in the ecosystem.
    • Connected my Github account to crates.io for my identity.
    • cargo login $(pass registry/cargo-io-yonson)
    • Namespace conflicts on crates.io are primarily avoided through a first-come, first-served naming system.
    • Once a name is taken, it cannot be used by another crate unless the original is transferred or removed.
    • Crate Name Conventions
      • Case insensitive.
      • Must begin with alpha character.
      • Common to have top-level directory name match crate name, but not a rule.
      • Crate names use kebab-case and directories follow this, while rust internals generally use snake_case. These are just conventions though, no enforced rules.
    • Ownership
      • Can transfer published crate ownership amoung crate.io users.
      • You cannot update metadata of published crates, but you can for new versions.
  • Authentication with a registry is delegated to system providers.

error handling

Composability is the name of the game when it comes to ergonomic error handling in Rust. The Result and Option enums are at the heart of tying all the conventions together.

The unwrap() and expect() methods on Result and Option are useful when hacking, but are not in the composable family since the panic.

visibility

A struct being made public with pub does not mean any of its fields are made public too. One big exception to that is enum variants in a pub enum are also public by default.

Ownership

Rust has a unique memory management model. The main purpose of ownership is to manage heap (not stack) data. Every value in Rust has an owner and only one owner. When the owner goes out of scope the value can be dropped. It is a little more explicit than languages that make use of a garbage collection runtime and less explicit than direct memory management. But it is at a spot where the compiler can statically analyze code and help us out a bunch.

Data types of known size can be kept on the stack and their scope is easy for the compiler to determine (function calls). They can also be easily copied to other scopes.

Fun fact, variables are immutable by default, let x = 5 versus let mut x = 5.

drop and move are the main memory management tools used by the Rust runtime under the hood. drop is called automatically on a value when it goes out of scope to free the memory. A move is a transfer of ownership.

Passing a variable to a function will move it (transfer ownership). A return value is the final expression in a block or can be returned early with return, this also moves ownership. In Rust lingo, ownership transfer is sometimes referred to as consuming or into.

Data stored on the stack can implement a Copy trait to avoid moves. Instead, the data is copied on the stack which is relatively low cost.

An option to avoid moving ownership is to clone the value, which clones the data on the heap. Obviously has runtime costs.

References are like pointers, but with more memory guarantees and provide some flexibility. Functions can refer to a value without taking ownership, but caller and callee have to agree on that. Creating a reference is called borrowing. You can’t modify something which is borrowed! The rust compiler is able to detect “dangling” references where a reference uses memory that is going to be dropped. A reference is created with &.

A reference can be mutable with mut. Sometimes you want to give something the ability to write without giving up ownership.

mutability and references and bindings

The mut keyword is used in two distinct spots (which I didn’t realize at first): mutable bindings and mutable references.

y: &i32          // Can't change anything.
mut y: &i32      // Can point y to new memory location, but can't update the contents of the memory.
y: &mut i32      // Can update the memory contents, but not where y is pointed at.
mut y: &mut i32  // Can update both y and memory contents.

Mutability with bindings and references.

The mut is on the binding not the type, but I find that confusing since you can still specify mutable references in generics. I guess lifetimes are specified in generics too, so it is more than just type information. It is any information to help the compiler. It also doesn’t make much sense to specify a “mutable value” in a generic, since it is not a binding to be updated. The situation I have in mind is collections like Vec<T>, where the collection owns the values (like a struct).

Helpful to remember that only places are mutable in Rust, not values.

There are some more complexities in matching (“match ergonomics”) where a binding is occurring.

box

A Box type is a fat pointer. It is still a statically sized stack allocation, but contains more metadata on the thing it is pointing too.

A box owns its data whereas a reference just borrows.

lifetimes

Lifetimes are more information used by the compiler to determine if borrows are valid. A variable’s lifetime begins when it is created and ends when it is destroyed, but sometimes the compiler needs some help determining when that is exactly.

fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) {
    println!("x is {} and y is {}", x, y);
}   

Showing off the lifetime ' syntax in the generics.`

The lifetime annotation can connect an input value to an output value of a function, so the compiler knows the input needs to be around as long as the output. Because these can get verbose fast, Rust has elision rules where it auto-adds them for common scenarios.

cells and reference counters

There are some tools to push the borrow checker rules from compile time to runtime. There is an obvious cost in performance, but sometimes it is necessary for the task. The easiest to think of is a graph of connected nodes built off of user input. The nodes need references to each other, but their lifetimes are not known until runtime.

Rc, and its concurrent counterpart Arc, allow for “shared ownership”. Instead of following the borrow checker’s “one owner” rule, Rc reference counts are checked at runtime. Every clone ups the tracked reference count instead of deep-copying like usual. The instance isn’t dropped until the reference count goes to 0, multiple owners.

Cell and RefCell (Mutex for concurrency) marks an instance as mutable without requiring a mutable reference. This is often described as interior mutability. The borrow checker for write-ability are moved to runtime.

Types

Rust is statically typed and has all sorts of bells and whistles.

tuple

let (int_param, bool_param) = pair;

let can be used to bind the members of a tuple to variables, an unboxing.

numbers

The usize primitive size is architecture dependent, either 64 or 32 bits. Seems a little like go’s int type, used a lot for indexes of collections. Probably has to with how an index is related to a pointer and that is related to the architecture…but I find it confusing!

strings and characters

The Rust char type is four bytes.

The str type is a string slice. A slice is a kind of reference, so it doesn’t have ownership. String literals (e.g. let hello_world = "Hello, World!";) are string slices. They are usually type &str. Since they are references the compiler will help us out.

String type is a dynamic heap type.

arrays and slices

Reference a contiguous sequence of elements in a collection instead of the whole thing. Slices don’t own their contents, just special references.

let slice = &a[1..3];

The range syntax makes it easy to get new views on a slice or an array

A slice is a “fat-pointer”, so a pointer and the size of its contents, and it doesn’t know that size at compile time.

  • [T; n] – An array of type T elements with length n, the length is known at compile time and is part of the type.
  • &[T; n] – A reference to an array of length n, a “thin” pointer.
  • [T] – A slice of type T elements, the length is not known at compile time. This form is not commonly coded with directly.
  • &[T] – A slice of sized type T. A “fat-pointer”.

vector

Dynamic sized collections stored on the heap.

iterators

Some iterators consume the collection (into_iter) while other just reference it (iter).

newtypes and type aliases

The newtype pattern.

Type aliases are effectively just documentation, the compiler isn’t checking beyond the underlying type. The two types are synonyms. Generally used to just alias complicated types.

type Point = (u8, u8)    

The type keyword declares a new type, in this case a type synonym.

associated types

The type keyword shows up again! It helps define associated types. So it is only going to show up within traits and trait impl blocks.

“When a trait has a generic parameter, it can be implemented for a type multiple times, changing the concrete types of the generic type parameters each time”.

A trait with a generic can be implemented multiple times on a type. This might not be what you want all the time. A trait with an associated type doesn’t know the type ahead of time, but it can only be implemented once on a type. The trait implementation chooses its associated type.

pub trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

The Iterator trait has the associated type Item which is set by types which implement it.

Trait functions can still have type “placeholders” like if the trait was generic, but now the type is set when the trait is implemented vs. used. This is also how only one version of the trait can be applied to type.

The benefits to setting a type “upstream” is easier usage downstream. There are some scenarios where the caller is only abstracting over the outer, or parent, type and doesn’t really care about the inner types. If the outer type is generic though, all the inner type information needs to be passed along. There is the wildcard _, but that tells the compiler to infer a type and that needs to be deterministic.

If you only want a trait to be set once on a type no matter the trait’s inner type, that is a sign to use an associated type over a generic trait. Kinda shifts freedoms and burdens from caller to implementor.

type conversion and coercion

The standard library has a std::convert module (looks like it’s also in core) which holds a handful of traits a type can implement. These are “user-defined” type conversions.

The explicit cast keyword as is pretty limited to pairs of types. Some coercion is done by the compiler and doesn’t require an as, mostly reference and pointer stuff.

There is also some automatic type conversions, coercion, involved as well. Dereferencing is one of the big ones, referred to as deref coercion.

f.foo();
(&f).foo();
(&&f).foo();
(&&&&&&&&f).foo();

All these method calls are the same since they will be deref’d to the implementation.

enumerations

The built-in Option has some patterns around it.

unwrap is a shortcut method where if the result is OK variant, it returns that, else it panics. Can use expect to supply your own message, but same concept. Should probably use expect over unwrap though. Messages can be lowercase, no punctuation.

traits

Traits are Rust’s interface abstraction.

How can code consume a trait instead of the implementation (“accept interfaces, produce structs” a.k.a. polymorphism)? There are two ways with semantic differences, but each has a lot of different syntax.

First up, leverage generics at compile time. This is known as a trait bound, like bounding the possibilities of a type.

pub fn notify<T: Summary>(item: &T) {
    println!("Breaking news! {}", item.summarize());
}

// Specify multiple bounds.
pub fn notify<T: Summary + Display>(item: &T) {
    ...
}

The generic type T must implement the Summary trait.

You can shift these definitions into a where clause if they get unwieldy.

fn some_function<T, U>(t: &T, u: &U) -> i32
where
    T: Display + Clone,
    U: Clone + Debug,
{
    ...
}

where clause for long definitions.

There is also the impl syntax sugar for simple trait bounds. I dunno why this is necessary to be honest, kinda wish there was just one way to list a bound and we dealt with it!

pub fn notify(item: &impl Summary) {
    println!("Breaking news! {}", item.summarize());
}  

Syntax sugar for a trait bound.

trait objects

Ok, finally, the second type known as trait objects. A trait object is more runtime-y than the generic bounds. Another description is bounds are static while objects are dynamic (implementation is called “dynamic dispatch”).

// Box for ownership.
fn print(a: Box<dyn Printable>) {
    println!("{}", a.stringify());
} 

// Reference to borrow.
fn print(a: &dyn Printable>) {
    println!("{}", a.stringify());
}

A trait object declared with the dyn keyword.

A trait object can be used with any smart pointer (e.g. Box, Arc, etc.) or reference, but it must be a pointer since the concrete size is not known at compile time.

Trait bounds must hold a homogeneous type, but trait objects do not have this limitation. However, not all types can be trait objects.

Under the hood, bounds are implemented by the compiler generating a bunch of versions of a function. Objects require a lookup table (vtable) that is used at runtime to find what to execute.

bounds

The bounding aspect of traits is definitely my favorite quality, giving as much information to the compiler as possible. There are multiple spots to use it, some less obvious than others. Maybe captured by anywhere there is a generic parameter?

function arguments

The most obvious use case is bounding the type consumed by a function. This allows the function to only care about a specific interface of the argument, and not anything else it happens to implement.

conditional implementations

Methods can be conditionally implemented on a generic type depending on trait bounds.

impl<T: Display + PartialOrd> Pair<T> {
  ...
}    

The impl block only applies to Pairs who’s inner type implements Display and PartialOrd.

A trait is a set of shared behavior for multiple types, but we have the flexibility to only implement a method based on the inner type. An inner type can be moved to an associated type, but we still have the flexibility to conditionally implement, like a bound on a default method of a trait. Looks like there might be nicer syntax for this “soon”, but in any case, the RFC gives a nice breakdown of all syntax.

blanket implementations

Conditional implementations can be inverted as well and a trait can be blanket implemented for any type restricted by a bound.

impl<T: Display> ToString for T {
  ...
}    

The ToString trait is defined for any type T which implements Display.

marker traits

A marker trait tells you something about a type, but without adding any new methods or features. Some of these are auto-added by the compiler. All auto traits are marker traits but not all marker traits are auto traits.

The Sized trait is an auto-applied marker trait that lets everyone know a type’s size is known at compile time. A type without Sized is a dynamically sized type (DST). Slices and trait objects are DSTs, but for slices, usually dealing with a sized fat-pointer.

All generic type parameters are auto-bound with Sized by default.

extension traits

Incorporate additional methods on a type outside the crate it is defined in. This fits in Rust’s orphan rule.

sealed traits

A sealed trait can only be implemented in the crate it is defined. This makes it less flexible, but easier for the maintainer to not break downstream code. A small contract with the caller. This isn’t a first class language thing, you have to get tricky with public and private trait combining.

Concurrency

The borrow checker can help out a bunch to make programs safe for concurrency. Similar to the golang channel strategy of “only one owner” of a message at a time, the borrow checker is clearly a more general ownership checker. For the common data races (corrupted data) and deadlock (stuck program) issues, the borrow checker guards against data races very well. Deadlocks are still very possible though. Best to follow the golang mantra, “Do not communicate by sharing memory; instead, share memory by communicating”. In Rust, that means leveraging ownership instead of locks.

pin

Pinning keeps a pointer “pinned” in memory. It is used a lot in async-land because futures are self-referential.