Rust

Ultra-Violence

Contents

They would take their software out and race it in the black desert of the electronic night.

Productivity

Patterns to write better code.

system and rust-analyzer

The rustup executable is the standard way to manage the Rust version and tool chain on a local machine. I have had the best luck just using it directly versus using distribution{} bytes package managers to install and manage the tool chain and/or rustup itself. Just run rustup update to pull in the new changes.

The rust-analyzer component of the tool chain is the flagship Rust LSP. I haven’t found a need to modify it much.

bootstrap a crate

cargo new guessing_game

The cargo tool sets you up pretty good.

The dependencies and project settings are established in Cargo.toml
The source code always goes in src/

cargo run

Build and execute the program.

The --manifest-path flag can be used to point it at a different directory if called from a script.
The --quite flag removes all the build jargon.

[dependencies]
num-bigint = "0.4.4"

Cargo.toml dependencies section.

Add a dependency by manually updating Cargo.toml and then running cargo build to update lock file.

project structure

Rust has its unique words and conventions to split up a project.

A path names an item such as a struct, function, or module.
A module controls scope and privacy of paths, with things private by default. Similar to a file system, but lives alongside it.
A crate is a tree of modules which produces a library or executable. The smallest amount of code that the compiler will consider at a time. In a “hello world” single file example, the file is the crate.
- Binary crates must have a main function. Comparable to golang cmd/ programs.
- The crate root is the source file which the compiler starts from and is actually a module called crate.
A package can contain multiple binary crates and optionally one library crate. Is a cargo level feature, so focuses on building, testing, and publishing code.
- Cargo.toml is at the package root.
- cargo follows conventions that src/main.rs is a binary crate with the same name as the package. And src/lib.rs is a library.
- Other binary crates should be placed in src/bin/.
A workspace shares dependencies across a group of packages.

Seems sensible to start any crate with a src/main.rs and src/lib.rs to take advantage of the standard package/crate conventions. Splitting of the lib right away creates a nice interface for tests.

Module-fying code is where things get a bit more complex. This is where some encapsulation choices are being made. Modules need to first be declared before they can then be depended on by other modules in a crate. A module is declared with mod #MODULE and that declaration needs to be picked up by a file that the compiler will look at, so the first natural spot to put these would be in one of the roots (main or lib). The compiler will look in three spots for the module definition once it runs into a mod:

Inline, the current file.
./$MODULE.rs
./$MODULE/mod.rs (OLD!)

The module system is like a file system, but it is living alongside the file system. Only style 2 listed above should be used to tie in the file system (style 1 is same file and style 3 is deprecated). This means a module is declared in the root with mod garden and could be defined in ./garden.rs. If the garden module wants its own submodules, they are declared in garden.rs, like mod vegatable, and the submodules are defined in the directory like so: ./garden/vegatable.rs. This style is preferred since it is more searchable than a bunch of mod.rs files, even though it is kinda weird since you might assume a module is contained in a directory like other language hierarchies.

A new module is a new level in the namespace hierarchy. First though a bit of inversion, a child module has access to a parent module’s private components, but parents don’t have that with children. But if the child module is private, it itself can still be accessed by the parent since it is like a private component of the parent. Modules have visibility rules similar to structs and functions. Where the module is declared is where it can set if it is useable by something other than the parent module. There are some quirks to module hierarchy. The first is the crate module which is an internal pseudo module to access the root of the crate’s namespace. Any module in the crate can access modules that are direct descendants of the root, doesn’t matter if they are private. Kinda confusing…wonder if left over from before pub (crate) was a thing?

testing

The Rust test tooling makes a distinction between “unit” tests and “integration” (these go by different names else where). Unit tests are in the module they are testing as submodule. They have deep access including private methods of the module under test. They are supposed to be fast and small. Integration tests live in a tests/ directory which is a sibling directory to src/.

Unit tests by convention are an inline module called tests.

The built-in Rust linting tool is clippy.

crate busting

The cargo tool has a ton of built-in features to optimize the hell out of a workspace and its packages.

Adding [workspace] to a Cargo.toml declares that level as a workspace, a holder of packages, instead of just a package itself (but it can still have a root package). A workspace has a top-level cargo.lock, so dependency versions are shared by all packages in the workspace. Dependencies can only be used in packages that explicitly call for them though (thank god). Packages (and their crates) can be broken up to keep scope and dependencies as small as possible for consumers. No need to depend on one monolith crate that pulls in the world. But the maintainer can still get the benefits of a monorepo (simpler dependency tree, assurance that the package’s will at least work with each other).

Internal packages need an explicit dependency on any other internal packages they depend on. This can be with a path dependency or version. Crates that use dependencies specified with only a path cannot be published! But path and version can be combo’d where the path is only used locally.

[patch]’s can be supplied at the root of a workspace to override things for all internal packages. This is another way to supply a local path along with a version for intra-dependencies.

TOML is used in Cargo files. The sections (e.g. [patch]) are TOML tables. Dots . in the title keys give namespaces. Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created.

[patch.crates-io.bitcoin-units]
path = "units"

[patch.crates-io]
// Inline table with curly braces.
bitcoin-units = { path = "units" }

Equivalent in TOML.

type conversion and coercion

The standard library has a std::convert module (looks like it’s also in core) which holds a handful of traits a type can implement. These are “user-defined” type conversions.

AsRef – Often used to pass a reference to an internal value in a struct. Cheaper than copying or moving it, but requires the explicit connection.
Into/From – Convert type, but might be expensive. The input value is consumed. Rust provides a Into from a From implementation, but not vice versa.

There is also some automatic type conversions, coercion, involved as well. The explicit cast keyword as is pretty limited to pairs of types. Some coercion is done by the compiler and doesn’t require an as, mostly reference and pointer stuff.

error handling

Composability is the name of the game when it comes to ergonomic error handling in Rust. The Result and Option enums are at the heart of tying all the conventions together.

The unwrap() and expect() methods on Result and Option are useful when hacking, but are not in the composable family since the panic.

newtypes and type aliases

The newtype pattern.

Type aliases are effectively just documentation, the compiler isn’t checking beyond the underlying type.

Fundamentals

Things I always forget.

ownership

Rust has a unique memory management model. The main purpose of ownership is to manage heap (not stack) data. Every value in Rust has an owner and only one owner. When the owner goes out of scope the value can be dropped. It is a little more explicit than languages that make use of a garbage collection runtime and less explicit than direct memory management. But it is at a spot where the compiler can statically analyze code and help us out a bunch.

Data types of known size can be kept on the stack and their scope is easy for the compiler to determine (function calls). They can also be easily copied to other scopes.

Fun fact, variables are immutable by default, let x = 5 versus let mut x = 5.

drop and move are the main memory management tools used by the Rust runtime under the hood. drop is called automatically on a value when it goes out of scope to free the memory. A move is a transfer of ownership. Rust uses a few words for this like .

Passing a variable to a function will move it (transfer ownership). A return value is the final expression in a block or can be returned early with return, this also moves ownership. In Rust lingo, ownership transfer is sometimes referred to as consuming or into.

Data stored on the stack can implement a Copy trait to avoid moves. Instead, the data is copied on the stack which is relatively low cost.

An option to avoid moving ownership is to clone the value, which clones the data on the heap. Obviously has runtime costs.

References are like pointers, but with more memory guarantees and provide some flexibility. Functions can refer to a value without taking ownership, but caller and callee have to agree on that. Creating a reference is called borrowing. You can’t modify something which is borrowed! The rust compiler is able to detect “dangling” references where a reference uses memory that is going to be dropped. A reference is created with &.

A reference can be mutable with mut. Sometimes you want to give something the ability to write without giving up ownership.

mutability and references and bindings

The mut keyword is used in two distinct spots, which I didn’t realize at first: one is mutable variables and the other is mutable references.

y: &i32          // Can't change anything.
mut y: &i32      // Can point y to new memory location, but can't update the contents of the memory.
y: &mut i32      // Can update the memory contents, but not where y is pointed at.
mut y: &mut i32  // Can update both y and memory contents.

Mutability with bindings and references.

The mut is on the binding not the type, but I find that confusing since you can still specify mutable references in generics. I guess lifetimes are specified in generics too, so it is more than just type information. It is any information to help the compiler. It also doesn’t make much sense to specify a “mutable value” in a generic, since it is not a binding to be updated. The situation I have in mind is collections like Vec<T>, where the collection owns the values (like a struct).

There are some more complexities in matching (“match ergonomics”) where a binding is occurring.

box

A Box type is a fat pointer. It is still a statically sized stack allocation, but contains more metadata on the thing it is pointing too.

A box owns its data whereas a reference just borrows.

lifetimes

Lifetimes are more information used by the compiler to determine if borrows are valid. A variable’s lifetime begins when it is created and ends when it is destroyed, but sometimes the compiler needs some help determining when that is exactly.

fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) {
    println!("x is {} and y is {}", x, y);
}

Showing off the lifetime ' syntax in the generics.`

The lifetime annotation can connect an input value to an output value of a function, so the compiler knows the input needs to be around as long as the output. Because these can get verbose fast, Rust has elision rules where it auto-adds them for common scenarios.

traits

Traits are Rust’s interface abstraction.

How can code consume a trait instead of the implementation (“accept interfaces, produce structs” a.k.a. polymorphism)? There are two ways with semantic differences, but each has a lot of different syntax.

First up, leverage generics at compile time. This is known as a trait bound, like are bounding the possibilities of a type.

pub fn notify<T: Summary>(item: &T) {
    println!("Breaking news! {}", item.summarize());
}

// Specify multiple traits.
pub fn notify<T: Summary + Display>(item: &T) {
    ...
}

The generic type T must implement the Summary trait.

You can shift these definitions into a where clause if they get unwieldy.

fn some_function<T, U>(t: &T, u: &U) -> i32
where
    T: Display + Clone,
    U: Clone + Debug,
{
    ...
}

where clause for long definitions.

There is also the impl syntax sugar for simple trait bounds. I dunno why this is necessary to be honest.

pub fn notify(item: &impl Summary) {
    println!("Breaking news! {}", item.summarize());
}

Syntax sugar for a trait bound.

Ok, finally, the second type known as trait objects. A trait object is more runtime-y than the generic bounds. Another description is bounds are static while objects are dynamic (implementation is called “dynamic dispatch”).

// Box for ownership.
fn print(a: Box<dyn Printable>) {
    println!("{}", a.stringify());
} 

// Reference to borrow.
fn print(a: &dyn Printable>) {
    println!("{}", a.stringify());
}

A trait object declared with the dyn keyword.

A trait object can be used with any smart pointer (e.g. Box, Arc, etc.) or reference, but it must be a pointer since the concrete size is not known at compile time.

Trait bounds must hold a homogeneous type, but trait objects do not have this limitation. However, not all types can be trait objects.

Under the hood, bounds are implemented by the compiler generating a bunch of versions of a function. Objects require a lookup table (vtable) that is used at runtime to find what to execute.

extension

Incorporate additional methods on a type outside the crate it is defined in.

cells and reference counters

There are some tools to push the borrow checker rules from compile time to runtime. There is an obvious cost in performance, but sometimes it is necessary for the task. The easiest to think of is a graph of connected nodes built off of user input. The nodes need references to each other, but their lifetimes are not known until runtime.

Rc, and its concurrent counterpart Arc, allow for “shared ownership”. Instead of following the borrow checker’s “one owner” rule, Rc reference counts are checked at runtime. Every clone ups the tracked reference count instead of deep-copying like usual. The instance isn’t dropped until the reference count goes to 0, multiple owners.

Cell and RefCell (Mutex for concurrency) marks an instance as mutable without requiring a mutable reference. This is often described as interior mutability. The borrow checker for write-ability are moved to runtime.

concurrency

The borrow checker can help out a bunch to make programs safe for concurrency. Similar to the golang channel strategy of “only one owner” of a message at a time, the borrow checker is clearly a more general ownership checker. For the common data races (corrupted data) and deadlock (stuck program) issues, the borrow checker guards against data races very well. Deadlocks are still very possible though. Best to follow the golang mantra, “Do not communicate by sharing memory; instead, share memory by communicating”. In Rust, that means leveraging ownership instead of locks.

pin

Pinning keeps a pointer “pinned” in memory. It is used a lot in async-land because futures are self-referential.

types

tuple

let (int_param, bool_param) = pair;

let can be used to bind the members of a tuple to variables, an unboxing.

numbers

The usize primitive size is architecture dependent, either 64 or 32 bits. Seems a little like go’s int type, used a lot for indexes of collections. Probably has to with how an index is related to a pointer and that is related to the architecture…but I find it confusing!

strings and characters

The Rust char type is four bytes.

The str type is a string slice. A slice is a kind of reference, so it doesn’t have ownership. String literals (e.g. let hello_world = "Hello, World!";) are string slices. They are usually type &str. Since they are references the compiler will help us out.

String type is a dynamic heap type.

arrays and slices

Reference a contiguous sequence of elements in a collection instead of the whole thing. Slices don’t own their contents, just special references.

let slice = &a[1..3];

The range syntax makes it easy to get new views on a slice or an array

A slice is a “fat-pointer”, so a pointer and the size of its contents, and it doesn’t know that size at compile time.

[T; n] – An array of type T elements with length n, the length is known at compile time and is part of the type.
&[T; n] – A reference to an array of length n, a “thin” pointer.
[T] – A slice of type T elements, the length is not known at compile time. This form is not commonly coded with directly.
&[T] – A slice of sized type T. A “fat-pointer”.

vector

Dynamic sized collections stored on the heap.

iterators

Some iterators consume the collection (into_iter) while other just reference it (iter).

enumerations

The built-in Option has some patterns around it.

unwrap is a shortcut method where if the result is OK variant, it returns that, else it panics. Can use expect to supply your own message, but same concept. Should probably use expect over unwrap though. Messages can be lowercase, no punctuation.

visibility

A struct being made public with pub does not mean any of its fields are made public too. One big exception to that is enum variants in a pub enum are also public by default.

control flow

let odd_even = if x % 2 == 0 {
    println!("EVEN");
    "even"
} else {
    println!("ODD");
    "odd"
};

conditionals are expressions

2024.04.26