// Ultra-Violence
Contents
They would take their software out and race it in the black desert of the electronic night.
Productivity
Patterns to write better code.
system with rustup
The rustup
executable is the standard way to manage the Rust version and toolchain on a local machine. I have had the best luck just using it directly versus using distribution package managers to install and manage the tool chain and/or rustup
itself.
rustup update
to pull in the new changes.rustup show
prints out the local toolchains installed.rustup default $TOOLCHAIN
to update the system default.- Toolchain override shorthand looks like
cargo +beta
(nifty plus instead of the conventional-
). - Overrides can also be set per-directory, the mappings are stored in rustup’s storage.
- rustup has a funny profiles concept: minimal, default, and complete. Usually want default plus rust-analyzer since complete is everything under the sun and breaks all the time (apparently).
lsp
The rust-analyzer
component of the toolchain is the flagship Rust LSP. I haven’t found a need to modify it much. I believe the rust-toolchain.toml file is used to specify the version, including things like the formatter version.
formatting
rustfmt.toml
should set settings for a project, a version can be defined here for just formatting too.
bootstrap a crate
cargo new guessing_game
The cargo
tool sets you up pretty good.
- The dependencies and project settings are established in
Cargo.toml
- The source code always goes in
src/
cargo run
Build and execute the program.
- The
--manifest-path
flag can be used to point it at a different directory if called from a script. - The
--quite
flag removes all the build jargon.
[dependencies]
num-bigint = "0.4.4"
Cargo.toml dependencies section.
Add a dependency by manually updating Cargo.toml
and then running cargo build
to update lock file.
project structure
Rust has its unique words and conventions to split up a project.
- A
path
names an item such as a struct, function, or module. - A
module
controls scope and privacy of paths, with things private by default. Similar to a file system, but lives alongside it. - A
crate
is a tree of modules which produces a library or executable. The smallest amount of code that the compiler will consider at a time. In a “hello world” single file example, the file is the crate.- Binary crates must have a
main
function. Comparable to golangcmd/
programs. - The crate root is the source file which the compiler starts from and is actually a module called
crate
.
- Binary crates must have a
- A
package
can contain multiple binary crates and optionally one library crate. Is acargo
level feature, so focuses on building, testing, and publishing code.Cargo.toml
is at the package root.cargo
follows conventions thatsrc/main.rs
is a binary crate with the same name as the package. Andsrc/lib.rs
is a library.- Other binary crates should be placed in
src/bin/
.
- A
workspace
shares dependencies across a group ofpackages
.
Seems sensible to start any crate with a src/main.rs
and src/lib.rs
to take advantage of the standard package/crate conventions. Splitting of the lib right away creates a nice interface for tests.
Module-fying code is where things get a bit more complex. This is where some encapsulation choices are being made. Modules need to first be declared before they can then be depended on by other modules in a crate. A module is declared with mod #MODULE
and that declaration needs to be picked up by a file that the compiler will look at, so the first natural spot to put these would be in one of the roots (main or lib). The compiler will look in three spots for the module definition once it runs into a mod
:
- Inline, the current file.
./$MODULE.rs
(new)./$MODULE/mod.rs
(original)
The module system is like a file system, but it is living alongside the file system. This means a module is declared with mod garden
then defined either in the same file, in ./garden.rs
, or ./garden/mod.rs
. If the garden
module wants its own submodules, they are declared in the garden implementation, like mod vegatable
, and the submodules are defined in one of the three possible spots. Style #2
was introduced since it debatably more searchable than a bunch of mod.rs
files, even though it is kinda weird since you might assume a module is contained in a directory like other language hierarchies. It is mostly just a bummer that there are two styles now.
A new module is a new level in the namespace hierarchy. First though a bit of inversion, a child module has access to a parent module’s private components, but parents don’t have that with children. But if the child module is private, it itself can still be accessed by the parent since it is like a private component of the parent. Modules have visibility rules similar to structs and functions. Where the module is declared is where it can set if it is useable by something other than the parent module. There are some quirks to module hierarchy. The first is the crate
module which is an internal pseudo module to access the root of the crate’s namespace. Any module in the crate can access modules that are direct descendants of the root, doesn’t matter if they are private. Kinda confusing…wonder if left over from before pub (crate)
was a thing?
testing
The Rust test tooling makes a distinction between “unit” tests and “integration” (these go by different names else where). Unit tests are in the module they are testing as submodule. They have deep access including private methods of the module under test. They are supposed to be fast and small. Integration tests live in a tests/
directory which is a sibling directory to src/
.
Unit tests by convention are an inline module called tests
.
The built-in Rust linting tool is clippy
.
crate busting
The cargo
tool has a ton of built-in features to optimize the hell out of a workspace and its packages.
Adding [workspace]
to a Cargo.toml
declares that level as a workspace, a holder of packages, instead of just a package itself (but it can still have a root package). A workspace has a top-level cargo.lock
, so dependency versions are shared by all packages in the workspace. Dependencies can only be used in packages that explicitly call for them though (thank god). Packages (and their crates) can be broken up to keep scope and dependencies as small as possible for consumers. No need to depend on one monolith crate that pulls in the world. But the maintainer can still get the benefits of a monorepo (simpler dependency tree, assurance that the package’s will at least work with each other).
Internal packages need an explicit dependency on any other internal packages they depend on. This can be with a path
dependency or version
. Crates that use dependencies specified with only a path cannot be published! But path and version can be combo’d where the path is only used locally.
[patch]
’s can be supplied at the root of a workspace to override things for all internal packages. This is another way to supply a local path
along with a version for intra-dependencies.
TOML is used in Cargo files. The sections (e.g. [patch]
) are TOML tables. Dots .
in the title keys give namespaces. Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created.
[patch.crates-io.bitcoin-units]
path = "units"
[patch.crates-io]
// Inline table with curly braces.
bitcoin-units = { path = "units" }
Equivalent in TOML.
publishing to registries
- Registries hold onto actual code artififacts, not just a pointer to a code repository.
- A
.crate
file is a big compressed snapshot of the repository. - Once published, a create is independent from the original repository.
- A
- crates.io is the default registry in the ecosystem.
- Connected my Github account to crates.io for my identity.
cargo login $(pass registry/cargo-io-yonson)
- Namespace conflicts on crates.io are primarily avoided through a first-come, first-served naming system.
- Once a name is taken, it cannot be used by another crate unless the original is transferred or removed.
- Crate Name Conventions
- Case insensitive.
- Must begin with alpha character.
- Common to have top-level directory name match crate name, but not a rule.
- Crate names use kebab-case and directories follow this, while rust internals generally use snake_case. These are just conventions though, no enforced rules.
- Ownership
- Can transfer published crate ownership amoung crate.io users.
- You cannot update metadata of published crates, but you can for new versions.
- Authentication with a registry is delegated to system providers.
error handling
Composability is the name of the game when it comes to ergonomic error handling in Rust. The Result
and Option
enums are at the heart of tying all the conventions together.
The unwrap()
and expect()
methods on Result
and Option
are useful when hacking, but are not in the composable family since the panic.
visibility
A struct being made public with pub
does not mean any of its fields are made public too. One big exception to that is enum variants in a pub enum are also public by default.
Ownership
Rust has a unique memory management model. The main purpose of ownership is to manage heap (not stack) data. Every value in Rust has an owner and only one owner. When the owner goes out of scope the value can be dropped. It is a little more explicit than languages that make use of a garbage collection runtime and less explicit than direct memory management. But it is at a spot where the compiler can statically analyze code and help us out a bunch.
Data types of known size can be kept on the stack and their scope is easy for the compiler to determine (function calls). They can also be easily copied to other scopes.
Fun fact, variables are immutable by default, let x = 5
versus let mut x = 5
.
drop and move are the main memory management tools used by the Rust runtime under the hood. drop is called automatically on a value when it goes out of scope to free the memory. A move is a transfer of ownership.
Passing a variable to a function will move it (transfer ownership). A return value is the final expression in a block or can be returned early with return
, this also moves ownership. In Rust lingo, ownership transfer is sometimes referred to as consuming or into.
Data stored on the stack can implement a Copy trait to avoid moves. Instead, the data is copied on the stack which is relatively low cost.
An option to avoid moving ownership is to clone the value, which clones the data on the heap. Obviously has runtime costs.
References are like pointers, but with more memory guarantees and provide some flexibility. Functions can refer to a value without taking ownership, but caller and callee have to agree on that. Creating a reference is called borrowing. You can’t modify something which is borrowed! The rust compiler is able to detect “dangling” references where a reference uses memory that is going to be dropped. A reference is created with &
.
A reference can be mutable with mut
. Sometimes you want to give something the ability to write without giving up ownership.
mutability and references and bindings
The mut
keyword is used in two distinct spots (which I didn’t realize at first): mutable bindings and mutable references.
y: &i32 // Can't change anything.
mut y: &i32 // Can point y to new memory location, but can't update the contents of the memory.
y: &mut i32 // Can update the memory contents, but not where y is pointed at.
mut y: &mut i32 // Can update both y and memory contents.
Mutability with bindings and references.
The mut
is on the binding not the type, but I find that confusing since you can still specify mutable references in generics. I guess lifetimes are specified in generics too, so it is more than just type information. It is any information to help the compiler. It also doesn’t make much sense to specify a “mutable value” in a generic, since it is not a binding to be updated. The situation I have in mind is collections like Vec<T>
, where the collection owns the values (like a struct).
Helpful to remember that only places are mutable in Rust, not values.
There are some more complexities in matching (“match ergonomics”) where a binding is occurring.
box
A Box
type is a fat pointer. It is still a statically sized stack allocation, but contains more metadata on the thing it is pointing too.
A box owns its data whereas a reference just borrows.
lifetimes
Lifetimes are more information used by the compiler to determine if borrows are valid. A variable’s lifetime begins when it is created and ends when it is destroyed, but sometimes the compiler needs some help determining when that is exactly.
fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) {
println!("x is {} and y is {}", x, y);
}
Showing off the lifetime '
syntax in the generics.`
The lifetime annotation can connect an input value to an output value of a function, so the compiler knows the input needs to be around as long as the output. Because these can get verbose fast, Rust has elision rules where it auto-adds them for common scenarios.
cells and reference counters
There are some tools to push the borrow checker rules from compile time to runtime. There is an obvious cost in performance, but sometimes it is necessary for the task. The easiest to think of is a graph of connected nodes built off of user input. The nodes need references to each other, but their lifetimes are not known until runtime.
Rc
, and its concurrent counterpart Arc
, allow for “shared ownership”. Instead of following the borrow checker’s “one owner” rule, Rc
reference counts are checked at runtime. Every clone
ups the tracked reference count instead of deep-copying like usual. The instance isn’t dropped until the reference count goes to 0, multiple owners.
Cell
and RefCell
(Mutex
for concurrency) marks an instance as mutable without requiring a mutable reference. This is often described as interior mutability. The borrow checker for write-ability are moved to runtime.
Types
Rust is statically typed and has all sorts of bells and whistles.
tuple
let (int_param, bool_param) = pair;
let
can be used to bind the members of a tuple to variables, an unboxing.
numbers
The usize
primitive size is architecture dependent, either 64 or 32 bits. Seems a little like go’s int
type, used a lot for indexes of collections. Probably has to with how an index is related to a pointer and that is related to the architecture…but I find it confusing!
strings and characters
The Rust char
type is four bytes.
The str
type is a string slice. A slice is a kind of reference, so it doesn’t have ownership. String literals (e.g. let hello_world = "Hello, World!";
) are string slices. They are usually type &str
. Since they are references the compiler will help us out.
String
type is a dynamic heap type.
arrays and slices
Reference a contiguous sequence of elements in a collection instead of the whole thing. Slices don’t own their contents, just special references.
let slice = &a[1..3];
The range
syntax makes it easy to get new views on a slice or an array
A slice is a “fat-pointer”, so a pointer and the size of its contents, and it doesn’t know that size at compile time.
[T; n]
– An array of typeT
elements with lengthn
, the length is known at compile time and is part of the type.&[T; n]
– A reference to an array of lengthn
, a “thin” pointer.[T]
– A slice of typeT
elements, the length is not known at compile time. This form is not commonly coded with directly.&[T]
– A slice of sized typeT
. A “fat-pointer”.
vector
Dynamic sized collections stored on the heap.
iterators
Some iterators consume the collection (into_iter
) while other just reference it (iter
).
newtypes and type aliases
The newtype pattern.
Type aliases are effectively just documentation, the compiler isn’t checking beyond the underlying type. The two types are synonyms. Generally used to just alias complicated types.
type Point = (u8, u8)
The type
keyword declares a new type, in this case a type synonym.
associated types
The type
keyword shows up again! It helps define associated types. So it is only going to show up within traits and trait impl blocks.
“When a trait has a generic parameter, it can be implemented for a type multiple times, changing the concrete types of the generic type parameters each time”.
A trait with a generic can be implemented multiple times on a type. This might not be what you want all the time. A trait with an associated type doesn’t know the type ahead of time, but it can only be implemented once on a type. The trait implementation chooses its associated type.
pub trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
The Iterator
trait has the associated type Item
which is set by types which implement it.
Trait functions can still have type “placeholders” like if the trait was generic, but now the type is set when the trait is implemented vs. used. This is also how only one version of the trait can be applied to type.
The benefits to setting a type “upstream” is easier usage downstream. There are some scenarios where the caller is only abstracting over the outer, or parent, type and doesn’t really care about the inner types. If the outer type is generic though, all the inner type information needs to be passed along. There is the wildcard _
, but that tells the compiler to infer a type and that needs to be deterministic.
If you only want a trait to be set once on a type no matter the trait’s inner type, that is a sign to use an associated type over a generic trait. Kinda shifts freedoms and burdens from caller to implementor.
type conversion and coercion
The standard library has a std::convert
module (looks like it’s also in core
) which holds a handful of traits a type can implement. These are “user-defined” type conversions.
AsRef
– Often used to pass a reference to an internal value in a struct. Cheaper than copying or moving it, but requires the explicit connection.Into
/From
– Convert type, but might be expensive. The input value is consumed. Rust provides aInto
from aFrom
implementation, but not vice versa.
The explicit cast keyword as
is pretty limited to pairs of types. Some coercion is done by the compiler and doesn’t require an as
, mostly reference and pointer stuff.
There is also some automatic type conversions, coercion, involved as well. Dereferencing is one of the big ones, referred to as deref coercion.
f.foo();
(&f).foo();
(&&f).foo();
(&&&&&&&&f).foo();
All these method calls are the same since they will be deref’d to the implementation.
enumerations
The built-in Option has some patterns around it.
unwrap
is a shortcut method where if the result is OK variant, it returns that, else it panics. Can use expect
to supply your own message, but same concept. Should probably use expect
over unwrap
though. Messages can be lowercase, no punctuation.
traits
Traits are Rust’s interface abstraction.
How can code consume a trait instead of the implementation (“accept interfaces, produce structs” a.k.a. polymorphism)? There are two ways with semantic differences, but each has a lot of different syntax.
First up, leverage generics at compile time. This is known as a trait bound, like bounding the possibilities of a type.
pub fn notify<T: Summary>(item: &T) {
println!("Breaking news! {}", item.summarize());
}
// Specify multiple bounds.
pub fn notify<T: Summary + Display>(item: &T) {
...
}
The generic type T
must implement the Summary
trait.
You can shift these definitions into a where
clause if they get unwieldy.
fn some_function<T, U>(t: &T, u: &U) -> i32
where
T: Display + Clone,
U: Clone + Debug,
{
...
}
where
clause for long definitions.
There is also the impl
syntax sugar for simple trait bounds. I dunno why this is necessary to be honest, kinda wish there was just one way to list a bound and we dealt with it!
pub fn notify(item: &impl Summary) {
println!("Breaking news! {}", item.summarize());
}
Syntax sugar for a trait bound.
trait objects
Ok, finally, the second type known as trait objects. A trait object is more runtime-y than the generic bounds. Another description is bounds are static while objects are dynamic (implementation is called “dynamic dispatch”).
// Box for ownership.
fn print(a: Box<dyn Printable>) {
println!("{}", a.stringify());
}
// Reference to borrow.
fn print(a: &dyn Printable>) {
println!("{}", a.stringify());
}
A trait object declared with the dyn
keyword.
A trait object can be used with any smart pointer (e.g. Box
, Arc
, etc.) or reference, but it must be a pointer since the concrete size is not known at compile time.
Trait bounds must hold a homogeneous type, but trait objects do not have this limitation. However, not all types can be trait objects.
Under the hood, bounds are implemented by the compiler generating a bunch of versions of a function. Objects require a lookup table (vtable) that is used at runtime to find what to execute.
bounds
The bounding aspect of traits is definitely my favorite quality, giving as much information to the compiler as possible. There are multiple spots to use it, some less obvious than others. Maybe captured by anywhere there is a generic parameter?
function arguments
The most obvious use case is bounding the type consumed by a function. This allows the function to only care about a specific interface of the argument, and not anything else it happens to implement.
conditional implementations
Methods can be conditionally implemented on a generic type depending on trait bounds.
impl<T: Display + PartialOrd> Pair<T> {
...
}
The impl block only applies to Pair
s who’s inner type implements Display
and PartialOrd
.
A trait is a set of shared behavior for multiple types, but we have the flexibility to only implement a method based on the inner type. An inner type can be moved to an associated type, but we still have the flexibility to conditionally implement, like a bound on a default method of a trait. Looks like there might be nicer syntax for this “soon”, but in any case, the RFC gives a nice breakdown of all syntax.
blanket implementations
Conditional implementations can be inverted as well and a trait can be blanket implemented for any type restricted by a bound.
impl<T: Display> ToString for T {
...
}
The ToString
trait is defined for any type T
which implements Display
.
marker traits
A marker trait tells you something about a type, but without adding any new methods or features. Some of these are auto-added by the compiler. All auto traits are marker traits but not all marker traits are auto traits.
The Sized trait is an auto-applied marker trait that lets everyone know a type’s size is known at compile time. A type without Sized is a dynamically sized type (DST). Slices and trait objects are DSTs, but for slices, usually dealing with a sized fat-pointer.
All generic type parameters are auto-bound with Sized
by default.
extension traits
Incorporate additional methods on a type outside the crate it is defined in. This fits in Rust’s orphan rule.
sealed traits
A sealed trait can only be implemented in the crate it is defined. This makes it less flexible, but easier for the maintainer to not break downstream code. A small contract with the caller. This isn’t a first class language thing, you have to get tricky with public and private trait combining.
Concurrency
The borrow checker can help out a bunch to make programs safe for concurrency. Similar to the golang channel strategy of “only one owner” of a message at a time, the borrow checker is clearly a more general ownership checker. For the common data races (corrupted data) and deadlock (stuck program) issues, the borrow checker guards against data races very well. Deadlocks are still very possible though. Best to follow the golang mantra, “Do not communicate by sharing memory; instead, share memory by communicating”. In Rust, that means leveraging ownership instead of locks.
pin
Pinning keeps a pointer “pinned” in memory. It is used a lot in async-land because futures are self-referential.