The Monthly Oxide #5

From error prone primitives to solid newtypes

Welcome to another edition of the Monthly Oxide, a newsletter that covers something about Rust for you to learn about and some links and articles that I found interesting in the past month. I took off July from writing, because of a lot of good personal things going on in my life that made me a bit busier than normal. With that I want to spend this month talking about some things that make effective uses of newtype wrappers and types in general. Newtypes are just declarations that look like this pub struct MyType(String) where String might also just be a primitive like u64. We want to wrap these lower level types that are able to represent many things, to represent one concrete idea or type as you will see. I'm going to step through how you might build up an API that uses these types effectively and why you might want to use them over primitives alone. With that in mind lets get to it.

Using Newtype Wrappers/Types Effectively

One thing that I think is easy to do is make an API that uses &str/String and primitives like u8 all over the place. For instance say you want to do some things with time conversion between seconds to other times. You could just use a u64 like so:

fn secs_to_ms(input: u64) -> u128 {
  // Casting is fine since we go up in
  // terms of the amount of available bits
  (input as u128) * 1000
}

While the name itself should inform someone these are seconds and you get back ms, they are still raw numbers that you have to keep track of their meaning. We could instead do something like this:

#[derive(Clone, Copy, Debug, PartialEq, Eq, Ord, PartialOrd)]
pub struct Seconds(u64);

#[derive(Clone, Copy, Debug, PartialEq, Eq, Ord, PartialOrd)]
pub struct Milliseconds(u128);

impl From<Seconds> for Milliseconds {
  fn from(sec: Seconds) -> Self {
    Self((sec.into_inner() as u128) * 1000)
  }
}

impl From<Milliseconds> for Seconds {
  fn from(milli: Milliseconds) -> Self {
    // This might be better off using try_into instead
    // but this is an example bit of code
    Self((milli.into_inner()/1000) as u64)
  }
}

impl Seconds {
  pub(crate) fn into_inner(self) -> u64 {
    self.0
  }
  
  pub fn new(input: u64) -> Self {
    Self(input)
  }
  
  pub fn as_millis(&self) -> Milliseconds {
    self.clone().into()
  }
}

impl Milliseconds {
  pub(crate) fn into_inner(self) -> u128 {
    self.0
  }
  
  pub fn new(input: u128) -> Self {
    Self(input)
  }
  
  pub fn as_secs(&self) -> Seconds {
    self.clone().into()
  }
}

fn main() {
    let sec = Seconds::new(20);
    let millis = Milliseconds::new(20000);
    
    assert_eq!(sec.as_millis(), millis);    
    assert_eq!(sec, millis.as_secs());
}

Rust Playground Link

Now this might be more involved, but there are quite a few good things we get out of this:

  • We are representing the concept of time as a type itself. This means we can make methods or implement traits that work on the type itself such as conversion methods.

  • We could even create PartialEq/Eq impls to effectively compare types for equality like between Seconds and Milliseconds so that the conversion is always done properly.

  • Since it's a type we can make sure that only the correct type is used as input to a function. This is good if we want to only work with say Seconds.

  • We can make sure that we always handle the conversions properly rather than doing it everywhere we need it by hand. This means we could also make the above code even more robust by making sure the conversions will never fail (they might when downcast from seconds to milliseconds possibly)

My point is that using types to model the behavior and wrapping up primitives in a newtype or as part of a larger type makes it easier to do the correct thing, accept the right thing, and model states or options properly with enums as well. Now you might find that you would want to accept a lot more as input. For instance say you wanted to print out any input of time as if it was in milliseconds. Well we can use our conversion traits here again and add some Display impls as well.

use std::fmt;
#[derive(Clone, Copy, Debug, PartialEq, Eq, Ord, PartialOrd)]
pub struct Seconds(u64);

#[derive(Clone, Copy, Debug, PartialEq, Eq, Ord, PartialOrd)]
pub struct Milliseconds(u128);

impl From<Seconds> for Milliseconds {
  fn from(sec: Seconds) -> Self {
    Self((sec.into_inner() as u128) * 1000)
  }
}

impl From<Milliseconds> for Seconds {
  fn from(milli: Milliseconds) -> Self {
    // This might be better off using try_into instead
    // but this is an example bit of code
    Self((milli.into_inner()/1000) as u64)
  }
}

impl Seconds {
  pub(crate) fn into_inner(self) -> u64 {
    self.0
  }
  
  pub fn new(input: u64) -> Self {
    Self(input)
  }
  
  pub fn as_millis(&self) -> Milliseconds {
    self.clone().into()
  }
  
  pub fn print(&self) {
    println!("{}", self)
  }
}

impl Milliseconds {
  pub(crate) fn into_inner(self) -> u128 {
    self.0
  }
  
  pub fn new(input: u128) -> Self {
    Self(input)
  }
  
  pub fn as_secs(&self) -> Seconds {
    self.clone().into()
  }
  
  pub fn print(&self) {
    println!("{}", self)
  }  
}

impl fmt::Display for Milliseconds {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}ms", self.0)
    }
}

impl fmt::Display for Seconds {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}s", self.0)
    }
}

pub enum Time {
  Seconds(Seconds),
  Milliseconds(Milliseconds),
}

impl Time {
  pub fn print(&self) {
    match self {
      Self::Seconds(s) => s.print(),
      Self::Milliseconds(m) => m.print(),
    }
  }
}

impl From<Time> for Seconds {
  fn from(input: Time) -> Self {
    match input {
      Time::Seconds(s) => s,
      Time::Milliseconds(ms) => ms.into()
    }
  }
}

impl From<Time> for Milliseconds {
  fn from(input: Time) -> Self {
    match input {
      Time::Seconds(s) => s.into(),
      Time::Milliseconds(ms) => ms,
    }
  }
}

impl From<Seconds> for Time {
    fn from(input: Seconds) -> Self {
        Self::Seconds(input)
    }
}

impl From<Milliseconds> for Time {
    fn from(input: Milliseconds) -> Self {
        Self::Milliseconds(input)
    }
}

fn print_out_only_ms(input: impl Into<Milliseconds>) {
  println!("{}", input.into());
}

fn print_time_5_times(time: impl Into<Time>) {
  let time = time.into();
  time.print();
  time.print();
  time.print();
  time.print();
  time.print();
}

fn main() {
  let sec = Seconds::new(20);
  let millis = Milliseconds::new(20000);
    
  assert_eq!(sec.as_millis(), millis);    
  assert_eq!(sec, millis.as_secs());
    
  print_out_only_ms(sec);
    
  print_time_5_times(sec);
  print_time_5_times(millis);
  let time_sec: Time = sec.into();
  let time_millis: Time = millis.into();
  let sec: Seconds = time_millis.into();
  let millis: Milliseconds = time_sec.into();

  assert_eq!(Milliseconds::new(20000), millis);
  assert_eq!(Seconds::new(20), sec);
}

Rust Playground Link

Both work as input and the benefit of this approach is that we can now convert between an enum that represents all the different times that exist and back out freely to a type that handles one specific unit of time. We're combining the power of structs and enums here and building on all of our previous work. We now have an enum where we have it handle working with various units of time, like our function that prints out the proper time and unit, just fine. We also know we can freely convert from something that is any unit of time into one specific unit of time! All while maintaining a nice type system and avoiding using primitives directly.

One of the other benefits I mentioned was that you could use types instead of primitives as input and that this is good. For instance consider the bool. How many times, for say the filter method on Iterator, have you had to look up whether it being true or it being false is what filtered out the item? Wouldn't it be better if you could return an item like this instead?

enum Filter {
  Keep,
  Remove
}

It's effectively a bool but now you have stated what it's two states are for and this makes it easier to know what each state means, but it's also useful for situations where you might need to turn on or off two options in function like so:

fn print_value(&self, bold: bool, italics: bool) {
  // code to print off the value with the options
}

What happens if you swap those values? It will type check and not necessarily do what you want it to do. Instead you might want the following instead so that you know what you’re getting for input:

enum Bold {
  On,
  Off,
}

enum Italics {
  On,
  Off,
}

// Example fn call
item.print_value(Bold::On, Italics::Off);

This isn't just for bool though, taking multiple numbers as input can be error prone like say xyz coordinates:

pub struct X(pub i64);
pub struct Y(pub i64);
pub struct Z(pub i64);

struct Point {
  x: i64,
  y: i64,
  z: i64
}

impl Point {
  pub fn new(x: i64, y: i64, z: i64) -> Self {
    Self { x, y, z }
  }
}

fn main() {
    let point = Point::new(0, -1, -3);
    assert_eq!(point.x, 0);
    assert_eq!(point.y, -1);
    assert_eq!(point.z, -3);
    
    // Compiles! Even if we didn't want the order swapped
    let point = Point::new(-1, 0, -3);
}

Rust Playground Link

If you swap that order you might have data going in all the wrong directions. Instead you could do something like this to make it less error prone for those constructing the type:

pub struct X(pub i64);
pub struct Y(pub i64);
pub struct Z(pub i64);

struct Point {
  x: i64,
  y: i64,
  z: i64
}

impl Point {
  pub fn new(X(x): X, Y(y): Y, Z(z): Z) -> Self {
    Self { x, y, z }
  }
}

fn main() {
    let point = Point::new(X(0), Y(-1), Z(-3));
    assert_eq!(point.x, 0);
    assert_eq!(point.y, -1);
    assert_eq!(point.z, -3);
    
    // Won't compile
    // let point = Point::new(Y(-1), X(0), Z(-3));
}

Rust Playground Link

It might be a bit more extra work but you can signify what each input is so you can see "yes this is what I want", and the destructuring in function args (a little known but excellent trick) makes dealing with it easier when putting the values into a struct. I tend to use this when I have multiple inputs of the same primitive going into a constructor that mean a different thing, but when assigned to values in a struct it's clear what they are based off the name of the field. You can use this for any function though. There's a trade off here in the sense that some people might find extra type importing just to do arguments to a function is a lot. Since we don't have labelled args for input this is the better choice to show what each input should be and you don't want to have people mess it up. I do however think you can get a lot of use out of it, so something to consider! At the very least please stop using bool and start using two variant enums that mean things for input to a function. Your coworkers/crate users will thank you.

Lastly I have one last thing I want to talk to you about when it comes to newtypes and that's deserialization and validation of input! I have seen this happen where serde's default derive for Deserialize is used to get the data in, then after that TryInto is used to turn the data into the actual type after it does validation on the input! I'm here to tell you you can just do this with only one type for deserialization itself. Let's consider input on GitHub where we want to make sure the input are two valid SHA1 hashes (we're not worrying if they're in the repo or if they're a short form for this example). Now you might write the code like this:

use serde::Deserialize;
use std::{str, convert::TryInto};
use thiserror::Error;

#[derive(Deserialize)]
pub struct GitHubDiffInput {
    base: String,
    head: String,
}

pub struct GitHubDiff {
    base: String,
    head: String
}

#[derive(Debug,Error)]
pub enum DiffConvertError {
    #[error("The base input is not 40 chars. len was: {0}")]
    BaseTooShort(usize),
    #[error("The head input is not 40 chars. len was: {0}")]
    HeadTooShort(usize),
    #[error("The base input is not valid hex. input was: {0}")]
    BaseInvalidChars(String),
    #[error("The head input is not valid hex. input was: {0}")]
    HeadInvalidChars(String)
}

impl TryInto<GitHubDiff> for GitHubDiffInput {
    type Error = DiffConvertError;
    fn try_into(self) -> Result<GitHubDiff, Self::Error> {
        if self.base.len() != 40 {
            return Err(DiffConvertError::BaseTooShort(self.base.len()));
        }
        
        if self.head.len() != 40 {
            return Err(DiffConvertError::HeadTooShort(self.base.len()));
        }
        
        for next in self.base.as_bytes().windows(2){
            let hex = str::from_utf8(next)
                .map_err(|_| DiffConvertError::BaseInvalidChars(self.base.clone()))?;
            usize::from_str_radix(hex, 16)
                .map_err(|_| DiffConvertError::BaseInvalidChars(self.base.clone()))?;
        }
        
        for next in self.head.as_bytes().windows(2){
            let hex = str::from_utf8(next)
                .map_err(|_| DiffConvertError::HeadInvalidChars(self.head.clone()))?;
            usize::from_str_radix(hex, 16)
                .map_err(|_| DiffConvertError::HeadInvalidChars(self.head.clone()))?;
        }
        
        Ok(GitHubDiff {
            base: self.base,
            head: self.head,
        })
    }
}

fn main() -> Result<(),Box<dyn std::error::Error>> {
    let raw_input = r#"{ 
        "base": "6f2487c610f0acbcea3485149e6ebd3479641f96" ,
        "head": "899c44a7ef09bc78d3623d28c0a82fdaba8d7a04" 
    }"#;
    
    let raw_diff: GitHubDiffInput = serde_json::from_str(&raw_input)?;
    let diff: GitHubDiff = raw_diff.try_into()?;
    println!("base: {}", diff.base);
    println!("head: {}", diff.head);
    
    // Do stuff with the diff
    Ok(())
}

Rust Playground Link

Now this is fine, but it would be much nicer if you could just deserialize right into the GitHubDiff type. You also might be building an application where you need a SHA1 string as input all over the place. Let's try this again using newtypes and a custom deserializer:

use serde::{de::{self, Visitor}, Deserializer, Deserialize};
use std::{fmt, str};
use thiserror::Error;

#[derive(Debug, Deserialize)]
pub struct GitHubDiff {
    base: Sha1,
    head: Sha1
}

#[derive(Debug)]
pub struct Sha1(String);

impl Sha1 {
    pub fn new(value: &str) -> Result<Self, Sha1Error> {
        if value.len() != 40 {
            return Err(Sha1Error::HeadTooShort(value.len()));
        }
        
        for next in value.as_bytes().windows(2){
            let hex = str::from_utf8(next)
                .map_err(|_| Sha1Error::BaseInvalidChars(value.into()))?;
            usize::from_str_radix(hex, 16)
                .map_err(|_| Sha1Error::BaseInvalidChars(value.into()))?;
        }
        
        Ok(Self(value.into()))
    }
}
impl fmt::Display for Sha1 {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", self.0)
    }
}

impl<'de> Deserialize<'de> for Sha1 {
    fn deserialize<D>(deserializer: D) -> Result<Sha1, D::Error>
    where
        D: Deserializer<'de>,
    {
        deserializer.deserialize_str(Sha1Visitor)
    }
}

struct Sha1Visitor;
impl<'de> Visitor<'de> for Sha1Visitor {
    type Value = Sha1;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str("a valid 40 char hexadecimal string")
    }

    fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
    where
        E: de::Error,
    {
        Sha1::new(value).map_err(de::Error::custom)
    }
}

#[derive(Debug,Error)]
pub enum Sha1Error {
    #[error("The base input is not 40 chars. len was: {0}")]
    BaseTooShort(usize),
    #[error("The head input is not 40 chars. len was: {0}")]
    HeadTooShort(usize),
    #[error("The base input is not valid hex. input was: {0}")]
    BaseInvalidChars(String),
    #[error("The head input is not valid hex. input was: {0}")]
    HeadInvalidChars(String)
}

fn main() -> Result<(),Box<dyn std::error::Error>> {
    let raw_input = r#"{ 
        "base": "6f2487c610f0acbcea3485149e6ebd3479641f96" ,
        "head": "899c44a7ef09bc78d3623d28c0a82fdaba8d7a04" 
    }"#;
    
    let diff: GitHubDiff = serde_json::from_str(&raw_input)?;
    println!("base: {}", diff.base);
    println!("head: {}", diff.head);
    
    // Do stuff with the diff
    Ok(())
}

Rust Playground Link

Now this might seem a bit more involved because we did have to implement our own deserializer for serde, but the benefits here can't be understated. We get to have a type that represents a SHA1 sum that we can construct both in our code properly with validation via fn new, but also we get the same validation when converting from form input from a website. No need for intermediate types at all for deserialization or error prone handling of strings once we do have the type since we know it's a valid input that has been checked and we can make methods that work specifically on the type that we might need!

We covered a lot today and this is probably the longest of the newsletter's so far, but I do hope you see and think about how you can use the type system to your advantage to protect against invalid inputs, make a less confusing API, and make sure that your types get the right values in the right order!

Interesting Articles this Month

- open and closed universes - My friend Rain did another great article about how you might think about and represent choices in your API with enums and traits in Rust and what that means. It's a short and sweet read. Definitely worth reading

- The push for GATs stabilization - GATs! They're almost here! If you've not needed them it's kinda hard to see what they could do, but they'll unlock all kinds of really nice things to build solid Rust APIs and I can't wait. Want to know what all the fuss is about? Definitely read up if you haven't seen this yet.

Interesting Projects this Month

- ariadne - A pretty error diagnostics library with output and an API to construct errors much like rustc

- hazy - A crate my coworker at the time Esteban had worked on where you can derive OpaqueDebug on a type (like say a newtype) and it will redact the type when printed to logs and things like that so you don't say accidentally log API keys