The Newtype Pattern in Rust

Abstract

Programming design patterns are patterns that come up in a variety of different situations while programming. In this article I discuss the Newtype design pattern. Specifically, I discuss it in the context of the Rust programming language, and how to solve some of the problems that arise when using the Newtype pattern in Rust.

Design Patterns in Rust

Programming design patterns are patterns that come up in a variety of different situations while programming. That isn't to say that design patterns mean you don't need to think about the problem yourself, but design patterns give you a toolbox of ideas to help you think about solutions.

Different programming languages have different ways of expressing things. The classic book on design patterns, Design Patterns: Elements of Reusable Object-Oriented Software, wrote patterns around object oriented C++ and Smalltalk. While most of these patterns are still applicable to other object oriented programming languages, they might need a tweak here and there to make them work well.

Rust is an interesting programming language in that the design of the language takes ideas from object oriented, procedural, and functional programming languages. This means that there are different patterns that are useful, and the existing patterns may be better expressed in a new way.

In this article, I'm going to explain a pattern that I've been finding useful in my Rust code: The Newtype Pattern.

The Problem: Primitive Types Aren't Descriptive

Imagine that you're working on a large codebase. Like many projects, your project includes some user information, so you have a struct that looks like this:

pub struct Person {
    pub name: String,
    pub phone_number: String,
    pub id_number: String,
    pub age: u32
}

A few months down the line, you're looking at some code on the other side of the codebase. You want to get a person out of your database. This is the function signature:

pub fn load_person(person: String) -> Result<Person>;

Oh dear. What is that parameter supposed to be? Is it the person's ID number? Their name maybe?

Then there's the uncertainty on what exactly age means. How would you, for example, implement this function?

pub fn time_to_retirement(current_age: u32) -> u32;

Is it age in years? It's common to store timestamps in as a number of seconds, so maybe it's age in seconds?

The Newtype Pattern

The Newtype patterns is when you take an existing type, usually a primitive like a number or a string, and wrap it in a struct. This lets us add more information about the data to the type system to potentially catch errors, and make our code more expressive.

Let's see how we would apply it to our person example.

You'd first define your Newtypes. The pattern is just a value, wrapped in a struct.

pub struct Name(String);
pub struct PhoneNumber(String);
pub struct IdNumber(String);
pub struct Years(u32);

If you haven't encountered a struct like this where we don't name the fields, it's called a tuple struct. The Newtype is a special case of tuple struct, where we only have one field.

Then you can start using your new types in your Person struct.

pub struct Person {
    pub name: Name,
    pub phone_number: PhoneNumber,
    pub id_number: IdNumber,
    pub age: Years
}

As a benefit, our load_person function is much clearer. If the type is IdNumber, rather than String, you know to use the person's ID number.

pub fn load_person(person: IdNumber) -> Result<Person>;

Our age is also much clearer now too. The Years type makes it obvious that our age is in years, not seconds.

pub fn time_to_retirement(current_age: Years) -> Years;

Strings are a common use case for Newtypes, since you can use them to add validation around formatting of the string. For example, in South Africa, ID Numbers have a set format that you can validate against.

Problem 1: How Do I Construct The Newtype?

You may have noticed in the examples above that the Newtype itself is public, but the internal data is private. In its current form, this code won't work:

// Usually other modules would actually be in a different file, but
// this isn't a normal project, it's a blog article! After this example
// we won't explicitly be putting our Newtypes in a different module
// to simplify the examples.
mod some_module {
    pub struct PhoneNumber(String);
}

fn main() {
    // You would be able to access the private inner string directly
    // like this if this code was in the same module as the newtype,
    // but for the rest of the codebase this will fail.
    let num = some_module::PhoneNumber("555-12345".to_string());
    println!("{}", num.0)
}

error[E0603]: tuple struct constructor `PhoneNumber` is private
 --> rust-src-b5wQbx.rs:9:28
  |
2 |     pub struct PhoneNumber(String);
  |                            ------ a constructor is private if any of the fields is private
...
9 |     let num = some_module::PhoneNumber("555-12345".to_string());
  |                            ^^^^^^^^^^^ private tuple struct constructor
  |
note: the tuple struct constructor `PhoneNumber` is defined here
 --> rust-src-b5wQbx.rs:2:5
  |
2 |     pub struct PhoneNumber(String);
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0616]: field `0` of struct `some_module::PhoneNumber` is private
  --> rust-src-b5wQbx.rs:10:24
   |
10 |     println!("{}", num.0)
   |                        ^ private field

How exactly you handle this will depend on your type. Generally speaking, you can give your type some functions, like a constructor function and some function to get the data out. This would work:

pub struct PhoneNumber(String);
impl PhoneNumber {
    pub fn new(s: String) -> PhoneNumber {
        PhoneNumber(s)
    }
    pub fn as_str(&self) -> &str {
        // We didn't name the inner type, so it follows the same
        // naming convention as tuples. In other words, the inner
        // field is called `0`.
        &self.0
    }
}

fn main() {
    let num = PhoneNumber::new("555-1234".to_string());
    println!("{}", num.as_str())
}

You can add as many other functions as you want here. It's a great place to put any domain logic you might have around your data. For example, phone numbers might have different standard formattings that different contexts require, or you might be able to use the start of the phone number to figure out the country it refers to.

Some Useful Standard Library Traits

The two example functions I used above, constructing your type from a string and formatting the data as a string, seem like they would come up when looking at many different types. In fact, the Rust standard library has a number of traits that it makes sense to implement for your Newtype. Implementing the standard library traits rather than just your own functions will make it easier to use your Newtype together with the standard library, and many other Rust libraries. Let's take a look at some of them.

FromStr and Display

If, like in our phone number example, we're specifically interesting in working with strings, then there are two traits from the standard library that we should implement: FromStr and Display.

pub struct PhoneNumber(String);

use std::str::FromStr;
impl FromStr for PhoneNumber {
    type Err = Box<dyn std::error::Error>;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        Ok(PhoneNumber(s.to_string()))
    }
}

use std::fmt;
impl fmt::Display for PhoneNumber {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{}", self.0)
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // parse() uses FromStr
    let num: PhoneNumber = "555-1234".parse()?;

    // you can also call from_str directly
    let num = PhoneNumber::from_str("555-1234")?;

    // Display gives you a to_string function
    let num_as_string = num.to_string();

    // Display can also be called directly by println! or format!
    println!("Phone number is {}", num);
    Ok(())
}

Deref

If you're wrapping a string, implementing Deref can also be useful. It will let you pass your string-wrapping Newtype into functions that require a &str.

More generally, Deref is useful if you want to tell the compiler that, if it needs to, it can take an immutable reference to the data you're wrapping.

pub struct PhoneNumber(String);

use std::str::FromStr;
impl FromStr for PhoneNumber {
    type Err = Box<dyn std::error::Error>;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        Ok(PhoneNumber(s.to_string()))
    }
}

use std::ops::Deref;
impl Deref for PhoneNumber {
    type Target = str;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let num = PhoneNumber::from_str("555-1234")?;

    // Deref can be called when we take a reference. The function
    // takes a &str and our type can Deref from &PhoneNumber to &str.
    print_strings(&num);
    Ok(())
}

fn print_strings(s: &str) {
    println!("I've been asked to print {}", s);
}

Deref is called behind the scenes by the compiler and having an implementation of Deref may affect which functions the compiler calls when you're using your type. It's meant for when you're implementing smart pointers. If you want functionality similar to Deref, but don't want to let the compiler call it implicitly, a good alternative is to add your own function with a name of your choice, like as_str.

pub struct PhoneNumber(String);

use std::str::FromStr;
impl FromStr for PhoneNumber {
    type Err = Box<dyn std::error::Error>;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        Ok(PhoneNumber(s.to_string()))
    }
}

impl PhoneNumber {
    fn as_str(&self) -> &str {
        &self.0
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let num = PhoneNumber::from_str("555-1234")?;

    // Since we didn't implement Deref, the compiler can't convert to
    // a string implicitly, but it's still possible for us to do that
    // dereferencing explicitly.
    print_strings(num.as_str());
    Ok(())
}

fn print_strings(s: &str) {
    println!("I've been asked to print {}", s);
}

Both implementing Deref and implementing your own function that returns a reference to your wrapped data are the same in that you're directly exposing the data you're wrapping. In many cases this may be a leaky abstraction, so do it with caution. Only do this if you want the whole internal type to be part of your public API.

From, Into, TryFrom, and TryInto

Implementing FromStr and Display are fine when you're wrapping strings, but what if you're wrapping something else? That's where the From and Into come in, with their fallible cousins TryFrom and TryInto.

If you implement From<T> for your Newtype, then your Newtype can be created from a T. Into<T> is the other side of From, so if you implement Into<T> for your type then your type can be converted into a T. Of these two, you should always implement From, and the standard library will automatically implement the corresponding Into for you.

#[derive(Clone, Copy)]
pub struct Years(u32);

impl From<u32> for Years {
    fn from(val: u32) -> Years {
        Years(val)
    }
}

fn main() {
    // We can call from directly
    let years = Years::from(10);

    // By implementing `From<u32> for Years`, we also get 
    // `Into<Years> for u32` for free!
    let years: Years = 10.into();
}

From Rust 1.41 (released in Jan 2020), you never actually need to implement Into by hand. Previously, you weren't able to implement From in certain situations because of the orphan rule and so would implement Into instead. This was improved in Rust 1.41. Long story short, implement From, not Into.

From and Into are useful when the conversion will always succeed, but this isn't always the case. Sometimes, we want to implement a function that will sometimes do a conversion, and sometimes reject it with a validation error. That's where TryFrom and TryInto come in. It's basically the same thing, but they return a Result.

#[derive(Clone, Copy)]
pub struct Years(u32);

use std::convert::TryFrom;
use std::convert::TryInto;
impl TryFrom<u64> for Years {
    type Error = &'static str;
    fn try_from(val: u64) -> Result<Years, Self::Error> {
        if val > u32::MAX as u64 {
            Err("Number out of range")
        } else {
            Ok(Years(val as u32))
        }
    }
}

fn main() {
    // We can call from directly
    let years = Years::try_from(30 as u64);

    // By implementing `From<u32> for Years`, we also get 
    // `Into<Years> for u32` for free!
    let error: Result<Years, &'static str> = u64::MAX.try_into();
}

You may be wondering why FromStr exists when we could implement TryFrom<&str>. I think the only real reason here is legacy. FromStr was part of the original 1.0 release of Rust in 2015. TryFrom on the other hand, was stabilized in version 1.34 of Rust, released in April 2019.

Arithmetic Operators

When you're wrapping numbers, you may still want to still be able to do math using the numbers. For example, you could have two durations in years and want to be able to add them together. The traits you're probably interested in implementing are Add, Sub, Mul, and Div in std::ops.

#[derive(Clone, Copy)]
pub struct Years(u32);

impl From<u32> for Years {
    fn from(val: u32) -> Years {
        Years(val)
    }
}

use std::fmt;
impl fmt::Display for Years {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{} years", self.0)
    }
}

use std::ops::Add;
impl Add for Years {
    type Output = Years;
    fn add(self, rhs: Years) -> Years {
        Years(self.0.add(rhs.0))
    }
}

fn main() {
    let age_1 = Years::from(5);
    let age_2 = Years::from(2);

    println!("{} + {} = {}", age_1, age_2, age_1 + age_2);
}

Implementing Newtype Traits the Easy Way: Derive More

I can see why you might currently look at the code snippets above and think this is more work than its worth. There's an easier way!

My personal favourite crate for making the Newtype pattern easier to implement is Derive More. Derive More uses procedural macros to generate the same boilerplate code that we were writing by hand before.

// cargo-deps: derive_more = "0.99"
extern crate derive_more;

use derive_more::{FromStr, Display};
#[derive(FromStr, Display)]
pub struct PhoneNumber(String);

use std::str::FromStr;
fn main() -> Result<(), Box<dyn std::error::Error>> {
    // This is the same usage as before.

    let num: PhoneNumber = "555-1234".parse()?;
    let num = PhoneNumber::from_str("555-1234")?;
    let num_as_string = num.to_string();
    println!("Phone number is {}", num);
    Ok(())
}

Similarly, Derive More can implement the traits useful for numbers too.

// cargo-deps: derive_more = "0.99"
extern crate derive_more;

use derive_more::{From, Display, Add};
#[derive(Clone, Copy, From, Display, Add)]
#[display(fmt = "{} years", _0)]
pub struct Years(u32);

fn main() {
    let age_1 = Years::from(5);
    let age_2 = Years::from(2);

    println!("{} + {} = {}", age_1, age_2, age_1 + age_2);
}

Parse, don't validate

So far, my examples have simply wrapped primitives, but let you put any value into them. You can take things a step further, and have your types validate as you parse into them, and reject invalid values.

That is, you can use the type system to indicate when you've already done validation. For example, if you have some condition like "ID Numbers must be exactly 13 digits long", then instead of passing around a string and checking its length everywhere, rather parse it into special IdNumber type. If the string isn't a valid ID number, fail the parsing. This means that, everywhere else in your program, if you're taking in an IdNumber type you know that it already has a valid value.

// cargo-deps: derive_more = "0.99"
extern crate derive_more;

use derive_more::Display;
#[derive(Display, Debug, PartialEq)]
pub struct IdNumber(String);

use std::str::FromStr;
impl FromStr for IdNumber {
    type Err = IdNumberParseError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        if s.len() != 13 {
            Err(IdNumberParseError::InvalidFormat)
        } else {
            Ok(IdNumber(s.to_string()))
        }
    }
}

#[derive(Display, Debug, PartialEq)]
pub enum IdNumberParseError {
    InvalidFormat
}
impl std::error::Error for IdNumberParseError {}

fn main() {
    let id = IdNumber::from_str("12345");
    assert_eq!(id, Err(IdNumberParseError::InvalidFormat));

    let id = IdNumber::from_str("1234567890123").unwrap();

    println!("My ID Number is {}", id);
}

This approach of putting more information into your type system is sometimes referred to as type driven development. Alexis King has summed it up into the snappy phrase "Parse, don't validate".

It's even better if you can represent your type such that invalid data is unrepresentable. For example, with our ID number example, ID numbers have to be exactly 13 characters long. We could represent this as struct IdNumber([char;13]); to make invalid lengths unrepresentable. Usually this would be straying beyond the Newtype pattern so I'm not going to go down this rabbit hole in this article, but a good example to look at is the Rust standard library's implementation of IpAddr.

StructOpt and Serde

Sometimes, the place that the data comes into your program is managed by some other library. If your program is a web server, you're probably receiving data in JSON format and deserializing it with Serde. If you're writing a command line application, you're probably parsing command line arguments using something like StructOpt.

StructOpt uses FromStr to parse command line arguments. If you've already implemented FromStr for your type with any necessary validation, then StructOpt gives you your input validation for free!

Serde makes things a little bit more complicated. Serde has a "transparent" option for Newtypes, which bypasses your container and directly works with the value inside. While this seems like a good idea for serializing, when you're deserializing it will bypass any validation logic you've put into your FromStr or TryFrom implementation. Luckily, Serde also lets you point at TryFrom and Into implementations.

For strings, this means that you need to implement TryFrom<String> as well as FromStr, even though the two are basically the same thing. You will also need to implement Into<String>, even though you'd normally just implement Display. Luckily, Derive More can implement Into for us.

Derive More implementing Into is actually a bit of a misnomer. When you ask Derive More to derive Into, it will actually derive From on the type you're converting into. As I mentioned above, this causes the standard library to give you the implementation of Into that you were after anyway, so it works out to the same thing in the end.

//! ```cargo
//! [dependencies]
//! derive_more = "0.99"
//! serde = { version = "1", features = ["derive"] }
//! serde_json = "1"
//! ```
extern crate derive_more;
extern crate serde;
extern crate serde_json;

use derive_more::{Display, Into};
use serde::{Serialize, Deserialize};
#[derive(Display, Debug, Clone, Serialize, Deserialize)]
#[serde(try_from = "String", into = "String")]
pub struct IdNumber(String);
impl From<IdNumber> for String {
  fn from(s: IdNumber) -> String {
    s.0
  }
}

use std::str::FromStr;
impl FromStr for IdNumber {
    type Err = IdNumberParseError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        if s.len() != 13 {
            Err(IdNumberParseError::InvalidFormat)
        } else {
            Ok(IdNumber(s.to_string()))
        }
    }
}
use std::convert::TryFrom;
impl TryFrom<String> for IdNumber {
    type Error = IdNumberParseError;
    fn try_from(value: String) -> Result<Self, Self::Error> {
        // This is boilerplate code, but don't rely on Derive More's
        // TryFrom for this. We need it to go through our
        // implementation of FromStr so that it follows our validation
        // rules.
        value.parse()
    }
}

#[derive(Display, Debug, PartialEq)]
pub enum IdNumberParseError {
    InvalidFormat
}
impl std::error::Error for IdNumberParseError {}

fn main() {
    let json = "\"1234567890123\"".to_string();
    let deserialized: IdNumber = serde_json::from_str(&json).unwrap();

    println!("deserialized = {:?}", deserialized);
}

Summary

Use the Newtype pattern to add more information to the type system.
Implement FromStr or TryFrom<T> for types that are the primitive equivalent of your Newtype. FromStr is generally useful even if your type doesn't wrap a string.
Implement Display and Into<T> to make it easier to integrate with external code that doesn't know about your Newtype.
Implement Deref if it's fine for code to (immutably) bypass your Newtype and read the data inside it directly.
For numeric types, consider implementing arithmetic operators like Add. Only do this if it makes sense to actually use those arithmetic operators on your types.
If you use Serde, have it implement Serialize and Deserialize in terms of your TryFrom<T> and Into<T> functions. This lets you enforce your type's validation through Serde.
If you use StructOpt, enjoy the free validation from implementing FromStr.

Where Have I Been Using The Newtype Pattern?

I'm currently working at Panoptix, where we've gone all in on using Rust end to end. This includes server side backend code, command line utilities, and even a web front end. I've been using the Newtype pattern as a way to encode validation logic into the type system.

The same validation logic is called on all entry points to the system, whether it's front end validation as someone's typing on a form, a CLI parameter, or embedded in an API payload. The best part is that the validation is called exactly once, and you can't forget to call it (or accidentally bypass it while refactoring), because it's baked into the type system.

Refactoring existing code to use the Newtype was also fairly painless: change the type in a few places and follow the compilation errors to find where the data goes.

This is definitely a pattern that I would recommend trying out.

Support

If you get value from these blog articles, consider supporting me on Patreon. Support via Patreon helps to cover hosting, buying computer stuff, and will allow me to spend more time writing articles and open source software.

Advent of Code: Expressing yourself

Every year, I participate in the Advent of Code programming advent calendar. This year, I set myself the challenge to complete the puzzles using only pure expressions in Rust. In this article, I share some of the techniques I used and how it worked out.

Property Based Testing

Property based testing is a useful technique for testing software. In this article, I discuss property based testing, and show an example of how I used a property based testing approach when competing in the Entelect Challenge.