Abstract
Programming design patterns are patterns that come up in a variety of different situations while programming. In this article I discuss the Newtype design pattern. Specifically, I discuss it in the context of the Rust programming language, and how to solve some of the problems that arise when using the Newtype pattern in Rust.
Design Patterns in Rust
Programming design patterns are patterns that come up in a variety of different situations while programming. That isn't to say that design patterns mean you don't need to think about the problem yourself, but design patterns give you a toolbox of ideas to help you think about solutions.
Different programming languages have different ways of expressing things. The classic book on design patterns, Design Patterns: Elements of Reusable Object-Oriented Software, wrote patterns around object oriented C++ and Smalltalk. While most of these patterns are still applicable to other object oriented programming languages, they might need a tweak here and there to make them work well.
Rust is an interesting programming language in that the design of the language takes ideas from object oriented, procedural, and functional programming languages. This means that there are different patterns that are useful, and the existing patterns may be better expressed in a new way.
In this article, I'm going to explain a pattern that I've been finding useful in my Rust code: The Newtype Pattern.
The Problem: Primitive Types Aren't Descriptive
Imagine that you're working on a large codebase. Like many projects, your project includes some user information, so you have a struct that looks like this:
pub struct Person { pub name: String, pub phone_number: String, pub id_number: String, pub age: u32 }
A few months down the line, you're looking at some code on the other side of the codebase. You want to get a person out of your database. This is the function signature:
pub fn load_person(person: String) -> Result<Person>;
Oh dear. What is that parameter supposed to be? Is it the person's ID number? Their name maybe?
Then there's the uncertainty on what exactly age
means. How would
you, for example, implement this function?
pub fn time_to_retirement(current_age: u32) -> u32;
Is it age in years? It's common to store timestamps in as a number of seconds, so maybe it's age in seconds?
The Newtype Pattern
The Newtype patterns is when you take an existing type, usually a primitive like a number or a string, and wrap it in a struct. This lets us add more information about the data to the type system to potentially catch errors, and make our code more expressive.
Let's see how we would apply it to our person example.
You'd first define your Newtypes. The pattern is just a value, wrapped
in a struct
.
pub struct Name(String); pub struct PhoneNumber(String); pub struct IdNumber(String); pub struct Years(u32);
If you haven't encountered a
struct
like this where we don't name the fields, it's called a tuple struct. The Newtype is a special case of tuple struct, where we only have one field.
Then you can start using your new types in your Person
struct.
pub struct Person { pub name: Name, pub phone_number: PhoneNumber, pub id_number: IdNumber, pub age: Years }
As a benefit, our load_person
function is much clearer. If the type
is IdNumber
, rather than String
, you know to use the person's ID
number.
pub fn load_person(person: IdNumber) -> Result<Person>;
Our age is also much clearer now too. The Years
type makes it
obvious that our age is in years, not seconds.
pub fn time_to_retirement(current_age: Years) -> Years;
Strings are a common use case for Newtypes, since you can use them to add validation around formatting of the string. For example, in South Africa, ID Numbers have a set format that you can validate against.
Problem 1: How Do I Construct The Newtype?
You may have noticed in the examples above that the Newtype itself is public, but the internal data is private. In its current form, this code won't work:
// Usually other modules would actually be in a different file, but // this isn't a normal project, it's a blog article! After this example // we won't explicitly be putting our Newtypes in a different module // to simplify the examples. mod some_module { pub struct PhoneNumber(String); } fn main() { // You would be able to access the private inner string directly // like this if this code was in the same module as the newtype, // but for the rest of the codebase this will fail. let num = some_module::PhoneNumber("555-12345".to_string()); println!("{}", num.0) }
error[E0603]: tuple struct constructor `PhoneNumber` is private
--> rust-src-b5wQbx.rs:9:28
|
2 | pub struct PhoneNumber(String);
| ------ a constructor is private if any of the fields is private
...
9 | let num = some_module::PhoneNumber("555-12345".to_string());
| ^^^^^^^^^^^ private tuple struct constructor
|
note: the tuple struct constructor `PhoneNumber` is defined here
--> rust-src-b5wQbx.rs:2:5
|
2 | pub struct PhoneNumber(String);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error[E0616]: field `0` of struct `some_module::PhoneNumber` is private
--> rust-src-b5wQbx.rs:10:24
|
10 | println!("{}", num.0)
| ^ private field
How exactly you handle this will depend on your type. Generally speaking, you can give your type some functions, like a constructor function and some function to get the data out. This would work:
pub struct PhoneNumber(String); impl PhoneNumber { pub fn new(s: String) -> PhoneNumber { PhoneNumber(s) } pub fn as_str(&self) -> &str { // We didn't name the inner type, so it follows the same // naming convention as tuples. In other words, the inner // field is called `0`. &self.0 } } fn main() { let num = PhoneNumber::new("555-1234".to_string()); println!("{}", num.as_str()) }
You can add as many other functions as you want here. It's a great place to put any domain logic you might have around your data. For example, phone numbers might have different standard formattings that different contexts require, or you might be able to use the start of the phone number to figure out the country it refers to.
Some Useful Standard Library Traits
The two example functions I used above, constructing your type from a string and formatting the data as a string, seem like they would come up when looking at many different types. In fact, the Rust standard library has a number of traits that it makes sense to implement for your Newtype. Implementing the standard library traits rather than just your own functions will make it easier to use your Newtype together with the standard library, and many other Rust libraries. Let's take a look at some of them.
FromStr and Display
If, like in our phone number example, we're specifically interesting in working with strings, then there are two traits from the standard library that we should implement: FromStr and Display.
pub struct PhoneNumber(String); use std::str::FromStr; impl FromStr for PhoneNumber { type Err = Box<dyn std::error::Error>; fn from_str(s: &str) -> Result<Self, Self::Err> { Ok(PhoneNumber(s.to_string())) } } use std::fmt; impl fmt::Display for PhoneNumber { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { write!(f, "{}", self.0) } } fn main() -> Result<(), Box<dyn std::error::Error>> { // parse() uses FromStr let num: PhoneNumber = "555-1234".parse()?; // you can also call from_str directly let num = PhoneNumber::from_str("555-1234")?; // Display gives you a to_string function let num_as_string = num.to_string(); // Display can also be called directly by println! or format! println!("Phone number is {}", num); Ok(()) }
Deref
If you're wrapping a string, implementing Deref can also be useful. It
will let you pass your string-wrapping Newtype into functions that
require a &str
.
More generally, Deref
is useful if you want to tell the compiler
that, if it needs to, it can take an immutable reference to the data
you're wrapping.
pub struct PhoneNumber(String); use std::str::FromStr; impl FromStr for PhoneNumber { type Err = Box<dyn std::error::Error>; fn from_str(s: &str) -> Result<Self, Self::Err> { Ok(PhoneNumber(s.to_string())) } } use std::ops::Deref; impl Deref for PhoneNumber { type Target = str; fn deref(&self) -> &Self::Target { &self.0 } } fn main() -> Result<(), Box<dyn std::error::Error>> { let num = PhoneNumber::from_str("555-1234")?; // Deref can be called when we take a reference. The function // takes a &str and our type can Deref from &PhoneNumber to &str. print_strings(&num); Ok(()) } fn print_strings(s: &str) { println!("I've been asked to print {}", s); }
Deref
is called behind the scenes by the compiler and having an
implementation of Deref
may affect which functions the compiler
calls when you're using your type. It's meant for when you're
implementing smart pointers. If you want functionality similar to
Deref
, but don't want to let the compiler call it implicitly, a good
alternative is to add your own function with a name of your choice,
like as_str
.
pub struct PhoneNumber(String); use std::str::FromStr; impl FromStr for PhoneNumber { type Err = Box<dyn std::error::Error>; fn from_str(s: &str) -> Result<Self, Self::Err> { Ok(PhoneNumber(s.to_string())) } } impl PhoneNumber { fn as_str(&self) -> &str { &self.0 } } fn main() -> Result<(), Box<dyn std::error::Error>> { let num = PhoneNumber::from_str("555-1234")?; // Since we didn't implement Deref, the compiler can't convert to // a string implicitly, but it's still possible for us to do that // dereferencing explicitly. print_strings(num.as_str()); Ok(()) } fn print_strings(s: &str) { println!("I've been asked to print {}", s); }
Both implementing Deref
and implementing your own function that
returns a reference to your wrapped data are the same in that you're
directly exposing the data you're wrapping. In many cases this may be
a leaky abstraction, so do it with caution. Only do this if you want
the whole internal type to be part of your public API.
From, Into, TryFrom, and TryInto
Implementing FromStr
and Display
are fine when you're wrapping
strings, but what if you're wrapping something else? That's where the
From and Into come in, with their fallible cousins TryFrom and
TryInto.
If you implement From<T>
for your Newtype, then your Newtype can be
created from a T
. Into<T>
is the other side of From
, so if you
implement Into<T>
for your type then your type can be converted into
a T
. Of these two, you should always implement From
, and the
standard library will automatically implement the corresponding Into
for you.
#[derive(Clone, Copy)] pub struct Years(u32); impl From<u32> for Years { fn from(val: u32) -> Years { Years(val) } } fn main() { // We can call from directly let years = Years::from(10); // By implementing `From<u32> for Years`, we also get // `Into<Years> for u32` for free! let years: Years = 10.into(); }
From Rust 1.41 (released in Jan 2020), you never actually need to implement
Into
by hand. Previously, you weren't able to implementFrom
in certain situations because of the orphan rule and so would implementInto
instead. This was improved in Rust 1.41. Long story short, implementFrom
, notInto
.
From
and Into
are useful when the conversion will always succeed,
but this isn't always the case. Sometimes, we want to implement a
function that will sometimes do a conversion, and sometimes reject it
with a validation error. That's where TryFrom
and TryInto
come
in. It's basically the same thing, but they return a Result
.
#[derive(Clone, Copy)] pub struct Years(u32); use std::convert::TryFrom; use std::convert::TryInto; impl TryFrom<u64> for Years { type Error = &'static str; fn try_from(val: u64) -> Result<Years, Self::Error> { if val > u32::MAX as u64 { Err("Number out of range") } else { Ok(Years(val as u32)) } } } fn main() { // We can call from directly let years = Years::try_from(30 as u64); // By implementing `From<u32> for Years`, we also get // `Into<Years> for u32` for free! let error: Result<Years, &'static str> = u64::MAX.try_into(); }
You may be wondering why FromStr
exists when we could implement
TryFrom<&str>
. I think the only real reason here is
legacy. FromStr
was part of the original 1.0 release of Rust
in 2015. TryFrom
on the other hand, was stabilized in version 1.34
of Rust, released in April 2019.
Arithmetic Operators
When you're wrapping numbers, you may still want to still be able to do math using the numbers. For example, you could have two durations in years and want to be able to add them together. The traits you're probably interested in implementing are Add, Sub, Mul, and Div in std::ops.
#[derive(Clone, Copy)] pub struct Years(u32); impl From<u32> for Years { fn from(val: u32) -> Years { Years(val) } } use std::fmt; impl fmt::Display for Years { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { write!(f, "{} years", self.0) } } use std::ops::Add; impl Add for Years { type Output = Years; fn add(self, rhs: Years) -> Years { Years(self.0.add(rhs.0)) } } fn main() { let age_1 = Years::from(5); let age_2 = Years::from(2); println!("{} + {} = {}", age_1, age_2, age_1 + age_2); }
Implementing Newtype Traits the Easy Way: Derive More
I can see why you might currently look at the code snippets above and think this is more work than its worth. There's an easier way!
My personal favourite crate for making the Newtype pattern easier to implement is Derive More. Derive More uses procedural macros to generate the same boilerplate code that we were writing by hand before.
// cargo-deps: derive_more = "0.99" extern crate derive_more; use derive_more::{FromStr, Display}; #[derive(FromStr, Display)] pub struct PhoneNumber(String); use std::str::FromStr; fn main() -> Result<(), Box<dyn std::error::Error>> { // This is the same usage as before. let num: PhoneNumber = "555-1234".parse()?; let num = PhoneNumber::from_str("555-1234")?; let num_as_string = num.to_string(); println!("Phone number is {}", num); Ok(()) }
Similarly, Derive More can implement the traits useful for numbers too.
// cargo-deps: derive_more = "0.99" extern crate derive_more; use derive_more::{From, Display, Add}; #[derive(Clone, Copy, From, Display, Add)] #[display(fmt = "{} years", _0)] pub struct Years(u32); fn main() { let age_1 = Years::from(5); let age_2 = Years::from(2); println!("{} + {} = {}", age_1, age_2, age_1 + age_2); }
Parse, don't validate
So far, my examples have simply wrapped primitives, but let you put any value into them. You can take things a step further, and have your types validate as you parse into them, and reject invalid values.
That is, you can use the type system to indicate when you've already
done validation. For example, if you have some condition like "ID
Numbers must be exactly 13 digits long", then instead of passing
around a string and checking its length everywhere, rather parse it
into special IdNumber
type. If the string isn't a valid ID number,
fail the parsing. This means that, everywhere else in your program, if
you're taking in an IdNumber
type you know that it already has a
valid value.
// cargo-deps: derive_more = "0.99" extern crate derive_more; use derive_more::Display; #[derive(Display, Debug, PartialEq)] pub struct IdNumber(String); use std::str::FromStr; impl FromStr for IdNumber { type Err = IdNumberParseError; fn from_str(s: &str) -> Result<Self, Self::Err> { if s.len() != 13 { Err(IdNumberParseError::InvalidFormat) } else { Ok(IdNumber(s.to_string())) } } } #[derive(Display, Debug, PartialEq)] pub enum IdNumberParseError { InvalidFormat } impl std::error::Error for IdNumberParseError {} fn main() { let id = IdNumber::from_str("12345"); assert_eq!(id, Err(IdNumberParseError::InvalidFormat)); let id = IdNumber::from_str("1234567890123").unwrap(); println!("My ID Number is {}", id); }
This approach of putting more information into your type system is sometimes referred to as type driven development. Alexis King has summed it up into the snappy phrase "Parse, don't validate".
It's even better if you can represent your type such that invalid data
is unrepresentable. For example, with our ID number example, ID
numbers have to be exactly 13 characters long. We could represent this
as struct IdNumber([char;13]);
to make invalid lengths
unrepresentable. Usually this would be straying beyond the Newtype
pattern so I'm not going to go down this rabbit hole in this article,
but a good example to look at is the Rust standard library's
implementation of IpAddr.
StructOpt and Serde
Sometimes, the place that the data comes into your program is managed by some other library. If your program is a web server, you're probably receiving data in JSON format and deserializing it with Serde. If you're writing a command line application, you're probably parsing command line arguments using something like StructOpt.
StructOpt uses FromStr
to parse command line arguments. If you've
already implemented FromStr
for your type with any necessary
validation, then StructOpt gives you your input validation for free!
Serde makes things a little bit more complicated. Serde has a
"transparent" option for Newtypes, which bypasses your container and
directly works with the value inside. While this seems like a good
idea for serializing, when you're deserializing it will bypass any
validation logic you've put into your FromStr
or TryFrom
implementation. Luckily, Serde also lets you point at TryFrom
and
Into
implementations.
For strings, this means that you need to implement TryFrom<String>
as well as FromStr
, even though the two are basically the same
thing. You will also need to implement Into<String>
, even though
you'd normally just implement Display
. Luckily, Derive More can
implement Into
for us.
Derive More implementing
Into
is actually a bit of a misnomer. When you ask Derive More to deriveInto
, it will actually deriveFrom
on the type you're converting into. As I mentioned above, this causes the standard library to give you the implementation ofInto
that you were after anyway, so it works out to the same thing in the end.
//! ```cargo //! [dependencies] //! derive_more = "0.99" //! serde = { version = "1", features = ["derive"] } //! serde_json = "1" //! ``` extern crate derive_more; extern crate serde; extern crate serde_json; use derive_more::{Display, Into}; use serde::{Serialize, Deserialize}; #[derive(Display, Debug, Clone, Serialize, Deserialize)] #[serde(try_from = "String", into = "String")] pub struct IdNumber(String); impl From<IdNumber> for String { fn from(s: IdNumber) -> String { s.0 } } use std::str::FromStr; impl FromStr for IdNumber { type Err = IdNumberParseError; fn from_str(s: &str) -> Result<Self, Self::Err> { if s.len() != 13 { Err(IdNumberParseError::InvalidFormat) } else { Ok(IdNumber(s.to_string())) } } } use std::convert::TryFrom; impl TryFrom<String> for IdNumber { type Error = IdNumberParseError; fn try_from(value: String) -> Result<Self, Self::Error> { // This is boilerplate code, but don't rely on Derive More's // TryFrom for this. We need it to go through our // implementation of FromStr so that it follows our validation // rules. value.parse() } } #[derive(Display, Debug, PartialEq)] pub enum IdNumberParseError { InvalidFormat } impl std::error::Error for IdNumberParseError {} fn main() { let json = "\"1234567890123\"".to_string(); let deserialized: IdNumber = serde_json::from_str(&json).unwrap(); println!("deserialized = {:?}", deserialized); }
Summary
Use the Newtype pattern to add more information to the type system.
Implement
FromStr
orTryFrom<T>
for types that are the primitive equivalent of your Newtype.FromStr
is generally useful even if your type doesn't wrap a string.Implement
Display
andInto<T>
to make it easier to integrate with external code that doesn't know about your Newtype.Implement
Deref
if it's fine for code to (immutably) bypass your Newtype and read the data inside it directly.For numeric types, consider implementing arithmetic operators like
Add
. Only do this if it makes sense to actually use those arithmetic operators on your types.If you use Serde, have it implement
Serialize
andDeserialize
in terms of yourTryFrom<T>
andInto<T>
functions. This lets you enforce your type's validation through Serde.If you use StructOpt, enjoy the free validation from implementing
FromStr
.
Where Have I Been Using The Newtype Pattern?
I'm currently working at Panoptix, where we've gone all in on using Rust end to end. This includes server side backend code, command line utilities, and even a web front end. I've been using the Newtype pattern as a way to encode validation logic into the type system.
The same validation logic is called on all entry points to the system, whether it's front end validation as someone's typing on a form, a CLI parameter, or embedded in an API payload. The best part is that the validation is called exactly once, and you can't forget to call it (or accidentally bypass it while refactoring), because it's baked into the type system.
Refactoring existing code to use the Newtype was also fairly painless: change the type in a few places and follow the compilation errors to find where the data goes.
This is definitely a pattern that I would recommend trying out.