Heart for people
Mind for tech

Making illegal states unrepresentable

A powerful pattern with a hidden cost

Nahuel Manterola

Engineering

Terminal with error

Written by

Nahuel Manterola
Software Developer

Being a diverse group of pretty opinionated people, there are very few topics we don't have discussions about at Baseflow. Today, I'd like to tell you about something that has been simmering in my head for a while now: my take on 'making invalid states unrepresentable' or 'how strict should I make my models?'.

I think this subject is best illustrated with examples, and I'll be giving these in Rust. Partly out of familiarity, but also because its algebraic data types are a very natural way of describing this concept. Still, I want to point out that this concept is not at all language-specific, and can be used anywhere you want. For following along though, I'll assume you have a basic understanding of Rust's rich enums.

Where does this concept shine?

The phrase 'Making illegal states unrepresentable' was first coined by Yaron Minsky, who shows us some examples in OCaml. It is no coincidence that OCaml was a major inspiration for Rust's type system. Minsky describes an example where we model the state of a TCP connection:

enum ConnectionState {
    Connecting,
    Connected,
    Disconnected,
}

struct ConnectionInfo {
    state: ConnectionState,
    server: IpAddr,
    last_ping_time: Option<SystemTime>,
    last_ping_id: Option<i32>,
    session_id: Option<String>,
    when_initiated: Option<SystemTime>,
    when_disconnected: Option<SystemTime>,
}

This model has some serious bugs waiting around the corner, though. For instance: it allows state: Connected without having a session_id, and you could have either one of last_ping_time and last_ping_id without having the other. This requires a lot of manual checks at runtime to ensure compliance. As an alternative, he introduces the following model:

enum ConnectionState {
    Connecting {
        when_initiated: SystemTime,
    },
    Connected {
        session_id: String,
        last_ping: Option<(SystemTime, i32)>,
    },
    Disconnected {
        when_disconnected: SystemTime,
    },
}

struct ConnectionInfo {
    server: IpAddr,
    state: ConnectionState,
}

Here, the variables necessary for each state are clearly defined, and cannot be missing. It becomes impossible to compile code that tries to move into a state without supplying these variables. As a result, nobody needs to check whether all necessary information is there afterwards. This is the core essence of 'making illegal states unrepresentable', and clearly showcases its strength.

The hidden cost of strictness

In this case, the updated model is an incredible improvement. But I think it's important to highlight two reasons why this approach is such a great fit here:

  • The domain we're working in is very unlikely to change.
  • The type system has a natural way of describing the domain

This also means I think it could be a bad idea to have such strict types in other situations. We could go into great detail about huge choices that require large refactors, but I think it comes down to this trade-off: you're defining your model based on assumptions about the domain you're modelling. These assumptions are necessary, and make the model a lot easier to reason about than the entire domain. As long as they hold, you're reaping the rewards of structure and predictability. But if an assumption breaks because of new requirements, you're now stuck with technical debt. It's our job as engineers to balance the safety of strict types against the flexibility required by the real world.

Example: The Falsehood of Names

I've tried to come up with an example that's both easy to understand, but still highlights the potentially large refactoring cost of being too specific in your model. This example succeeds in at least one of these requirements:I'm developing an application in the Netherlands and because I know everybody has a last name, I can define my user name like this:

struct UserName {
    first_name: String,
    last_name: String, // Required!
}

This is convenient, as it allows me to sprinkle easy functions all through my code:

let email_header = format!("Dear Mx {},", user.last_name); // Look mom, no checks!
let initials = format!(
	"{}{}", 
	user.first_name.chars().next().unwrap(), // You'll handle this gracefully in production, right?
	user.last_name.chars().next().unwrap()
);

You probably see where this is going: we're going multinational and suddenly I have to deal with users that don't legally have a last name (and tons of other unexpected differences). Because I locked down the last name as required, I have been writing code where I don't check if there is a last name. Granted, fixing this will probably be doable, although it might require a database migration and updating API contracts. But if you make assumptions like these in large objects instead of single fields, a change in business requirements can cause a much larger mismatch between your model and new data.

What does this come down to?

Strict typing is an important tool, but it acts as a cement for your current understanding of the domain. When you know the domain laws are absolute (like the TCP states in Minsky's example), strict types are a powerful way of guaranteeing correctness.

However, when you are still discovering the domain or operating in a fluid environment (like human names or business rules), that same cement can trap you. The goal here isn't to make every invalid state unrepresentable, but to ensure your type system is flexible enough to change along with the requirements of tomorrow.

So whenever you're modeling a new domain, consider which assumptions you are making. Are these hard specifications, or soft business rules of today? If it's the first, go on ahead and pour that cement. If it's the latter, leave yourself some wiggle room.

Curious about how we apply these principles in production? See our engineering approach.