Data Boundaries

By Justin Hewlett - February 28, 2020

7 minutes - 1413 words

Modern applications increasingly interface with lots of external systems — third-party APIs, databases, message queues, etc. In some cases, we may have control over the shape of the data being transmitted, but in other cases we may be at the mercy of something else. If we’re not careful, implementation details like the shape of the data and the naming of the fields can permeate throughout our code. Domain-driven design (DDD) talks about building an Anti-Corruption layer to isolate our domain from these outside systems.

Let’s dig into a practical example of how this tension might play out for a back-end web app. The examples are in TypeScript, but the concepts apply to other ecosystems as well.

What Not to Do

Let’s start by looking at a “bad” way of querying a database and returning some data in an API endpoint:

app.get('/users', async (req, res) => {
    const users = await query('SELECT * FROM users');

    res.send({ users })
})

You might think this is a bit under-engineered, and you’d be right. Let’s talk about some concrete limitations of the approach.

Is it just that all the code is in the route handler? That’s certainly a symptom, but just extracting, say, a data access layer for our query doesn’t help much. Why? Notice that we haven’t actually declared any TypeScript types. Further, we did a SELECT * rather than pulling out specific columns. We’re taking whatever is returned from our query and stuffing it in a JSON payload.

What happens when a column is added to our table? Are we going to accidentally expose some data we didn’t intend to, or break a consumer? Will newcomers to the codebase be able to see at a glance which data gets returned? What if we want to remove a column that we think is unused. Will we be able to tell?

Typing the Database Response

Let’s try again:

type UserRow = {
    name: string,
    date_of_birth: number
}

app.get('/users', async (req, res) => {
    const users: UserRow[] = await query('SELECT * FROM users');

    res.send({ users })
})

This is slightly better in that it gives us an idea of the type of data coming back from the database. It also gives us some type safety if we need to do any processing of the data. It still has some problems though.

One Representation for Many Uses

We’ve been using a single type to represent the user data, from the time it comes out of the database, all the way to serializing it to JSON. One problem with this is that everything is coupled to how data is represented in the database. Another problem, somewhat TypeScript-specific, is that our UserRow type gets erased at runtime. If any columns besides name and date_of_birth get returned from the query, they’ll leak through to the JSON response — even though they don’t appear on our User type.

DTOs and Domain Objects

DDD makes a distinction between Domain objects and Data Transfer Objects (DTOs). DTOs are the objects at the edges of our application for interfacing with other systems. These often use primitives and simple data structures. The shape is often dictated by something else, such as a third-party API or a database schema. Domain objects are the objects that we create in the core of our program, structured how we want them to be, with names that are meaningful to us.

Let’s create separate types for the data coming out of the database, a domain representation, and a serialized (JSON) representation:

// DTO for database table
type UserRow = {
    name: string,
    date_of_birth: number
}

// Domain object
type Name = string
type User = {
    name: Name,
    dateOfBirth: Date
}

// DTO for API response
type UserResponse = {
    name: string,
    dateOfBirth: string
}

We have three distinct types that are similar in structure, but with their own considerations.

For example, UserRow has snake case keys and uses a number (unix timestamp) to read the date_of_birth from the database.

On our Domain object, dateOfBirth is camel case and converted to a Date. name has a Name type that is just an alias to string at the moment, but does carry some additional semantic meaning and could evolve to be a distinct type with special validation in the future.

On the API response, dateOfBirth is converted to a string so that we can explicitly decide how to format the date.

Even if we had ended up with types that were identical, it’s still useful to create them as separate types and map between them. This will allow the types to diverge as needed in the future. This may seem to fly in the face of Don’t Repeat Yourself (DRY), but the spirit of DRY is to consolidate things that have the same semantic meaning, not to combine things that happen to have the same structure.

Here are the functions we’ll define to map between our types:

const fromUserRow = (row: UserRow) : User => ({
    name: row.name,
    dateOfBirth: new Date(row.date_of_birth)
})

const toUserResponse = (user: User) : UserResponse => ({
    name: user.name,
    dateOfBirth: user.dateOfBirth.toUTCString()
})

Here’s our new route handler:

app.get('/users', async (req, res) => {
    const rows: UserRow[] = await query('SELECT name, date_of_birth FROM users');

    const users: User[] = rows.map(fromUserRow)
    //domain logic here, using our `User` Domain object

    const userResponse: UserResponse[] = users.map(toUserResponse)

    res.send({
        data: userResponse
    })
})

(Note that we’re now explicitly selecting out the columns we need in our query as well, which supports our efforts to be explicit about the data as it flows through the various stages.)

Splitting Into Layers

It may seem a little silly seeing all three types used in a single function. In real life, we might split this up into different files based on the concerns — perhaps a layer for the route handler, a layer for the domain and business logic, and a layer for data access:

//users.route.ts

import { getUsers } from './users.db'
import { User, transformUsers } from './users.domain'

type UserResponse = {
    name: string,
    dateOfBirth: string
}

const toUserResponse = (user: User) : UserResponse => ({
    name: user.name,
    dateOfBirth: user.dateOfBirth.toUTCString()
})

app.get('/users', async (req, res) => {
    const usersFromDb: User[] = await getUsers()

    const users: User[] = transformUsers(usersFromDb)

    const userResponse: UserResponse[] = users.map(toUserResponse)

    res.send({
        data: userResponse
    })
})

//users.domain.ts

export type Name = string
export type User = {
    name: Name,
    dateOfBirth: Date
}

export const transformUsers = (users: User[]) : User[] => {
    //domain logic here

    return users
}

//users.db.ts

import { User } from './users.domain'

type UserRow = {
    name: string,
    date_of_birth: number
}

const fromUserRow = (row: UserRow) : User => ({
    name: row.name,
    dateOfBirth: new Date(row.date_of_birth)
})

export const getUsers = async () : Promise<User[]> => {
    const userRows: UserRow[] = await query('SELECT name, date_of_birth FROM users');
    
    return userRows.map(fromUserRow)
}

We’ve isolated UserResponse and UserRow to their own file at the edges of our app. We don’t even export those types — we quickly map them to our User Domain type and return that.

transformUsers is a stand-in for some domain logic. Again, it speaks in terms of User and has no concept of how the data is represented elsewhere.

Think about the flexibility this buys us! We can change the external representation of the data in a single file, without touching the domain code. Within our domain, we can use names that are meaningful to us, and not necessarily the ones used in other systems. We can use the appropriate data structures, e.g. use a Set in our Domain object, even if the JSON representation is an array.

Types and Tests

I’ve been giving examples in TypeScript. While static types are helpful to document and check the separation that I’ve outlined, they’re not strictly required. Whether you use types or not, you’ll want some amount of unit and integration testing as well. For example, we’ve tried to capture our database schema in the UserRow type, but until we actually run the code, we can’t be completely sure we correctly mapped the column names and data types. (See my companion post, Taming Dynamic Data in TypeScript, for patterns to deal with dynamic data like this at the edges.)

Conclusion

I encourage you to find a place where you can try out this strict separation. Just like how practicing Test-Driven Development causes us to reflect on the design (in addition to producing a suite of tests), giving some thought to how you want to structure the data within your domain is a useful exercise in itself.

Categories: technical

Tags: javascript, typescript, architecture, ddd