A Spectrum of Code Reuse
The first rule of code reuse is: don’t.
Ok, that’s (mostly) a joke. But let’s explore some of the tradeoffs you make when deciding how and when to reuse code. Along the way, we’ll see some interesting parallels to branches in version control.
Consider the following project structure:
PS.DataAccess
PS.Web
-> PS.DataAccess
PS.Listener
-> PS.DataAccess
Here, PS.Web
and PS.Listener
are two applications deployed from the same codebase.
PS.DataAccess
represents code that is used by both applications for accessing the database.
In our zeal for maximum modularity, we might be tempted to package up PS.DataAccess
and reference it through our package manager (such as npm).
Doing so introduces a layer of indirection that we should be aware of.
Indirection Beware
Before, when we had a direct code dependency in the same repository, any time we made an update to PS.DataAccess
, both of our applications would immediately get the new update (pending a deploy, of course).
If we’re referencing a package in our package manager, on the other hand, we now have the option to delay upgrading. The longer we wait, the more painful it will be to upgrade.
Further, it’s also possible for each of our applications to reference different versions of the package. That might seem like some nice flexibility to allow us to do things a little more piecemeal, but left unchecked we can end up in a situation with divergent versions (and behavior!) in production. That might be fine for generic utility libraries, but pretty risky for something as domain-specific and critical as our data access code, for example.
(If we’re not comfortable with this level of coupling — or is it cohesion? — between our applications, then perhaps they shouldn’t be sharing a database!)
Responding to Change
Let’s consider a concrete change that we might make in our data access code. Say we want to remove an old column from our database. To do so safely and without downtime, first we must stop reading from the column and deploy both applications. Then, we can stop writing to the column and deploy both applications. Finally, we can drop the column from the database.
If we’re referencing the data access code directly, we’ll be encouraged to do the right thing — both from a code cleanliness and safety standpoint — and deploy all the affected applications1 at each step of the way.
If we’re referencing a package, we might make a mistake or simply end up leaving around that old column longer than we’d like because it was the easy thing to do.
This Sounds Familiar
Do you see the similarity between this discussion and branches in version control? Think about it for a minute. I’ll wait.
A common phrase you’ll hear in the Continuous Integration world is, “Integrate early and often.” You may have also heard about trunk-based development. The basic idea is that unmerged code is a liability, and any branches that you create from the mainline (“trunk”) should be as short-lived as possible. This helps ensure there are minimal conflicts and that things get tested together early.
Using packages for internal code reuse discourages this early integration of code, just like branches left unmerged in version control.
Just like with branching, this doesn’t mean you should avoid all package dependencies entirely. You may have a valid reason to move the code to another repository, like if it’s shared between multiple teams. Just be aware of the extra indirection and potential maintenance cost.
Tradeoffs
As with most things in engineering, it’s a tradeoff.
There’s a spectrum that goes something like:
- Just copy the code
- Extract to a shared module in the same codebase; reference the code directly
- Share code through a package manager
- Extract a new application, in its own codebase, for the common functionality; interact through network calls or asynchronous communication
I like to start at the second option as a good default and move up or down as I feel it’s warranted.
Conclusion
Dependencies can be a liability. You likely have enough third-party packages to deal with. Think twice about making your own internal packages when a direct code reference (or copy!) will do.
-
It may be tricky to determine which applications are affected by any given change to shared code. You might be able to use static analysis tooling to determine this, though when you’re unsure you can err on the side of deploying more of the applications than you think are necessary. This becomes tricky as the codebase grows, which is why you want to reserve this for cohesive applications that belong to the same Bounded Context in DDD terminology. ↩︎