bitHound Blog

Why we stopped vendoring our npm dependencies

At bitHound we work with npm dependencies a lot. We analyze yours, we use and publish our own. But when it came to best managing and deploying our projects and all it's dependencies (there are a lot!) – we came to a crossroads. On a sunny afternoon in June of 2014, a heated debate (understatement much?) took place at bitHound.

node_modules folder

The topic? Should we check-in our dependencies to our Git repo? (We only later discovered the term was dubbed 'vendoring').

Why vendor?

At the time, we had a few leading arguments for vendoring our dependencies:

  1. Break away from relying on the registry's uptime. If npm was down, our deploys would bork. At the time, we didn't have appropriate systems in place to do easy rollbacks. As such, a botched deploy was a scary proposition.

  2. Our dependency versions, and their sub-dependencies, would effectively be pinned. We knew what versions we were running. We were afraid of semver versioning going awry, breaking changes being introduced and having to wade through sub-dependencies to determine what failed.

  3. What you run locally is what is run in production. Environments would actually match because npm installs wouldn't produce separate results depending on the time/day it ran.

All of the aforementioned arguments outweighed the opposing arugments: auto applying patches keeps you up-to-date, the registries don't go down that often, and it just feels messy. We took a vote and we opted to try it as an experiment.

committing node_modules

It turns out that experiment became our common practice for a over a year. We discovered that committing your dependencies isn't as straightforward as committing your own code. For a few months we struggled with how to get ourselves out of a vendoring mess.

Our experience told a different story:

We started to begrudge the checked in modules enough that it caused a tipping point in November. Looking back it actually caused more issues than it presumably helped alleviate.

Reviewing pull requests that involve dependencies was a nightmare

Ever try to review a pull request in GitHub with several hundred files changed? Yah...
In fact, you can't really do it. GitHub chokes at about the 301 file point and rightfully so! We found that any pull request we had that introduced a new dependency became a mess. Any code changes we made to our own app were either hidden in GitHub (they actually just won't show you files after a large limit), or at best, just hard to find in a sea of node_module file check-ins. We developed some strategies to mitigate this, but nothing really worked well.

We started applying patches directly to our checked-in dependencies

Our node_modules became a dangerous minefield of HACKS that while necessary for our specific purposes, made it very difficult to maintain our dependencies. Remembering that some packages would cease to function because we overwrote one of our own hacks became problematic. While we still issued pull requests to get these fixes and features in to the main package, it was just a bit too easy to add hacks and then forget about them.

Managing uninstall's was spotty

Uninstalling a package is rather straightforward. Except when the modules are checked in. Sometimes we'd remember to check in the file deletions for a package and other times we'd just delete a line item from package.json. We started finding orphans – long forgotten packages we no longer used and since no one was really sure what was going on in that folder, they'd often stay.

It felt weird

Yup – it felt weird. The large node_modules folder was an unwieldy sight in your Git tree. Can't say more than it was just a weird feeling.

Our new approach

With a pending upgrade to both Node, Express and big upgrade already behind us, we knew it was time to unravel our dependencies and bring back simplicity.

removing node_modules

Cleaning up our hacks

Rather than rely on committing hacks, we continue to issue pull requests to get the main packages updated, and when that fails, we fork it and use that (temporarily) until the main package is updated. At least with the fork it becomes obvious where we are using custom versions of packages just by looking in our package.json.

Using npm shrinkwrap

To alleviate our concerns about version conflicts and unexpected behaviour in our own application, we use to lock our dependency versions. It helps to accomplish version locking to help mitigate breaking changes slipping through in production. It helps to keep our dev machines matching our production ones and makes debugging issues a little easier.

The registry is much more reliable, npm install on deploy is ok

npm has felt very solid, so we are now much more comfortable relying on them during our deploy process. We can also rollback with a lot more ease in case something does go wrong.

Upgrading to npm 3, eventually

We're still running npm 2.4 for a couple reasons. For starters, we wanted to clean up our dependencies first– which we've done. However, performance issues are holding us back at the moment. We will make the jump once wider adoption takes hold and performance is improved. We are excited to take advantage of the much flatter dependency tree npm 3 provides.

As Node.js developers, our reliance on third-party code is huge, and it deserves just as much attention as your own code. What our experiment has taught us is that you cannot just manage it the same way you would manage your own code. It's a different animal, but one that proves invaluable to the product building process.

Have you run into similar issues? Or do you have tips on what your organization finds to be helpful? Let us know and we'll be sure to share them with the community!

bitHound identifies risks and priorities in your Node.js projects.