Part 1: How to architect a medium sized node.js web application
This is the first in a two part series of how to architect a medium-sized node.js web application. It is designed for testability and long-term maintainability. Part 1 is going to be a high level overview of the directory structure and how it fits together. In part 2 (not yet finished) we’re going to look into some specific highlights of the architecture.
I started building node.js web applications in 2011. Back then there was still a list of manually installable third-party “node modules” on the nodejs github wiki! Over the years I’ve managed to learn a whole lot about how to architect a solid, medium sized node.js web application. With this article series I’d like to share some of those insights.
There is no shortage of starter kits available when beginning a node.js application. Most of these aim for a specific stack or a specific use case, such as for a hackathon or an API. The one I’m about to show aims for long-term maintainability and ease of testing. It is opinionated when it comes to requiring a certain discipline for laying stuff out.
What is “medium-sized”?
Lines of source code is not a particularly good measure of anything. But it is good enough for giving an approximation on what I mean with a medium-sized application.
The largest projects I’m maintaining and actively developing come in at around 75 000 - 100 000 lines of code each. The ones built according to the pattern I’m about to show are the only ones where I feel confident they could grow to 200 000 lines of code without getting constrained by the architectural “suit” they were created in. The other ones? Don’t ask how they ever got so big!
It is possible to use this “starter kit” for applications with a size of less than 20 000 lines of code. Every project needs to start somewhere of course. But, some of the decisions only start to make sense when the application goes beyond a certain size, and will likely feel overly verbose until then.
Notes on the stack
This guide is only going to make two assumptions about the stack used: Express.js for the web server, and Mocha for the test runner. Using the same stack is not the most important consideration when following this guide. The one I use is very vanilla in node.js land, but the concepts demonstrated can apply to many other node.js web stacks as well.
We’re only going to focus on the backend server for this guide. The backend server is not going to render any views, just serve up a static directory. The entire “client/frontend” part will be omitted, possibly to another article series.
Considering the directory structure is the first thing to do. We are going to go over every folder in
app/server one by one. Here’s an overview:
$ tree . . └── app ├── client └── server ├── bin ├── lib ├── node_modules │ └── server -> ../ ├── package.json ├── public_html ├── routes └── services
It is important to never
require files using relative paths. The
node_modules directory was included to demonstrate a simple trick for how this rule can be followed.
The full path of an import should always be stated, and the symlink in
node_modules allows a file such as
/app/server/services/database.js to be imported from anywhere in the application by using the path
What’s the benefit of this? Considerably easier refactoring. Instead of having to play “import-path-detective” every time a file needs to be moved, every reference can be updated with a single non-ambiguous search/replace over the entire project. Auditing becomes similarly simple: every place a file is used can be found by just searching for the same unique import path.
lib might be the most interesting directory of them all. An explicit goal of the architecture is to put as much of the actual code of the repository in this directory as possible. Not everything can, though. Code in
lib has an important constraint placed on it: no side-effects or dependencies on external services.
Example of some things that might end up here:
- Helper files exporting idempotent functions
- Classes governing external system integrations that can be configured to never “break out” of the running program
- Mock API classes for external systems
- Most of the tests (by amount) of the entire app
Logic bits and their associated code tends to be on a constant journey from the other directories into
lib. Elevating parts of the application to
lib code pays dividends in overall testability right away. There is nothing that is as easy and fast to test as methods and classes without any side-effects. That is why the majority of tests (by sheer number) also end up here as well.
Files in this directory never import anything from the other directories except sometimes third-party modules. Code in
lib that needs configuration also can’t get it from the running environment. Any configuration needed gets passed as option-objects to constructor methods or other functions.
Simple example of a lib file
Every application needs to interact with third party services. All files governing this goes in the
services directory. Files here handle setting up and tearing down any connection to an outside service, be that the database, message queue or an external API.
A service file is stateless in the sense that importing it must not trigger a database connection automatically. It never stores any connection handles as local variables — the application must store them somewhere else.
There are many ways to write a service file, but I prefer to keep them simple. I always export an
init function for setup. If resources need to be freed there is a
teardown function too. The
init function always returns the actual service, i.e. the actual object that is passed along whenever the service is needed.
Example of services:
- Database connections
- Application-wide logging
- External API connections
A service file can use the execution environment to setup sane defaults. These can always be overridden by passing option-objects to the relevant functions.
With a dedicated service directory it is simple to tell which services an application can interact with. Related functions are exported together with
teardown to accomplish tedious tasks. I use a
rabbitmq service in a lot of my projects, and usually export a method to make RPC-style calls as part of the rabbitmq service file. This method takes a handle returned by
init and is then used by the application or the tests with very little effort.
A service can make use of other services. For instance, one might want the database service to use the logging service. Services are not allowed to instantiate each other by themselves. Services are instead setup somewhere else. They can then be passed into other services or methods as dependencies when required.
There are many ways to inject dependencies, but I prefer following these five simple rules:
- The argument is always named
depsis always unpacked (destructed) with the required dependencies at the top of a method
- Every dependency has the same local name (name of the file in
server/services) unless there is a good reason otherwise
- When the receiving method needs to call out to a dependency-requiring method itself, the
depsargument is never passed directly. Instead it must be repacked just like it was unpacked in the same method!
Sometimes a method used somewhere deep down in the application needs to interact with a new service. Accomplishing this while following all the above rules can mean editing a lot of files and tests. I’m okay with this trade-off and welcome this extra audit-step. It forces me to think hard about introducing new side-effect causing dependencies to my methods. Sometimes I choose a different solution all together.
Example of a service file
Lines showing examples of the five rules are highlighted.
Every application needs to interact with third party services to be useful. Every application also needs to be able to allow outside interaction with itself. In the context of a node.js web application, outside interaction usually means exposing an HTTP API that clients make requests to.
Every endpoint that the server exposes to the client (or another integrating system) goes in the
routes directory. The naming scheme for files here should follow the final mounting pattern. A file named
server/routes/user.js should expose an endpoint mounted on
/user. If the mounting structure has many nesting levels it might be a good idea to use sub-directories or camel-case the file names. An application exposed on
/account/subscriptions can then be named
The big benefit of having the file structure in
server/routes reflect the way the routes are mounted is that you can immediately find the correct file where a request was processed just by looking at the URL that was requested. This helps when there are hundreds or maybe even thousands of routes in the entire application.
Routes are much like services in the sense that they expose an
init function. Instead of returning a service handle, they return an
express router that is later mounted on the root router. This gives you a
routes directory consisting only of small “endpoint-applications” that can be tested and composed independently.
Example of a route file
express-static serve up this directory, or better yet have a dedicated static web server such as nginx do it instead.
All the client code goes in
app/client. This is a great place to put create-react-app or anything else you might want to use, such as angular or any of the dozens of frontend frameworks available. Make sure the result of the build step is symlinked or copied into
server/public_html when deploying.
The root server directory is where things from the other directories are tied together and instantiated. Here you might find files such as
worker.js. Each represents a complete and independent subsystem of the application. I call these “subsystem-files”.
Much like the other directories, mere inclusion of a file from here is not enough to trigger any side-effects. I prefer to export a flat list of functions like a service or route file. Other variants such as classes is possible too. Usually these files are only themselves included from executable files in
server/bin which we will look at in the next section. A subsystem-file initializes its own service dependencies. These can always be overridden, just like configuration options.
Example a subsystem file
This file sets up a web server. Interesting lines are highlighted.
We have now gone through all the interesting directories and only one remains,
server/bin. This is where all the executable files of the application end up and things come alive. Inclusion of an actual file here can have side-effects automatically.
The executables in here are usually one of four types:
- Application/subsystem launchers (for the web server, worker etc.)
- Mock API launchers (standalone servers mocking an external API)
- Administrative binaries (one-off CLI tools built with a library like commander to do things like database reindexing and so on)
- Bundled application launchers (many subsystems in one process)
Type four is especially interesting.
bin files should be small and simple. The subsystem files in
server that they interact with are completely independent in their setup. This means it is easy to initialize many subsystems from a single executable file in
Imagine an application consisting of three subsystems:
server/socket.js. In your development environment you might run
server/bin/serverAndWorkerAndSocket.js. This file launches all three subsystems in the same process. In the production environment you might instead use a process manager such as pm2. Then you can launch two instances of
server/bin/server.js, four instances of
server/bin/worker.js and a single instance of
server/bin/socket.js. You can adjust the exact number of processes/forks to scaling needs.
debugger statement together with
node inspect is much easier when everything is running in the same process. This is great when developing. Having all executables in a single folder also makes it obvious what can actually be done with the application.
Example of a subsystem launcher
See earlier example of a subsystem file for context.
Example of bundled launcher
deps method exported by both
worker allow pre-initialized services to be used. This makes having the worker and server use the same database connections, logging and more when they are running in the same process very easy.
Tying it all together with testability
We’ve now seen how it is possible to create a modular architecture by following some simple rules. So far we have not touched on the subject of testability much (one of the goals of this guide), but we are now ready to do so. This is where the architecture we’ve now prepared starts to shine.
A quick aside first. I’m of the opinion that unit test files should be placed right next to the file they are testing, just as it is done in Golang. The test for
server/lib/mathHelpers.js is named
mathHelpers.test.js and is in the same directory. This saves the trouble of having to go digging around for the correct file in the
tests directory. The unit test for a file is right next to the file tested! This goes for every kind of unit test of a small component: a single routes, service, library file and so on.
The test running environment should be capable of running all by itself. This makes running all the tests a one-command affair. Which of course is way superior to having to do manual setup/teardown whenever it is time to run the tests.
The architecture we’ve looked at makes setup of any part of the application simple. We have made a conscious effort to have almost every file only export functions with no automatic side-effects. Every configuration option and every service dependency can be overridden on demand. Many functions also fall back to a sane set of default options.
In my projects I’ve decided to collect all test-related functions in a folder named
server. This is a good directory for many test-related things. These could be factory functions for test object generation, test-agent setup or any other type of helper.
Example of a test helper for setting up an app
There is no reason to stray from the pattern already established elsewhere. Test helper files also export functions that let every important object being setup be overridden with something else. This allows fine-grained control of options and services even down to the level of individual tests.
Note: this file is exposed as
require('server/testHelpers').app in the other examples.
Example of a service test file
A simple test showing how dependencies are setup.
Example of a route test file
A test for a route usually makes use of the entire web server so isn’t really a “unit-test” in the sense of having a small testing surface. In the example below I’m using supertest which makes it very easy to test APIs.
Supertest is based on superagent and is setup in the not-shown
testHelpers.agent(). This is a function that in many of my projects creates a user in the database and then logs the user in. Once you have the returned
agent handle, you are ready to make requests exactly like a normal logged in user of your app is.
Integration tests are sometimes more important than unit tests. They are not testing a specific file, so should be in a directory such as
tests. There is not much to say about them. They can make use of as many parts and subsystems in their setup as required by using the setup techniques demonstrated in the examples above.
A complete in-process application ready for testing can be started like this:
Note: a “complete application” in this sense means one with two independent subsystems: a worker and a (http) server. The above example snippet could be put in a mocha
before() handler when setting up a test.
Having the entire application running in the same process as your test runner also enables you to use the node.js debugger. This can be done right when your tests are running. Compare with the bundled application launcher that we talked about earlier.
Many of the concepts presented are influenced by other sources, most notably the twelve-factor app. I hope that any reader has gained insight into how essentially exporting functions with overrideable defaults and following some quite basic layout rules enables the creation of a well-balanced node.js web application architecture. The architecture and every component of it can then be set up, tested, deployed and maintained with little effort. A fancy backend framework is mostly unnecessary.
There are some more details related to this architecture and a part two will be coming as soon as I can finish it. In the meantime I might also prepare a github repository and even a CLI application (I’d like to have this myself!) for automatic generation of services, routes, subsystems and launchers. Stay tuned for more info, and please leave your comments below!