November 2018

4292 words

Part 1: How to architect a medium sized node.js web application

This is the first in a two part series of how to architect a medium-sized node.js web application. It is designed for testability and long-term maintainability. Part 1 is going to be a high level overview of the directory structure and how it fits together. In part 2 (not yet finished) we’re going to look into some specific highlights of the architecture.

I started building node.js web applications in 2011. Back then there was still a list of manually installable third-party “node modules” on the nodejs github wiki! Over the years I’ve managed to learn a whole lot about how to architect a solid, medium sized node.js web application. With this article series I’d like to share some of those insights.

Background/Goals

There is no shortage of starter kits available when beginning a node.js application. Most of these aim for a specific stack or a specific use case, such as for a hackathon or an API. The one I’m about to show aims for long-term maintainability and ease of testing. It is opinionated when it comes to requiring a certain discipline for laying stuff out.

What is “medium-sized”?

Lines of source code is not a particularly good measure of anything. But it is good enough for giving an approximation on what I mean with a medium-sized application.

The largest projects I’m maintaining and actively developing come in at around 75 000 - 100 000 lines of code each. The ones built according to the pattern I’m about to show are the only ones where I feel confident they could grow to 200 000 lines of code without getting constrained by the architectural “suit” they were created in. The other ones? Don’t ask how they ever got so big!

It is possible to use this “starter kit” for applications with a size of less than 20 000 lines of code. Every project needs to start somewhere of course. But, some of the decisions only start to make sense when the application goes beyond a certain size, and will likely feel overly verbose until then.

For some context on the significance of 20 000 and 200 000 lines of code read this good article about Norris numbers and the insightful comment thread on Hacker News.

Notes on the stack

This guide is only going to make two assumptions about the stack used: Express.js for the web server, and Mocha for the test runner. Using the same stack is not the most important consideration when following this guide. The one I use is very vanilla in node.js land, but the concepts demonstrated can apply to many other node.js web stacks as well.

We’re only going to focus on the backend server for this guide. The backend server is not going to render any views, just serve up a static directory. The entire “client/frontend” part will be omitted, possibly to another article series.

Directory structure

Considering the directory structure is the first thing to do. We are going to go over every folder in app/server one by one. Here’s an overview:

$ tree .
.
└── app
    ├── client
    └── server
        ├── bin
        ├── lib
        ├── node_modules
        │   └── server -> ../
        ├── package.json
        ├── public_html
        ├── routes
        └── services

server/node_modules

It is important to never import or require files using relative paths. The node_modules directory was included to demonstrate a simple trick for how this rule can be followed.

The full path of an import should always be stated, and the symlink in node_modules allows a file such as /app/server/services/database.js to be imported from anywhere in the application by using the path "server/services/database".

What’s the benefit of this? Considerably easier refactoring. Instead of having to play “import-path-detective” every time a file needs to be moved, every reference can be updated with a single non-ambiguous search/replace over the entire project. Auditing becomes similarly simple: every place a file is used can be found by just searching for the same unique import path.

server/lib

lib might be the most interesting directory of them all. An explicit goal of the architecture is to put as much of the actual code of the repository in this directory as possible. Not everything can, though. Code in lib has an important constraint placed on it: no side-effects or dependencies on external services.

Example of some things that might end up here:

Helper files exporting idempotent functions
Classes governing external system integrations that can be configured to never “break out” of the running program
Mock API classes for external systems
Most of the tests (by amount) of the entire app

Logic bits and their associated code tends to be on a constant journey from the other directories into lib. Elevating parts of the application to lib code pays dividends in overall testability right away. There is nothing that is as easy and fast to test as methods and classes without any side-effects. That is why the majority of tests (by sheer number) also end up here as well.

Files in this directory never import anything from the other directories except sometimes third-party modules. Code in lib that needs configuration also can’t get it from the running environment. Any configuration needed gets passed as option-objects to constructor methods or other functions.

Simple example of a lib file

1
2
3
4


// server/lib/mathHelper.js
exports.add = function(a, b) {
  return a + b;
};

server/services

Every application needs to interact with third party services. All files governing this goes in the services directory. Files here handle setting up and tearing down any connection to an outside service, be that the database, message queue or an external API.

A service file is stateless in the sense that importing it must not trigger a database connection automatically. It never stores any connection handles as local variables — the application must store them somewhere else.

There are many ways to write a service file, but I prefer to keep them simple. I always export an init function for setup. If resources need to be freed there is a teardown function too. The init function always returns the actual service, i.e. the actual object that is passed along whenever the service is needed.

Example of services:

Database connections
Application-wide logging
Authentication/Authorization
External API connections

A service file can use the execution environment to setup sane defaults. These can always be overridden by passing option-objects to the relevant functions.

With a dedicated service directory it is simple to tell which services an application can interact with. Related functions are exported together with init and teardown to accomplish tedious tasks. I use a rabbitmq service in a lot of my projects, and usually export a method to make RPC-style calls as part of the rabbitmq service file. This method takes a handle returned by init and is then used by the application or the tests with very little effort.

Dependency injection

A service can make use of other services. For instance, one might want the database service to use the logging service. Services are not allowed to instantiate each other by themselves. Services are instead setup somewhere else. They can then be passed into other services or methods as dependencies when required.

There are many ways to inject dependencies, but I prefer following these five simple rules:

A method or class requiring a service dependency always receive them in the first argument. The argument is always a plain javascript object.
The argument is always named deps
deps is always unpacked (destructed) with the required dependencies at the top of a method
Every dependency has the same local name (name of the file in server/services) unless there is a good reason otherwise
When the receiving method needs to call out to a dependency-requiring method itself, the deps argument is never passed directly. Instead it must be repacked just like it was unpacked in the same method!

Applying these rules enables super-simple auditing of every service-requiring method in your application. This helps testing (what do I need to setup and tear down to test this?) and gives you an instant grasp of the type of side-effects that calling a particular method might have. Some might opt to create a dedicated class/manager for dependencies. I prefer the explicit verbosity of a simple javascript object.

Sometimes a method used somewhere deep down in the application needs to interact with a new service. Accomplishing this while following all the above rules can mean editing a lot of files and tests. I’m okay with this trade-off and welcome this extra audit-step. It forces me to think hard about introducing new side-effect causing dependencies to my methods. Sometimes I choose a different solution all together.

Example of a service file

Lines showing examples of the five rules are highlighted.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


// server/services/db.js
const theDatabaseLib = require('the-database-lib');
const services = require('server/services'); // exports all files in the services directory

// return an options object for exports.init with some sane defaults
exports.options = (extendWith = {}) => ({
  host: process.env.DB_HOST || 'localhost',
  port: process.env.DB_PORT || 123456,
  name: process.env.DB_NAME || 'testDatabase',
  ...extendWith,
});

// a "dependency-requiring" method ALWAYS has a first argument named "deps" (rule 1 and 2)
exports.init = async (deps, options = {}) => {
  // the dependency object is ALWAYS unpacked like this (rule 3 and 4)
  const { log } = deps;
  // "db" is what is passed around to all other dependency-requiring methods
  const db = await theDatabaseLib.connect(options);
  log.info('connected to the database');
  return db;
};

// slightly contrived example showing packing
exports.publishLastTaskToWorker = async (deps) => {
  const { db, amqp, log } = deps;
  const lastTask = await db.table('tasks').orderBy(['createdAt', 'DESC']).one();
  // dependencies are ALWAYS repacked like this when passed on (rule 5)
  await services.amqp.publishTask({amqp, log}, lastTask);
};

exports.teardown = async (deps) => {
  const { db } = deps;
  await db.disconnect();
};

server/routes

Every application needs to interact with third party services to be useful. Every application also needs to be able to allow outside interaction with itself. In the context of a node.js web application, outside interaction usually means exposing an HTTP API that clients make requests to.

Every endpoint that the server exposes to the client (or another integrating system) goes in the routes directory. The naming scheme for files here should follow the final mounting pattern. A file named server/routes/user.js should expose an endpoint mounted on /user. If the mounting structure has many nesting levels it might be a good idea to use sub-directories or camel-case the file names. An application exposed on /account/subscriptions can then be named account/subscriptions.js or accountSubscriptions.js.

The big benefit of having the file structure in server/routes reflect the way the routes are mounted is that you can immediately find the correct file where a request was processed just by looking at the URL that was requested. This helps when there are hundreds or maybe even thousands of routes in the entire application.

Routes are much like services in the sense that they expose an init function. Instead of returning a service handle, they return an express router that is later mounted on the root router. This gives you a routes directory consisting only of small “endpoint-applications” that can be tested and composed independently.

Example of a route file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


const express = require('express');

exports.init = (deps) => {
  const { db } = deps; // this route depends on the database service

  const router = express.Router();

  router.get('/', async (req, res, next) => {
    try {
      const users = await db.table('users').all();
      res.body = users;
    } catch(err) {
      next(err);
    }
  });

  // the router is always returned so it can be mounted in a parent app
  return router;
};

server/public_html

Have express-static serve up this directory, or better yet have a dedicated static web server such as nginx do it instead.

app/client

All the client code goes in app/client. This is a great place to put create-react-app or anything else you might want to use, such as angular or any of the dozens of frontend frameworks available. Make sure the result of the build step is symlinked or copied into server/public_html when deploying.

server

The root server directory is where things from the other directories are tied together and instantiated. Here you might find files such as server.js or worker.js. Each represents a complete and independent subsystem of the application. I call these “subsystem-files”.

Much like the other directories, mere inclusion of a file from here is not enough to trigger any side-effects. I prefer to export a flat list of functions like a service or route file. Other variants such as classes is possible too. Usually these files are only themselves included from executable files in server/bin which we will look at in the next section. A subsystem-file initializes its own service dependencies. These can always be overridden, just like configuration options.

Example a subsystem file

This file sets up a web server. Interesting lines are highlighted.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57


const express = require('express');
const services = require('server/services');
const routes = require('server/routes');

// setup default options, allowing each option to be overridden
exports.options = (extendWith = {}) => ({
  host: process.env.SERVER_LISTEN_HOST || 'localhost',
  port: parseInt(process.env.SERVER_LISTEN_PORT, 10) || 3000,
  ...extendWith,
});

// setup default service deps, allowing each service to be overridden
exports.deps = async (replaceWith = {}) => ({
  const log = replaceWith.log || services.log.init(services.log.options());
  const db = replaceWith.db || await services.db.init({log}, services.db.options());
  const worker = replaceWith.worker || await services.worker.init({log}, services.worker.options());
  return { log, db, worker };
});

// initializes the server
exports.init = async (deps, options = {}) => {
  const { log, db, worker } = deps;

  const app = express();

  const api = express.Router();
  api.use('/user', routes.user.init({log, db}));

  // every route is mounted to /api
  app.use('/api', api);

  // public_html is served statically
  app.use('/', express.static(__dirname + '/public_html'));

  return app;
};

// exposes the server to the outer world, returning the server handle
exports.listen = (deps, app, options) => {
  const { log }  = deps;
  const { host, port } = options;
  const httpServer = app.listen(host, port, () => {
      log.info('server listen', {host, port});
  });
  return httpServer;
};

// closes the server the preferred way
exports.teardown = async (deps, httpServer) => {
  const { log } = deps;
  return new Promise(resolve => {
      httpServer.close(() => {
          log.info('server closed');
          resolve();
      });
  });
};

server/bin

We have now gone through all the interesting directories and only one remains, server/bin. This is where all the executable files of the application end up and things come alive. Inclusion of an actual file here can have side-effects automatically.

The executables in here are usually one of four types:

Application/subsystem launchers (for the web server, worker etc.)
Mock API launchers (standalone servers mocking an external API)
Administrative binaries (one-off CLI tools built with a library like commander to do things like database reindexing and so on)
Bundled application launchers (many subsystems in one process)

Type four is especially interesting. bin files should be small and simple. The subsystem files in server that they interact with are completely independent in their setup. This means it is easy to initialize many subsystems from a single executable file in bin.

Imagine an application consisting of three subsystems: server/server.js, server/worker.js and server/socket.js. In your development environment you might run server/bin/serverAndWorkerAndSocket.js. This file launches all three subsystems in the same process. In the production environment you might instead use a process manager such as pm2. Then you can launch two instances of server/bin/server.js, four instances of server/bin/worker.js and a single instance of server/bin/socket.js. You can adjust the exact number of processes/forks to scaling needs.

Using the debugger statement together with node inspect is much easier when everything is running in the same process. This is great when developing. Having all executables in a single folder also makes it obvious what can actually be done with the application.

Example of a subsystem launcher

See earlier example of a subsystem file for context.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


#!/usr/bin/node

const server = require('server/server');

// init all web server deps
server.deps().then(deps => {
  const { log } = deps;

  // get a default set of options
  const options = server.options();

  // init the web server
  return server.init(deps, options).then(app => {

    // expose the web server
    const httpServer = server.listen({log}, app, options);

    // gracefully close the web server on SIGINT
    process.on('SIGINT', () => {
        log.info('caught SIGINT');
        server.teardown({log}, httpServer).then(() => {
            process.exit(0);
        }).catch(err => {
          console.error(err);
          process.exit(-1);
        });
    });
  });
}).catch(err => {
  console.error(err);
  process.exit(-1);
});

Example of bundled launcher

The deps method exported by both server and worker allow pre-initialized services to be used. This makes having the worker and server use the same database connections, logging and more when they are running in the same process very easy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


#!/usr/bin/node

const server = require('server/server');
const worker = require('server/worker');

// init all web server deps
server.deps().then(async serverDeps => {
  // init all the worker deps
  // pre-filled with the server deps so they use the same connections, logger etc
  const workerDeps = await worker.deps(serverDeps);

  // get a default set of options
  const serverOptions = server.options();
  const workerOptions = worker.options();

  const [server, worker] = await Promise.all([
    server.init(serverDeps, serverOptions).then(app => {
      // [...] extra setup etc omitted
      return app;
    }),
    worker.init(workerDeps, workerOptions).then(worker => {
      // [...] extra setup etc omitted
      return worker;
    }),
  ]);

  // [...] teardown logic etc omitted

}).catch(err => {
  console.error(err);
  process.exit(-1);
});

Tying it all together with testability

We’ve now seen how it is possible to create a modular architecture by following some simple rules. So far we have not touched on the subject of testability much (one of the goals of this guide), but we are now ready to do so. This is where the architecture we’ve now prepared starts to shine.

A quick aside first. I’m of the opinion that unit test files should be placed right next to the file they are testing, just as it is done in Golang. The test for server/lib/mathHelpers.js is named mathHelpers.test.js and is in the same directory. This saves the trouble of having to go digging around for the correct file in the tests directory. The unit test for a file is right next to the file tested! This goes for every kind of unit test of a small component: a single routes, service, library file and so on.

Test setup

The test running environment should be capable of running all by itself. This makes running all the tests a one-command affair. Which of course is way superior to having to do manual setup/teardown whenever it is time to run the tests.

The architecture we’ve looked at makes setup of any part of the application simple. We have made a conscious effort to have almost every file only export functions with no automatic side-effects. Every configuration option and every service dependency can be overridden on demand. Many functions also fall back to a sane set of default options.

In my projects I’ve decided to collect all test-related functions in a folder named testHelpers within server. This is a good directory for many test-related things. These could be factory functions for test object generation, test-agent setup or any other type of helper.

Example of a test helper for setting up an app

There is no reason to stray from the pattern already established elsewhere. Test helper files also export functions that let every important object being setup be overridden with something else. This allows fine-grained control of options and services even down to the level of individual tests.

Note: this file is exposed as require('server/testHelpers').app in the other examples.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59


const services = require('server/services');

// setup everything required to test the "server" subsystem
exports.server = async (replaceWith = {}) => {
  const log = replaceWith.log || exports.log();
  const db = replaceWith.db || await exports.db({log});
  const amqp = replaceWith.amqp || await exports.amqp({log});

  // deps for server.init
  // we should declare every dependency required here - no pre-filling!
  const deps = {
    log,
    db,
    amqp,
  };

  const serverClass = require('server/server');
  const server = await serverClass.init(deps, serverClass.defaultOptions());

  return {
    ...deps,
    server,
  };
};

// setup the logging service
exports.log = () => {
  return services.log.init(services.log.defaultOptions({
    // overrides the log level 
    // we don't ALWAYS want to be so chatty when testing
    level: process.env.TEST_LOG_LEVEL,
  }));
};

// setup the db service 
exports.db = async (deps = {}) => {
  let { log } = deps;
  if(!log) log = exports.log();

  const db = await services.db.init({log}, services.db.defaultOptions({
    // override the database actually used with one dedicated just for tests
    database: process.env.TEST_DB_DATABASE || -1,
  }));

  // you can do preparation here such as synchronizing the schema
  // or dropping and recreating an existing database

  return db;
};

// setup the amqp service
exports.amqp = async (deps = {}) => {
  // [...] omitted for brevity
};

// calls teardown on every service
exports.teardown = async (deps = {}) => {
  // [...] omitted for brevity
};

Example of a service test file

A simple test showing how dependencies are setup.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


const testHelpers = require('server/testHelpers');

// tests server/services/db.js
describe('service db', () => {

  // db tests need the db dependency
  let db;
  before(async function() {
    db = await testHelpers.app.db();
  });

  after(async () => {
    await testHelpers.app.teardown({db});
  });

  // you may remember this function from an earlier example ...
  describe('publishLastTaskToWorker', () => {

    // ... it also requires the amqp service dependency
    let amqp;
    before(async () => {
      amqp = await testHelpers.app.amqp();
    });

    // [...] actual test and teardown omitted
  });

});

Example of a route test file

A test for a route usually makes use of the entire web server so isn’t really a “unit-test” in the sense of having a small testing surface. In the example below I’m using supertest which makes it very easy to test APIs.

Supertest is based on superagent and is setup in the not-shown testHelpers.agent(). This is a function that in many of my projects creates a user in the database and then logs the user in. Once you have the returned agent handle, you are ready to make requests exactly like a normal logged in user of your app is.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


const testHelpers = require('server/testHelpers');

// tests server/routes/user.js exposed on /user
describe('route user', () => {

  let env, agent;
  before(async function() {
    env = await testHelpers.app.server();
    agent = await testHelpers.agent(env);
  });

  after(async () => {
    await testHelpers.app.teardown(env);
  });

  it('should return 200', async () => {
    return agent.get('/user').expect(200);
  });

});

Integration tests

Integration tests are sometimes more important than unit tests. They are not testing a specific file, so should be in a directory such as integrationTests or tests. There is not much to say about them. They can make use of as many parts and subsystems in their setup as required by using the setup techniques demonstrated in the examples above.

A complete in-process application ready for testing can be started like this:

1
2
3


const testHelpers = require('server/testHelpers');
const serverEnv = await testHelpers.app.server();
const workerEnv = await testHelpers.app.worker(serverEnv); // pre-fill deps again

Note: a “complete application” in this sense means one with two independent subsystems: a worker and a (http) server. The above example snippet could be put in a mocha before() handler when setting up a test.

Having the entire application running in the same process as your test runner also enables you to use the node.js debugger. This can be done right when your tests are running. Compare with the bundled application launcher that we talked about earlier.

Closing remarks

Many of the concepts presented are influenced by other sources, most notably the twelve-factor app. I hope that any reader has gained insight into how essentially exporting functions with overrideable defaults and following some quite basic layout rules enables the creation of a well-balanced node.js web application architecture. The architecture and every component of it can then be set up, tested, deployed and maintained with little effort. A fancy backend framework is mostly unnecessary.

Part two

There are some more details related to this architecture and a part two will be coming as soon as I can finish it. In the meantime I might also prepare a github repository and even a CLI application (I’d like to have this myself!) for automatic generation of services, routes, subsystems and launchers. Stay tuned for more info, and please leave your comments below!