Customized Remote Work Solutions From the World’s Largest Fully Remote CompanyCustomized Remote Work SolutionsLearn More
Technology
7 minute read

Serializing Complex Objects in JavaScript

Luke has 12 years of experience as an engineer, team lead, and scrum master.

Website Performance and Data Caching

Modern websites typically retrieve data from a number of different locations, including databases and third-party APIs. For example, when authenticating a user, a website might look up the user record from the database, then embellish it with data from some external services via API calls. Minimizing expensive calls to these data sources, such as disk access for database queries and internet roundtrips for API calls, is essential to maintaining a fast, responsive site. Data caching is a common optimization technique used to achieve this.

Processes store their working data in memory. If a web server runs in a single process (such as Node.js/Express), then this data can easily be cached using a memory cache running in the same process. However, load-balanced web servers span multiple processes, and even when working with a single process, you might want the cache to persist when the server is restarted. This necessitates an out-of-process caching solution such as Redis, which means the data needs to be serialized somehow, and deserialized when read from the cache.

Serialization and deserialization are relatively straightforward to achieve in statically typed languages such as C#. However, the dynamic nature of JavaScript makes the problem a little trickier. While ECMAScript 6 (ES6) introduced classes, the fields on these classes (and their types) aren’t defined until they are initialized—which may not be when the class is instantiated—and the return types of fields and functions aren’t defined at all in the schema. What’s more, the structure of the class can easily be changed at runtime—fields can be added or removed, types can be changed, etc. While this is possible using reflection in C#, reflection represents the “dark arts” of that language, and developers expect it to break functionality.

I was presented with this problem at work a few years ago when working on the Toptal core team. We were building an agile dashboard for our teams, which needed to be fast; otherwise, developers and product owners wouldn’t use it. We pulled data from a number of sources: our work-tracking system, our project management tool, and a database. The site was built in Node.js/Express, and we had a memory cache to minimize calls to these data sources. However, our rapid, iterative development process meant we deployed (and therefore restarted) several times a day, invalidating the cache and thereby losing many of its benefits.

An obvious solution was an out-of-process cache such as Redis. However, after some research, I found that no good serialization library existed for JavaScript. The built-in JSON.stringify/JSON.parse methods return data of the object type, losing any functions on the prototypes of the original classes. This meant the deserialized objects couldn’t simply be used “in-place” within our application, which would therefore require considerable refactoring to work with an alternative design.

Requirements for the Library

In order to support serialization and deserialization of arbitrary data in JavaScript, with the deserialized representations and originals usable interchangeably, we needed a serialization library with the following properties:

  • The deserialized representations must have the same prototype (functions, getters, setters) as the original objects.
  • The library should support nested complexity types (including arrays and maps), with the prototypes of the nested objects set correctly.
  • It should be possible to serialize and deserialize the same objects multiple times—the process should be idempotent.
  • The serialization format should be easily transmittable over TCP and storable using Redis or a similar service.
  • Minimal code changes should be required to mark a class as serializable.
  • The library routines should be fast.
  • Ideally, there should be some way to support deserialization of old versions of a class, through some sort of mapping/versioning.

Implementation

To plug this gap, I decided to write Tanagra.js, a general-purpose serialization library for JavaScript. The name of the library is a reference to one of my favorite episodes of Star Trek: The Next Generation, where the crew of the Enterprise must learn to communicate with a mysterious alien race whose language is unintelligible. This serialization library supports common data formats to avoid such problems.

Tanagra.js is designed to be simple and lightweight, and it currently supports Node.js (it hasn’t been tested in-browser, but in theory, it should work) and ES6 classes (including Maps). The main implementation supports JSON, and an experimental version supports Google Protocol Buffers. The library requires only standard JavaScript (currently tested with ES6 and Node.js), with no dependency on experimental features, Babel transpiling, or TypeScript.

Serializable classes are marked as such with a method call when the class is exported:

module.exports = serializable(Foo, myUniqueSerialisationKey)

The method returns a proxy to the class, which intercepts the constructor and injects a unique identifier. (If not specified, this defaults to the class name.) This key is serialized with the rest of the data, and the class also exposes it as a static field. If the class contains any nested types (i.e., members with types that need serializing), they are also specified in the method-call:

module.exports = serializable(Foo, [Bar, Baz], myUniqueSerialisationKey)

(Nested types for previous versions of the class can also be specified in a similar way, so that, for example, if you serialize a Foo1, it can be deserialized into a Foo2.)

During serialization, the library recursively builds up a global map of keys to classes, and uses this during deserialization. (Remember, the key is serialized with the rest of the data.) In order to know the type of the “top-level” class, the library requires that this be specified in the deserialization call:

const foo = decodeEntity(serializedFoo, Foo)

An experimental auto-mapping library walks the module tree and generates the mappings from the class names, but this only works for uniquely named classes.

Project Layout

The project is divided into a number of modules:

  • tanagra-core - common functionality required by the different serialization formats, including the function for marking classes as serializable
  • tanagra-json - serializes the data into JSON format
  • tanagra-protobuf - serializes the data into Google protobuffers format (experimental)
  • tanagra-protobuf-redis-cache - a helper library for storing serialized protobufs in Redis
  • tanagra-auto-mapper - walks the module tree in Node.js to build up a map of classes, meaning the user doesn’t have to specify the type to deserialize to (experimental).

Note that the library uses US spelling.

Example Usage

The following example declares a serializable class and uses the tanagra-json module to serialize/deserialize it:

const serializable = require('tanagra-core').serializable
class Foo {
  constructor(bar, baz1, baz2, fooBar1, fooBar2) {
	this.someNumber = 123
	this.someString = 'hello, world!'
	this.bar = bar // a complex object with a prototype
	this.bazArray = [baz1, baz2]
	this.fooBarMap = new Map([
  	['a', fooBar1],
  	['b', fooBar2]
	])
  }
}

// Mark class `Foo` as serializable and containing sub-types `Bar`, `Baz` and `FooBar`
module.exports = serializable(Foo, [Bar, Baz, FooBar])

...

const json = require('tanagra-json')
json.init()
// or:
// require('tanagra-protobuf')
// await json.init()

const foo = new Foo(bar, baz)
const encoded = json.encodeEntity(foo)

...

const decoded = json.decodeEntity(encoded, Foo)

Performance

I compared the performance of the two serializers (the JSON serializer and experimental protobufs serializer) with a control (native JSON.parse and JSON.stringify). I conducted a total of 10 trials with each.

I tested this on my 2017 Dell XPS15 laptop with 32Gb memory, running Ubuntu 17.10.

I serialized the following nested object:

foo: {
  "string": "Hello foo",
  "number": 123123,
  "bars": [
	{
  	"string": "Complex Bar 1",
  	"date": "2019-01-09T18:22:25.663Z",
  	"baz": {
    	"string": "Simple Baz",
    	"number": 456456,
    	"map": Map { 'a' => 1, 'b' => 2, 'c' => 2 }
  	}
	},
	{
  	"string": "Complex Bar 2",
  	"date": "2019-01-09T18:22:25.663Z",
  	"baz": {
    	"string": "Simple Baz",
    	"number": 456456,
    	"map": Map { 'a' => 1, 'b' => 2, 'c' => 2 }
  	}
	}
  ],
  "bazs": Map {
	'baz1' => Baz {
  	string: 'baz1',
  	number: 111,
  	map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
	},
	'baz2' => Baz {
  	string: 'baz2',
  	number: 222,
  	map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
	},
	'baz3' => Baz {
  	string: 'baz3',
  	number: 333,
  	map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
	}
  },
}

Write Performance

Serialization method Ave. inc. first trial (ms) StDev. inc. first trial (ms) Ave. ex. first trial (ms) StDev. ex. first trial (ms)
JSON 0.115 0.0903 0.0879 0.0256
Google Protobufs 2.00 2.748 1.13 0.278
Control group 0.0155 0.00726 0.0139 0.00570

Read

Serialization method Ave. inc. first trial (ms) StDev. inc. first trial (ms) Ave. ex. first trial (ms) StDev. ex. first trial (ms)
JSON 0.133 0.102 0.104 0.0429
Google Protobufs 2.62 1.12 2.28 0.364
Control group 0.0135 0.00729 0.0115 0.00390

Summary

The JSON serializer is around 6-7 times slower than native serialization. The experimental protobufs serializer is around 13 times slower than the JSON serializer, or 100 times slower than native serialization.

Additionally, the internal caching of schema/structural information within each serializer clearly has an effect on performance. For the JSON serializer, the first write is about four times slower than the average. For the protobuf serializer, it’s nine times slower. So writing objects whose metadata has already been cached is much quicker in either library.

The same effect was observed for reads. For the JSON library, the first read is around four times slower than the average, and for the protobuf library, it’s around two and a half times slower.

The performance issues of the protobuf serializer mean it’s still in the experimental stage, and I would recommend it only if you need the format for some reason. However, it is worth investing some time in, as the format is much terser than JSON, and therefore better for sending over the wire. Stack Exchange uses the format for its internal caching.

The JSON serializer is clearly much more performant but still significantly slower than the native implementation. For small object trees, this difference is not significant (a few milliseconds on top of a 50ms request will not destroy the performance of your site), but this could become an issue for extremely large object trees, and is one of my development priorities.

Roadmap

The library is still in the beta stage. The JSON serializer is reasonably well-tested and stable. Here is the roadmap for the next few months:

  • Performance improvements for both serializers
  • Better support for pre-ES6 JavaScript
  • Support for ES-Next decorators

I know of no other JavaScript library that supports serializing complex, nested object data, and deserializing to its original type. If you’re implementing functionality that would benefit from the library, please give it a try, get in touch with your feedback, and consider contributing.

Project homepage
GitHub repository

Understanding the basics

How are objects stored in JavaScript?

Generally, objects aren’t stored. They are instantiated when needed, used in processing, then removed from memory when they are no longer needed. If we need to temporarily use data elsewhere, we serialize and deserialize that data into another structure.

What is a JavaScript object?

An object is a piece of code that encapsulates a structure, along with operations that can be performed on that structure. Generally, it is the same as an object in any object-oriented programming language.

What is a data object?

A data object is a piece of code that temporarily contains data from a data store so that it may be read or processed in an application. Unless there is a way to keep that data in memory, it is written back or otherwise not retained when the object goes out of scope or is otherwise not needed.

Why do we need serialization and deserialization?

Serialization enables us to keep data for use in the running process or other processes if needed. We store the data, then deserialize when it is used elsewhere.