Luke Wilson
Luke has 12 years of experience as an engineer, team lead, and scrum master.
The Tanagra.js library is designed to be simple and lightweight, and it currently supports Node.js and ES6 classes. The main implementation supports JSON, and an experimental version supports Google Protocol Buffers.
The Tanagra.js library is designed to be simple and lightweight, and it currently supports Node.js and ES6 classes. The main implementation supports JSON, and an experimental version supports Google Protocol Buffers.
Luke has 12 years of experience as an engineer, team lead, and scrum master.
Modern websites typically retrieve data from a number of different locations, including databases and third-party APIs. For example, when authenticating a user, a website might look up the user record from the database, then embellish it with data from some external services via API calls. Minimizing expensive calls to these data sources, such as disk access for database queries and internet roundtrips for API calls, is essential to maintaining a fast, responsive site. Data caching is a common optimization technique used to achieve this.
Processes store their working data in memory. If a web server runs in a single process (such as Node.js/Express), then this data can easily be cached using a memory cache running in the same process. However, load-balanced web servers span multiple processes, and even when working with a single process, you might want the cache to persist when the server is restarted. This necessitates an out-of-process caching solution such as Redis, which means the data needs to be serialized somehow, and deserialized when read from the cache.
Serialization and deserialization are relatively straightforward to achieve in statically typed languages such as C#. However, the dynamic nature of JavaScript makes the problem a little trickier. While ECMAScript 6 (ES6) introduced classes, the fields on these classes (and their types) aren’t defined until they are initialized—which may not be when the class is instantiated—and the return types of fields and functions aren’t defined at all in the schema. What’s more, the structure of the class can easily be changed at runtime—fields can be added or removed, types can be changed, etc. While this is possible using reflection in C#, reflection represents the “dark arts” of that language, and developers expect it to break functionality.
I was presented with this problem at work a few years ago when working on the Toptal core team. We were building an agile dashboard for our teams, which needed to be fast; otherwise, developers and product owners wouldn’t use it. We pulled data from a number of sources: our work-tracking system, our project management tool, and a database. The site was built in Node.js/Express, and we had a memory cache to minimize calls to these data sources. However, our rapid, iterative development process meant we deployed (and therefore restarted) several times a day, invalidating the cache and thereby losing many of its benefits.
An obvious solution was an out-of-process cache such as Redis. However, after some research, I found that no good serialization library existed for JavaScript. The built-in JSON.stringify/JSON.parse methods return data of the object type, losing any functions on the prototypes of the original classes. This meant the deserialized objects couldn’t simply be used “in-place” within our application, which would therefore require considerable refactoring to work with an alternative design.
In order to support serialization and deserialization of arbitrary data in JavaScript, with the deserialized representations and originals usable interchangeably, we needed a serialization library with the following properties:
To plug this gap, I decided to write Tanagra.js, a general-purpose serialization library for JavaScript. The name of the library is a reference to one of my favorite episodes of Star Trek: The Next Generation, where the crew of the Enterprise must learn to communicate with a mysterious alien race whose language is unintelligible. This serialization library supports common data formats to avoid such problems.
Tanagra.js is designed to be simple and lightweight, and it currently supports Node.js (it hasn’t been tested in-browser, but in theory, it should work) and ES6 classes (including Maps). The main implementation supports JSON, and an experimental version supports Google Protocol Buffers. The library requires only standard JavaScript (currently tested with ES6 and Node.js), with no dependency on experimental features, Babel transpiling, or TypeScript.
Serializable classes are marked as such with a method call when the class is exported:
module.exports = serializable(Foo, myUniqueSerialisationKey)
The method returns a proxy to the class, which intercepts the constructor and injects a unique identifier. (If not specified, this defaults to the class name.) This key is serialized with the rest of the data, and the class also exposes it as a static field. If the class contains any nested types (i.e., members with types that need serializing), they are also specified in the method-call:
module.exports = serializable(Foo, [Bar, Baz], myUniqueSerialisationKey)
(Nested types for previous versions of the class can also be specified in a similar way, so that, for example, if you serialize a Foo1, it can be deserialized into a Foo2.)
During serialization, the library recursively builds up a global map of keys to classes, and uses this during deserialization. (Remember, the key is serialized with the rest of the data.) In order to know the type of the “top-level” class, the library requires that this be specified in the deserialization call:
const foo = decodeEntity(serializedFoo, Foo)
An experimental auto-mapping library walks the module tree and generates the mappings from the class names, but this only works for uniquely named classes.
The project is divided into a number of modules:
Note that the library uses US spelling.
The following example declares a serializable class and uses the tanagra-json module to serialize/deserialize it:
const serializable = require('tanagra-core').serializable
class Foo {
constructor(bar, baz1, baz2, fooBar1, fooBar2) {
this.someNumber = 123
this.someString = 'hello, world!'
this.bar = bar // a complex object with a prototype
this.bazArray = [baz1, baz2]
this.fooBarMap = new Map([
['a', fooBar1],
['b', fooBar2]
])
}
}
// Mark class `Foo` as serializable and containing sub-types `Bar`, `Baz` and `FooBar`
module.exports = serializable(Foo, [Bar, Baz, FooBar])
...
const json = require('tanagra-json')
json.init()
// or:
// require('tanagra-protobuf')
// await json.init()
const foo = new Foo(bar, baz)
const encoded = json.encodeEntity(foo)
...
const decoded = json.decodeEntity(encoded, Foo)
I compared the performance of the two serializers (the JSON serializer and experimental protobufs serializer) with a control (native JSON.parse and JSON.stringify). I conducted a total of 10 trials with each.
I tested this on my 2017 Dell XPS15 laptop with 32Gb memory, running Ubuntu 17.10.
I serialized the following nested object:
foo: {
"string": "Hello foo",
"number": 123123,
"bars": [
{
"string": "Complex Bar 1",
"date": "2019-01-09T18:22:25.663Z",
"baz": {
"string": "Simple Baz",
"number": 456456,
"map": Map { 'a' => 1, 'b' => 2, 'c' => 2 }
}
},
{
"string": "Complex Bar 2",
"date": "2019-01-09T18:22:25.663Z",
"baz": {
"string": "Simple Baz",
"number": 456456,
"map": Map { 'a' => 1, 'b' => 2, 'c' => 2 }
}
}
],
"bazs": Map {
'baz1' => Baz {
string: 'baz1',
number: 111,
map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
},
'baz2' => Baz {
string: 'baz2',
number: 222,
map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
},
'baz3' => Baz {
string: 'baz3',
number: 333,
map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
}
},
}
Serialization method | Ave. inc. first trial (ms) | StDev. inc. first trial (ms) | Ave. ex. first trial (ms) | StDev. ex. first trial (ms) |
JSON | 0.115 | 0.0903 | 0.0879 | 0.0256 |
Google Protobufs | 2.00 | 2.748 | 1.13 | 0.278 |
Control group | 0.0155 | 0.00726 | 0.0139 | 0.00570 |
Read
Serialization method | Ave. inc. first trial (ms) | StDev. inc. first trial (ms) | Ave. ex. first trial (ms) | StDev. ex. first trial (ms) |
JSON | 0.133 | 0.102 | 0.104 | 0.0429 |
Google Protobufs | 2.62 | 1.12 | 2.28 | 0.364 |
Control group | 0.0135 | 0.00729 | 0.0115 | 0.00390 |
The JSON serializer is around 6-7 times slower than native serialization. The experimental protobufs serializer is around 13 times slower than the JSON serializer, or 100 times slower than native serialization.
Additionally, the internal caching of schema/structural information within each serializer clearly has an effect on performance. For the JSON serializer, the first write is about four times slower than the average. For the protobuf serializer, it’s nine times slower. So writing objects whose metadata has already been cached is much quicker in either library.
The same effect was observed for reads. For the JSON library, the first read is around four times slower than the average, and for the protobuf library, it’s around two and a half times slower.
The performance issues of the protobuf serializer mean it’s still in the experimental stage, and I would recommend it only if you need the format for some reason. However, it is worth investing some time in, as the format is much terser than JSON, and therefore better for sending over the wire. Stack Exchange uses the format for its internal caching.
The JSON serializer is clearly much more performant but still significantly slower than the native implementation. For small object trees, this difference is not significant (a few milliseconds on top of a 50ms request will not destroy the performance of your site), but this could become an issue for extremely large object trees, and is one of my development priorities.
The library is still in the beta stage. The JSON serializer is reasonably well-tested and stable. Here is the roadmap for the next few months:
I know of no other JavaScript library that supports serializing complex, nested object data, and deserializing to its original type. If you’re implementing functionality that would benefit from the library, please give it a try, get in touch with your feedback, and consider contributing.
Generally, objects aren’t stored. They are instantiated when needed, used in processing, then removed from memory when they are no longer needed. If we need to temporarily use data elsewhere, we serialize and deserialize that data into another structure.
An object is a piece of code that encapsulates a structure, along with operations that can be performed on that structure. Generally, it is the same as an object in any object-oriented programming language.
A data object is a piece of code that temporarily contains data from a data store so that it may be read or processed in an application. Unless there is a way to keep that data in memory, it is written back or otherwise not retained when the object goes out of scope or is otherwise not needed.
Serialization enables us to keep data for use in the running process or other processes if needed. We store the data, then deserialize when it is used elsewhere.
Located in London, United Kingdom
Member since March 24, 2020
Luke has 12 years of experience as an engineer, team lead, and scrum master.
World-class articles, delivered weekly.
World-class articles, delivered weekly.
Join the Toptal® community.