In this guide, we’ll zoom in on both kinds of frameworks, considering API usability and performance. While I’m sure you’ll find plenty of value in examining this juxtaposition, make no mistake: we are comparing apples to oranges.
For our benchmark, we’ll use this relatively simple data structure (please don’t use this for anything in production): Serve, the incumbent serialization /serialization library, is elegant, flexible, fast to run, and slow to compile.
You rarely need to implement those traits manually since Serve comes with powerful derive macros that allow many tweaks to the output format. For example, you can rename fields, define defaults for missing values on serialization, and more.
Once you have derived or implemented those traits, you can use the format crates, such as serde_json or bin code, to (DE)serialize your data. This means reducing the problem of serializing N data structures to M formats from M × N to M + N.
Because Serve relies heavily on monomorphisation to facilitate great performance despite its famous flexibility, compile time has been an issue from the beginning. To counter this, multiple crates have appeared, from miniseries, to tinkered, to nanohertz.
The idea behind these tools is to use runtime dispatch to reduce code bloat due to monomorphisation. Serde_Jason crate allows serialization to and serialization from JSON, which is plain text and thus (somewhat) readable, at the cost of some overhead during parsing and formatting.
Serializing can be done with to_string, to_vec, or to_writer with _pretty -variants to write out nicely formatted instead of minified JSON. Serializing TowerData takes about a quarter of a microsecond on my machine.
The overhead will vary depending on the serialized types and values. This is another textual format with multiple language bindings like JSON, but it’s very idiosyncratic.
It’s a wee bit terser than JSON at 91 bytes, but slower to work with. Like serde_json, bin code also works with serve to serialize or deserialize any types that implement the respective traits.
Because Writer is implemented for a good deal of types (notably smut DEC
Serializing took roughly 35 nanoseconds and deserializing a bit less than an eighth of a microsecond. It prides itself on being very terse on the wire, which our benchmark case validated: the data serialized to just 24 bytes.
The Rust implementation of the MessagePack protocol is called rap, which also works with serve. Its thriftiness when it comes to space comes with a small performance overhead compared to bin code.
Note that this will only construct a serde_json::Value, which is pretty fast (to the tune of only a few nanoseconds), but not exactly a serialized object. Serialization was speedy enough at roughly 140 nanoseconds, but serialization was, unexpectedly, slower at almost half a millisecond.
At 41 bytes, it’s a good compromise between size and speed, because at 60 nanoseconds to serialize and 180 nanoseconds to deserialize, it’s roughly 1.5x slower than bin code, at roughly 70 percent of the message size. The relatively fast serialization and the thrifty format are a natural fit for embedded systems.
MessagePack might overtax the embedded CPU, whereas we often have a beefier machine to deserialize the data. For a freely chosen polyglot format, both JSON and mudpack best it in every respect.
From Google comes a polyglot serialization format with bindings to C, C++, Java, C#, Go, LA, and Rust, among others. FlatBuffers appears to lack a direct representation of pointer-length integers (e.g., size nor of Range s), so in this example, I just picked uint64 and an array of length 2 to represent them.
Compiling this to Rust code requires the flat, which is available as a Windows binary. Obviously, this is not our original data structure, but for the sake of comparability, we’ll benchmark the serialization and serialization via this slightly modified type.
After we published this post, people asked why I left out Cap’n Photo, so I added it to the benchmark. It works similarly to flat buffers, but the interface is somewhat impenetrable, so I cannot guarantee the results.
UPDATE, Sept. 25, 2020 : One of the Cap’n Photo crate maintainers sent a PR my way that showed I did do something wrong: I used a nested struct to represent an Option
If you find problems or improvements, feel free to send me an issue or PR. If you’re interested in monitoring and tracking performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try Rocket.
Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. I got some great comments on Reddit, So I wanted to do another post to update my numbers.
I’ve added bin code, which serializes values as raw bytes. I’ve changed C++ and Rust JSON tests to serialize ends as units.
I’m betting the reason the Rust numbers are so high is that we’re allocating strings. Back when I started this project a number of months ago, I wanted to benchmark to see how we compared to some other languages.
Loser, mainly because it was using a complex real world log structure, and they did the hard work of implementing benchmarks for encoding/Jason, goprotobuf, gogoprotobuf, and go-capnproto. I also included the Go JSON and C++ Davidson, which both claim to be the fastest JSON libraries for those languages.
DOM-style means deserializing into a generic object, then from there into the final object, SAX-style means a callback approach where a callback handler is called for each JSON token. Ffjson uses code generation to get its serialization sped, but it doesn’t implement serialization.
Between then and now, Patrick Walton, Human Aden, myself, and probably lots others found and fixed a number of bugs across serialize::Jason, std::Io, generic function calls, and more. While having finished unboxed closures might finally let us break through this performance bottleneck, it’s not guaranteed.
All in all, this, and the representational problems from post 1 make it pretty obvious we got some fundamental issues here, and we need to use an alternative solution. Next post I’ll start getting into the details of the design of serve.
The serialize crate is an internal part of the standard Rust distribution. The rustc-serialize crate used to be serialized, but it was moved out to a separate repository and uploaded to crates.Io so that it can evolve on its own.
This was done because the utility of rustc-serialize is enormous but it was not realistic to get it stabilized in time for Rust 1.0. Since the Rust distribution will prohibit unstable features on the stable channel, the only way to continue using the serialization infrastructure is to 1) stabilize what we have or 2) move it to crates.Io, where the unstable restrictions don't apply.
It's a bit weird, but it leaves the Decidable/Enjoyable names available for a future backwards-compatible version of a serialize crate that is better than what we have now (perhaps this is what serde2 will become from the aforementioned link).