What's the fun of writing fast code if you cannot show it off?
The comparables sorted by fastest to slowest: HYPERLUMINAL-MEM, CL-CONSPACK and CL-STORE.
hyperluminal-mem is currently broken in quicklisp (though you can hit accept a couple times to get it compiled and running).
In terms of feature comparable, cl-binary-store (and cl-store) has the best behavior out of the box without requiring anything of the user. cl-conspack is really nice too, but has some different goals, and it requires adding specialized serializers for each structure or object you use. It is faster than cl-store which is nice, but not fast enough, and as noted earlier doesn't work well with data containing many instances of structures or classes. Hyperluminal-mem is a very optimized binary serializer made to write to SAP buffers directly. It also requires writing code for every struct or object you want to serialize. It does not do reference or circularity tracking at all. Hyperluminal-mem is currently broken in quicklisp unfortunately, but you can still load it by accepting the compilation errors (it's problems in stm library). It's my benchmark for zippy raw data serialization. We are almost as fast as it in many cases, faster in others, slower in others.
All the tests here are mainly using SBCL as that is (by far) the fastest implementation. There is a section at the end where I run some on ECL, CCL, and Allegro.
There are three modes we do benchmarking in. The first is with *track-references* nil, which means that every piece of data is considered unique and there is no circular or multiple referencing supported. This is the only mode that hyperluminal-mem runs in. Conspack and cl-store can be run in a mode like this as well, so the comparisons are fair. The second is with *track-references* t and *support-shared-list-structures* nil, which is comparable to the default behavior of cl-store. Conspack when not tracking-refs does determine lengths of lists in advance, but will explode if you give it a circular list; so either it does full circularity tracking or none. So it's hard to do apples to apples here. The third is with *track-references* t and *support-shared-list-structures* t which is comparable to the behavior of conspack when conspack:tracking-refs is enabled.
Note that here we are writing out 1M conses and 1M small integers in < 10 ms, that's roughly 200Mobjects per second. Not too shabby! The MB/sec numbers below are a bit skewed because hyperluminal-mem writes out 8 bytes per tiny integer. On this test we have reference tracking disabled on cl-binary-store (*track-references* set to NIL) and similarly on cl-store (*check-for-circs* nil) on cl-conspack.
(defun long-list-of-tiny-integers (&optional (n 1000000))
(loop repeat n collect (- (random 33) 16)))
CL-BINARY-STORE> (test-on-data (long-list-of-tiny-integers))
HYPERLUMINAL-MEM
OUTPUT SIZE: 8.00 MB
WRITE: 2.28 ms at 3509 MB/sec
READ : 5.36 ms at 1493 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 1.00 MB <-- tiny
WRITE: 10.28 ms at 97 MB/sec
READ : 5.20 ms at 192 MB/sec
CL-CONSPACK
OUTPUT SIZE: 2.00MB
WRITE: 38.80 ms at 52 MB/sec
READ : 28.40 ms at 70 MB/sec
CL-STORE
OUTPUT SIZE: 5.00MB
WRITE: 83.60 ms at 60 MB/sec
READ : 56.00 ms at 89 MB/sec
Let's go farther into the synthetic zone. Let's turn on reference tracking *track-references* (which is the default behavior anyway). This rules out hyperluminal-mem as a comparable, and conspack does a bit heavier reference tracking so shouldn't be compared with this mode.
CL-BINARY-STORE
OUTPUT SIZE: 1.00 MB
WRITE: 13.88 ms at 72 MB/sec
READ : 4.92 ms at 203 MB/sec
CL-STORE
OUTPUT SIZE: 5.00MB
WRITE: 72.79 ms at 69 MB/sec
READ : 76.39 ms at 65 MB/sec
Now let's turn on all the features, which adds support for any type of circularity you want (this is *support-shared-list-structures* t and *track-references* t). We cannot elide cons markers in this case (at least not in a way that makes any sense from a performance view to me). Here we are nominally #'eq hash-table speed dominated. Nominally cl-store supports this mode too with precise-list-storage, but it blows the stack for any data set that is not tiny (so I had to scale the data set down for it). We are now comparing now with conspack:tracking-refs which supports the same behavior as cl-binary-store:
CL-BINARY-STORE
OUTPUT SIZE: 2.00 MB <-- larger file because user has asserted complex list circularity
WRITE: 90.71 ms at 22 MB/sec
READ : 5.68 ms at 352 MB/sec
CL-CONSPACK
OUTPUT SIZE: 2.00MB
WRITE: 171.58 ms at 12 MB/sec
READ : 29.60 ms at 68 MB/sec
CL-STORE (run on a 50x smaller set of data, scaled results up)
OUTPUT SIZE: 5.00MB
WRITE: 200.00 ms at 25 MB/sec
READ : 180.00 ms at 28 MB/sec
We can win back some performance by hinting at the amount of data we are looking at by let'ing *eq-refs-table-size* to 1.3 million as we spend a lot of time in grow-hash-table / rehash. We can do the same thing for cl-store.
CL-BINARY-STORE
OUTPUT SIZE: 2.00 MB
WRITE: 55.59 ms at 36 MB/sec
READ : 5.32 ms at 376 MB/sec
CL-STORE
OUTPUT SIZE: 5.00MB
WRITE: 80.79 ms at 62 MB/sec
READ : 69.99 ms at 71 MB/sec
In this case, though, the lists are simple, so using just *track-references* t is the right choice (it allows us to handle normal circularity and is generally the default right choice, the cost is small).
Of course, who the heck would use a list for this sort of thing. Here's a simple array of signed-byte 8 integers... obviously having a specialized serializer helps:
HYPERLUMINAL-MEM
OUTPUT SIZE: 8.00 MB
WRITE: 3.12 ms at 2564 MB/sec
READ : 3.20 ms at 2500 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 1.00 MB
WRITE: 0.04 ms at 25000 MB/sec
READ : 0.24 ms at 4167 MB/sec
CL-CONSPACK
OUTPUT SIZE: 1.00MB
WRITE: 26.80 ms at 37 MB/sec
READ : 20.80 ms at 48 MB/sec
CL-STORE
OUTPUT SIZE: 4.00MB
WRITE: 56.79 ms at 70 MB/sec
READ : 52.79 ms at 76 MB/sec
OK, now onto fixnums. These are pretty good numbers. Here is a simple vector of random fixnums... (we showed simple-arrays are no fun to benchmark with because cl-binary-store wins too easily):
HYPERLUMINAL-MEM
OUTPUT SIZE: 8.00 MB
WRITE: 1.84 ms at 4348 MB/sec
READ : 3.60 ms at 2222 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 9.00 MB
WRITE: 2.32 ms at 3880 MB/sec
READ : 3.36 ms at 2679 MB/sec
CL-CONSPACK
OUTPUT SIZE: 9.00MB
WRITE: 52.79 ms at 170 MB/sec
READ : 46.80 ms at 192 MB/sec
CL-STORE
OUTPUT SIZE: 36.75MB
WRITE: 416.36 ms at 88 MB/sec
READ : 289.17 ms at 127 MB/sec
A simple-vector of double floats:
HYPERLUMINAL-MEM
OUTPUT SIZE: 16.00 MB
WRITE: 31.72 ms at 504 MB/sec
READ : 15.08 ms at 1061 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 9.00 MB
WRITE: 3.28 ms at 2744 MB/sec
READ : 7.60 ms at 1184 MB/sec
CL-BINARY-STORE <--- just to show what a (simple-array double-float (*)) gets
OUTPUT SIZE: 8.00 MB
WRITE: 0.56 ms at 14287 MB/sec
READ : 2.04 ms at 3922 MB/sec
CL-CONSPACK
OUTPUT SIZE: 9.00MB
WRITE: 93.19 ms at 97 MB/sec
READ : 51.99 ms at 173 MB/sec
CL-STORE
OUTPUT SIZE: 47.00MB
WRITE: 553.55 ms at 85 MB/sec
READ : 860.72 ms at 55 MB/sec
Here's a list of random single floats. Here again for the case of long lists, we are making a small trade in speed for smaller size here (like conspack does). But still very very fast.
CL-BINARY-STORE> (test-on-data (long-list-of-random-single-floats))
HYPERLUMINAL-MEM
OUTPUT SIZE: 8.00 MB
WRITE: 3.00 ms at 2667 MB/sec
READ : 6.04 ms at 1325 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 5.00 MB
WRITE: 8.20 ms at 610 MB/sec
READ : 6.28 ms at 796 MB/sec
CL-CONSPACK
OUTPUT SIZE: 5.00MB
WRITE: 66.79 ms at 75 MB/sec
READ : 34.00 ms at 147 MB/sec
CL-STORE
OUTPUT SIZE: 22.00MB
WRITE: 310.77 ms at 71 MB/sec
READ : 499.55 ms at 44 MB/sec
Complex double float numbers. cl-binary-store goes fast. Same for complex single floats. Pretty niche until you starting doing heterodyning and mixing and then you have a large pile of them.
CL-BINARY-STORE> (test-on-data (long-list-of-random-complex-double-floats))
HYPERLUMINAL-MEM
OUTPUT SIZE: 24.00 MB
WRITE: 38.40 ms at 625 MB/sec
READ : 21.92 ms at 1095 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 17.00 MB
WRITE: 15.12 ms at 1124 MB/sec
READ : 15.56 ms at 1093 MB/sec
CL-CONSPACK
OUTPUT SIZE: 19.00MB
WRITE: 214.80 ms at 88 MB/sec
READ : 136.40 ms at 139 MB/sec
CL-STORE
OUTPUT SIZE: 96.00MB
WRITE: 1537.60 ms at 62 MB/sec
READ : 1882.00 ms at 51 MB/sec
cl-binary-store does good work on specialized vectors. It's just blitting the underlying store to whatever your target is, so hard to beat. This is very much a reason to use cl-binary-store --- for specialized arrays it is way faster than anything else.
CL-BINARY-STORE> (test-on-data (long-list-of-big-ub8-vectors))
HYPERLUMINAL-MEM
OUTPUT SIZE: 80.02 MB
WRITE: 32.48 ms at 2464 MB/sec
READ : 39.04 ms at 2050 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 10.01 MB
WRITE: 1.20 ms at 8338 MB/sec
READ : 2.52 ms at 3971 MB/sec
CL-CONSPACK
OUTPUT SIZE: 10.00MB
WRITE: 236.00 ms at 42 MB/sec
READ : 148.40 ms at 67 MB/sec
CL-STORE
OUTPUT SIZE: 10.01MB
WRITE: 112.00 ms at 89 MB/sec
READ : 36.80 ms at 272 MB/sec
Here are simple-bit-vectors, one of my favorite data structures in Common Lisp
CL-BINARY-STORE> (test-on-data (long-list-of-big-simple-bit-vectors))
HYPERLUMINAL-MEM
OUTPUT SIZE: 1.27 MB
WRITE: 8.00 ms at 159 MB/sec
READ : 14.44 ms at 88 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 1.26 MB
WRITE: 0.12 ms at 10467 MB/sec
READ : 0.16 ms at 7850 MB/sec
CL-CONSPACK
OUTPUT SIZE: 20.00MB
WRITE: 282.00 ms at 71 MB/sec
READ : 243.20 ms at 82 MB/sec
CL-STORE
OUTPUT SIZE: 40.05MB
WRITE: 664.40 ms at 60 MB/sec
READ : 504.80 ms at 79 MB/sec
Just all the specialized vectors I can think of. It goes fast on sbcl.
CL-BINARY-STORE> (test-on-data (a-bunch-of-specialized-arrays 100000))
HYPERLUMINAL-MEM
OUTPUT SIZE: 12.81 MB
WRITE: 11.04 ms at 1161 MB/sec
READ : 9.64 ms at 1329 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 5.09 MB
WRITE: 0.32 ms at 15899 MB/sec
READ : 0.64 ms at 7949 MB/sec
CL-CONSPACK
OUTPUT SIZE: 5.70MB
WRITE: 58.00 ms at 98 MB/sec
READ : 47.20 ms at 121 MB/sec
CL-STORE
OUTPUT SIZE: 20.10MB
WRITE: 338.80 ms at 59 MB/sec
READ : 222.00 ms at 91 MB/sec
The rest of the story is the same here, for specialized vectors and specialized arrays we are very very fast because we are just blitting them. This means cl-binary-store is a good choice for serializing double float matrices, for example:
CL-BINARY-STORE> (test-on-data (list-of-double-float-matrices))
HYPERLUMINAL-MEM
OUTPUT SIZE: 16.00 MB
WRITE: 39.48 ms at 405 MB/sec
READ : 16.48 ms at 971 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 8.00 MB
WRITE: 0.72 ms at 11112 MB/sec
READ : 2.00 ms at 4000 MB/sec
CL-CONSPACK
OUTPUT SIZE: 8.01MB
WRITE: 89.20 ms at 90 MB/sec
READ : 62.00 ms at 129 MB/sec
CL-STORE
OUTPUT SIZE: 47.01MB
WRITE: 772.80 ms at 61 MB/sec
READ : 873.60 ms at 54 MB/sec
simple-base-strings, we win again, it's just a simple specialized vector
CL-BINARY-STORE> (test-on-data (simple-base-strings))
HYPERLUMINAL-MEM
OUTPUT SIZE: 2.40 MB
WRITE: 4.04 ms at 594 MB/sec
READ : 2.56 ms at 938 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 0.79 MB
WRITE: 2.44 ms at 323 MB/sec
READ : 2.24 ms at 352 MB/sec
CL-CONSPACK
OUTPUT SIZE: 0.79MB
WRITE: 13.60 ms at 58 MB/sec
READ : 6.00 ms at 131 MB/sec
CL-STORE
OUTPUT SIZE: 0.99MB
WRITE: 16.00 ms at 62 MB/sec
READ : 7.60 ms at 130 MB/sec
unicode strings. Here cl-binary-store is doing utf-8 encoding, and you can see our decoder isn't amazing.
CL-BINARY-STORE> (test-on-data (simple-strings))
HYPERLUMINAL-MEM
OUTPUT SIZE: 2.40 MB
WRITE: 4.64 ms at 517 MB/sec
READ : 3.68 ms at 652 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 0.99 MB
WRITE: 4.36 ms at 227 MB/sec
READ : 8.76 ms at 113 MB/sec
CL-CONSPACK
OUTPUT SIZE: 0.99MB
WRITE: 14.40 ms at 69 MB/sec
READ : 5.20 ms at 190 MB/sec <-- Using package trivial-utf8, fast!
CL-STORE
OUTPUT SIZE: 2.08MB
WRITE: 32.40 ms at 64 MB/sec
READ : 13.20 ms at 157 MB/sec
Just a bunch of random crud in long list just to average out dispatch stuff.
CL-BINARY-STORE> (test-on-data (long-complex-list))
HYPERLUMINAL-MEM
OUTPUT SIZE: 20.49 MB
WRITE: 33.68 ms at 608 MB/sec
READ : 29.40 ms at 697 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 7.14 MB
WRITE: 24.00 ms at 297 MB/sec
READ : 39.32 ms at 182 MB/sec
CL-CONSPACK
OUTPUT SIZE: 7.46MB
WRITE: 133.19 ms at 56 MB/sec
READ : 86.79 ms at 86 MB/sec
CL-STORE
OUTPUT SIZE: 33.43MB
WRITE: 446.76 ms at 75 MB/sec
READ : 501.55 ms at 67 MB/sec
Both hyperluminal-mem and cl-conspack requires writing code to support each new struct. First we test without reference tracking. This means writing information out about the structures every time --- at least I think that's how these others all work. Except hyperluminal-mem which has you write code so it doesn't have to figure structures out. cl-conspack could do the same, but it doesn't. cl-binary-store does not ever disable object type tracking, so switches to implicit referencing for object slot and type information. You can see a big differerence here. (This is #'lots-of-structure-objects in benchmarks.lisp)
HYPERLUMINAL-MEM
OUTPUT SIZE: 10.40 MB
WRITE: 26.04 ms at 399 MB/sec
READ : 25.12 ms at 414 MB/sec
CL-BINARY-STORE
OUTPUT SIZE: 1.59 MB <--- much smaller
WRITE: 8.96 ms at 177 MB/sec
READ : 11.92 ms at 133 MB/sec
CL-CONSPACK
OUTPUT SIZE: 9.09MB
WRITE: 183.98 ms at 49 MB/sec
READ : 129.59 ms at 70 MB/sec
CL-STORE
OUTPUT SIZE: 13.69MB
WRITE: 232.78 ms at 59 MB/sec
READ : 231.98 ms at 59 MB/sec
Now with reference tracking. cl-conspack output size improves, but not a huge amount.
CL-BINARY-STORE
OUTPUT SIZE: 1.49 MB <--- (there are some repeated strings in the structs, so even smaller)
WRITE: 41.28 ms at 36 MB/sec
READ : 10.60 ms at 141 MB/sec
CL-CONSPACK
OUTPUT SIZE: 4.89MB
WRITE: 209.98 ms at 23 MB/sec
READ : 74.39 ms at 66 MB/sec
CL-STORE
OUTPUT SIZE: 6.59MB
WRITE: 169.98 ms at 39 MB/sec
READ : 179.98 ms at 37 MB/sec
cl-conspack comes with some cool explain tool that lets you analyze the output. For each structure it sees it repeats the structure symbol not as a reference even if it has seen that structure before. Here is a bit of the output. It's a typed-map. (:ref 0) is the slot name of the first struct slot and (:ref 1) is the slot name of the second struct slot. So, yeah, it just isn't aggressive enough about reference tracking.
((:LIST 3
((:TMAP 2
((:SYMBOL BENCH-BLARG) (:TAG 0 (:SYMBOL B)) (:STRING "54")
(:TAG 1 (:SYMBOL A)) (:NUMBER :DOUBLE-FLOAT 0.8779502726820398d0)
(:TMAP 2
((:SYMBOL BENCH-BLARG) (:REF 0) (:STRING "74") (:REF 1)
(:NUMBER :DOUBLE-FLOAT 0.42889587902623627d0)))
(:BOOLEAN NIL)))
END-OF-FILE))
END-OF-FILE)
Because cl-conspack behaves badly above for structures, here I am making sure that it does OK with reference tracking. It does.
CL-BINARY-STORE> (test-on-data (lots-of-the-same-string) :hlmem nil)
CL-BINARY-STORE
OUTPUT SIZE: 1.00 MB
WRITE: 28.80 ms at 35 MB/sec
READ : 8.20 ms at 122 MB/sec
CL-CONSPACK
OUTPUT SIZE: 1.00MB
WRITE: 208.40 ms at 5 MB/sec
READ : 24.00 ms at 42 MB/sec
CL-STORE
OUTPUT SIZE: 4.00MB
WRITE: 90.80 ms at 44 MB/sec
READ : 82.00 ms at 49 MB/sec
Again I did not feel like writing an extension for hyperluminal-mem or cl-conspack for this, though it is straightforward to do. Small files, good speeds. This is using reference tracking.
CL-BINARY-STORE> (test-on-data (lots-of-standard-objects) :hlmem nil)
CL-BINARY-STORE
OUTPUT SIZE: 0.59 MB
WRITE: 32.12 ms at 18 MB/sec
READ : 9.08 ms at 65 MB/sec
CL-CONSPACK
OUTPUT SIZE: 3.50MB
WRITE: 197.58 ms at 18 MB/sec
READ : 66.39 ms at 53 MB/sec
CL-STORE
OUTPUT SIZE: 2.20MB
WRITE: 85.20 ms at 26 MB/sec
READ : 64.00 ms at 34 MB/sec
Here hyperluminal mem explodes because of list circularity. Enabling list circularity detection beyond just the normal reference tracking . Notice the asymmetry between store and restore (and also the nice small file). The faster restore is nice. Note that this will eventually blow the stack if you don't have it big enough. We follow CDRs without recursion, but we do CAR following with recursive calls, so...
CL-BINARY-STORE> (defparameter *blarg* (a-pile-of-tangled-conses))
CL-BINARY-STORE> (test-cl-binary-store-on-data *blarg* :support-shared-list-structures t :track-references t)
CL-BINARY-STORE
OUTPUT SIZE: 0.12 MB
WRITE: 4.08 ms at 29 MB/sec
READ : 1.20 ms at 99 MB/sec
CL-BINARY-STORE> (test-cl-store-on-data *blarg* :check-for-circs t :precise-list-storage t)
CL-STORE
OUTPUT SIZE: 0.26MB
WRITE: 9.20 ms at 28 MB/sec
READ : 6.40 ms at 40 MB/sec
CL-BINARY-STORE> (test-conspack-on-data *blarg* :track-references t)
CL-CONSPACK
OUTPUT SIZE: 0.21MB
WRITE: 15.20 ms at 14 MB/sec
READ : 2218.00 ms at 0 MB/sec <-- I don't know what is going on here
Allegro, Lispworks, and CCL are not terrible from a speed point of view, about 4-5x slower than SBCL. ECL is about 10x slower, but this is due to some failures in structure inlining that is being fixed currently. ABCl is 100x slower, but I am sure somene can fix that up. cl-binary-store is still the fastest among cl-store and cl-conspack. I would do more optimization on Allegro and Lispworks but their heap/time limitations make it near impossible to do any real work.
;; Using a 100x shorter list for this test because these implementations are slow
CL-USER> (test-on-data (long-list-of-tiny-integers 10000))
ALLEGRO CL-BINARY-STORE <-- about 4x slower than sbcl
OUTPUT SIZE: 0.02 MB
WRITE: 0.24 ms at 83 MB/sec
READ : 0.89 ms at 22 MB/sec
LISPWORKS CL-BINARY-STORE <-- about 5x slower than sbcl
OUTPUT SIZE: 0.02 MB
WRITE: 0.32 ms at 63 MB/sec
READ : 1.17 ms at 17 MB/sec
CCL CL-BINARY-STORE <-- about 4x slower than sbcl
OUTPUT SIZE: 0.02 MB
WRITE: 0.27 ms at 74 MB/sec
READ : 1.58 ms at 13 MB/sec
ECL CL-BINARY-STORE <-- about 10x slower than sbcl
OUTPUT SIZE: 0.02 MB
WRITE: 4.76 ms at 4 MB/sec
READ : 7.96 ms at 3 MB/sec
Here are double-floats too:
CL-USER> (test-on-data (long-list-of-random-double-floats 10000))
ALLEGRO CL-BINARY-STORE
OUTPUT SIZE: 0.10 MB
WRITE: 1.58 ms at 63 MB/sec
READ : 4.13 ms at 24 MB/sec
ECL CL-BINARY-STORE
OUTPUT SIZE: 0.10 MB
WRITE: 10.07 ms at 10 MB/sec
READ : 22.61 ms at 4 MB/sec
CCL CL-BINARY-STORE
OUTPUT SIZE: 0.10 MB
WRITE: 0.93 ms at 108 MB/sec
READ : 6.98 ms at 14 MB/sec