This library is a native Clojure implementation of the Concise Binary Object Representation format specified in RFC 7049.
CBOR is a binary encoding with the goal of small code size, compact messages, and extensibility without the need for version negotiation. This makes it a good alternative to EDN for storing and transmitting Clojure data in a more compact form.
Library releases are published on Clojars. To use the latest version with Leiningen, add the following dependency to your project definition:
The clj-cbor.core
namespace contains the high-level encoding and decoding
functions. The simplest way to use this library is to require it and call them
directly with data:
=> (require '[clj-cbor.core :as cbor])
=> (cbor/encode [0 :foo/bar true {:x 'y} #{1/3} #"foo"])
; 0x8600D827683A666F6F2F626172F5A1D827623A78D8276179CD81D81E820103D82363666F6F
=> (cbor/decode *1)
[0 :foo/bar true {:x y} #{1/3} #"foo"]
With no extra arguments, encode
and decode
will make use of the
default-codec
, which comes loaded with read and write handler support for many
Java and Clojure types (see the type extensions section
below). Both functions accept an additional argument to specify the codec,
should different behavior be desired.
=> (def codec (cbor/cbor-codec :canonical true))
=> (cbor/encode codec {:foo "bar", :baz 123})
; 0xA2D827643A666F6F63626172D827643A62617A187B
=> (cbor/decode codec *1)
{:foo "bar", :baz 123}
So far we haven't specified any outputs when encoding, so we've gotten a byte
array back. The full form of encode
takes three arguments: the codec, the
output stream, and the value to encode.
=> (def out (java.io.ByteArrayOutputStream.))
=> (cbor/encode codec out :a)
5
=> (cbor/encode codec out 123)
2
=> (cbor/encode codec out true)
1
=> (cbor/encode codec out "foo")
4
=> (.toByteArray out))
; 0xD827623A61187BF563666F6F
=> (with-open [input (java.io.ByteArrayInputStream. *1)]
(doall (cbor/decode-seq codec input)))
(:a 123 true "foo")
In this mode, encode
returns the number of bytes written instead of a byte
array. We can read multiple items from an input stream using decode-seq
, which
returns a lazy sequence. If the input is a file you must realize the values
before closing the input. Similarly, encode-seq
will write a sequence of
multiple values to an output stream.
As a convenience, the library also provides the spit
, spit-all
, slurp
, and
slurp-all
functions, which operate on files:
=> (cbor/spit "data.cbor" {:abc 123, :foo "qux", :bar true})
29
=> (cbor/spit-all "data.cbor" [[0.0 'x] #{-42}] :append true)
12
=> (.length (io/file "data.cbor"))
41
=> (cbor/slurp "data.cbor")
{:abc 123, :bar true, :foo "qux"}
=> (cbor/slurp-all "data.cbor")
({:abc 123, :bar true, :foo "qux"} [0.0 x] #{-42})
In order to support types of values outside the ones which are a native to CBOR, the format uses tagged values, similar to EDN. In CBOR, the tags are integer numbers instead of symbols, but the purpose is the same: the tags convey new semantics about the following value.
The most common example of a need for this kind of type extension is
representing an instant in time. In EDN, this is represented by the #inst
tag
on an ISO-8601 timestamp string. CBOR offers two tags to represent instants -
tag 0 codes a timestamp string, while tag 1 codes a number in epoch seconds. The
former is more human-friendly, but the latter is more efficient.
New types are implemented by using read and write handlers - functions which map from typed value to representation and back. Currently, the library comes with support for the following types:
Tag | Representation | Type | Semantics |
---|---|---|---|
0 | Text string | Date /Instant |
Standard date/time string |
1 | Number | Date /Instant |
Epoch-based date/time |
2 | Byte string | BigInt |
Positive bignum |
3 | Byte string | BigInt |
Negative bignum |
4 | Array(2) | BigDecimal |
Decimal fraction |
27 | Array(2) | TaggedLiteral |
Constructor support for Clojure tagged literal values |
30 | Array(2) | Ratio |
Rational fractions, represented as numerator and denominator numbers |
32 | Text string | URI |
Uniform Resource Identifier strings |
35 | Text string | Pattern |
Regular expression strings |
37 | Byte string | UUID |
Binary-encoded UUID values |
39 | Text string | Symbol /Keyword |
Identifiers |
100 | Integer | LocalDate |
Epoch-based local calendar date |
258 | Array | Set |
Sets of unique entries |
1004 | Text string | LocalDate |
String-based local calendar date |
55799 | Varies | N/A | Self-describe CBOR |
For further information about registered tag semantics, consult the IANA Registry.
A write handler is a function which takes a typed value and returns an
encodable representation. In most cases, the representation is a CBOR tagged
value. The tag conveys the type semantic and generally the expected form that
the representation takes. Write handlers are selected by a dispatch function,
which defaults to class
. The clj-cbor.core/dispatch-superclasses
function
can be used to construct an inheritance-based dispatcher.
In some cases, multiple types will map to the same tag. For example, by default
this library maps both java.util.Date
and the newer java.time.Instant
types
to the same representation.
A read handler is a function which takes the representation from a tagged value and returns an appropriately typed value. The choice of function to parse the values thus determines the 'preferred type' to represent values of that kind.
Continuing the example, the library comes with read handlers for both Date
and
Instant
types, allowing the user to choose their preferred time type.
As of 0.7.1
, this library is competitive with many other comparable
serialization formats. Some benchmarking results can be found in
this spreadsheet.
For small and medium data sizes CBOR is more compact than most formats, while at larger sizes (above 512 bytes) all formats are fairly close in size (within 10%, generally). Other than Nippy, which was by far the fastest codec, clj-cbor was one of the fastest encoders and is in the middle of the pack in decoding times.
To give some concrete performance numbers, here are a few samples from the dataset:
Size | Encode | Decode |
---|---|---|
4 | 6.09 µs | 2.31 µs |
55 | 20.87 µs | 7.58 µs |
173 | 12.64 µs | 5.74 µs |
388 | 15.38 µs | 11.60 µs |
882 | 31.55 µs | 14.24 µs |
1632 | 54.82 µs | 33.52 µs |
3127 | 92.14 µs | 64.66 µs |
4918 | 104.92 µs | 59.67 µs |
7328 | 108.37 µs | 82.16 µs |
A few things to keep in mind while using the library:
- Streaming CBOR data can be parsed from input, but there is currently no way to generate streaming output data.
- Decoding half-precision (16-bit) floating-point numbers is supported, but the
values are promoted to single-precision (32-bit) as the JVM does not have
native support for them. There is currently no support for writing
half-precision floats except for the special values
0.0
,+Inf
,-Inf
, andNaN
, which are always written as two bytes for efficiency. - CBOR does not have a type for bare characters, so they will be converted to single-character strings when written.
- Regular expressions are supported using tag 35, but beware that Java
Pattern
objects do not compare equal or have the same hash code for otherwise identical regexes. Using them in sets or as map keys is discouraged.
This is free and unencumbered software released into the public domain. See the UNLICENSE file for more information.