Skip to content
This repository has been archived by the owner on Aug 15, 2021. It is now read-only.

packed regression for enum variants with data? #173

Closed
ctm opened this issue Jan 9, 2020 · 7 comments
Closed

packed regression for enum variants with data? #173

ctm opened this issue Jan 9, 2020 · 7 comments
Labels

Comments

@ctm
Copy link
Contributor

ctm commented Jan 9, 2020

With cbor 0.10, enum variant names are serialized if the variant carries data. Here's an example:

use {
    serde::{Deserialize, Serialize},
    serde_cbor::ser,
};

#[derive(Serialize, Deserialize)]
enum Foo {
    Bar,
    Quux(u8),
    Wtf(f64),
}

fn main() {
    let cbor = ser::to_vec(&Foo::Bar).unwrap();
    let cbor_packed = ser::to_vec_packed(&Foo::Bar).unwrap();
    println!("       cbor: {:?}", cbor);
    println!("cbor_packed: {:?}", cbor_packed);
    let cbor = ser::to_vec(&Foo::Quux(42)).unwrap();
    let cbor_packed = ser::to_vec_packed(&Foo::Quux(42)).unwrap();
    println!("       cbor: {:?}", cbor);
    println!("cbor_packed: {:?}", cbor_packed);
    let cbor = ser::to_vec(&Foo::Wtf(3.1415)).unwrap();
    let cbor_packed = ser::to_vec_packed(&Foo::Wtf(3.1415)).unwrap();
    println!("       cbor: {:?}", cbor);
    println!("cbor_packed: {:?}", cbor_packed);
}

Under 0.9 the output is:

       cbor: [99, 66, 97, 114]
cbor_packed: [0]
       cbor: [130, 100, 81, 117, 117, 120, 24, 42]
cbor_packed: [130, 1, 24, 42]
       cbor: [130, 99, 87, 116, 102, 251, 64, 9, 33, 202, 192, 131, 18, 111]
cbor_packed: [130, 2, 251, 64, 9, 33, 202, 192, 131, 18, 111]

Under 0.10:

       cbor: [99, 66, 97, 114]
cbor_packed: [0]
       cbor: [161, 100, 81, 117, 117, 120, 24, 42]
cbor_packed: [161, 100, 81, 117, 117, 120, 24, 42]
       cbor: [161, 99, 87, 116, 102, 251, 64, 9, 33, 202, 192, 131, 18, 111]
cbor_packed: [161, 99, 87, 116, 102, 251, 64, 9, 33, 202, 192, 131, 18, 111]
@pyfisch
Copy link
Owner

pyfisch commented Jan 9, 2020

See release notes: https://github.com/pyfisch/cbor/releases/tag/v0.10.0

The default enum format was changed. It now uses the externally tagged format recommended by serde.

@ctm
Copy link
Contributor Author

ctm commented Jan 9, 2020

I don't see a way to tell serde to use anything other than the default, although there's a lot about serde that I don't understand, so I could be overlooking something obvious to everyone else.

If it's not possible to do that, would you be interested in a PR to making it possible to change the default so that serde can write more compact enums when using to_vec_packed? I believe I could easily add that as a feature.

If there's already a way to do that, it would be handy (to me at least!) to have that info in the Packed Encoding portion of the documentation. If not, and you want a PR, I'll be sure to update that section of the documentation explaining how (and why) to use the new feature. If there's not a way to do it and you don't want a PR to add that capability, I'm happy to write a doc-only PR that updates that section of the documentation so that others won't be surprised by the behavior.

@pyfisch
Copy link
Owner

pyfisch commented Jan 10, 2020

I now understand what you mean.
That enum variants with data are now serialized as strings is indeed a bug. But due to the design of serde it is most likely a wontfix issue as you don't receive all information in the serializer.

If there's not a way to do it and you don't want a PR to add that capability, I'm happy to write a doc-only PR that updates that section of the documentation so that others won't be surprised by the behavior.

I'll accept both a PR fixing the issue or a PR containing documentation for the current behavior.

@pyfisch pyfisch added bug and removed question labels Jan 10, 2020
@ctm
Copy link
Contributor Author

ctm commented Jan 10, 2020

I hope I'm not wasting anyone's time, but it seems to me that if to_vec_packed's definition included a call to legacy_enums, the enum tag would be omitted from the encoding. I.e.:

    value.serialize(&mut Serializer::new(&mut IoWrite::new(&mut vec)).packed_format().legacy_enums())?;

but I can't reconcile that fix with:

But due to the design of serde it is most likely a wontfix issue as you don't receive all information in the serializer.

My backstory is that I'm using yew with actix-web and had been using json to pass messages through a websocket with the idea that I'd eventually move to a more compact binary representation. I assumed this would be trivial since both the client and server are using Rust.

When I started pursuing the binary format, yew was using serde_cbor 0.9.0 and my cursory exploration of to_vec_packed showed that it did what I wanted it to (which was to omit the names of the enum variants). When yew upgraded to serde_cbor 0.10.0 I didn't notice the change until I was debugging some of my own code and happened to see the variant name present in the encoding.

Since I wasn't around when a lot of this ground was already covered, I'm guessing that I'm missing something, but I don't know what. AFAICT, the internal machinations of IoWrite aren't exposed, so an end-user can't write something equivalent to my above code, so assuming there's no desire to change to_vec_packed, one solution would be to introduce some other method, e.g., to_vec_minimal that uses both packed_format and legacy_enums.

@ctm
Copy link
Contributor Author

ctm commented Jan 10, 2020

Of course shortly after I wrote the above, I realized I ahd made a mistake concerning IoWrite and it looks like the end-user can write a version that calls legacy_enums in addition to packed_format. I need to take a break (and should have before sending that last comment!), but I'll give that a go in an hour or two. Assuming I can get it to go, I'll then submit a doc patch that explains why people might want to use something more than to_vec_packed and how to do it.

@ctm
Copy link
Contributor Author

ctm commented Jan 10, 2020

So, indeed any user who is paying attention can create a routine comparable to to_vec_packed that will also use the legacy enum encoding to save space when using enums whose variants have data.

I think I understand the situation well enough to update the documentation, but I'm wondering if you'd be willing to introduce a to_slice_minimal method, so people who want the most compact representation wouldn't need to roll their own?

As it is, I introduced a CborPacked encoding for yew that I think is probably a bad idea in light of what I know now. I think a CborMinimal would be a better replacement and I'll submit a PR for that to the yew team once I know if you'd like the tiny bit of code in serde-cbor.

I'll prepare a patch which includes to_slice_minimal, but take no offense if you don't want it. If you'd prefer it not be in serde-cbor, I'll write a separate PR-only patch that explains why to_slice_vec might not be what people want and how to get a more compact representation.

ctm added a commit to ctm/cbor that referenced this issue Jan 10, 2020
@pyfisch
Copy link
Owner

pyfisch commented Jan 10, 2020

I think I understand the situation well enough to update the documentation, but I'm wondering if you'd be willing to introduce a to_slice_minimal method, so people who want the most compact representation wouldn't need to roll their own?

No, please do not add more methods to serde-cbor. As you see the interactions between different features already cause problems and originally I planned to remove the old encoding in a future version.

ctm added a commit to ctm/cbor that referenced this issue Jan 10, 2020
npmccallum pushed a commit to npmccallum/cbor that referenced this issue Jul 6, 2020
matix2267 added a commit to matix2267/cbor that referenced this issue Jun 14, 2021
Legacy enum format supported packed encoding pretty well but with the
switch to new enum format there was a regression (pyfisch#173).
This commit fixes the new enum format in packed encoding.

For example consider the following structure:
```rust
enum Enum {
    Unit,
    NewType(i32),
    Tuple(i32, i32),
    Struct { x: i32, y: i32 },
}
```
Legacy enum packed encodings are:
```
Empty: <variant number>
NewType: [<variant number>, value]
Tuple: [<variant number>, values..]
Struct: [<variant number>, {<struct>}]
```

Non-legacy enum packed encodings before this commit:
```
Empty: <variant number>
NewType: {"<variant>": value}
Tuple: {"<variant>": [values..]}
Struct: {<variant number>: {<struct>}}
```
Notice how NewType and Tuple store the name instead of variant number.

Fixed after this commit:
```
Empty: <variant number>
NewType: {<variant number>: value}
Tuple: {<variant number>: [values..]}
Struct: {<variant number>: {<struct>}}
```

After this commit the packed encoding can be briefly described as:
All structs fields and enum variants are identified by their number
rather than name.
This applies to all types of members (unit, newtype, tuple and struct).
matix2267 added a commit to matix2267/cbor that referenced this issue Jun 14, 2021
Fixes pyfisch#187

Legacy enum format supported packed encoding pretty well but with the
switch to new enum format there was a regression (pyfisch#173).
This commit fixes the new enum format in packed encoding.

For example consider the following structure:
```rust
enum Enum {
    Unit,
    NewType(i32),
    Tuple(i32, i32),
    Struct { x: i32, y: i32 },
}
```

Legacy enum packed encodings are:
```
Empty: <variant number>
NewType: [<variant number>, value]
Tuple: [<variant number>, values..]
Struct: [<variant number>, {<struct>}]
```

Non-legacy enum packed encodings before this commit:
```
Empty: <variant number>
NewType: {"<variant>": value}
Tuple: {"<variant>": [values..]}
Struct: {<variant number>: {<struct>}}
```
Notice how NewType and Tuple store the name instead of variant number.

Fixed after this commit:
```
Empty: <variant number>
NewType: {<variant number>: value}
Tuple: {<variant number>: [values..]}
Struct: {<variant number>: {<struct>}}
```

After this commit the packed encoding can be briefly described as:
All struct fields and enum variants are encoded as their field number
rather than name.
This applies to all types of members (unit, newtype, tuple and struct).
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants