Messages ruby: DTOs serialization and deserialization capabilities #1605

aurelien-reeves · 2021-06-10T09:04:25Z

Add serialization and deserialization capabilities to ruby DTOs

Current status: ready to be reviewed

aurelien-reeves · 2021-06-10T14:40:19Z

Deserialization of messages with arrays of other types of messages won't be easy without the addition of a complex layer which would analysis the DTOs and the schemas to find appropriate classes

aslakhellesoy · 2021-06-11T17:25:14Z

Nah, here is a simple solution:

require 'json'

class Envelope
  attr_reader :meta

  def initialize(
    meta: nil
  )
    @meta = meta
  end

  def self.from_hash(hash)
    self.new(
      meta: Meta.from_hash(hash[:meta])
    )
  end
end

class Meta
  attr_reader :protocol_version
  attr_reader :implementation

  def initialize(
    protocol_version: nil,
    implementation: nil
  )
    @protocol_version = protocol_version
    @implementation = implementation
  end

  def self.from_hash(hash)
    self.new(
      protocol_version: hash[:protocolVersion],
      implementation: Product.from_hash(hash[:implementation])
    )
  end
end

class Product
  attr_reader :name
  attr_reader :version

  def initialize(
    name: nil,
    version: nil
  )
    @name = name
    @version = version
  end

  def self.from_hash(hash)
    self.new(
      name: hash[:name],
      version: hash[:version]
    )
  end
end

describe '.from_hash' do
  it 'reconstructs a deep object' do
    # Normally we'd construct the hash with JSON.parse(json, {symbolize_names: true})
    hash = {
      meta: {
        protocolVersion: '10.2.3',
        implementation: {
          name: 'cucumis',
          version: '55.7'
        }
      }
    }

    envelope = Envelope.from_hash(hash)
    expect(envelope.class).to eq Envelope
    expect(envelope.meta.class).to eq Meta
    expect(envelope.meta.implementation.class).to eq Product
    puts envelope.inspect
  end
end

We just need to generate a .from_hash class method on each class.

aslakhellesoy · 2021-06-11T17:44:36Z

And for arrays, something like this:

def self.from_hash(hash)
  self.new(
    object_array: (hash[:object_array] || []).map {|element| SomeClass.from_hash(element)},
    primitive_array: (hash[:primitive_array] || [])
  )
end

mattwynne · 2021-06-11T20:20:13Z

I like the idea of adding (manually-written) specs for this behaviour in the DTO, even if the implementation of the solution is generated code.

aslakhellesoy · 2021-06-11T22:40:03Z

Yeah we need specs for the deserialisation. I didn't modify the code generator to add those .from_hash class methods, the code I pasted above if just a proof of concept of what we could generate.

aurelien-reeves · 2021-06-14T06:10:08Z

Thanks @aslakhellesoy
I'll study that today 👍

aurelien-reeves · 2021-06-14T14:14:56Z

Here's a first draft of a generic solution.
The idea remain to generate as less code as possible when generating DTOs.

This is still draft. Some refactoring is needed.

mattwynne · 2021-06-23T17:08:02Z

IMO, if we want to keep the deserialization "clutter" out of the generated DTO classes, we'd be better off putting that code in a module and using extend to add it to the metaclass. That way the DTOs are simple to read, but the user doesn't have to think about requiring extra files to get the deserialization behaviour.

aurelien-reeves · 2021-06-24T09:44:05Z

IMO, if we want to keep the deserialization "clutter" out of the generated DTO classes, we'd be better off putting that code in a module and using extend to add it to the metaclass. That way the DTOs are simple to read, but the user doesn't have to think about requiring extra files to get the deserialization behaviour.

Done.
Deserialization are now part of messages per default.
require 'cucumber/messages' now allows to use the DTOs, but also the deserializers.

aslakhellesoy · 2021-06-24T11:50:11Z

Serialization is working very well with reflection, but yes, deserialization is very difficult that way.

It's worse than difficult - it's impossible.

Knowledge of the schema is the only way we can infer the tight types to instantiate.

Good to see the deserialisation logic extracted to separate files! I would have preferred to do the same for the serialization logic (generate a separate file) rather than using reflection. Reflection adds performance overhead and complexity to the library, whereas with the statically generated version, there is no penalty for using reflection and regexp evaluation.

It's a tradeoff between complexity in the code generator and the runtime code, and I think I prefer to keep the complexity out of the runtime code.

messages/ruby/lib/cucumber/messages/message/serialization.rb

aurelien-reeves · 2021-06-24T13:56:26Z

It's a tradeoff between complexity in the code generator and the runtime code, and I think I prefer to keep the complexity out of the runtime code.

At the moment I stick to my idea for the serialization using reflection over code generation.

Indeed the runtime code may be a little bit more complex, but it has unit tests / specs that are easy to maintain, and I still find more easy to maintain something like this (that could still be improved in term of readibility):

##
# Returns a new Hash formed from the message attributes
# If +camelize:+ keyword parameter is set to true, then keys will be camelized
# If +reject_nil_values:+ keyword parameter is set to true, resulting hash won't include nil values
#
#   Cucumber::Messages::Duration.new(seconds: 1, nanos: 42).to_h                                 # => { seconds: 1, nanos: 42 }
#   Cucumber::Messages::PickleTag.new(name: 'foo', ast_node_id: 'abc-def').to_h(camelize: true)  # => { name: 'foo', astNodeId: 'abc-def' }
#   Cucumber::Messages::PickleTag.new(name: 'foo', ast_node_id: nil).to_h(reject_nil_values: true)  # => { name: 'foo' }
#
# It is recursive so embedded messages are also processed
#
#   location = Cucumber::Messages::Location.new(line: 2)
#   Cucumber::Messages::Comment.new(location: location, text: 'comment').to_h  # => { location: { line: 2, :column: nil }, text: "comment" }
#

def to_h(camelize: false, reject_nil_values: false)
  resulting_hash = self.instance_variables.to_h do |variable_name|
    h_key = variable_name[1..-1]
    h_key = Cucumber::Messages::Message.camelize(h_key) if camelize

    h_value = prepare_value(
      self.instance_variable_get(variable_name),
      camelize: camelize,
      reject_nil_values: reject_nil_values
    )

    [ h_key.to_sym, h_value ]
  end

  resulting_hash.reject! { |_, value| value.nil? } if reject_nil_values
  resulting_hash
end

Than something that would look like this:

require 'cucumber/messages.dtos'
require 'json'

# The code was auto-generated by {this script}[https://github.com/cucumber/common/blob/main/messages/jsonschema/scripts/codegen.rb]
#

module Cucumber
  module Messages
  <%- @schemas.sort.each do |key, schema| -%>

    class <%= class_name(key) %>

      ##
      # Returns a new <%= class_name(key) %> from the given hash.
      # If the hash keys are camelCased, they are properly assigned to the
      # corresponding snake_cased attributes.
      #
      #   Cucumber::Messages::<%= class_name(key) %>.from_h(some_hash) # => #<Cucumber::Messages::<%= class_name(key) %>:0x... ...>
      #

      def self.from_h(hash)
        return nil if hash.nil?

        self.new(
        <%-
          schema['properties'].each do |property_name, property|
            ref = property['$ref']
            items_ref = property.dig('items', '$ref')
        -%>
          <%= "#{underscore(property_name)}: " -%>
          <%- if items_ref -%>hash[:<%= property_name -%>]&.map { |item| <%= class_name(items_ref) %>.from_h(item) },
          <%- elsif ref -%><%= class_name(ref) %>.from_h(hash[:<%= property_name %>]),
          <%- else -%>hash[:<%= property_name %>],
          <%- end -%>
        <%- end -%>
        )
      end
    end
  <%- end -%>
  end
end

And regarding the performance overhead, at the moment cucumber-ruby is still based on an implementation of messages which embeds protobuf. I think the implementation here would already be an improvement.

luke-hill · 2021-07-01T08:52:45Z

I'll get onto this properly on friday afternoon. So I can dedicate a bit of time to it.

aurelien-reeves added 3 commits June 10, 2021 10:39

Add templates into make target

18a27e1

Add serialization capabilities to ruby DTOs

bd35fcb

Camelize result of to_json

5ae1764

aurelien-reeves added 🏦 debt Tech debt language: ruby library: cucumber-messages labels Jun 10, 2021

aurelien-reeves added 2 commits June 10, 2021 11:07

Remove redundant comments in generated DTOs file

8c7e1ef

Add some deserialization capabilities

70124b3

aurelien-reeves changed the title ~~Messages ruby: DTOs serialization~~ Messages ruby: DTOs serialization and deserialization capabilities Jun 10, 2021

Merge branch 'main' into messages-ruby-dtos-serialization

7a070e6

aurelien-reeves added 3 commits June 14, 2021 08:10

Merge branch 'main' into messages-ruby-dtos-serialization

f9b4463

Generate deserializers for object_arrays attributes

a53f414

Add deserialization capabilities for object arrays

ad76c47

aurelien-reeves added 8 commits June 14, 2021 16:59

Fix DTO generation of private methods

e421556

Extract #underscore and #camelize into a dedicated mixin

d77a0aa

Extract #to_h and #to_json into a dedicated mixin

52a959d

Fix #camelize which would return nil if term has not been transformed

ac3bfa4

Add some rdoc to Messages::Message::Serialization

9f7cea8

Refactor messages serialization

a8eff8c

Extract #from_h and #from_json into a dedicated mixin

13d3e77

Fix rdoc in Messages::Message::Utils

0c20ffe

aurelien-reeves requested a review from mattwynne June 15, 2021 13:48

Merge branch 'main' into messages-ruby-dtos-serialization

d68ef74

Provide serialization and deserialization per default

a9a015e

aurelien-reeves and others added 3 commits June 24, 2021 11:48

Fix the demo formatter

fc461f5

Clean-up the codegen script

c75476f

Merge branch 'main' into messages-ruby-dtos-serialization

6f813be

aslakhellesoy reviewed Jun 24, 2021

View reviewed changes

messages/ruby/lib/cucumber/messages/message/serialization.rb Show resolved Hide resolved

aurelien-reeves added 4 commits June 24, 2021 14:45

[skip ci]Fix a typo

6344b84

[skip ci] Remove useless empty file

63fb7ea

[skip ci] Update some RDoc

ce9af63

Add some RDoc to DTO's from_h methods

ef38ad1

aurelien-reeves marked this pull request as ready for review June 24, 2021 12:51

aurelien-reeves added 3 commits June 25, 2021 09:04

Merge branch 'main' into messages-ruby-dtos-serialization

ec232bd

Merge branch 'main' into messages-ruby-dtos-serialization

d36121a

Merge branch 'main' into messages-ruby-dtos-serialization

3c334ec

aurelien-reeves requested review from aslakhellesoy and luke-hill June 28, 2021 14:38

aurelien-reeves added 2 commits June 28, 2021 16:56

Merge branch 'main' into messages-ruby-dtos-serialization

fc02f75

Merge branch 'main' into messages-ruby-dtos-serialization

a317abb

Merge branch 'main' into messages-ruby-dtos-serialization

0d4bcd3

aslakhellesoy approved these changes Jul 1, 2021

View reviewed changes

aslakhellesoy and others added 3 commits July 1, 2021 15:42

Merge branch 'main' into messages-ruby-dtos-serialization

f0520b5

Update messages ruby deserializers

417013b

Merge branch 'main' into messages-ruby-dtos-serialization

94dad8c

aurelien-reeves mentioned this pull request Jul 5, 2021

Update Gherkin to use Messages DTOs instead of plain ruby hashes #1603

Merged

aurelien-reeves merged commit 1dff595 into main Jul 5, 2021

aurelien-reeves deleted the messages-ruby-dtos-serialization branch July 5, 2021 08:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Messages ruby: DTOs serialization and deserialization capabilities #1605

Messages ruby: DTOs serialization and deserialization capabilities #1605

aurelien-reeves commented Jun 10, 2021 •

edited

Loading

aurelien-reeves commented Jun 10, 2021

aslakhellesoy commented Jun 11, 2021 •

edited

Loading

aslakhellesoy commented Jun 11, 2021

mattwynne commented Jun 11, 2021

aslakhellesoy commented Jun 11, 2021

aurelien-reeves commented Jun 14, 2021

aurelien-reeves commented Jun 14, 2021

mattwynne commented Jun 23, 2021

aurelien-reeves commented Jun 24, 2021

aslakhellesoy commented Jun 24, 2021

aurelien-reeves commented Jun 24, 2021 •

edited

Loading

luke-hill commented Jul 1, 2021

Messages ruby: DTOs serialization and deserialization capabilities #1605

Messages ruby: DTOs serialization and deserialization capabilities #1605

Conversation

aurelien-reeves commented Jun 10, 2021 • edited Loading

aurelien-reeves commented Jun 10, 2021

aslakhellesoy commented Jun 11, 2021 • edited Loading

aslakhellesoy commented Jun 11, 2021

mattwynne commented Jun 11, 2021

aslakhellesoy commented Jun 11, 2021

aurelien-reeves commented Jun 14, 2021

aurelien-reeves commented Jun 14, 2021

mattwynne commented Jun 23, 2021

aurelien-reeves commented Jun 24, 2021

aslakhellesoy commented Jun 24, 2021

aurelien-reeves commented Jun 24, 2021 • edited Loading

luke-hill commented Jul 1, 2021

aurelien-reeves commented Jun 10, 2021 •

edited

Loading

aslakhellesoy commented Jun 11, 2021 •

edited

Loading

aurelien-reeves commented Jun 24, 2021 •

edited

Loading