Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Messages ruby: DTOs serialization and deserialization capabilities #1605

Merged
merged 47 commits into from
Jul 5, 2021

Conversation

aurelien-reeves
Copy link
Contributor

@aurelien-reeves aurelien-reeves commented Jun 10, 2021

Add serialization and deserialization capabilities to ruby DTOs

Current status: ready to be reviewed

@aurelien-reeves aurelien-reeves changed the title Messages ruby: DTOs serialization Messages ruby: DTOs serialization and deserialization capabilities Jun 10, 2021
@aurelien-reeves
Copy link
Contributor Author

Deserialization of messages with arrays of other types of messages won't be easy without the addition of a complex layer which would analysis the DTOs and the schemas to find appropriate classes

@aslakhellesoy
Copy link
Contributor

aslakhellesoy commented Jun 11, 2021

Nah, here is a simple solution:

require 'json'

class Envelope
  attr_reader :meta

  def initialize(
    meta: nil
  )
    @meta = meta
  end

  def self.from_hash(hash)
    self.new(
      meta: Meta.from_hash(hash[:meta])
    )
  end
end

class Meta
  attr_reader :protocol_version
  attr_reader :implementation

  def initialize(
    protocol_version: nil,
    implementation: nil
  )
    @protocol_version = protocol_version
    @implementation = implementation
  end

  def self.from_hash(hash)
    self.new(
      protocol_version: hash[:protocolVersion],
      implementation: Product.from_hash(hash[:implementation])
    )
  end
end

class Product
  attr_reader :name
  attr_reader :version

  def initialize(
    name: nil,
    version: nil
  )
    @name = name
    @version = version
  end

  def self.from_hash(hash)
    self.new(
      name: hash[:name],
      version: hash[:version]
    )
  end
end

describe '.from_hash' do
  it 'reconstructs a deep object' do
    # Normally we'd construct the hash with JSON.parse(json, {symbolize_names: true})
    hash = {
      meta: {
        protocolVersion: '10.2.3',
        implementation: {
          name: 'cucumis',
          version: '55.7'
        }
      }
    }

    envelope = Envelope.from_hash(hash)
    expect(envelope.class).to eq Envelope
    expect(envelope.meta.class).to eq Meta
    expect(envelope.meta.implementation.class).to eq Product
    puts envelope.inspect
  end
end

We just need to generate a .from_hash class method on each class.

@aslakhellesoy
Copy link
Contributor

And for arrays, something like this:

def self.from_hash(hash)
  self.new(
    object_array: (hash[:object_array] || []).map {|element| SomeClass.from_hash(element)},
    primitive_array: (hash[:primitive_array] || [])
  )
end

@mattwynne
Copy link
Member

I like the idea of adding (manually-written) specs for this behaviour in the DTO, even if the implementation of the solution is generated code.

@aslakhellesoy
Copy link
Contributor

Yeah we need specs for the deserialisation. I didn't modify the code generator to add those .from_hash class methods, the code I pasted above if just a proof of concept of what we could generate.

@aurelien-reeves
Copy link
Contributor Author

Thanks @aslakhellesoy
I'll study that today 👍

@aurelien-reeves
Copy link
Contributor Author

Here's a first draft of a generic solution.
The idea remain to generate as less code as possible when generating DTOs.

This is still draft. Some refactoring is needed.

@mattwynne
Copy link
Member

IMO, if we want to keep the deserialization "clutter" out of the generated DTO classes, we'd be better off putting that code in a module and using extend to add it to the metaclass. That way the DTOs are simple to read, but the user doesn't have to think about requiring extra files to get the deserialization behaviour.

@aurelien-reeves
Copy link
Contributor Author

IMO, if we want to keep the deserialization "clutter" out of the generated DTO classes, we'd be better off putting that code in a module and using extend to add it to the metaclass. That way the DTOs are simple to read, but the user doesn't have to think about requiring extra files to get the deserialization behaviour.

Done.
Deserialization are now part of messages per default.
require 'cucumber/messages' now allows to use the DTOs, but also the deserializers.

@aslakhellesoy
Copy link
Contributor

Serialization is working very well with reflection, but yes, deserialization is very difficult that way.

It's worse than difficult - it's impossible.

Knowledge of the schema is the only way we can infer the tight types to instantiate.

Good to see the deserialisation logic extracted to separate files! I would have preferred to do the same for the serialization logic (generate a separate file) rather than using reflection. Reflection adds performance overhead and complexity to the library, whereas with the statically generated version, there is no penalty for using reflection and regexp evaluation.

It's a tradeoff between complexity in the code generator and the runtime code, and I think I prefer to keep the complexity out of the runtime code.

@aurelien-reeves aurelien-reeves marked this pull request as ready for review June 24, 2021 12:51
@aurelien-reeves
Copy link
Contributor Author

aurelien-reeves commented Jun 24, 2021

It's a tradeoff between complexity in the code generator and the runtime code, and I think I prefer to keep the complexity out of the runtime code.

At the moment I stick to my idea for the serialization using reflection over code generation.

Indeed the runtime code may be a little bit more complex, but it has unit tests / specs that are easy to maintain, and I still find more easy to maintain something like this (that could still be improved in term of readibility):

##
# Returns a new Hash formed from the message attributes
# If +camelize:+ keyword parameter is set to true, then keys will be camelized
# If +reject_nil_values:+ keyword parameter is set to true, resulting hash won't include nil values
#
#   Cucumber::Messages::Duration.new(seconds: 1, nanos: 42).to_h                                 # => { seconds: 1, nanos: 42 }
#   Cucumber::Messages::PickleTag.new(name: 'foo', ast_node_id: 'abc-def').to_h(camelize: true)  # => { name: 'foo', astNodeId: 'abc-def' }
#   Cucumber::Messages::PickleTag.new(name: 'foo', ast_node_id: nil).to_h(reject_nil_values: true)  # => { name: 'foo' }
#
# It is recursive so embedded messages are also processed
#
#   location = Cucumber::Messages::Location.new(line: 2)
#   Cucumber::Messages::Comment.new(location: location, text: 'comment').to_h  # => { location: { line: 2, :column: nil }, text: "comment" }
#

def to_h(camelize: false, reject_nil_values: false)
  resulting_hash = self.instance_variables.to_h do |variable_name|
    h_key = variable_name[1..-1]
    h_key = Cucumber::Messages::Message.camelize(h_key) if camelize

    h_value = prepare_value(
      self.instance_variable_get(variable_name),
      camelize: camelize,
      reject_nil_values: reject_nil_values
    )

    [ h_key.to_sym, h_value ]
  end

  resulting_hash.reject! { |_, value| value.nil? } if reject_nil_values
  resulting_hash
end

Than something that would look like this:

require 'cucumber/messages.dtos'
require 'json'

# The code was auto-generated by {this script}[https://github.com/cucumber/common/blob/main/messages/jsonschema/scripts/codegen.rb]
#

module Cucumber
  module Messages
  <%- @schemas.sort.each do |key, schema| -%>

    class <%= class_name(key) %>

      ##
      # Returns a new <%= class_name(key) %> from the given hash.
      # If the hash keys are camelCased, they are properly assigned to the
      # corresponding snake_cased attributes.
      #
      #   Cucumber::Messages::<%= class_name(key) %>.from_h(some_hash) # => #<Cucumber::Messages::<%= class_name(key) %>:0x... ...>
      #

      def self.from_h(hash)
        return nil if hash.nil?

        self.new(
        <%-
          schema['properties'].each do |property_name, property|
            ref = property['$ref']
            items_ref = property.dig('items', '$ref')
        -%>
          <%= "#{underscore(property_name)}: " -%>
          <%- if items_ref -%>hash[:<%= property_name -%>]&.map { |item| <%= class_name(items_ref) %>.from_h(item) },
          <%- elsif ref -%><%= class_name(ref) %>.from_h(hash[:<%= property_name %>]),
          <%- else -%>hash[:<%= property_name %>],
          <%- end -%>
        <%- end -%>
        )
      end
    end
  <%- end -%>
  end
end

And regarding the performance overhead, at the moment cucumber-ruby is still based on an implementation of messages which embeds protobuf. I think the implementation here would already be an improvement.

@luke-hill
Copy link
Contributor

I'll get onto this properly on friday afternoon. So I can dedicate a bit of time to it.

@aurelien-reeves aurelien-reeves merged commit 1dff595 into main Jul 5, 2021
@aurelien-reeves aurelien-reeves deleted the messages-ruby-dtos-serialization branch July 5, 2021 08:54
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants