Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Ability to parse cross depedencies among Avro schema files #9

Closed
wants to merge 2 commits into from

Conversation

batconjurer
Copy link
Contributor

This partially resolves Issue #6 . Given types defined in different files, the code now scans the directory, builds up a dependency tree, and uses this to aid the code generation.

Known Limitations:

  • Does not support cyclic dependencies
  • Do not search recursively through the provided directory (assumes all files are in the top level)

@lerouxrgd
Copy link
Owner

Thank you for trying to solve this, I'll review it in the coming days but it will take some time as it is quite big.

But after just a quick glance at it, why did you choose to work with schemas at the json level whereas all the library is working at the parsed avros_rs::Schema level (which is simpler I think) ?

@batconjurer
Copy link
Contributor Author

Thank you for trying to solve this, I'll review it in the coming days but it will take some time as it is quite big.

But after just a quick glance at it, why did you choose to work with schemas at the json level whereas all the library is working at the parsed avros_rs::Schema level (which is simpler I think) ?

No worries. I wanted to get review before it got any bigger!
So at first I thought of using the avro_rs::Schema s as well, but of course they cannot be parsed (as far as I can tell) if the type is not completely specified. For example parsing

{"name": "B", "type": "record", "fields" : [{"name": "field", "type": "float"}]}

and then

 { "name": "A", "type": "array", "values": "B"}   

will fail, because it does not know what B is. So my solution was to substitute the definition of B into A as a json before parsing. Maybe there is a way to avoid json altogether, but I could not figure it out.

@lerouxrgd
Copy link
Owner

lerouxrgd commented Jan 13, 2021

I had a look again at the PR, my feeling is that a lot of the json parsing is duplicate work with what avro-rs does. As a consequence, I find that it mixes json object manipulation and algorithm.

Ideally if avro_rs::Schema::parse could take an optional HashMap of known types so far, this problem could be solved much more simply. It would be possible to determine file order similarly to what you did but in a recursive manner.

So I think it would be better to see how to integrate such schema parsing feature into avro-rs. I'll try to think about a more concrete plan for that, however I am lacking time for this project (which I am more maintaining than actively developing). So if you have good ideas about all this I encourage you to get in touch with the avro-rs team.

Besides, walking through nested dir to find schema files could be added later simply by using glob.

@batconjurer
Copy link
Contributor Author

That sounds reasonable. I was taking a look at the avro-rs repo and it certainly seems doable over there. I will reach out to them. In the meantime, I will close this PR.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants