-
Notifications
You must be signed in to change notification settings - Fork 290
allow sorting keys on to_json and to_python by passing in sort_keys #1637
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
CodSpeed Performance ReportMerging #1637 will not alter performanceComparing Summary
|
07a31f5
to
7222c8d
Compare
please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main issue is perf regression
I separated the test out, I also refactor the functions so we can reuse when Let me know what else I need to improve. Thanks |
please review, not sure how I can take it from here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted']
or something like that.
0cf4b6d
to
95f9329
Compare
Added different sort mode as above, updated the PR description. |
please review 👍 |
95f9329
to
c83a212
Compare
please review 👍 |
c83a212
to
0075a4b
Compare
please review 👍 There is this test that is failing, unclear if its related to my change. Let me know any other places i can optimize. Thanks! |
dce2689
to
5ce696c
Compare
please review 👍 |
dfe7380
to
1b8d824
Compare
please review 👍 |
1 similar comment
please review 👍 |
dc0ec59
to
11501f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this @aezomz! I'm a Rust newbie as well, but I have a feeling we can significantly simplify this code by dropping the recursive dict sorting and moving this to the serializer for the dict type specifically. Can you give that a try please?
I also think it's worth seeing if we can reduce the duplication between the if sort_keys
/else
branches, but I know Rust typing may make that tricky...
Let me know if you get stuck, we can bring in some of our Rust experts!
11501f6
to
200865b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aezomz Thanks, this looks a lot better already, but I have a feeling we can get rid of the recursive sorting entirely, if we do it at each level (model + dict) explicitly.
I'm running into the limits of my Rust skills though, and I'm not sure the data types involved actually allow us to do this.
@davidhewitt Would you mind having a Rusty eyed look for us? :)
if !exclude_default(value, &field_extra, serializer).map_err(py_err_se_err)? { | ||
// Get potentially sorted value | ||
if extra.sort_keys { | ||
let sorted_dict = sort_dict_recursive(value.py(), value).map_err(py_err_se_err)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still need to sort_dict_recursive
here if we're already doing that in the dict serialization itself? (Same question for the 2 calls below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried ur recommendation and it doesn't work.
It will only work when converting to Python type (to_python
). Since we wrote the serializing logic.
As for to_json
we use the serde package if am not wrong.
They are separate into defined field processing and non defined fields processing. Cause we need to sort before using the serde package.
This is based on my high level understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unconvinced we need the recursive sort here either. serde
will serialize the content in the order we call .serialize_entry
, I think we just need to make sure that everywhere we do that we're sorting first.
let value = | ||
value_serializer.to_python(&value, next_include.as_ref(), next_exclude.as_ref(), extra)?; | ||
let value = if extra.sort_keys { | ||
let sorted_value = sort_dict_recursive(py, &value)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not do a recursive sort here, but sort new_dict
specifically after we've built it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be less efficient for sorting new_dict at the end? It would require an extra iteration over all items. The current approach sorts values recursively as they're being processed, which should be more optimal since we are already iterating and setting key values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick scan, I think the implementation here would have exponential blowup, because for a dict-of-dicts, when serializing the top-level dict we'll call sort_dict_recursive
, and then again we'll call it for each dict when we start serializing them below.
); | ||
map.serialize_entry(&key, &value_serialize)?; | ||
if extra.sort_keys { | ||
let sorted_dict = sort_dict_recursive(value.py(), &value).map_err(py_err_se_err)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above -- I'd like to get rid of recursive sorting and do this after we build the map
here. But I'm not 100% sure the data types here allow that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have serialize_pairs_python
and serialize_pairs_json
functions in infer.rs
which are probably good starting points for code re-use which might lead to a solution.
Fully agree the recursive sort is questionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aezomz Thanks, this looks a lot better already, but I have a feeling we can get rid of the recursive sorting entirely, if we do it at each level (model + dict) explicitly.
I'm running into the limits of my Rust skills though, and I'm not sure the data types involved actually allow us to do this.
@davidhewitt Would you mind having a Rusty eyed look for us? :)
53186da
to
66c7f53
Compare
..*extra | ||
}; | ||
|
||
let filter = self.filter.key_filter(key, include, exclude).map_err(py_err_se_err)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check for include and exclude defined in the parameters. Those fields are definied in pydantic schema.
So this function handles the serialization or extra handling of those fields.
@@ -444,8 +546,19 @@ impl TypeSerializer for GeneralFieldsSerializer { | |||
let filter = self.filter.key_filter(&key, include, exclude).map_err(py_err_se_err)?; | |||
if let Some((next_include, next_exclude)) = filter { | |||
let output_key = infer_json_key(&key, extra).map_err(py_err_se_err)?; | |||
let s = SerializeInfer::new(&value, next_include.as_ref(), next_exclude.as_ref(), extra); | |||
map.serialize_entry(&output_key, &s)?; | |||
if extra.sort_keys { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this block is cater for extra fields (not defined in pydantic schema)
So we need to sort recursively for that as well.
Both of them doesn't seem to use dict.rs serializer . As the objective is to_json
and using serde package directly...
…ields count when exclude_none is set
66c7f53
to
91e98f9
Compare
@DouweM can you help to review again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the very slow review by me. Thank you for working on this.
I think this is a good starting point however there's a fair bit of work still needed to make this implementation both efficient and correct.
sorted_dict.set_item(k, sorted_v)?; | ||
} | ||
Ok(sorted_dict.into_any()) | ||
} else if let Ok(list) = value.downcast::<PyList>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a branch for lists here?
let value = | ||
value_serializer.to_python(&value, next_include.as_ref(), next_exclude.as_ref(), extra)?; | ||
let value = if extra.sort_keys { | ||
let sorted_value = sort_dict_recursive(py, &value)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick scan, I think the implementation here would have exponential blowup, because for a dict-of-dicts, when serializing the top-level dict we'll call sort_dict_recursive
, and then again we'll call it for each dict when we start serializing them below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's now a lot of code duplication in this file. It feels like what we really need is a for_all_fields
function which takes a closure and calls it for each field to serialize in turn.
That function can take responsibility for, if sorting requested, collecting all fields into a vec and sorting them first. Otherwise can just do the serializing in a streaming fashion.
if !exclude_default(value, &field_extra, serializer).map_err(py_err_se_err)? { | ||
// Get potentially sorted value | ||
if extra.sort_keys { | ||
let sorted_dict = sort_dict_recursive(value.py(), value).map_err(py_err_se_err)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unconvinced we need the recursive sort here either. serde
will serialize the content in the order we call .serialize_entry
, I think we just need to make sure that everywhere we do that we're sorting first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are places in this file where we serialize dicts, those presumably need sorting added (and tests). Best way to test those code paths is by using the serialize_as_any=True
runtime flag.
); | ||
map.serialize_entry(&key, &value_serialize)?; | ||
if extra.sort_keys { | ||
let sorted_dict = sort_dict_recursive(value.py(), &value).map_err(py_err_se_err)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have serialize_pairs_python
and serialize_pairs_json
functions in infer.rs
which are probably good starting points for code re-use which might lead to a solution.
Fully agree the recursive sort is questionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add tests for computed fields; at the moment the implementation neither supports them nor tests them.
Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense?
Since I have been trying to do model_dump_json with sort keys too.
This feature should simulate the same as how we use
json.dumps(data, sort_keys=True)
Take note that
field_d
is extra and still manage to sortPlease let me know if I miss out any other features that
sort_keys=True
is suppose to do!Thanks!
Change Summary
allow sorting keys on to_json and to_python by passing in sort_keys
Related issue number
should fix pydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.
Checklist
pydantic-core
(except for expected changes)Selected Reviewer: @davidhewitt