Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

perf: Improve enum matching performance #81

Closed
wants to merge 1 commit into from

Conversation

Stranger6667
Copy link
Owner

PoC for #80

Currently, for the enum bench, it gives 15% slower compilation, but 35-55% faster validation for valid numeric values and up to 50% improvement for validation of invalid values. It also gives some slight improvements for small schema bench, because of enum: ["hello", "world"] in it.

There are definitely possibilities to improve this PR


impl Validate for NumberEnumValidator {
fn validate<'a>(&self, schema: &'a JSONSchema, instance: &'a Value) -> ErrorIterator<'a> {
if !self.is_valid(schema, instance) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid duplicating this method.

}
}

fn name(&self) -> String {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is also could be without duplication

let mut strings = vec![];
for item in items {
match item {
Value::Array(_) => types |= PrimitiveType::Array,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to exit this loop somehow early if a different type occured

Some(EnumValidator::compile(schema))
if let Value::Array(items) = schema {
let mut types = PrimitiveTypesBitMap::new();
let mut numbers = vec![];
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can avoid creating all these vectors

@codecov
Copy link

codecov bot commented May 23, 2020

Codecov Report

Merging #81 into master will decrease coverage by 0.21%.
The diff coverage is 83.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #81      +/-   ##
==========================================
- Coverage   81.87%   81.65%   -0.22%     
==========================================
  Files          45       46       +1     
  Lines        3227     3320      +93     
==========================================
+ Hits         2642     2711      +69     
- Misses        585      609      +24     
Impacted Files Coverage Δ
src/lib.rs 17.77% <ø> (ø)
src/error.rs 59.80% <50.00%> (-0.11%) ⬇️
src/primitive_type.rs 71.15% <71.15%> (ø)
src/keywords/enum_.rs 84.44% <89.55%> (-11.39%) ⬇️
src/keywords/type_.rs 96.22% <91.66%> (-0.14%) ⬇️
src/keywords/legacy/type_draft_4.rs 92.00% <92.00%> (-0.60%) ⬇️
draft/src/lib.rs 95.45% <0.00%> (-0.06%) ⬇️
src/keywords/one_of.rs 97.72% <0.00%> (-0.06%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52b033c...8209503. Read the comment docs.

@macisamuele
Copy link
Contributor

I'm not entirely sure about this, the specifications mentions

An instance validates successfully against this keyword if its value is equal to one of the elements in this keyword's array value.
Elements in the array might be of any value, including null.

I don't see any specification of the values in the array should be of the same type and actually the tests provided by JSON-Schema-Test-Suite reflect this: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft7/enum.json#L41

I would say that in order to improve enum validation performances we might do something like

// This is dummy code ... ;)
struct EnumValidator {
    array: Vec<Value>,
    string: Vec<String>,
    ....
}

for all the types and then during validation we do iterate on the "correct" array only according to the instance to validate ... this would save us some runtime and be compliant with the specificatons

@Stranger6667
Copy link
Owner Author

I don't see any specification of the values in the array should be of the same type

You are right, I probably didn't reflect my intentions properly. The idea is that we can optimize check only for the case when all enum variants are of the same type and fallback to the current implementation otherwise so that the invariant stated in the spec still holds. I.e. if we have a string input value and we already know that there are no strings in the enum, then we can skip checking every element and immediately return false because all comparison operations will return false anyway because of a different type.

or all the types and then during validation we do iterate on the "correct" array only according to the instance to validate

Indeed it should improve the performance in general, I'll explore this option as well! :)

@macisamuele
Copy link
Contributor

Indeed it should improve the performance in general, I'll explore this option as well! :)

I mentioned that type of approach because if we're going to end merging #86 then for every instance to validate we don't even need to identify the type :). It might be a bit heavier in terms of bit in memory but I don't think that it would be a very high overhead

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants