Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

move clean and finalize to after filters #1654

Closed
wants to merge 4 commits into from
Closed

move clean and finalize to after filters #1654

wants to merge 4 commits into from

Conversation

djptek
Copy link
Contributor

@djptek djptek commented Nov 9, 2021

  • Have you signed the contributor license agreement? Y
  • Have you followed the contributor guidelines? Y
  • For proposing substantial changes or additions to the schema, have you reviewed the [RFC process] (https://github.com/elastic/ecs/blob/master/rfcs/README.md)? Y
  • If submitting code/script changes, have you verified all tests pass locally using make test? Y
  • If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes? Y
  • Is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed. Y
  • Have you added an entry to the CHANGELOG.next.md? Y

@djptek djptek requested a review from a team November 9, 2021 10:19
@ebeahan
Copy link
Member

ebeahan commented Nov 9, 2021

This change would be significant, and the motivation to make it isn't included in the description. @djptek, can you add some more context?

I have concerns about switching the processing order without considering the full impact of the change.

The current implementation does the following before any filtering (--subset, --exclude) happens:

  • loads the entire schema (fields = loader.load_schemas(ref=args.ref, included_files=args.include))
  • sets field reuse locations, checks attributes, and makes defaults explicit (cleaner.clean(fields, strict=args.strict))
  • fleshing out those reuses, and finalizes field details (finalizer.finalize(fields))

Switching the ordering will affect later processing downstream. For example, I run the generator with --subset, and an exception is now thrown because the flat_name attribute doesn't exist yet:

$ python scripts/generator.py --ref v1.10.0 --include usage-example/fields/custom --subset usage-example/fields/subset.yml usage-example/fields/process.yml --template-settings usage-example/fields/template-settings.json --out usage-example
Loading schemas from git ref v1.10.0
Running generator. ECS version 1.10.0
Loading user defined schemas: ['usage-example/fields/custom']
Traceback (most recent call last):
  File "/Users/user/dev/ecs/scripts/generator.py", line 107, in <module>
    main()
  File "/Users/user/dev/ecs/scripts/generator.py", line 49, in main
    fields = subset_filter.filter(fields, args.subset, out_dir)
  File "/Users/user/dev/ecs/scripts/schema/subset_filter.py", line 13, in filter
    intermediate_files.generate(subfields, os.path.join(out_dir, 'ecs', 'subset', subset['name']), False)
  File "/Users/user/dev/ecs/scripts/generators/intermediate_files.py", line 14, in generate
    flat = generate_flat_fields(fields)
  File "/Users/user/dev/ecs/scripts/generators/intermediate_files.py", line 26, in generate_flat_fields
    visitor.visit_fields_with_memo(filtered, accumulate_field, flattened)
  File "/Users/user/dev/ecs/scripts/schema/visitor.py", line 59, in visit_fields_with_memo
    visit_fields_with_memo(details['fields'], func, memo)
  File "/Users/user/dev/ecs/scripts/schema/visitor.py", line 59, in visit_fields_with_memo
    visit_fields_with_memo(details['fields'], func, memo)
  File "/Users/user/dev/ecs/scripts/schema/visitor.py", line 57, in visit_fields_with_memo
    func(details, memo)
  File "/Users/user/dev/ecs/scripts/generators/intermediate_files.py", line 37, in accumulate_field
    flat_name = field_details['flat_name']
KeyError: 'flat_name'

I also expect subtle and unexpected changes to --subset now that we're not fully generating all the nestings in advance of subset filtering.

@djptek
Copy link
Contributor Author

djptek commented Nov 10, 2021

While I was adding tests for #1519 I realised the check for e.g. @timestamp was happening before subset, exclude etc.

Therefore, it is possible to pass the check for @timestamp and then go create a fieldset that does not include this required field

The motivation was to move the check until after subset, exclude etc.

@ebeahan How about if rollback this change, so the logic is restored, then extract the checks and move only these downstream?

@djptek
Copy link
Contributor Author

djptek commented Nov 17, 2021

Hi @ebeahan how about if I rollback this change, so the logic is restored, then extract the checks and move only these downstream?

@ebeahan
Copy link
Member

ebeahan commented Nov 17, 2021

Hi @ebeahan how about if I rollback this change, so the logic is restored, then extract the checks and move only these downstream?

Sounds good!

@github-actions
Copy link

This PR is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale Stale issues and pull requests label Feb 23, 2022
@djptek djptek closed this by deleting the head repository Sep 27, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
stale Stale issues and pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants