Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Parse, don’t validate #610

Open
wismill opened this issue Jan 24, 2025 · 3 comments
Open

Parse, don’t validate #610

wismill opened this issue Jan 24, 2025 · 3 comments
Labels
compile-keymap Indicates a need for improvements or additions to keymap compilation enhancement Indicates new feature requests needs info

Comments

@wismill
Copy link
Member

wismill commented Jan 24, 2025

As a general note, if I had to rewrite the parser, I would opt for the “parse, don’t validate” approach.

The article is really good. But a yacc grammar is pretty much "parse" no? I'm curious which part you consider the "validate".

@bluetech My point is that we use a bunch of Expr that require further check on their type after the yacc parsing. So Expr is too general and we may end up with error with no reference to erroneous file/line. Consider the silly:

xkb_keymap { };
    xkb_keycodes { };
    xkb_types {
        type "XXX" {
        modifiers[1234] = Shift;
        map[Shift] = Level2;
        level_name[Level1] = "Base";
        level_name[Level2] = "Number";
        };
    };
    xkb_compat { };
    xkb_symbols { };
};

You would just get the following warning, without any reference to the erroneous line:

xkbcommon: WARNING: The modifiers field of a key type is not an array; Illegal array subscript ignored

Imagine now having this in an included key type file: really not ideal for debugging. But at this stage we do not have parser context anymore!

On the other hand, the following file will fail with a better (but still incomplete1) error2:

xkb_keymap {
    xkb_keycodes { 
      <};> = 123;
      <🦆> = 456;
    };
    xkb_types { };
    xkb_compat { };
    xkb_symbols { };
};

That said, I guess that fully parsing a section is slower than the current approach so any include would get slower with the parse, don’t validate approach applied strictly.

Originally posted by @wismill in #598 (comment)

Footnotes

  1. (unknown file) instead of the path of the relevant file.

  2. Guess what line is erroneous? The <🦆> one, not the <};> one…

@wismill wismill added enhancement Indicates new feature requests compile-keymap Indicates a need for improvements or additions to keymap compilation needs info labels Jan 24, 2025
@bluetech
Copy link
Member

I understand what you mean now. But it's a bit hard for me to imagine how it will look.

Do you mean, get rid of the intermediate AST layer, and construct the semantic types (e.g. SymbolsInfo) directly from the parser? This is basically a "one-pass compiler", but I don't know if the XKB text format is amenable to this, unless you defer validations to a second pass.

Or do you mean, make the parser more specific, so we don't use e.g. Expr so much but e.g. more specific expressions in the grammar rules. This would blow up the amount of grammar rules but it should be possible I think.

@wismill
Copy link
Member Author

wismill commented Jan 24, 2025

I consider 4 main aspects:

  1. Correctness and security (no leak, no NULL deference, etc.)
  2. Easy code maintenance
  3. Good error messages
  4. Runtime efficiency

Observations:

  • 1 is obviously the most important but difficult to achieve with the current general data types.
  • 1 and 2 somehow go hand in hand.
  • 3 and 4 may be competing: good logging is costly.
  • 1 and 3 would benefit from a more specific parser (parse, don’t validate), but it may affect 4 with the blow-up of grammar rules?.
  • 4 may benefit from a one-pass compiler, but what if we want a section that come after a lot of others in the same file? What if the keymap components do not come in order?

I would combine more specific rules (e.g. no array nor string where we expect a number) with maybe the immediate computation of whatever we can: masks, indexes. We want to have the error in context (file + position), not lost in postprocessing. This would also maybe help with 4 by allocating less, with as much as possible results directly usable by the semantic types.

@bluetech
Copy link
Member

For fun, I did a quick test for one expr type, action/action list:
bluetech@211ac59
This does indeed make one place error early on invalid input - actions in symbols.
But we still need a check for interpret.actions = ... in xkb_compat files. If we want to make this safe, this when the "explosion" begins. We'd need to split the XkbMapConfig rule such that xkb_compat defines its own rule. Then we need to add CompatDeclList so it can define its own specific rule for this interpret global. Etc.

It should be possible but I can see the appeal of how it's currently done. Possibly with a custom recursive descent parser it would be more natural.

Arguably this small change is already an improvement.

I'll play with this more if I have some time.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
compile-keymap Indicates a need for improvements or additions to keymap compilation enhancement Indicates new feature requests needs info
Projects
None yet
Development

No branches or pull requests

2 participants