Skip to content

Use state machine to parse directives #3243

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Use state machine to parse directives #3243

wants to merge 1 commit into from

Conversation

djc
Copy link
Contributor

@djc djc commented Mar 25, 2025

Intended to fix #3174. Have not done any benchmarks yet -- suggestions on how best to approach that are welcome.

This is a little bit more code, but IMO maybe slightly more readable?

It's correct enough that it passes the tests, there might be edge cases that aren't covered but those should be easy to solve.

@dpc
Copy link
Contributor

dpc commented Mar 26, 2025

> hyperfine --warmup 30 -N ./target/release/dotr-before ./target/release/dotr-after
Benchmark 1: ./target/release/dotr-before
  Time (mean ± σ):       1.2 ms ±   0.1 ms    [User: 0.4 ms, System: 0.7 ms]
  Range (min … max):     1.1 ms …   1.9 ms    2044 runs

Benchmark 2: ./target/release/dotr-after
  Time (mean ± σ):     613.8 µs ±  79.6 µs    [User: 289.1 µs, System: 254.0 µs]
  Range (min … max):   511.5 µs … 979.8 µs    5264 runs

Summary
  ./target/release/dotr-after ran
    2.03 ± 0.34 times faster than ./target/release/dotr-before 

Added true just to have a baseline:

Benchmark 1: ./target/release/dotr-before
  Time (mean ± σ):       1.2 ms ±   0.1 ms    [User: 0.4 ms, System: 0.7 ms]
  Range (min … max):     1.1 ms …   1.8 ms    2535 runs

Benchmark 2: ./target/release/dotr-after
  Time (mean ± σ):     611.7 µs ±  80.3 µs    [User: 295.6 µs, System: 244.1 µs]
  Range (min … max):   518.8 µs … 1663.5 µs    3825 runs

Benchmark 3: true
  Time (mean ± σ):     595.8 µs ±  74.1 µs    [User: 311.4 µs, System: 215.4 µs]
  Range (min … max):   512.3 µs … 1035.5 µs    5362 runs

This app sets RUST_LOG internally, so setting it from the outside doesn't make a difference.

AFAICT you fixed it and it's nearly free now.

@klensy
Copy link
Contributor

klensy commented Mar 27, 2025

After this, regex can be removed from Cargo.toml (for tracing-subscriber), only left in dev-deps?

@djc djc force-pushed the parse-directive branch from 12445d7 to b298124 Compare March 27, 2025 12:21
@djc
Copy link
Contributor Author

djc commented Mar 27, 2025

After this, regex can be removed from Cargo.toml (for tracing-subscriber), only left in dev-deps?

Nice catch, yes! Added that into the first commit.

@djc djc force-pushed the parse-directive branch 2 times, most recently from d533cdf to a4484c3 Compare March 29, 2025 10:38
Copy link
Contributor

@hds hds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rebase this against master. We merge everything in there first and then David or I will handle backporting to v0.1.x.

Before making this change, I think we need more tests to ensure that the behavior isn't changing. Especially because there are things I'm seeing in the code that are different to what the docs say.

I'm happy to have a look at writing some of those tests, but it may take me a little bit to get to it.

hds added a commit that referenced this pull request Apr 30, 2025
With a view to replacing the env filter parsing in #3243, this change
adds some additional tests to improve our confidence in not breaking
existing behavior. Tests for empty and invalid directives is added, as
well as tests for directives overriding less specific directives.

One of the latter tests (`more_specific_dynamic_filter_less_verbose`)
currently fails, which is a known issue reported in #1388. The test is
in place with `#[should_panic]` and can be reverted to a normal test
when that behavior is fixed.

The documentation on `parse`, `parse_lossy`, `from_env_lossy` and
`try_from_env` has also been made more explicit as to the result of an
empty directive (the default directive is used). This was already
documented on `with_default_directive`.
@djc djc force-pushed the parse-directive branch from a4484c3 to bab5257 Compare April 30, 2025 16:07
@djc djc requested a review from yaahc as a code owner April 30, 2025 16:07
@djc djc changed the base branch from v0.1.x to master April 30, 2025 16:07
@djc djc requested a review from davidbarsky as a code owner April 30, 2025 16:07
@djc djc force-pushed the parse-directive branch from bab5257 to 2007997 Compare April 30, 2025 16:21
@djc djc force-pushed the parse-directive branch from 2007997 to 06e5695 Compare April 30, 2025 16:26
for (i, c) in from.trim().char_indices() {
state = match (state, c) {
(Start, '[') => Span { span_start: i + 1 },
(Start, c) if !c.is_alphanumeric() => return Err(ParseError::new()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this arm to pass the new test cases from #3262. This seems like a reasonable way to weed out invalid stuff while allowing non-ASCII target names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to match the previous regex here, and is_alphanumeric is missing a couple of characters.

In the current implementation, a target accepts [\w:-] so as well as : and -, we need everything specified by \w, which the regex docs specify as: word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}).

Is there some way we can get those character classes in Rust?

Otherwise, the real thing that made that test invalid is that it started with a comma. I'm happy to modify the test if we want to accept starting with a comma (and ignore empty directives). I think it would be acceptable to be more permissive about what characters we accept for the target (or a span name for example), but we should not be more restrictive.

@djc
Copy link
Contributor Author

djc commented Apr 30, 2025

@djc djc requested a review from hds April 30, 2025 16:28
for (i, c) in from.trim().char_indices() {
state = match (state, c) {
(Start, '[') => Span { span_start: i + 1 },
(Start, c) if !c.is_alphanumeric() => return Err(ParseError::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to match the previous regex here, and is_alphanumeric is missing a couple of characters.

In the current implementation, a target accepts [\w:-] so as well as : and -, we need everything specified by \w, which the regex docs specify as: word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}).

Is there some way we can get those character classes in Rust?

Otherwise, the real thing that made that test invalid is that it started with a comma. I'm happy to modify the test if we want to accept starting with a comma (and ignore empty directives). I think it would be acceptable to be more permissive about what characters we accept for the target (or a span name for example), but we should not be more restrictive.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tracing-subscriber's Directive parsing adds 600us to startup time
5 participants