Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve wc #243

Merged
merged 2 commits into from
Sep 23, 2024
Merged

Improve wc #243

merged 2 commits into from
Sep 23, 2024

Conversation

gcanat
Copy link
Contributor

@gcanat gcanat commented Sep 23, 2024

As suggested in #137, I took inspiration from wc2 to improve wc_file_bytes.

In my benchmarks (tested with 1G utf8.txt file generated with wctool), it is a bit faster than wc2, and on par with wz, except when passing multiple files as argument because wz will parallelize computation, but I guess we don't want to pull rayon as a dependency here :)

@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Looks good at a quick glance.

The table initialization can be improved further with a static table, yes? e.g. https://www.reddit.com/r/rust/comments/1dokkh2/recipe_for_compiletime_sparse_array_initialization/

@gcanat
Copy link
Contributor Author

gcanat commented Sep 23, 2024

OK I made the change.
For what it's worth here's the result on 1G utf8.txt

tool branch time
posixutils-rs main 2.66s
posixutils-rs wc2 1.50s
wc2.c main 2.10s
wz main 1.57s
wc v8.32 15.33

@jgarzik jgarzik merged commit 1b1ab55 into rustcoreutils:main Sep 23, 2024
2 checks passed
@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Thanks, merged.

Interested in tackling wc_file_chars()? :)

@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Post-merge comment: was_space need not appear in CountInfo at all, as far as I can tell.

@gcanat
Copy link
Contributor Author

gcanat commented Sep 23, 2024

Oh yeah, I can have a look at wc_file_chars() in the coming days. And will try to handle was_space differently.

@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Oh yeah, I can have a look at wc_file_chars() in the coming days. And will try to handle was_space differently.

Thanks.

re was_space, it appears just a local, since it is not accumulated, and should not count across different files.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants