Skip to content

ignore crate: Fix reference cycle for compiled matchers #2692

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Jan 6, 2024

Conversation

fe9lix
Copy link
Contributor

@fe9lix fe9lix commented Dec 20, 2023

This attempts to fix the issue around unbounded memory growth in the ignore crate when ignore flags are enabled, see #2690

I don't have full understanding of the ignore crate codebase but it looks like there is a reference cycle caused by the compiled matchers (compiled HashMap holds ref to Ignore and Ignore holds ref to HashMap). Using weak refs fixes issue #2690 in my test project. Also confirmed via before and after when profiling the code, see the attached screenshots.

CleanShot 2023-12-20 at 16 30 56@2x
CleanShot 2023-12-20 at 16 26 02@2x

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you.

It looks like there is a reference cycle caused by the compiled
matchers (compiled HashMap holds ref to Ignore and Ignore holds ref
to HashMap). Using weak refs fixes issue BurntSushi#2690 in my test project.
Also confirmed via before and after when profiling the code, see the
attached screenshots in BurntSushi#2692.

Fixes BurntSushi#2690
@BurntSushi BurntSushi merged commit b9c7749 into BurntSushi:master Jan 6, 2024
klensy added a commit to klensy/rust that referenced this pull request Jan 16, 2024
 $ cargo update  -p ignore --precise=0.4.22
    Updating crates.io index
    Updating aho-corasick v1.0.2 -> v1.1.2
    Updating bstr v1.5.0 -> v1.9.0
    Updating globset v0.4.10 -> v0.4.14
    Updating ignore v0.4.20 -> v0.4.22
    Updating log v0.4.19 -> v0.4.20
    Updating memchr v2.5.0 -> v2.7.1
      Adding regex-automata v0.4.3
    Updating walkdir v2.3.3 -> v2.4.0

some notable change is BurntSushi/ripgrep#2692

reduces memory usage from

==47796== Total:     821,467,407 bytes in 3,955,595 blocks
==47796== At t-gmax: 10,976,209 bytes in 66,100 blocks
==47796== At t-end:  2,944,016 bytes in 12,490 blocks
==47796== Reads:     4,788,959,023 bytes
==47796== Writes:    975,493,639 bytes

to

==66633== Total:     791,565,538 bytes in 3,503,144 blocks
==66633== At t-gmax: 10,914,511 bytes in 65,997 blocks
==66633== At t-end:  395,531 bytes in 941 blocks
==66633== Reads:     4,249,388,949 bytes
==66633== Writes:    814,119,580 bytes
klensy added a commit to klensy/rust that referenced this pull request Jan 16, 2024
 $ cargo update  -p ignore --precise=0.4.22
    Updating crates.io index
    Updating aho-corasick v1.0.2 -> v1.1.2
    Updating bstr v1.5.0 -> v1.9.0
    Updating globset v0.4.10 -> v0.4.14
    Updating ignore v0.4.20 -> v0.4.22
    Updating log v0.4.19 -> v0.4.20
    Updating memchr v2.5.0 -> v2.7.1
      Adding regex-automata v0.4.3
    Updating walkdir v2.3.3 -> v2.4.0

some notable change is BurntSushi/ripgrep#2692

reduces memory usage from

==47796== Total:     821,467,407 bytes in 3,955,595 blocks
==47796== At t-gmax: 10,976,209 bytes in 66,100 blocks
==47796== At t-end:  2,944,016 bytes in 12,490 blocks
==47796== Reads:     4,788,959,023 bytes
==47796== Writes:    975,493,639 bytes

to

==66633== Total:     791,565,538 bytes in 3,503,144 blocks
==66633== At t-gmax: 10,914,511 bytes in 65,997 blocks
==66633== At t-end:  395,531 bytes in 941 blocks
==66633== Reads:     4,249,388,949 bytes
==66633== Writes:    814,119,580 bytes

bump regex to dedupe one regex-syntax

$ cargo update -p regex
    Updating crates.io index
    Updating regex v1.8.4 -> v1.10.2
    Removing regex-syntax v0.7.2
klensy added a commit to klensy/rust that referenced this pull request Jan 22, 2024
 $ cargo update  -p ignore --precise=0.4.22
    Updating crates.io index
    Updating aho-corasick v1.0.2 -> v1.1.2
    Updating bstr v1.5.0 -> v1.9.0
    Updating globset v0.4.10 -> v0.4.14
    Updating ignore v0.4.20 -> v0.4.22
    Updating log v0.4.19 -> v0.4.20
    Updating memchr v2.5.0 -> v2.7.1
      Adding regex-automata v0.4.3
    Updating walkdir v2.3.3 -> v2.4.0

some notable change is BurntSushi/ripgrep#2692

reduces memory usage from

==47796== Total:     821,467,407 bytes in 3,955,595 blocks
==47796== At t-gmax: 10,976,209 bytes in 66,100 blocks
==47796== At t-end:  2,944,016 bytes in 12,490 blocks
==47796== Reads:     4,788,959,023 bytes
==47796== Writes:    975,493,639 bytes

to

==66633== Total:     791,565,538 bytes in 3,503,144 blocks
==66633== At t-gmax: 10,914,511 bytes in 65,997 blocks
==66633== At t-end:  395,531 bytes in 941 blocks
==66633== Reads:     4,249,388,949 bytes
==66633== Writes:    814,119,580 bytes

bump regex to dedupe one regex-syntax

$ cargo update -p regex
    Updating crates.io index
    Updating regex v1.8.4 -> v1.10.2
    Removing regex-syntax v0.7.2
klensy added a commit to klensy/rust that referenced this pull request Jan 22, 2024
 $ cargo update  -p ignore --precise=0.4.22
    Updating crates.io index
    Updating aho-corasick v1.0.2 -> v1.1.2
    Updating bstr v1.5.0 -> v1.9.0
    Updating globset v0.4.10 -> v0.4.14
    Updating ignore v0.4.20 -> v0.4.22
    Updating log v0.4.19 -> v0.4.20
    Updating memchr v2.5.0 -> v2.7.1
      Adding regex-automata v0.4.3
    Updating walkdir v2.3.3 -> v2.4.0

some notable change is BurntSushi/ripgrep#2692

reduces memory usage from

==47796== Total:     821,467,407 bytes in 3,955,595 blocks
==47796== At t-gmax: 10,976,209 bytes in 66,100 blocks
==47796== At t-end:  2,944,016 bytes in 12,490 blocks
==47796== Reads:     4,788,959,023 bytes
==47796== Writes:    975,493,639 bytes

to

==66633== Total:     791,565,538 bytes in 3,503,144 blocks
==66633== At t-gmax: 10,914,511 bytes in 65,997 blocks
==66633== At t-end:  395,531 bytes in 941 blocks
==66633== Reads:     4,249,388,949 bytes
==66633== Writes:    814,119,580 bytes

bump regex to dedupe one regex-syntax

$ cargo update -p regex
    Updating crates.io index
    Updating regex v1.8.4 -> v1.10.2
    Removing regex-syntax v0.7.2
klensy added a commit to klensy/rust that referenced this pull request Jan 22, 2024
 $ cargo update  -p ignore --precise=0.4.22
    Updating crates.io index
    Updating aho-corasick v1.0.2 -> v1.1.2
    Updating bstr v1.5.0 -> v1.9.0
    Updating globset v0.4.10 -> v0.4.14
    Updating ignore v0.4.20 -> v0.4.22
    Updating log v0.4.19 -> v0.4.20
    Updating memchr v2.5.0 -> v2.7.1
      Adding regex-automata v0.4.3
    Updating walkdir v2.3.3 -> v2.4.0

some notable change is BurntSushi/ripgrep#2692

reduces memory usage from

==47796== Total:     821,467,407 bytes in 3,955,595 blocks
==47796== At t-gmax: 10,976,209 bytes in 66,100 blocks
==47796== At t-end:  2,944,016 bytes in 12,490 blocks
==47796== Reads:     4,788,959,023 bytes
==47796== Writes:    975,493,639 bytes

to

==66633== Total:     791,565,538 bytes in 3,503,144 blocks
==66633== At t-gmax: 10,914,511 bytes in 65,997 blocks
==66633== At t-end:  395,531 bytes in 941 blocks
==66633== Reads:     4,249,388,949 bytes
==66633== Writes:    814,119,580 bytes

bump regex to dedupe one regex-syntax

$ cargo update -p regex
    Updating crates.io index
    Updating regex v1.8.4 -> v1.10.2
    Removing regex-syntax v0.7.2
klensy added a commit to klensy/rust that referenced this pull request Jan 30, 2024
 $ cargo update  -p ignore --precise=0.4.22
    Updating crates.io index
    Updating aho-corasick v1.0.2 -> v1.1.2
    Updating bstr v1.5.0 -> v1.9.0
    Updating globset v0.4.10 -> v0.4.14
    Updating ignore v0.4.20 -> v0.4.22
    Updating log v0.4.19 -> v0.4.20
    Updating memchr v2.5.0 -> v2.7.1
      Adding regex-automata v0.4.3
    Updating walkdir v2.3.3 -> v2.4.0

some notable change is BurntSushi/ripgrep#2692

reduces memory usage from

==47796== Total:     821,467,407 bytes in 3,955,595 blocks
==47796== At t-gmax: 10,976,209 bytes in 66,100 blocks
==47796== At t-end:  2,944,016 bytes in 12,490 blocks
==47796== Reads:     4,788,959,023 bytes
==47796== Writes:    975,493,639 bytes

to

==66633== Total:     791,565,538 bytes in 3,503,144 blocks
==66633== At t-gmax: 10,914,511 bytes in 65,997 blocks
==66633== At t-end:  395,531 bytes in 941 blocks
==66633== Reads:     4,249,388,949 bytes
==66633== Writes:    814,119,580 bytes

bump regex to dedupe one regex-syntax

$ cargo update -p regex
    Updating crates.io index
    Updating regex v1.8.4 -> v1.10.2
    Removing regex-syntax v0.7.2
klensy added a commit to klensy/rust that referenced this pull request Jan 30, 2024
 $ cargo update  -p ignore --precise=0.4.22
    Updating crates.io index
    Updating aho-corasick v1.0.2 -> v1.1.2
    Updating bstr v1.5.0 -> v1.9.0
    Updating globset v0.4.10 -> v0.4.14
    Updating ignore v0.4.20 -> v0.4.22
    Updating log v0.4.19 -> v0.4.20
    Updating memchr v2.5.0 -> v2.7.1
      Adding regex-automata v0.4.3
    Updating walkdir v2.3.3 -> v2.4.0

some notable change is BurntSushi/ripgrep#2692

reduces memory usage from

==47796== Total:     821,467,407 bytes in 3,955,595 blocks
==47796== At t-gmax: 10,976,209 bytes in 66,100 blocks
==47796== At t-end:  2,944,016 bytes in 12,490 blocks
==47796== Reads:     4,788,959,023 bytes
==47796== Writes:    975,493,639 bytes

to

==66633== Total:     791,565,538 bytes in 3,503,144 blocks
==66633== At t-gmax: 10,914,511 bytes in 65,997 blocks
==66633== At t-end:  395,531 bytes in 941 blocks
==66633== Reads:     4,249,388,949 bytes
==66633== Writes:    814,119,580 bytes

bump regex to dedupe one regex-syntax

$ cargo update -p regex
    Updating crates.io index
    Updating regex v1.8.4 -> v1.10.2
    Removing regex-syntax v0.7.2
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants