Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix performance issue caused by using repeated > characters inside <?xml #170

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

Watson1978
Copy link
Contributor

@Watson1978 Watson1978 commented Jul 16, 2024

A < is treated as a string delimiter.
In certain cases, if < is used in succession, read and match are repeated, which slows down the process. Therefore, the following is used to read ahead to a specific part of the string in advance.

…`<?xml`

A `<` is treated as a string delimiter. 
In certain cases, if `<` is used in succession, read and match are repeated, which slows down the process.
Therefore, the following is used to read ahead to a specific part of the string in advance.

## Proof of Concept
```ruby
require "rexml"
require "benchmark"

def test(benchmark, payload)
  benchmark.report { begin REXML::Document.new(payload) rescue Exception end }
end

Benchmark.bm do |x|
  test(x, '<?xml version="1.0" ' + ">" * 20000 + ' ?>')
  test(x, '<?xml version="1.0" ' + ">" * 40000 + '?>')
  test(x, '<?xml version="1.0" ' + ">" * 60000 + '?>')
  test(x, '<?xml version="1.0" ' + ">" * 80000 + '/>')
  test(x, '<?xml version="1.0" ' + ">" * 100000 + '/>')
end
```
@kou kou merged commit b8a5f4c into ruby:master Jul 16, 2024
61 checks passed
@kou
Copy link
Member

kou commented Jul 16, 2024

Thanks.

@Watson1978 Watson1978 deleted the fix-performance-1 branch July 17, 2024 09:34
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants