Skip to content

Commit

Permalink
Fix performance issue caused by using repeated > characters inside …
Browse files Browse the repository at this point in the history
…`<xml><!-- --></xml>` (#177)

A `<` is treated as a string delimiter. 
In certain cases, if `<` is used in succession, read and match are
repeated, which slows down the process. Therefore, the following is used
to read ahead to a specific part of the string in advance.
  • Loading branch information
Watson1978 authored Jul 16, 2024
1 parent 1f1e6e9 commit 910e5a2
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 1 deletion.
2 changes: 1 addition & 1 deletion lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,7 @@ def pull_event
#STDERR.puts "SOURCE BUFFER = #{source.buffer}, #{source.buffer.size}"
raise REXML::ParseException.new("Malformed node", @source) unless md
if md[0][0] == ?-
md = @source.match(/--(.*?)-->/um, true)
md = @source.match(/--(.*?)-->/um, true, term: Private::COMMENT_TERM)

if md.nil? || /--|-\z/.match?(md[1])
raise REXML::ParseException.new("Malformed comment", @source)
Expand Down
7 changes: 7 additions & 0 deletions test/parse/test_comment.rb
Original file line number Diff line number Diff line change
Expand Up @@ -128,5 +128,12 @@ def test_gt_linear_performance
REXML::Document.new('<!-- ' + ">" * n + ' -->')
end
end

def test_gt_linear_performance_in_element
seq = [10000, 50000, 100000, 150000, 200000]
assert_linear_performance(seq, rehearsal: 10) do |n|
REXML::Document.new('<xml><!-- ' + '>' * n + ' --></xml>')
end
end
end
end

0 comments on commit 910e5a2

Please # to comment.