Skip to content

Commit

Permalink
Fix performance issue caused by using repeated > characters inside …
Browse files Browse the repository at this point in the history
…`<!DOCTYPE root [<!-- PAYLOAD -->]>` (#174)

A `<` is treated as a string delimiter. 
In certain cases, if `<` is used in succession, read and match are
repeated, which slows down the process. Therefore, the following is used
to read ahead to a specific part of the string in advance.
  • Loading branch information
Watson1978 authored Jul 16, 2024
1 parent c33ea49 commit a79ac8b
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 1 deletion.
2 changes: 1 addition & 1 deletion lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ def pull_event
raise REXML::ParseException.new(message, @source)
end
return [:notationdecl, name, *id]
elsif md = @source.match(/--(.*?)-->/um, true)
elsif md = @source.match(/--(.*?)-->/um, true, term: Private::COMMENT_TERM)
case md[1]
when /--/, /-\z/
raise REXML::ParseException.new("Malformed comment", @source)
Expand Down
7 changes: 7 additions & 0 deletions test/parse/test_document_type_declaration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,13 @@ def test_gt_linear_performance_malformed_entity
end
end

def test_gt_linear_performance_comment
seq = [10000, 50000, 100000, 150000, 200000]
assert_linear_performance(seq, rehearsal: 10) do |n|
REXML::Document.new('<!DOCTYPE root [<!-- ' + ">" * n + ' -->]>')
end
end

private
def parse(internal_subset)
super(<<-DOCTYPE)
Expand Down

0 comments on commit a79ac8b

Please # to comment.