Skip to content

Commit 78ef23e

Browse files
committed
Parse unicode characters above \uFFFF
The regular expression matching identifiers was incomplete for unicode characters. Now 𝖒 can be parsed in an identifier. Ruby Bug #7524
1 parent 5544853 commit 78ef23e

File tree

2 files changed

+11
-1
lines changed

2 files changed

+11
-1
lines changed

lib/rdoc/ruby_lex.rb

+1-1
Original file line numberDiff line numberDiff line change
@@ -857,7 +857,7 @@ def identify_gvar
857857
end
858858

859859
IDENT_RE = if defined? Encoding then
860-
/[\w\u0080-\uFFFF]/u
860+
eval '/[\w\u{0080}-\u{FFFFF}]/u' # 1.8 can't parse \u{}
861861
else
862862
/[\w\x80-\xFF]/
863863
end

test/test_rdoc_ruby_lex.rb

+10
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# coding: UTF-8
2+
13
require 'rdoc/test_case'
24

35
class TestRDocRubyLex < RDoc::TestCase
@@ -133,6 +135,14 @@ def test_class_tokenize_heredoc_percent_N
133135
assert_equal expected, tokens
134136
end
135137

138+
def test_class_tokenize_identifier_high_unicode
139+
tokens = RDoc::RubyLex.tokenize '𝖒', nil
140+
141+
expected = @TK::TkIDENTIFIER.new(0, 1, 0, '𝖒')
142+
143+
assert_equal expected, tokens.first
144+
end
145+
136146
def test_class_tokenize_percent_1
137147
tokens = RDoc::RubyLex.tokenize 'v%10==10', nil
138148

0 commit comments

Comments
 (0)