Skip to content

Commit a60f7a1

Browse files
anishathalyeingydotnet
authored andcommittedJan 13, 2021
Fix compatibility with Jython
This patch was taken from #369 (comment), authored by Pekka Klärck <peke@iki.fi>. In short, Jython doesn't support lone surrogates, so importing yaml (and in particular, loading `reader.py`) caused a UnicodeDecodeError. This patch works around this through a clever use of `eval` to defer evaluation of the string containing the lone surrogates, only doing it on non-Jython platforms. This is only done in `lib/yaml/reader.py` and not `lib3/yaml/reader.py` because Jython does not support Python 3. With this patch, Jython's behavior with respect to Unicode code points over 0xFFFF becomes as it was before 0716ae2. It still does not pass all the unit tests on Jython (passes 1275, fails 3, errors on 1); all the failing tests are related to unicode. Still, this is better than simply crashing upon `import yaml`. With this patch, all tests continue to pass on Python 2 / Python 3.
1 parent ee98abd commit a60f7a1

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed
 

‎lib/yaml/reader.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -137,9 +137,14 @@ def determine_encoding(self):
137137
self.update(1)
138138

139139
if has_ucs4:
140-
NON_PRINTABLE = re.compile(u'[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD\U00010000-\U0010ffff]')
140+
NON_PRINTABLE = u'[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD\U00010000-\U0010ffff]'
141+
elif sys.platform.startswith('java'):
142+
# Jython doesn't support lone surrogates https://bugs.jython.org/issue2048
143+
NON_PRINTABLE = u'[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]'
141144
else:
142-
NON_PRINTABLE = re.compile(u'[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uFFFD]|(?:^|[^\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?:[^\uDC00-\uDFFF]|$)')
145+
# Need to use eval here due to the above Jython issue
146+
NON_PRINTABLE = eval(r"u'[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uFFFD]|(?:^|[^\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?:[^\uDC00-\uDFFF]|$)'")
147+
NON_PRINTABLE = re.compile(NON_PRINTABLE)
143148
def check_printable(self, data):
144149
match = self.NON_PRINTABLE.search(data)
145150
if match:

0 commit comments

Comments
 (0)