Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

pcre pattern does not match unicode characters #738

Closed
jakubmisek opened this issue May 2, 2020 · 2 comments
Closed

pcre pattern does not match unicode characters #738

jakubmisek opened this issue May 2, 2020 · 2 comments
Assignees
Labels
Milestone

Comments

@jakubmisek
Copy link
Member

Following pattern supposes to match UTF8 byte sequences:

$regex = '/
	(
	(?: [\x00-\x7F]                  # single-byte sequences   0xxxxxxx
	|   [\xC2-\xDF][\x80-\xBF]       # double-byte sequences   110xxxxx 10xxxxxx
	|   \xE0[\xA0-\xBF][\x80-\xBF]   # triple-byte sequences   1110xxxx 10xxxxxx * 2
	|   [\xE1-\xEC][\x80-\xBF]{2}
	|   \xED[\x80-\x9F][\x80-\xBF]
	|   [\xEE-\xEF][\x80-\xBF]{2}
	|   \xF0[\x90-\xBF][\x80-\xBF]{2} # four-byte sequences   11110xxx 10xxxxxx * 3
	|   [\xF1-\xF3][\x80-\xBF]{3}
	|   \xF4[\x80-\x8F][\x80-\xBF]{2}
	){1,40}                          # ...one or more times
	) | .                            # anything else
	/x';

The pcre replaces sanitized string by keeping only matched characters:

$value = preg_replace( $regex, '$1', $value );

for $value = "rřsšcč";
Expected: "rřsšcč
Actual: "rsc"

@jakubmisek
Copy link
Member Author

Note: this causes WordPress to fail to save any options containing Unicode characters

@roberthusak roberthusak self-assigned this May 3, 2020
@jakubmisek jakubmisek added this to the 1.0.0 milestone May 4, 2020
@roberthusak
Copy link
Contributor

Fixed in peachpiecompiler/peachpie-perlregex@25d7e15 and peachpiecompiler/peachpie-perlregex@1174816. Bumping version of Peachpie.Library.RegularExpressions and referencing it should solve this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants