Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Issue with the json file? Confusables for characters 'm' and 'w' #9

Open
michaelbutler opened this issue Dec 15, 2021 · 3 comments
Open

Comments

@michaelbutler
Copy link

Something seems off with just those characters in the json file.

Repro:

<?php declare(strict_types=1);

$all = file_get_contents('confusables.json');
$all = json_decode($all, true);

print_r($all['a']); // this is fine, prints out array of 23 confusables

print_r($all['m']); // PROBLEM: Only prints one confusable, {"c":"rn","n":"LATIN SMALL LETTER R, LATIN SMALL LETTER N"}

print_r($all['w']); // PROBLEM: Only prints one confusable, {"c":"vv","n":"LATIN SMALL LETTER V, LATIN SMALL LETTER V"}

What both of these have in common is that the only confusable happens to be a double char: m has rn and w has vv, so maybe there is a bug in the generation of this file that doesn't know about multi-character confusables?

Here's a link showing actual confusables for M and W, which I would expect to be in this JSON file:

https://util.unicode.org/UnicodeJsps/confusables.jsp?a=manwe&r=None

@michaelbutler
Copy link
Author

Update, I realized I wasn't checking for the uppercase versions too (M and W) but it still looks like some are missing for some reason.

@carbontwelve
Copy link
Contributor

Interesting, i'll see if I can investigate further.

carbontwelve added a commit that referenced this issue Feb 17, 2022
@carbontwelve
Copy link
Contributor

Interesting, it seems an update to the json files results in w returning the correct values but m still returning just one:

Array
(
    [0] => Array
        (
            [c] => rn
            [n] => LATIN SMALL LETTER R, LATIN SMALL LETTER N
        )

)

I have added a breaking test on an issue branch here: https://github.com/photogabble/php-confusable-homoglyphs/blob/issue/9-checking-missing-confusables/tests/ConfusableTest.php#L123-L131

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants