Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

OpenCC "conversion_chain" not fully working #652

Closed
3 tasks done
amorphobia opened this issue May 17, 2023 · 1 comment · Fixed by #688 or #715
Closed
3 tasks done

OpenCC "conversion_chain" not fully working #652

amorphobia opened this issue May 17, 2023 · 1 comment · Fixed by #688 or #715
Labels

Comments

@amorphobia
Copy link

amorphobia commented May 17, 2023

Describe the bug
Sometimes OpenCC conversion only applies the first dict in json file, but skips following ones.

To Reproduce
Steps to reproduce the bug:

  1. Create following files in Rime/opencc directory:
  • t.json
{
  "name": "Test Conversion",
  "segmentation": {
    "type": "mmseg",
    "dict": {
      "type": "text",
      "file": "t1.txt"
    }
  },
  "conversion_chain": [{
    "dict": {
      "type": "text",
      "file": "t1.txt"
    }
  }, {
    "dict": {
      "type": "text",
      "file": "t2.txt"
    }
  }]
}
  • t1.txt (tab separated dictionary)
三	二
  • t2.txt (tab separated dictionary)
二	一
  1. Create custom patch for luna pinyin:
  • luna_pinyin.custom.yaml
patch:
  test_conversion:
    opencc_config: t.json
    option_name: test
    tips: all
  engine/filters/@next: simplifier@test_conversion
  switches/@next: { name: test, reset: 1, states: [ "off", "on" ] }
  1. Deploy rime, activate luna pinyin, and type for 「三」 and 「三人」

Expected behavior
All character 「三」 should be converted to 「一」 finally. However, the single character 「三」 converted to 「二」, which is an intermediate result. 「三人」 can be correctly converted to 「一人」.

Also tested with OpenCC command line tool and got correct results.

$ echo| opencc -c ./t.json
一
$ echo 三人 | opencc -c ./t.json
一人

Screenshots
三 to 二
三人 to 一人

Flavor(please complete the following information):
Select your flavor:

  • Squirrel
  • Weasel
  • Hamster

Package:

Additional context

I found the logic in simplifier.cc:

When the original candidate as a whole can be converted by Opencc::ConvertWord, it will not be further converted. Otherwise the candidate will be converted with Opencc::ConvertText.

Opencc::ConvertWord looks up the value from Opencc::dict_ which only uses the first conversion of the chain. By contrast, Opencc::ConvertText calls the OpenCC converter's Convert method, which will use the whole chain.

@amorphobia amorphobia added the bug label May 17, 2023
@amorphobia
Copy link
Author

A real conversion case

Use OpenCC command line:

$ echo 抬价 | opencc -c s2tw.json
抬價

Use Rime:
Any recipe using simplified Chinese in dict.yaml, e.g., 四叶草拼音, change opencc_config to s2tw.json, and type for 「抬价」:

抬价 to 擡價

amorphobia added a commit to amorphobia/librime that referenced this issue May 26, 2023
* continue to process whole conversion chain for whole candidate
matching except for the candidate original form
@groverlynn groverlynn mentioned this issue Sep 18, 2023
2 tasks
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
1 participant