Skip to content

[User] CJK Character Output Broken on Windows After 41654ef #1377

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
44670 opened this issue May 9, 2023 · 7 comments
Closed

[User] CJK Character Output Broken on Windows After 41654ef #1377

44670 opened this issue May 9, 2023 · 7 comments

Comments

@44670
Copy link
Contributor

44670 commented May 9, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [Yes] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [Yes] I carefully followed the README.md.
  • [Yes] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [Yes] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I was trying to use the llama.cpp project to output CJK characters (Chinese, Japanese, Korean) in a Windows environment. I expected the characters to be correctly displayed in the console.

Current Behavior

Unfortunately, the output of CJK characters becomes garbled, and they are not correctly displayed. Instead of the expected characters, I see random characters or symbols, as shown in the attached screenshot.

Environment and Context

I have done extensive testing on various versions of the project and have narrowed down the problem. The last version that worked correctly was f9a6364. However, from version 41654ef onwards, the CJK character output no longer works correctly.

If there is any additional information you need from my end to assist with resolving this issue, I am more than willing to provide it. I can perform a more detailed analysis if necessary.

Thank you for your attention to this matter. I look forward to the resolution of this issue so I can continue to use this excellent project.

Screenshot

image

@DannyDaemonic
Copy link
Contributor

This is related to a change I made. Could you include the full text output so that I can recreate the bug?

@44670
Copy link
Contributor Author

44670 commented May 9, 2023

Hi, here is the text output of both version, -s 1 has been set to ensure the seeds are same.

From 41654ef:
鎮ㄥソ锛屾垜鏄偍鐨凙I鍔╂墜銆傛湁浠€涔堝彲浠ュ府鍒版偍鐨勫悧锛?

From f9a6364:
您好,我是您的AI助手。有什么可以帮到您的吗?

The script could be found at: https://github.com/OpenBuddy/OpenBuddy/blob/main/chat-llamacpp.bat

The model is: https://huggingface.co/OpenBuddy/openbuddy-7b-v1.1-q4_0-enc

Thank you for you attention to this matter.

@DannyDaemonic
Copy link
Contributor

So I'm understanding this correctly. The input is showing up fine, it's the output, the text generation that becomes garbled?

@44670
Copy link
Contributor Author

44670 commented May 9, 2023

Yes! In both versions the input is fine.

@DannyDaemonic
Copy link
Contributor

@44670 Are you able to try out the fix in #1379? If you don't know how to check it out you can get the patch from here to apply it manually: https://patch-diff.githubusercontent.com/raw/ggerganov/llama.cpp/pull/1379.patch

@44670
Copy link
Contributor Author

44670 commented May 9, 2023

Yes, I think it's fixed! Thank you for your fast responding!

@44670 44670 closed this as completed May 9, 2023
@DannyDaemonic
Copy link
Contributor

I can't review my own code, so I'm going to leave this open for visibility until someone reviews the code.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants