Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Items in toc are not clickable if those items contain some CJK characters. #481

Closed
llouislu opened this issue Jul 18, 2016 · 9 comments
Closed
Assignees

Comments

@llouislu
Copy link

I render this test document with the command asciidoctor-pdf -r asciidoctor-pdf-cjk-kai_gen_gothic -a pdf-style=KaiGenGothicCN test.asc with the help of asciidoctor-pdf-cjk gem package which provides a library for a good theme and font solution.

Version info:

Asciidoctor PDF 1.5.0.alpha.11 using Asciidoctor 1.5.4 [http://asciidoctor.org]
Runtime Environment (ruby 2.2.5p319 (2016-04-26 revision 54774) [x86_64-cygwin]) (lc:UTF-8 fs:GBK in:- ex:UTF-8)

test.asc

= test
:doctype:   article
:docinfo:
:toc:
:toclevels: 3
:toc-title: 目录(Table of Contents)


== English clickable

== 中文 Chinese unclickable

== 한국어 Korean clickable

== カタカナ 片仮名 Japanese unclickable

You can also get my rendered PDF here.

@mojavelinux
Copy link
Member

This appears to be the same problem as #308.

@mojavelinux
Copy link
Member

When I click the two unclickable links, I get the following warnings in the console:

failed to look up _​中文_chinese_unclickable
failed to look up _​カタカナ_​片仮名_japanese_unclickable

@mojavelinux
Copy link
Member

As you can see, PDF cannot reference internal IDs that contain characters beyond a certain range (I'm not sure what that range is). As I mention in #308, we'll need to generate internal IDs using a different strategy that adheres to the PDF specification.

@mojavelinux
Copy link
Member

mojavelinux commented Aug 7, 2016

Here's the document I'm using for testing:

= CJK References
:doctype: book
:toc:
:toclevels: 3
:toc-title: 目录

== English

content

== 中文 Chinese

content

== 한국어 Korean

content

== カタカナ 片仮名 Japanese

content

See <<_中文_chinese>>.

@mojavelinux mojavelinux added this to the v1.5.0.alpha.13 milestone Aug 7, 2016
@mojavelinux mojavelinux self-assigned this Aug 7, 2016
@mojavelinux
Copy link
Member

I think the solution is to convert the anchor to hex if a call to ascii_only? returns false. This must be done both where the anchor is defined and where the anchor is referenced. I suppose we could do this for all anchor, but let's start by just doing it when necessary.

In the case of 中文 Chinese, the anchor will become:

0x5fe4b8ade696875f6368696e657365

I decided to prefix the hex value with 0x to distinguish it from an ASCII ID.

This change will also squash the following warning:

warning: regexp match /.../n against to UTF-8 string

The regexp in pdf-core assumes it's working with an ASCII string, so we have to be sure to supply it with one.

mojavelinux added a commit to mojavelinux/asciidoctor-pdf that referenced this issue Aug 7, 2016
…haracters outside ASCII range in hex

- fix cross references for IDs that contain characters outside the ASCII range
- squelch the following warning: regexp match /.../n against to UTF-8 string
mojavelinux added a commit to mojavelinux/asciidoctor-pdf that referenced this issue Aug 8, 2016
…haracters outside ASCII range in hex

- hex encode anchors that contain characters outside the ASCII range
- squelch the following warning: regexp match /.../n against to UTF-8 string (pdf-core < 0.6.1)
@llouislu
Copy link
Author

@mojavelinux Thanks for your effort! By the way, how did you debug the reference issue you mentioned as following?

failed to look up _​中文_chinese_unclickable
failed to look up _​カタカナ_​片仮名_japanese_unclickable

@mojavelinux
Copy link
Member

Do you mean how did I test it?

@llouislu
Copy link
Author

@mojavelinux Yep. What tool did you give the error message?

@mojavelinux
Copy link
Member

Those error messages come from evince (the PDF viewer in Gnome).

fapdash pushed a commit to vogellacompany/asciidoctor-pdf that referenced this issue Dec 13, 2016
…haracters outside ASCII range in hex (PR asciidoctor#499)

- hex encode anchors that contain characters outside the ASCII range
- squelch the following warning: regexp match /.../n against to UTF-8 string (pdf-core < 0.6.1)
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants