Items in toc are not clickable if those items contain some CJK characters. #481

llouislu · 2016-07-18T14:29:16Z

I render this test document with the command asciidoctor-pdf -r asciidoctor-pdf-cjk-kai_gen_gothic -a pdf-style=KaiGenGothicCN test.asc with the help of asciidoctor-pdf-cjk gem package which provides a library for a good theme and font solution.

Version info:

Asciidoctor PDF 1.5.0.alpha.11 using Asciidoctor 1.5.4 [http://asciidoctor.org]
Runtime Environment (ruby 2.2.5p319 (2016-04-26 revision 54774) [x86_64-cygwin]) (lc:UTF-8 fs:GBK in:- ex:UTF-8)

test.asc

= test
:doctype:   article
:docinfo:
:toc:
:toclevels: 3
:toc-title: 目录(Table of Contents)


== English clickable

== 中文 Chinese unclickable

== 한국어 Korean clickable

== カタカナ 片仮名 Japanese unclickable

You can also get my rendered PDF here.

The text was updated successfully, but these errors were encountered:

mojavelinux · 2016-08-07T20:58:04Z

This appears to be the same problem as #308.

mojavelinux · 2016-08-07T20:58:46Z

When I click the two unclickable links, I get the following warnings in the console:

failed to look up _中文_chinese_unclickable
failed to look up _カタカナ_片仮名_japanese_unclickable

mojavelinux · 2016-08-07T21:00:44Z

As you can see, PDF cannot reference internal IDs that contain characters beyond a certain range (I'm not sure what that range is). As I mention in #308, we'll need to generate internal IDs using a different strategy that adheres to the PDF specification.

mojavelinux · 2016-08-07T21:50:27Z

Here's the document I'm using for testing:

= CJK References
:doctype: book
:toc:
:toclevels: 3
:toc-title: 目录

== English

content

== 中文 Chinese

content

== 한국어 Korean

content

== カタカナ 片仮名 Japanese

content

See <<_中文_chinese>>.

mojavelinux · 2016-08-07T22:25:23Z

I think the solution is to convert the anchor to hex if a call to ascii_only? returns false. This must be done both where the anchor is defined and where the anchor is referenced. I suppose we could do this for all anchor, but let's start by just doing it when necessary.

In the case of 中文 Chinese, the anchor will become:

0x5fe4b8ade696875f6368696e657365

I decided to prefix the hex value with 0x to distinguish it from an ASCII ID.

This change will also squash the following warning:

warning: regexp match /.../n against to UTF-8 string

The regexp in pdf-core assumes it's working with an ASCII string, so we have to be sure to supply it with one.

…haracters outside ASCII range in hex - fix cross references for IDs that contain characters outside the ASCII range - squelch the following warning: regexp match /.../n against to UTF-8 string

…haracters outside ASCII range in hex - hex encode anchors that contain characters outside the ASCII range - squelch the following warning: regexp match /.../n against to UTF-8 string (pdf-core < 0.6.1)

llouislu · 2016-08-17T07:53:28Z

@mojavelinux Thanks for your effort! By the way, how did you debug the reference issue you mentioned as following?

failed to look up _中文_chinese_unclickable
failed to look up _カタカナ_片仮名_japanese_unclickable

mojavelinux · 2016-08-17T08:17:50Z

Do you mean how did I test it?

llouislu · 2016-08-17T10:10:48Z

@mojavelinux Yep. What tool did you give the error message?

mojavelinux · 2016-08-17T10:22:23Z

Those error messages come from evince (the PDF viewer in Gnome).

…haracters outside ASCII range in hex (PR asciidoctor#499) - hex encode anchors that contain characters outside the ASCII range - squelch the following warning: regexp match /.../n against to UTF-8 string (pdf-core < 0.6.1)

mojavelinux added this to the v1.5.0.alpha.13 milestone Aug 7, 2016

mojavelinux self-assigned this Aug 7, 2016

mojavelinux added compliance bug labels Aug 7, 2016

mojavelinux added the in progress label Aug 7, 2016

mojavelinux closed this as completed in c3955c5 Aug 8, 2016

mojavelinux removed the in progress label Aug 8, 2016

mojavelinux mentioned this issue Jul 12, 2017

Non-Latin internal cross references doesn't work #833

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Items in toc are not clickable if those items contain some CJK characters. #481

Items in toc are not clickable if those items contain some CJK characters. #481

llouislu commented Jul 18, 2016

mojavelinux commented Aug 7, 2016

mojavelinux commented Aug 7, 2016

mojavelinux commented Aug 7, 2016

mojavelinux commented Aug 7, 2016 •

edited

Loading

mojavelinux commented Aug 7, 2016

llouislu commented Aug 17, 2016

mojavelinux commented Aug 17, 2016

llouislu commented Aug 17, 2016

mojavelinux commented Aug 17, 2016

Items in toc are not clickable if those items contain some CJK characters. #481

Items in toc are not clickable if those items contain some CJK characters. #481

Comments

llouislu commented Jul 18, 2016

mojavelinux commented Aug 7, 2016

mojavelinux commented Aug 7, 2016

mojavelinux commented Aug 7, 2016

mojavelinux commented Aug 7, 2016 • edited Loading

mojavelinux commented Aug 7, 2016

llouislu commented Aug 17, 2016

mojavelinux commented Aug 17, 2016

llouislu commented Aug 17, 2016

mojavelinux commented Aug 17, 2016

mojavelinux commented Aug 7, 2016 •

edited

Loading