Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

problem file encoding (Umlaute) for external PlantUML diagrams #586

Open
2 of 3 tasks
wumpz opened this issue Jul 15, 2022 · 4 comments
Open
2 of 3 tasks

problem file encoding (Umlaute) for external PlantUML diagrams #586

wumpz opened this issue Jul 15, 2022 · 4 comments

Comments

@wumpz
Copy link

wumpz commented Jul 15, 2022

  • Bug report
  • Feature request
  • Question

I am not sure, if this is the right place or the asciidoctor-diagram project. So hopefully here is the right one.

My maven projects source code is / should be completely UTF-8. Now I want to build a maven site and the pages should be asciidoctor files and integrate an PlantUML diagram, which comes from a file. This diagram is generated but seems to have always the wrong encoding but the internal diagrams are correct.

So how do I tell asciidoctor, that this diagram files should be UTF-8?

What I did / tried so far:

  1. changed file.encoding while starting maven (-Dfile.encoding=UTF-8)
  2. defined project source encoding in maven
  3. defined project reporting encoding in maven
  4. different Java versions
  5. tried to configure default_external parameter, which had no effect
  6. changed defined project encodings, to get some change

BTW my environment is Windows 11, Java 8, 11, 17, Maven 3.6, 3.8.

I attached a minimal maven project (asciidoctor1.zip) . Just run site:site or look into the target directory I sent.

Look into target/site directory:

  • diag-....png is correct. It is defined using UTF-8 in overview.adoc image

  • test_class_utf8.png is wrong. It is defined using UTF-8 in test_class_utf8.puml image

  • test_class_cp1252.png is correct. It is defined using CP1252 in test_class_cp1252.puml image

So it seems that asciidoctor (diagrams) tries to always use Cp1252 for external PlantUML files, which is strange, since I already reset file encoding to UTF-8.

So what did I wrong?

@abelsromero
Copy link
Member

There's something here, but I need to setup a Windows vm, so it may take some extra time to answer.

Files should already be UTF-8, Asciidoctor does not understand other encodings, and in non-Win OSs the example just crashes when processing the cp1252 file. Why in Windows cp1252 works and utf-8 is what I need to research, we only use project.build.sourceEncoding to copy resources which you don't do in the example.

I understand that the end goal is to have all files in UTF-8 right? mixing encodings is not going to work ever.

@wumpz
Copy link
Author

wumpz commented Jul 15, 2022

Right. All should be UTF-8. I just included this cp1252 to test and got lucky. However using ISO-8859-1 works as well, same encoding at least for those characters.

If you remove this cp1252 stuff does a non Windows machine render the utf pumls right?

@abelsromero
Copy link
Member

If you remove this cp1252 stuff does a non Windows machine render the utf pumls right?

Yes.
In fact non-Windows (testing MacOs now) totally crash with org.jruby.exceptions.ArgumentError: (ArgumentError) asciidoctor: FAILED: <stdin>: Failed to load AsciiDoc document - invalid byte sequence in UTF-8. That's a common thing for ppl to ask about asciidoctor, you can find several reports googling for it.

That's why I am pluzzed that you get the opposite effect and need to do research. I know Windows does not crash, but using cp1252 as default 🤔

@wumpz
Copy link
Author

wumpz commented Jul 15, 2022

Strange. This should be the same as starting java with -Dfile.encoding=UTF-8. Is there another instance of JVM started somehow in the rendering process? At the moment in windows Cp1252 is the standard encoding in Java but in Linux and MacOs its UTF-8.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants