-
-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add encoding detection callback #2788
Conversation
.findFirst() | ||
.get(); | ||
|
||
assertEquals("\"Привет мир\"", utf8Type.getField("s1").getAssignment().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this expected value :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't read Cyrillic, it means "Hello World" :)
I would like to have available file name+path in encoding detection callback. encoding//src/main/java/spoon/support/visitor/replace/ReplacementVisitor.java=ISO-8859-1 So I can imagine that many clients might be able to select encoding depending on the file name or file path. Of course bytes of file might be good input too! WDYT? But before we design the final API for encoding detection, please note this bug in Spoon: class VirtualFile implements SpoonFile {
...
@Override
public InputStream getContent() {
return new ByteArrayInputStream(content.getBytes());
}
...
} here we convert String value to bytes using system dependent encoding ... and later we convert bytes back to chars using another encoding. ... but how to mix it together with encoding detector?? ;-) @surli @monperrus Do you agree to refactor the related file API here? |
interface SpoonFile {
default char[] getContentChars(Environment env) {
... move loading code from `FileCompilerConfig#initializeCompiler` here...
}
}
class VirtualFile implements SpoonFile {
...
@Override
public char[] getContentChars(Environment env) {
return content.toCharArray();
}
...
} WDYT? |
We can use |
The And I personally would prefer new interface |
So, as far as I understand you want me to add
Am I right? As you can see WDYT? |
if you add "default" implementation in
yes
I believe that this encoding is ignored (The compilation unit has no access to origin byte[] so encoding is useless.), so we can simply provide default encoding from environment. |
@pvojtechovsky, I refactored the API as you suggested. |
byte[] bytes; | ||
try { | ||
InputStream contentStream = getContent(); | ||
bytes = new byte[contentStream.available()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way how to get size of the input stream.
Use ByteArrayOutputStream
, then IOUtils.copy to copy input to output and then ByteArrayOutputStream#toByteArray
@@ -54,8 +56,9 @@ | |||
byte[] bytes; | |||
try { | |||
InputStream contentStream = getContent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contentStream must be closed. The best is:
try (InputStream contentStream = getContent()) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, my bad, thanks.
Looks got to me. Thank You Egor! |
This PR provides ability to specify user-defined callback, which could be used to detect encoding for each file separately. See #2781.
For example I use juniversalchardet lib:
Spoon api:
It helps to have correct AST in projects with mixed encodings.