Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Hidden unicode characters (Trojan Source) #4576

Closed
chschommer opened this issue Nov 4, 2021 · 3 comments
Closed

Hidden unicode characters (Trojan Source) #4576

chschommer opened this issue Nov 4, 2021 · 3 comments

Comments

@chschommer
Copy link

chschommer commented Nov 4, 2021

Hi,

in light of the recent "Trojan Source" publication about hidden Unicode control characters doing bad things, I checked how the Ace Editor handles this.
Screenshot 2021-11-04 at 09 39 20

As you can see, you really can not see that something is wrong. Yes, the comment looks a little bit strange, but if you are someone who just copy and pastes scripts, you might not familiar with this.

We tried to show the hidden characters but this does not really improve the situation:
Screenshot 2021-11-03 at 14 04 21

I wonder if it is possible to show the hidden Unicode characters with Ace. E.g., like Atlassian handles this now:
Screenshot 2021-11-04 at 09 41 45

Is there any way to achieve this with the current version of Ace?

Thanks in advance!

@chschommer chschommer changed the title Hidden unicode characters Hidden unicode characters (Trojan Source) Nov 4, 2021
@nightwing
Copy link
Member

Do you have a list of such characters? Displaying them as red dots would be very easy. Displaying code the way Atlassian does will be harder but should be possible as well.

@chschommer
Copy link
Author

chschommer commented Nov 5, 2021

Yes, there is a list. Red dots or something similar is an interesting idea. Basically an obvious hint that something is wrong/strange with the script. Obviously, there are legit cases for these characters in code. What I am more concerned of is copy&pasting code from the Internet or if someone sends you code that has been altered for bad causes.

Abbreviation Code Point Name Description
LRE U+202A Left-to-Right Embedding Try treating following text as left-to-right.
RLE U+202B Right-to-Left Embedding Try treating following text as right-to-left.
LRO U+202D Left-to-Right Override Force treating following text as left-to-right
RLO U+202E Right-to-Left Override Force treating following text as right-to-left.
LRI U+2066 Left-to-Right Isolate Force treating following text as left-to-right without affecting adjacent text.
RLI U+2067 Right-to-Left Isolate Force treating following text as right-to-left without affecting adjacent text.
FSI U+2068 First Strong Isolate Force treating following text in direction indicated by the next character.
PDF U+202C Pop Directional Formatting Terminate nearest LRE, RLE, LRO, or RLO.
PDI U+2069 Pop Directional Isolate Terminate nearest LRI or RLI.

@andrewnester
Copy link
Contributor

Fixed by this PR and release in 1.15.0 version #4693

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants