Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

URL scheme characters admitted by DECODE-URL more restrictive than those admitted by TRANSCODE #1327

Open
Siskin-Bot opened this issue Feb 15, 2020 · 2 comments

Comments

@Siskin-Bot
Copy link
Collaborator

Siskin-Bot commented Feb 15, 2020

Submitted by: meijeru

DECODE-URL parses the scheme part of an URL (before :) with the following charset:
A - Z a - z 0 - 9 + - . (this is in accordance with RFC 1738).
Then it does TO-LIT-WORD, which eliminates the case of an initial digit or + -, which seems to be allowed by RFC 1738.
TRANSCODE (i.e. the lexical scan) admits the following characters before the characteristic : of a URL! literal:

in initial position A - Z a-z ! & = ? * . ^ _ ` | ~ (note the absence of digits and + -).

In subsequent positions: anything from ! to ~ except [ ] { } ( ) " / :
Thus TRANSCODE is much more permissive than either RFC 1738 or DECODE-URL.

The restrictions mentioned would merit documenting, I think.


Imported from: CureCode [ Version: alpha 94 Type: Issue Platform: All Category: Datatype Reproduce: Always Fixed-in:none ]
Imported from: metaeducation#1327

Comments:


Rebolbot mentioned this issue on Jan 12, 2016:
DECODE-URL and url! syntax don't obey the url encoding rules
TO-URL returns incorrect value for % symbol
LOAD does not handle correctly URLs containing encoded delimiters
MOLD does not handle correctly URLs containing encoded characters
wrong error on load "1abcde"


IngoHohmann mentioned this issue on Feb 5, 2020:
Semicolons should be legal in URL


@Oldes
Copy link
Owner

Oldes commented Apr 13, 2022

@meijeru and what do you recommend to do?

  1. Limit the transcode? I think it would be a shame.
  2. Make the decode-url more permissive?

@meijeru
Copy link

meijeru commented Apr 13, 2022

What about making transcode and decode-url both conform to RFC-1738?

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants