Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Semicolons should be legal in URL #2382

Closed
Siskin-Bot opened this issue Feb 15, 2020 · 1 comment
Closed

Semicolons should be legal in URL #2382

Siskin-Bot opened this issue Feb 15, 2020 · 1 comment

Comments

@Siskin-Bot
Copy link
Collaborator

Submitted by: Hostilefork

Semicolons in URLs are apparently legal:

https://stackoverflow.com/questions/1178024/can-a-url-contain-a-semi-colon

However, Rebol doesn't consider them to be part of a URL! when LOAD-ing, because the semicolon acts as a to-end-of-line comment.

r3-alpha>> u: http://example.com/foo;bar
== http://example.com/foo

Proposal would be that since a URL is delimited at its end by whitespace, that until whitespace is seen all characters are considered part of the content. This would match how a string doesn't consider a semicolon to be a comment if it is inside its delimiters, e.g. {foo ; not a comment}


Imported from: metaeducation#2381

Comments:

Oldes commented on Jun 14, 2019:

It's not just a semicolon... Rebol stops also with any of the delimiter chars, like [ and (

>> load {http://httpbin.org/get?q=foo()boo}
== [http://httpbin.org/get?q=foo () boo]

But I'm quite not sure if I like this proposal, because semicolon and other mentioned chars should be url-encoded, when you want to load it and if you have input from other sources, you should validate it anyway. One can always use this:

>> u: to-url {http://httpbin.org/get?q=foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> u: append http://httpbin.org/get?q= {foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> form u
== "http://httpbin.org/get?q=foo;[()]boo"

On the other side, the change may not be breaking too much existing data/code. But it is still change in the lexer, which I try to avoid personally.


Hostilefork commented on Jun 14, 2019:

Good point about the brackets...although with the plan that most working on R3-Alpha code had agreed on, only ] and ) would be able to terminate a token. There would be 4 exceptions: ][, )(, ](, and )[. The idea that it would provide more lexical expansion possibilities in the future, if you could someday define what xy"abc" meant as being different from xy "abc".

metaeducation#2094

But we don't want to sacrifice [1 2 http://3] as meaning the expected thing.

I feel like the other compromises in Rebol, like saying {a {b} c} is a legal string, may justify something like [1 2 http://httpbin.org/get?q=foo;[()]boo] working as a 3-element block, with a complete URL.

But one thing we can do to punt on the question is to just make the sequence illegal for now. If you see http://foo[ or http://foo; then make that an error. We are planning errors on things like [3()4] anyway.


IngoHohmann commented on Feb 5, 2020:

>> to text! read to url! {https://httpbin.org/anything?a={"x":"y"}} 
== {{
  "args": {
    "a": ""
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13d1-1b00ba3c41d343bcd6626578"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a="
}
}

It works, if it is run through ENHEX.

>> to text! read to url! enhex {https://httpbin.org/anything?a={"x":"y"}}
== {{
  "args": {
    "a": "{\"x\":\"y\"}"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13ea-462eb19c983b8d68a906994c"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a={\"x\":\"y\"}"
}
}

According to https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding braces do not need to be encoded.

Copied here from: metaeducation/ren-c#1046
See also #2207, #1644
In #2012 also #1327, #1333 and #1644 are mentioned.


IngoHohmann mentioned this issue on Feb 5, 2020:
url!s are cut at curly braces when reading


Oldes commented on Feb 6, 2020:

@IngoHohmann your example seems to be working in my branch:
image


Oldes commented on Feb 6, 2020:

@IngoHohmann btw... I would use something like this:

read join https://httpbin.org/anything? {a={"x":"y"}}

instead:

read to url! {https://httpbin.org/anything?a={"x":"y"}}

And when posting issues here, you could try to use Rebol code... to text! is Ren-C's feature.


@Oldes
Copy link
Owner

Oldes commented Feb 17, 2020

With recent addition of the new as native (Oldes/Rebol3@d27e4b1), it is now possible also:

>> as url! "http://example.com/foo;bar"
== http://example.com/foo%3Bbar

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants