Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Semicolons should be legal in URL #2381

Open
hostilefork opened this issue Jun 14, 2019 · 5 comments
Open

Semicolons should be legal in URL #2381

hostilefork opened this issue Jun 14, 2019 · 5 comments

Comments

@hostilefork
Copy link
Member

Semicolons in URLs are apparently legal:

https://stackoverflow.com/questions/1178024/can-a-url-contain-a-semi-colon

However, Rebol doesn't consider them to be part of a URL! when LOAD-ing, because the semicolon acts as a to-end-of-line comment.

r3-alpha>> u: http://example.com/foo;bar
== http://example.com/foo

Proposal would be that since a URL is delimited at its end by whitespace, that until whitespace is seen all characters are considered part of the content. This would match how a string doesn't consider a semicolon to be a comment if it is inside its delimiters, e.g. {foo ; not a comment}

@Oldes
Copy link

Oldes commented Jun 14, 2019

It's not just a semicolon... Rebol stops also with any of the delimiter chars, like [ and (

>> load {http://httpbin.org/get?q=foo()boo}
== [http://httpbin.org/get?q=foo () boo]

But I'm quite not sure if I like this proposal, because semicolon and other mentioned chars should be url-encoded, when you want to load it and if you have input from other sources, you should validate it anyway. One can always use this:

>> u: to-url {http://httpbin.org/get?q=foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> u: append http://httpbin.org/get?q= {foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> form u
== "http://httpbin.org/get?q=foo;[()]boo"

On the other side, the change may not be breaking too much existing data/code. But it is still change in the lexer, which I try to avoid personally.

@hostilefork
Copy link
Member Author

hostilefork commented Jun 14, 2019

Good point about the brackets...although with the plan that most working on R3-Alpha code had agreed on, only ] and ) would be able to terminate a token. There would be 4 exceptions: ][, )(, ](, and )[. The idea that it would provide more lexical expansion possibilities in the future, if you could someday define what xy"abc" meant as being different from xy "abc".

#2094

But we don't want to sacrifice [1 2 http://3] as meaning the expected thing.

I feel like the other compromises in Rebol, like saying {a {b} c} is a legal string, may justify something like [1 2 http://httpbin.org/get?q=foo;[()]boo] working as a 3-element block, with a complete URL.

But one thing we can do to punt on the question is to just make the sequence illegal for now. If you see http://foo[ or http://foo; then make that an error. We are planning errors on things like [3()4] anyway.

@IngoHohmann
Copy link

>> to text! read to url! {https://httpbin.org/anything?a={"x":"y"}} 
== {{
  "args": {
    "a": ""
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13d1-1b00ba3c41d343bcd6626578"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a="
}
}

It works, if it is run through ENHEX.

>> to text! read to url! enhex {https://httpbin.org/anything?a={"x":"y"}}
== {{
  "args": {
    "a": "{\"x\":\"y\"}"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13ea-462eb19c983b8d68a906994c"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a={\"x\":\"y\"}"
}
}

According to https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding braces do not need to be encoded.

Copied here from: metaeducation/ren-c#1046
See also #2207, #1644
In #2012 also #1327, #1333 and #1644 are mentioned.

@Oldes
Copy link

Oldes commented Feb 6, 2020

@IngoHohmann your example seems to be working in my branch:
image

@Oldes
Copy link

Oldes commented Feb 6, 2020

@IngoHohmann btw... I would use something like this:

read join https://httpbin.org/anything? {a={"x":"y"}}

instead:

read to url! {https://httpbin.org/anything?a={"x":"y"}}

And when posting issues here, you could try to use Rebol code... to text! is Ren-C's feature.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants