Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use WHATWG URL parsing, if available, instead of url.parse #2564

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dresende
Copy link
Collaborator

@dresende dresende commented Nov 21, 2023

The strategy:

  1. Check if URL is a function
  2. Check if URL.prototype is an object

If the checks are true, use new URL instead of url.parse.

The only 2 changes I know of, are:

  1. URL has username and password already parsed, no need to check for auth
  2. URL has searchParams instead of query

Closes #2563

@dougwilson
Copy link
Member

Make sure to add a new test case for what you are trying to fix with this. Also curious on the change to the existing test: would this change now break anyone who has a colon in thier password?

@dresende
Copy link
Collaborator Author

Make sure to add a new test case for what you are trying to fix with this. Also curious on the change to the existing test: would this change now break anyone who has a colon in thier password?

That's an interesting question. I did this in kind of a hurry but then figured I could just pass an object and avoid the problem I was having. I'm not in a hurry now, so I'll sure add a test for that. About people using and it breaking, I think I need to change the parser to decode the password when using WHATWG URL as it seems it is always encoded when parsed.

@dresende
Copy link
Collaborator Author

> new URL("foo://user:pa:ss@host")
URL {
  href: 'foo://user:pa%3Ass@host',
  origin: 'null',
  protocol: 'foo:',
  username: 'user',
  password: 'pa%3Ass',
  host: 'host',
  hostname: 'host',
  port: '',
  pathname: '',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
> new URL("foo://user:pa%3Ass@host")
URL {
  href: 'foo://user:pa%3Ass@host',
  origin: 'null',
  protocol: 'foo:',
  username: 'user',
  password: 'pa%3Ass',
  host: 'host',
  hostname: 'host',
  port: '',
  pathname: '',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}

It always encodes, whether it's already encoded or not.

@dresende
Copy link
Collaborator Author

dresende commented Nov 21, 2023

What was failing for me was that I had a % on the password and the URL was not encoded, and because after that character there wasn't necessarily 2 hex characters, url.parse fails because it tries to decode and fails.

In the following example, assume password is pass% (not encoded).

> url.parse("foo://user:pass%@host")
Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)
    at Url.parse (node:url:368:19)
    at Object.urlParse [as parse] (node:url:156:13)
> new URL("foo://user:pass%@host")
URL {
  href: 'foo://user:pass%@host',
  origin: 'null',
  protocol: 'foo:',
  username: 'user',
  password: 'pass%',
  host: 'host',
  hostname: 'host',
  port: '',
  pathname: '',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}

What's failing:

> decodeURIComponent("pass%")
Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)

@dougwilson
Copy link
Member

dougwilson commented Nov 21, 2023

But shouldn't you always need to url encodure your password? What if your password is pa%0foo ?

@dresende
Copy link
Collaborator Author

But shouldn't you always need to url encodure your password? Whay is your password is pa%0foo ?

In that case you should, but according to the spec you don't need to unless it causes ambiguity (like your case). Password must be encoded if it has % and 2 hex chars next to it.

@dougwilson
Copy link
Member

Gotcha. What are we expecting the outcome of mysql://foo:bar%fzo%25ab@localhost to be? I know you said that encoding the % is not required unless there are two hex chars after it, but this one seems weird when put through the code here, as it seems that %25 is ending up in the password instead of it being just a % character, though new URL does parse it now. I think the way decodeURIComponent is used may need to be changed to match how the new URL is working so they are supporting the same semantics.

@dresende
Copy link
Collaborator Author

I think the way decodeURIComponent is used may need to be changed to match how the new URL is working so they are supporting the same semantics.

I don't know the implementation. Do you have any hints? As a funny note, your example URL is parsed in a different way on Chrome. I thought the implementation was the same.

Screenshot 2023-11-21 at 14 59 37

@dresende
Copy link
Collaborator Author

dresende commented Nov 21, 2023

@dougwilson you meant something like this?

function decodeUriComponent(str) {
	return str.replace(/\%([a-f0-9]{2})/ig, (_, hex) => (String.fromCharCode(parseInt(hex, 16))));
}

(but converted to run on older node versions)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Development

Successfully merging this pull request may close these issues.

Updating URL parsing to WHATWG URL
2 participants