Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: add allowedCharactersInPath whitelist #39

Merged
merged 11 commits into from
Jan 21, 2025

Conversation

zhongliang02
Copy link
Contributor

@zhongliang02 zhongliang02 commented Jan 14, 2025

Context

In anticipation of future open redirects, we want to increase strictness of URL validation.

Approach

We implement a default pathname character whitelist where the pathname can only consist of [A-Za-z0-9.-_/].
Users are then given the option to adjust or remove this whitelist via an option allowedCharactersInPath.

We also ensure that the pathname can only start with a single slash to prevent schema-relative URLs.

Risks

This may break existing links that contain special characters. Especially mailto links since they contain @.
However, a quick scan of the repos indicate that no one will be affected by this change.
Regardless, the upgrade advisory will include a warning on breaking links with special characters.

Additional changes

The behaviour of parsing invalid URLs with the schema has been changed to throw a ZodError instead of a URLValidationError.
This is in line with expected behaviour from Zod schemas, allowing the usage of .catch seamlessly instead of having to wrap the parse in a try-catch.

On the other hand, the behaviour of the URLValidator class has not been changed, but the error message has been changed.

This PR also refactors some of the code to be more organised.

Additionally some dependencies have been updated with pnpm audit --fix

Testing

Tests have been updated to reflect the new behaviours.
Tests have also been added for the pathname whitelist.
Additionally, the care360 open redirect PoC has been added as a test case.

@zhongliang02 zhongliang02 requested a review from a team January 14, 2025 23:22
@zhongliang02 zhongliang02 marked this pull request as ready for review January 14, 2025 23:45
Copy link
Contributor

@spaceraccoon spaceraccoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits and a quick Q on the allowedchars design!

Comment on lines 25 to 37
const schema = z.string().refine(
(str: string) => {
for (const char of str) {
if (!allowed.has(char)) {
return false
}
}
return true
},
{
message: `Every character must be in ${allowedChars}`,
},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing this, would it make sense to just use .regex instead? The downside is that we will need to have allowedRegex instead and accept a Regex arg instead of string. Not the worst thing in the word, but I'd rather have a regex in than construct it dynamically from a string. Would be a bit more conformant IMO. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mulling over this as well. We did initially discuss doing a z.regex() but I thought the interface felt too weird for the user.

allowedPathCharsRegex: /[a-zA-Z0-9]/

What would be the intended behaviour if the user supplies

allowedPathCharsRegex: /[a-zA-Z0-9]/g
allowedPathCharsRegex: /abcdef/
allowedPathCharsRegex: /[a-zA-Z0-9]*/g

I think it is not intuitive for the user that each character will be matched against the regex.

Copy link
Contributor Author

@zhongliang02 zhongliang02 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more usable interface could be like

validPathRegex: /^[a-zA-Z0-9.-_/]*$/

@zhongliang02
Copy link
Contributor Author

preemptively added validPathRegex so we can visually compare between the two

Copy link
Contributor

@spaceraccoon spaceraccoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the visual comparison, I've going to go with allowedChars approach - Regexes are hard. Even we keep this invisible by default, ejecting may happen in which case devs might have trouble writing a regex or add a reDOS. It's better for dev experience and consequently security to go with chars. Thanks for working on this!

@zhongliang02
Copy link
Contributor Author

zhongliang02 commented Jan 20, 2025

reverted!, need review approval

spaceraccoon
spaceraccoon previously approved these changes Jan 20, 2025
Copy link
Contributor

@spaceraccoon spaceraccoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor code style choice, up to you!

Co-authored-by: Eugene Lim <limzhiweieugene@gmail.com>
@zhongliang02 zhongliang02 merged commit c105c06 into develop Jan 21, 2025
6 checks passed
@zhongliang02 zhongliang02 deleted the feat/add-allowed-characters-in-path branch January 21, 2025 13:55
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants