Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Identity Server Security Bug: Bad HTML filtering regexp #22

Open
4 tasks done
alexhiggins732 opened this issue Feb 16, 2024 · 1 comment · Fixed by #34
Open
4 tasks done

Identity Server Security Bug: Bad HTML filtering regexp #22

alexhiggins732 opened this issue Feb 16, 2024 · 1 comment · Fixed by #34
Assignees
Labels
bug Something isn't working dependencies Pull requests that update a dependency file

Comments

@alexhiggins732
Copy link
Owner

Bad HTML filtering regexp

The original Identity Server 4 code base has several security bugs detected by CodeQL scanning violating this rule.

Description:

It is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well it might be possible to circumvent it, which can lead to cross-site scripting or other security issues.

Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors.

Examples

Tool Rule ID Source
CodeQL js/redos
/dist/jquery.js#L5960-L5960
	// checked="checked" or checked
	rchecked = /checked\s*(?:[^=]|=\s*.checked.)/i,
	rcleanScript = /^\s*<!(?:\[CDATA\[|--)|(?:\]\]|--)>\s*$/g;

Issues

Recommendation

Use a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation.

Example

The following example attempts to filters out all <script> tags.

function filterScript(html) {
var scriptRegex = /<script\b[^>]>([\s\S]?)</script>/gi;
var match;
while ((match = scriptRegex.exec(html)) !== null) {
html = html.replace(match[0], match[1]);
}
return html;
}

The above sanitizer does not filter out all <script> tags. Browsers will not only accept </script> as script end tags, but also tags such as </script foo="bar"> even though it is a parser error. This means that an attack string such as <script>alert(1)</script foo="bar"> will not be filtered by the function, and alert(1) will be executed by a browser if the string is rendered as HTML.

Other corner cases include that HTML comments can end with --!>, and that HTML tag names can contain upper case characters.

References

@alexhiggins732
Copy link
Owner Author

Security bugs have been patched.

@alexhiggins732 alexhiggins732 added bug Something isn't working dependencies Pull requests that update a dependency file labels Feb 16, 2024
@alexhiggins732 alexhiggins732 self-assigned this Feb 16, 2024
@alexhiggins732 alexhiggins732 linked a pull request Feb 17, 2024 that will close this issue
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant