Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Clarify threat model, "XSS first and foremost" #188

Closed
otherdaniel opened this issue Feb 8, 2023 · 20 comments · Fixed by #208
Closed

Clarify threat model, "XSS first and foremost" #188

otherdaniel opened this issue Feb 8, 2023 · 20 comments · Fixed by #208
Labels
write-into-spec resolved discussions that need to be written into the spec

Comments

@otherdaniel
Copy link
Collaborator

This is a follow-up to #183. As I understand that issue, we would like to tighten the Sanitizer threat model to be specifically about XSS / script injection, with anything else being optional. This would have several consequences for the API in general:

Baseline and default would be XSS-focused, and would not be different concepts anymore. The distinction would just drop out. #183 covers this.

Additional cleanup items:

  • There are several config options - allowComments, allowCustomElements, allowUnknownMarkup - which do not seem to defend against XSS-y threats. In this sharpened threat model, they'd default to true?
  • allowing unsafe markup #185 discusses allowShadowRoots. It seems that discussion gravitates towards default-true anyhow, but.. that'd also be default-true?
  • The current spec unconditionally drops some node types, like CDATA or PI, which I think are still relevant to XML documents. In or out?
@annevk
Copy link
Collaborator

annevk commented Feb 14, 2023

The sanitizer should take care of shadow roots. I think it makes sense for it to recurse into them. As there's currently no way to remove a shadow root from a shadow host.

Not dropping PI might allow XSLT (although that wouldn't work in subtrees).

@otherdaniel
Copy link
Collaborator Author

Agree on shadow roots.

Also agree that XSL transformations are XSS-equivalent. As I read it, XSLT wants a PI with a target xml-stylesheet that is a child of the root node before the first element. I'd assume setHTML defined on the Element - or any other entry point that doesn't take a Document - wouldn't be affected.

@annevk
Copy link
Collaborator

annevk commented Feb 17, 2023

Yeah. I was thinking later on that we should "Assert" https://infra.spec.whatwg.org/#assertions that the node is not CDATA or PI as the HTML parser cannot return those and for now we'd only run on HTML parser output. When there is an endpoint that can contain those we can carefully evaluate again as we'd also need to consider some other angles at that point anyway.

That leaves your first bullet point and I'm less clear on that.

allowUnknownMarkup seems like a potential forward compatibility problem and as such probably safer to remove by default.

allowCustomElements sits between safe and unsafe. We don't know what the JS implementation will do and how vulnerable it is. I'd lean towards removing by default still.

allowComments was also safer to remove based on discussion I thought. In particular because they have been a common mutated XSS vector in the past.

@otherdaniel
Copy link
Collaborator Author

allowComments / XSS: I believe this problem only occurrs when re-parsing, which the API discourages. If there's any XSS that involves comments but doesn't involve re-parsing, I'd be eager to learn more.

custom elements: I see the point, and it makes sense to me, but this surely isn't "XSS first and foremost".

unknown markup/compatibility: This one I just don't understand. If browsers add new markup that is inherently script-y, e.g. a new event handler, then -- regardless of compatibility -- the Sanitizer will drop it. There is no way to remain compatible and still maintain the security properties. In all other cases -- new markup that isn't inherently script-y -- it would seem that default-true would be the option that preserves existing behaviour.


What I'm trying to figure out with this issue is the threat model we're trying to uphold. What I understood from issue #183 is that we re-focus on XSS, and drop any of the not-XSS-but-maybe-drop-anyway bits, at least from the default config. (I'll quote from that discussion: "We all know there is a shared understanding of the threat model 'no XSS'. Everything else seems to be quite app-specific with a lot of 'grey-area'." )

What I understand here is that... all the "grey-area" is back. And what I understood from issue #185 is that apparently it's all optional anyhow.

I really think we need to have a shared consensus on the threat model and security requirements for the sanitizer. I increasingly view this as a blocker for this whole project.

@annevk
Copy link
Collaborator

annevk commented Feb 23, 2023

Let's be clear, unless Chrome unships this I don't think you are in a position to talk about blockers.

Personally I think there is some room for nuance here, and erring on the safe side seems better unless there is a clear benefit to not doing so, which I don't think there is here.

@annevk
Copy link
Collaborator

annevk commented Mar 22, 2023

It was clarified to me that allowCustomElements and allowUnknownMarkup on their own have no effect as the relevant elements would still need to be safelisted. I don't think we need separate options for them in that case and can just rely on the safelist directly. So I would be okay with removing those two features provided you still need to explicitly safelist any applicable elements.

allowComments is different as that directly impacts what ends up being sanitized.

@Sora2455
Copy link

Don't allowCustomElements and allowUnknownMarkup change how the sanitizer behaves when an element blacklist is being used?

I agree that they have no effect when a whitelist is being used, but...

@annevk
Copy link
Collaborator

annevk commented Mar 22, 2023

You mean blocklist and allow/safelist? I don't see that in https://wicg.github.io/sanitizer-api/#sanitize-action-for-an-element.

@Sora2455
Copy link

It's not explicitly mentioned in the steps, but currently (assuming a blocklist is being used):

  1. If the element is unknown and/or custom and the settings are false, it is dropped
  2. Then elements on the blocklist are dropped

If the settings are removed or defaulted to true, then custom/unknown elements not on the blocklist will then be allowed through, which is a behaviour change. If that's acceptable, then fine, I just thought your comment of "allowCustomElements and allowUnknownMarkup on their own have no effect" needed clarification.

(Though admittedly on review Chrome/Edge use an allowlist by default, so if those are the only config options you are changing then indeed they don't matter.)

@annevk
Copy link
Collaborator

annevk commented May 17, 2023

In the meeting it was clarified that the issue around XSS is essentially whether we overall have a safelist or a blocklist by default (for both elements and attributes alike).

I very much think we should have a safelist.

  1. This is safer.
    1. If websites store the result of filtering, their past filtering might have become insecure in newer user agents that support a new feature.
    2. If websites have semantics associated with data-* or custom elements and forget to filter them, they will run into issues.
  2. This generally nudges people to avoid patterns that end up being incompatible with the evolution of the web.

I think it's okay to have an opt-in to allow custom elements, unknown elements, and unknown attributes, although I'm less keen on unknown elements and attributes given the forward extensibility risk they present.

I suggest we offer tri-states to configure each of them: not allowed, only allowed when explicitly safelisted, and all allowed. If you set all of these to "all allowed" you'd essentially end up with the blocklist. Given the safety concerns with that perhaps the naming should reflect that as well.


I think the same logic applies to comments, though a tri-state doesn't make sense for them. (Although maybe if we end up adding node parts...)

@koto
Copy link

koto commented May 17, 2023

I think it's okay to have an opt-in to allow custom elements, unknown elements, and unknown attributes

In your opinion, is it ok to have an opt-it to allow all custom elements, without needing to specify them? Same question for the other groups.

@annevk
Copy link
Collaborator

annevk commented May 17, 2023

That's what the tri-state I mentioned offers, but I'm less convinced it's a good idea for unknown elements/attributes. Seems very reasonable for custom elements though.

@koto
Copy link

koto commented May 18, 2023

I think that makes sense, thanks for clarifying. The important part for me is that it is possible to specify a config that e.g. enables all unknown attributes, or unknown elements while still being able to use the safe variant of the entry point (because that's not directly enabling XSS via platform features), which would now imply that default != baseline.

To bikeshed a bit more - I'm not sure if we shouldn't have a separate tri-state group for data- attributes (vs controlling them together with all unknown attributes) - or just allow data-* always, as there is are no extensibility constraints for them, and for XSS they are just script gadgety in a way any known element or attribute can be.

@annevk
Copy link
Collaborator

annevk commented May 18, 2023

I think being able to control data-* makes sense given their special place in the language.

@otherdaniel
Copy link
Collaborator Author

IIRC, we mentioned this in one of the recent meetings as still having several open sub-issues. My current take is:

  • The core issue is resolved, by having two versions of each operation (safe + unsafe).
  • XML-ish things (PIs, CDATA, etc) are, I believe, resolved by parsing as HTML, even in XML documents.

I think open issues are:

  • config options to allow all of data-*, or all unknown elements, or all custom elements.
  • allowShadowRoots: The unsafe versions allow shadow roots by default; I think that is resolved (in the HTML spec). I think we dropped allowShadowRoots option from the examples at some point, but I don't quite remember why.
  • allowComments, default for the safe versions. (I don't have a strong argument either way.)

I changed my opinion on allowCustomElements / allowUnknownMarkup a little:

A while ago, colleagues analyzed sanitizer usage "in the wild", and I circled back with them: There were several usages for explicitly allowing any data-* attributes. We did have one case that allowed any custom element. (That app had a separate mechanism to make sure that only "good" custom elements were allowed. That is, they didn't fold custom-element checking into the sanitizer; they did that elsewhere. But that required the elements to be passed through in the sanitizer step.) We didn't see a use-case that relied on allowing any unknown element.

I do think the dual safe/unsafe methods make this issue much easier, in that users or libraries can still build the functionality they want (using the unsafe variant; but then need to take the responsibilities for the security properties themselves. This didn't exist when we originally discussed this.) This IMHO makes it much easier to do the simple thing first, and potentially expand the API later when we have more insight into real-world usage.

@mozfreddyb
Copy link
Collaborator

Looks like we discussed this in a synchronous meeting but never wrote it down here. Sigh.

data-
These attributes can be harmful. But it's application specific ("script gadgets"). We can't know and decide a priori. We don't want to disallow/allow all by default without a useful switch. The compromise is to support a boolean toggle for all data- attributes rather than some wildcard matching.

comments
Disallowed by default. Prevents some cases of mutation XSS (aka MXSS), but OK to allow them in the safe setHTML() function.

Unknown markup (elements and attributes)
Authors need to explicitly list your custom and unknown elements/attrs in your config.

Custom Elements
Same as above.

Another note: Ergonomics of (dis)allowing non-finite sets (e.g., data- and custom elements)
data- attributes are the only thing requiring a wildcard.
-> We want to have a global toggle

@mozfreddyb mozfreddyb added the write-into-spec resolved discussions that need to be written into the spec label Jan 23, 2024
@mozfreddyb
Copy link
Collaborator

With the summary above, I think we can we label this issue as "ready to be written in the spec".

@annevk
Copy link
Collaborator

annevk commented Jan 23, 2024

I'm pretty sure the reason we decided to add a toggle for data-* attributes in particular was because they were a common use case in existing sanitizers. So we felt like we couldn't require web developers to enumerate all data-* attributes their application would use, as they are required to do for custom and unknown markup.

(I'm writing this down here as I was again questioning why we ended up where we ended up.)

@annevk
Copy link
Collaborator

annevk commented Jan 23, 2024

@mozfreddyb the one thing we haven't settled on is the exact API (IDL) to use, but I suppose that can be discussed as part of a PR.

@mozfreddyb
Copy link
Collaborator

mozfreddyb commented Feb 29, 2024

You're right. Some of thus discussed this again in our Feb 21st meeting, and aligned on dataAttributes and the upcoming PR is going to use that.

The reasoning being:

  • The spec calls them "custom data attribute" in https://html.spec.whatwg.org/multipage/dom.html#attr-data-*.
  • MDN and other dev docs we found just call them "data attributes"
  • There are no "non-custom" data attributes. The custom is basically applied.
  • We do not use block/allow verbs in our config language.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
write-into-spec resolved discussions that need to be written into the spec
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants