Clarify threat model, "XSS first and foremost" #188

otherdaniel · 2023-02-08T15:47:35Z

This is a follow-up to #183. As I understand that issue, we would like to tighten the Sanitizer threat model to be specifically about XSS / script injection, with anything else being optional. This would have several consequences for the API in general:

Baseline and default would be XSS-focused, and would not be different concepts anymore. The distinction would just drop out. #183 covers this.

Additional cleanup items:

There are several config options - allowComments, allowCustomElements, allowUnknownMarkup - which do not seem to defend against XSS-y threats. In this sharpened threat model, they'd default to true?
allowing unsafe markup #185 discusses allowShadowRoots. It seems that discussion gravitates towards default-true anyhow, but.. that'd also be default-true?
The current spec unconditionally drops some node types, like CDATA or PI, which I think are still relevant to XML documents. In or out?

The text was updated successfully, but these errors were encountered:

annevk · 2023-02-14T12:50:20Z

The sanitizer should take care of shadow roots. I think it makes sense for it to recurse into them. As there's currently no way to remove a shadow root from a shadow host.

Not dropping PI might allow XSLT (although that wouldn't work in subtrees).

otherdaniel · 2023-02-17T13:17:59Z

Agree on shadow roots.

Also agree that XSL transformations are XSS-equivalent. As I read it, XSLT wants a PI with a target xml-stylesheet that is a child of the root node before the first element. I'd assume setHTML defined on the Element - or any other entry point that doesn't take a Document - wouldn't be affected.

annevk · 2023-02-17T13:30:48Z

Yeah. I was thinking later on that we should "Assert" https://infra.spec.whatwg.org/#assertions that the node is not CDATA or PI as the HTML parser cannot return those and for now we'd only run on HTML parser output. When there is an endpoint that can contain those we can carefully evaluate again as we'd also need to consider some other angles at that point anyway.

That leaves your first bullet point and I'm less clear on that.

allowUnknownMarkup seems like a potential forward compatibility problem and as such probably safer to remove by default.

allowCustomElements sits between safe and unsafe. We don't know what the JS implementation will do and how vulnerable it is. I'd lean towards removing by default still.

allowComments was also safer to remove based on discussion I thought. In particular because they have been a common mutated XSS vector in the past.

otherdaniel · 2023-02-23T17:29:36Z

allowComments / XSS: I believe this problem only occurrs when re-parsing, which the API discourages. If there's any XSS that involves comments but doesn't involve re-parsing, I'd be eager to learn more.

custom elements: I see the point, and it makes sense to me, but this surely isn't "XSS first and foremost".

unknown markup/compatibility: This one I just don't understand. If browsers add new markup that is inherently script-y, e.g. a new event handler, then -- regardless of compatibility -- the Sanitizer will drop it. There is no way to remain compatible and still maintain the security properties. In all other cases -- new markup that isn't inherently script-y -- it would seem that default-true would be the option that preserves existing behaviour.

What I'm trying to figure out with this issue is the threat model we're trying to uphold. What I understood from issue #183 is that we re-focus on XSS, and drop any of the not-XSS-but-maybe-drop-anyway bits, at least from the default config. (I'll quote from that discussion: "We all know there is a shared understanding of the threat model 'no XSS'. Everything else seems to be quite app-specific with a lot of 'grey-area'." )

What I understand here is that... all the "grey-area" is back. And what I understood from issue #185 is that apparently it's all optional anyhow.

I really think we need to have a shared consensus on the threat model and security requirements for the sanitizer. I increasingly view this as a blocker for this whole project.

annevk · 2023-02-23T17:38:55Z

Let's be clear, unless Chrome unships this I don't think you are in a position to talk about blockers.

Personally I think there is some room for nuance here, and erring on the safe side seems better unless there is a clear benefit to not doing so, which I don't think there is here.

annevk · 2023-03-22T09:45:09Z

It was clarified to me that allowCustomElements and allowUnknownMarkup on their own have no effect as the relevant elements would still need to be safelisted. I don't think we need separate options for them in that case and can just rely on the safelist directly. So I would be okay with removing those two features provided you still need to explicitly safelist any applicable elements.

allowComments is different as that directly impacts what ends up being sanitized.

Sora2455 · 2023-03-22T10:07:28Z

Don't allowCustomElements and allowUnknownMarkup change how the sanitizer behaves when an element blacklist is being used?

I agree that they have no effect when a whitelist is being used, but...

annevk · 2023-03-22T10:31:01Z

You mean blocklist and allow/safelist? I don't see that in https://wicg.github.io/sanitizer-api/#sanitize-action-for-an-element.

Sora2455 · 2023-03-22T11:03:14Z

It's not explicitly mentioned in the steps, but currently (assuming a blocklist is being used):

If the element is unknown and/or custom and the settings are false, it is dropped
Then elements on the blocklist are dropped

If the settings are removed or defaulted to true, then custom/unknown elements not on the blocklist will then be allowed through, which is a behaviour change. If that's acceptable, then fine, I just thought your comment of "allowCustomElements and allowUnknownMarkup on their own have no effect" needed clarification.

(Though admittedly on review Chrome/Edge use an allowlist by default, so if those are the only config options you are changing then indeed they don't matter.)

annevk · 2023-05-17T09:23:11Z

In the meeting it was clarified that the issue around XSS is essentially whether we overall have a safelist or a blocklist by default (for both elements and attributes alike).

I very much think we should have a safelist.

This is safer.
1. If websites store the result of filtering, their past filtering might have become insecure in newer user agents that support a new feature.
2. If websites have semantics associated with data-* or custom elements and forget to filter them, they will run into issues.
This generally nudges people to avoid patterns that end up being incompatible with the evolution of the web.

I think it's okay to have an opt-in to allow custom elements, unknown elements, and unknown attributes, although I'm less keen on unknown elements and attributes given the forward extensibility risk they present.

I suggest we offer tri-states to configure each of them: not allowed, only allowed when explicitly safelisted, and all allowed. If you set all of these to "all allowed" you'd essentially end up with the blocklist. Given the safety concerns with that perhaps the naming should reflect that as well.

I think the same logic applies to comments, though a tri-state doesn't make sense for them. (Although maybe if we end up adding node parts...)

koto · 2023-05-17T09:28:12Z

I think it's okay to have an opt-in to allow custom elements, unknown elements, and unknown attributes

In your opinion, is it ok to have an opt-it to allow all custom elements, without needing to specify them? Same question for the other groups.

annevk · 2023-05-17T10:52:34Z

That's what the tri-state I mentioned offers, but I'm less convinced it's a good idea for unknown elements/attributes. Seems very reasonable for custom elements though.

koto · 2023-05-18T05:41:56Z

I think that makes sense, thanks for clarifying. The important part for me is that it is possible to specify a config that e.g. enables all unknown attributes, or unknown elements while still being able to use the safe variant of the entry point (because that's not directly enabling XSS via platform features), which would now imply that default != baseline.

To bikeshed a bit more - I'm not sure if we shouldn't have a separate tri-state group for data- attributes (vs controlling them together with all unknown attributes) - or just allow data-* always, as there is are no extensibility constraints for them, and for XSS they are just script gadgety in a way any known element or attribute can be.

annevk · 2023-05-18T07:51:14Z

I think being able to control data-* makes sense given their special place in the language.

otherdaniel · 2023-11-28T17:16:40Z

IIRC, we mentioned this in one of the recent meetings as still having several open sub-issues. My current take is:

The core issue is resolved, by having two versions of each operation (safe + unsafe).
XML-ish things (PIs, CDATA, etc) are, I believe, resolved by parsing as HTML, even in XML documents.

I think open issues are:

config options to allow all of data-*, or all unknown elements, or all custom elements.
allowShadowRoots: The unsafe versions allow shadow roots by default; I think that is resolved (in the HTML spec). I think we dropped allowShadowRoots option from the examples at some point, but I don't quite remember why.
allowComments, default for the safe versions. (I don't have a strong argument either way.)

I changed my opinion on allowCustomElements / allowUnknownMarkup a little:

A while ago, colleagues analyzed sanitizer usage "in the wild", and I circled back with them: There were several usages for explicitly allowing any data-* attributes. We did have one case that allowed any custom element. (That app had a separate mechanism to make sure that only "good" custom elements were allowed. That is, they didn't fold custom-element checking into the sanitizer; they did that elsewhere. But that required the elements to be passed through in the sanitizer step.) We didn't see a use-case that relied on allowing any unknown element.

I do think the dual safe/unsafe methods make this issue much easier, in that users or libraries can still build the functionality they want (using the unsafe variant; but then need to take the responsibilities for the security properties themselves. This didn't exist when we originally discussed this.) This IMHO makes it much easier to do the simple thing first, and potentially expand the API later when we have more insight into real-world usage.

mozfreddyb · 2024-01-23T11:55:32Z

Looks like we discussed this in a synchronous meeting but never wrote it down here. Sigh.

data-
These attributes can be harmful. But it's application specific ("script gadgets"). We can't know and decide a priori. We don't want to disallow/allow all by default without a useful switch. The compromise is to support a boolean toggle for all data- attributes rather than some wildcard matching.

comments
Disallowed by default. Prevents some cases of mutation XSS (aka MXSS), but OK to allow them in the safe setHTML() function.

Unknown markup (elements and attributes)
Authors need to explicitly list your custom and unknown elements/attrs in your config.

Custom Elements
Same as above.

Another note: Ergonomics of (dis)allowing non-finite sets (e.g., data- and custom elements)
data- attributes are the only thing requiring a wildcard.
-> We want to have a global toggle

mozfreddyb · 2024-01-23T11:56:51Z

With the summary above, I think we can we label this issue as "ready to be written in the spec".

annevk · 2024-01-23T13:39:11Z

I'm pretty sure the reason we decided to add a toggle for data-* attributes in particular was because they were a common use case in existing sanitizers. So we felt like we couldn't require web developers to enumerate all data-* attributes their application would use, as they are required to do for custom and unknown markup.

(I'm writing this down here as I was again questioning why we ended up where we ended up.)

annevk · 2024-01-23T13:40:18Z

@mozfreddyb the one thing we haven't settled on is the exact API (IDL) to use, but I suppose that can be discussed as part of a PR.

mozfreddyb · 2024-02-29T08:54:46Z

You're right. Some of thus discussed this again in our Feb 21st meeting, and aligned on dataAttributes and the upcoming PR is going to use that.

The reasoning being:

The spec calls them "custom data attribute" in https://html.spec.whatwg.org/multipage/dom.html#attr-data-*.
MDN and other dev docs we found just call them "data attributes"
There are no "non-custom" data attributes. The custom is basically applied.
We do not use block/allow verbs in our config language.

annevk mentioned this issue May 5, 2023

First draft: explainer-new. #193

Merged

This was referenced Oct 9, 2023

Add more examples. #196

Merged

New WebIDL syntax with explicit name/namespace #181

Closed

annevk mentioned this issue Oct 18, 2023

Custom data attributes (dataset / data-*) support #191

Closed

mozfreddyb added the write-into-spec resolved discussions that need to be written into the spec label Jan 23, 2024

otherdaniel mentioned this issue Mar 15, 2024

Update the spec to match the current group consensus. #208

Merged

otherdaniel closed this as completed in #208 Mar 19, 2024

otherdaniel closed this as completed in db21b50 Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify threat model, "XSS first and foremost" #188

Clarify threat model, "XSS first and foremost" #188

otherdaniel commented Feb 8, 2023

annevk commented Feb 14, 2023

otherdaniel commented Feb 17, 2023

annevk commented Feb 17, 2023

otherdaniel commented Feb 23, 2023

annevk commented Feb 23, 2023

annevk commented Mar 22, 2023

Sora2455 commented Mar 22, 2023

annevk commented Mar 22, 2023

Sora2455 commented Mar 22, 2023

annevk commented May 17, 2023

koto commented May 17, 2023

annevk commented May 17, 2023 •

edited

Loading

koto commented May 18, 2023

annevk commented May 18, 2023

otherdaniel commented Nov 28, 2023

mozfreddyb commented Jan 23, 2024

mozfreddyb commented Jan 23, 2024

annevk commented Jan 23, 2024

annevk commented Jan 23, 2024

mozfreddyb commented Feb 29, 2024 •

edited

Loading

Clarify threat model, "XSS first and foremost" #188

Clarify threat model, "XSS first and foremost" #188

Comments

otherdaniel commented Feb 8, 2023

annevk commented Feb 14, 2023

otherdaniel commented Feb 17, 2023

annevk commented Feb 17, 2023

otherdaniel commented Feb 23, 2023

annevk commented Feb 23, 2023

annevk commented Mar 22, 2023

Sora2455 commented Mar 22, 2023

annevk commented Mar 22, 2023

Sora2455 commented Mar 22, 2023

annevk commented May 17, 2023

koto commented May 17, 2023

annevk commented May 17, 2023 • edited Loading

koto commented May 18, 2023

annevk commented May 18, 2023

otherdaniel commented Nov 28, 2023

mozfreddyb commented Jan 23, 2024

mozfreddyb commented Jan 23, 2024

annevk commented Jan 23, 2024

annevk commented Jan 23, 2024

mozfreddyb commented Feb 29, 2024 • edited Loading

annevk commented May 17, 2023 •

edited

Loading

mozfreddyb commented Feb 29, 2024 •

edited

Loading