Goodbye InnerHTML, Hello SetHTML: Stronger XSS Protection in Firefox 148 (hacks.mozilla.org)

316 points by todsacerdoti 9 hours ago

entuno 9 hours ago

This kind of thing always makes me nervous, because you end with a mix of methods where you can (supposedly) pass arbitrary user input to them and they'll safely handle it, and methods where you can't do that without introducing vulnerabilities - but it's not at all clear which is which from the names. Ideally you design that in from the state, so any dangerous functions are very clearly dangerous from the name. But you can't easily do that down the line.

I'm also rather sceptical of things that "sanitise" HTML, both because there's a long history of them having holes, and because it's not immediately clear what that means, and what exactly is considered "safe".

jncraton 9 hours ago

You are right that the concept of "safe" is nebulous, but the goal here is specifically to be XSS-safe [1]. Elements or properties that could allow scripts to execute are removed. This functionality lives in the user agent and prevents adding unsafe elements to the DOM itself, so it should be easier to get correct than a string-to-string sanitizer. The logic of "is the element currently being added to the DOM a <script>" is fundamentally easier to get right than "does this HTML string include a script tag".

[1] https://developer.mozilla.org/en-US/docs/Web/API/Element/set...

entuno 5 hours ago

It's certainly an improvement over people trying to homebrew their own sanitisers. But that distinction of being XSS-safe is a potentially subtle one, and could end up being dangerous if people don't carefully consider whether XSS-safe is good enough when they're handling arbitrary users input like that.

intrasight 5 hours ago

Also has made me nervous for years that there's been no schema against which one can validate HTML. "You want to validate? Paste your URL into the online validation tool."

Dylan16807 an hour ago

snowhale 8 hours ago

the browser-native Sanitizer API has one advantage the library approaches don't: it uses the same HTML parser the browser uses to render. libraries like DOMPurify parse in a separate context then re-serialize, and historically that round-trip is where most bypasses came from. when the sanitizer and the renderer share the same parser, mutation XSS attacks have nowhere to hide.

pornel 8 hours ago

BTW, HTML allows inline SVG with an XML-flavored syntax that interprets <script/> and <title> differently. It's a goldmine for sanitizer escapes. There are completely bonkers syntax switching and error recovery rules that interact with parsing modes (there's even an edge case where a particular attribute value switches between HTML and XML-ish parsing rules).

Don't even try to allow inline <svg> from untrusted sources! (and then you still must sanitise any svg files you host)

kccqzy 7 hours ago

cxr 2 hours ago

It may be using some of the same deserialization machinery, but "parsing" is a broad term that includes things that the sanitizer is doing and that the browser's ordinary content-processing → rendering path does not.

Even with this being a native API, there are still two parsers that need to be maintained. What a native API achieves is to shift the onus for maintaining synchronicity between the two onto the browser makers. That's not nothing, but it's also not the sort of free lunch that some people naively believe it is.

cxr 2 hours ago

> it's not at all clear which is which from the names. Ideally you design that in from the [start]

It was, and there is: setting elementNode.textContent is safe for untrusted inputs, and setting elementNode.innerHTML is unsafe for untrusted inputs. The former will escape everything, and the latter won't escape anything.

You are right that these "sanitizers" are fundamentally confused:

> "HTML sanitization" is never going to be solved because it's not solvable.¶ There's no getting around knowing whether or any arbitrary string is legitimate markup from a trusted source or some untrusted input that needs to be treated like text. This is a hard requirement.

<https://news.ycombinator.com/item?id=46222923>

The Web platform folks who are responsible for getting fundamental APIs standardized and implemented natively are in a position to know better, and they should know better. This API should not have made it past proposal stage and should not have been added to browsers.

Dylan16807 an hour ago

> There's no getting around knowing whether or any arbitrary string is legitimate markup from a trusted source or some untrusted input that needs to be treated like text. This is a hard requirement.

It is not a hard requirement that untrusted input is "treated like text". And this API lets you customize exactly what tags/attributes are allowed in the untrusted input. That's way better than telling everyone to write their own; it's not trivial.

cxr an hour ago

Cthulhu_ 7 hours ago

Ideally you should be able to set a global property somewhere (as a web developer) that disallows outdated APIs like `innerHTML`, but with the Big Caveat that your website will not work on browsers older than X. But maybe there's web standards for that already, backup content if a browser is considered outdated.

cxr 2 hours ago

It's not an "outdated API". It's still good for what it was always meant for: parsing trusted, application-generated markup and atomically inserting it into the content tree as a replacement for a given element's existing children.

> set a global property somewhere (as a web developer) that disallows[…] `innerHTML`

    Object.defineProperty(Element.prototype, "innerHTML", {
      set: (() => { throw Error("No!") })
    });
(Not that you should actually do this—anyone who has to resort to it in their codebase has deeper problems.)

staticassertion 7 hours ago

Doesn't using TrustedTypes basically do that? I'm not really web-y, someone please correct me if I'm off.

madeofpalk 6 hours ago

afavour 7 hours ago

I like the idea of that. But I imagine linting rules are a much more immediate answer in a lot of projects.

voxic11 9 hours ago

The idea is you wouldn't mix innerHTML and setHTML, you would eliminate all usage of innerHTML and use the new setHTMLUnsafe if you needed the old functionality.

extraduder_ire 8 hours ago

I looked up setHTMLUnsafe on MDN, and it looks like its been in every notable browser since last year.

Good idea to ship that one first, when it's easier to implement and is going to be the unsafe fallback going forward.

onion2k 6 hours ago

croes 8 hours ago

If I need the old functionality why not stick to innerHTML?

orf 8 hours ago

tbrownaw 8 hours ago

philipwhiuk 6 hours ago

reddalo 8 hours ago

You can't rename an existing method. It would break compatibility with existing websites.

post-it 9 hours ago

> you would eliminate all usage of innerHTML

The mythical refactor where all deprecated code is replaced with modern code. I'm not sure it has ever happened.

I don't have an alternative of course, adding new methods while keeping the old ones is the only way to edit an append-only standard like the web.

thenewnewguy 8 hours ago

Vinnl 8 hours ago

bulbar 8 hours ago

littlestymaar 7 hours ago

noduerme 8 hours ago

jaffathecake 8 hours ago

fwiw, if you serve your page with:

Content-Security-Policy: require-trusted-types-for 'script'

…then it blocks you from passing regular strings to the methods that don't sanitize.

DoctorOW 9 hours ago

They do link the default configuration for "safe": https://wicg.github.io/sanitizer-api/#built-in-safe-default-...

But I agree, my default approach has usually been to only use innerText if it has untrusted content:

So if their demo is this:

    container.SetHTML(`<h1>Hello, {name}</h1>`);
Mine would be:

    let greetingHeader = container.CreateElement("h1");
    greetingHeader.innerText = `Hello, {name}`;

itishappy 8 hours ago

What if I wanted an <h2>?

Edit: I don't mean this flippantly. If I want to render, say, my blog entry on your site, will I need to select every markup element from a dropdown list of custom elements that only accept text a la Wordpress?

DoctorOW 5 minutes ago

HWR_14 5 hours ago

That's why I only allow user input of alphanumeric ascii characters. No need to worry about sanitation then, and you can just remove all the characters that don't match.

(It's a joke, but it is also 100% XSS, SQL injection, etc. safe and future proof)

thaumasiotes 2 hours ago

> I'm also rather sceptical of things that "sanitise" HTML, both because there's a long history of them having holes, and because it's not immediately clear what that means, and what exactly is considered "safe".

What is safe depends on where the sanitized HTML is going, on what you're doing with it.

It isn't possible to "sanitize HTML" after collecting it so that, when you use it in the future, it will be safe. "Safe" is defined by the use.

But it is possible to sanitize it before using it, when you know what the use will be.

noduerme 8 hours ago

Some sanitization is better than none? If you're relying on the browser to handle it for you, you're already in a lot of trouble.

post-it 9 hours ago

realSetSafeHTML()

onion2k 6 hours ago

it's not at all clear which is which from the names

There's setHTML and setHTMLUnsafe. That seems about as clear as you can get.

entuno 5 hours ago

If that'd been the design from the start, then sure. But it's not at all obvious that setHTML is safe with arbitrary user input (for a given value of "safe") and innerHTML is dangerous.

hahn-kev 6 hours ago

But you can use InnerHTML to set HTML and that's not safe.

onion2k 5 hours ago

simonw 9 hours ago

Great to see this start to show up, but it looks like it will be a while before browser support is widely distributed enough to rely on it being present: https://caniuse.com/mdn-api_element_sethtml

jraph 9 hours ago

Indeed, as any browser API, it might be for in a few years (months if happy with the most recent versions), and we may have polyfills in the meantime.

tuyiown 9 hours ago

I wouldn't advise polyfills on this one, it entirely depends on the browser ability to evaluate cross scripting and cross origin rule on a arbitrary snippet. This is not a convenience API.

Aachen 8 hours ago

So you can still inject <h1> or <br><br><br>... etc into your username, in the given example

Preventing one bug class (script execution) is good, but this still allows arbitrary markup to the page (even <style> CSS rules) if I'm reading the docs correctly. You could give Paypal a fresh look for anyone who opens your profile page, if they use this. Who would ever want this?

cogman10 8 hours ago

> Who would ever want this?

The main case I can think of is wanting some forum functionality. Perhaps you want to allow your users to be able to write in markdown. This would provide an extra layer of protection as you could take the HTML generated from the markdown and further lock it down to only an allowed set of elements like `h1`. Just in case someone tried some of the markdown escape hatches that you didn't expect.

Aachen 8 hours ago

> This would provide an extra layer of protection

I think this might be the answer. There's no point to it by itself (either you separate data and code or you don't and let the user do anything to your page), but if you're already using a sanitiser and you can't use `textContent` because (such as with Markdown) there'll be HTML tags in the output, then this could be extra hardening. Thanks!

iLoveOncall 4 hours ago

You'd never want to store the processed HTML anyway, this is website building 101.

efilife 2 hours ago

piccirello 7 hours ago

`setHTML` is meant as a replacement for `innerHTML`. In the use case you describe, you would have never wanted `innerHTML` anyway. You'd want `innerText` or `textContent`.

iLoveOncall 4 hours ago

But that's what setHTML isn't at all a replacement for innerHTML.

You still need innerHTML when you want to inject HTML tags in the page, and you could already use innerText when you didn't want to.

Having something in between is seriously useless.

Dylan16807 an hour ago

itishappy 8 hours ago

> If the default configuration of setHTML( ) is too strict (or not strict enough) for a given use case, developers can provide a custom configuration that defines which HTML elements and attributes should be kept or removed.

Aachen 7 hours ago

Injecting markup into someone else's website isn't what I'd call too strict a default configuration

If you mean to convey that it's possible to configure it to filter properly, let me introduce you to `textContent` which is older than Firefox (I'm struggling to find a date it's so old)

itishappy 7 hours ago

byproxy 8 hours ago

> but this still allows arbitrary markup to the page (even <style> CSS rules) if I'm reading the docs correctly.

If that's true, seems like it's still a security risk given what you can do with CSS these days: https://news.ycombinator.com/item?id=47132102

circuit10 7 hours ago

You can use selectors to gain some information about things like input fields, e.g. https://www.invicti.com/blog/web-security/private-data-stole...

Or I guess you could completely restyle and change the text of UI elements so it looks like the user is doing one thing when they're actually doing something completely different like sending you money

qingcharles 6 hours ago

jerf 7 hours ago

If I'm reading this right,

    .setHTML("<h1>Hello</h1>", new Sanitizer({}))
will strip all elements out. That's not too difficult.

Plus this is defense-in-depth. Backends will still need to sanitize usernames on some standard anyhow (there's not a lot of systems out there that should take arbitrary Unicode input as usernames), and backends SHOULD (in the RFC sense [1]) still HTML-escape anything they output that they don't want to be raw HTML.

[1]: https://www.rfc-editor.org/rfc/rfc2119

evilpie 5 hours ago

You aren't reading it right.

  new Sanitizer({})
This Sanitizer will allow everything by default, but setHTML will still block elements/attributes that can lead to XSS.

You might want something like:

  new Sanitizer({ replaceWithChildrenElements: ["h1"], elements: [], attributes: [] })
This will replace <h1> elements with their children (i.e. text in this case), but disallow all other elements and attributes.

benmmurphy 7 hours ago

i think the use case for setHTML is for user content that contains rich text and to display that safely. so this is not an alternative for escaping text or inserting text into the DOM but rather a method for displaying rich text. for example maybe you have an editor that produces em, and strong tags so now you can just whitelist those tags and use setHTML to safely put that rich text into the DOM without worrying about all the possible HTML parsing edge cases.

lelanthran 4 hours ago

> Who would ever want this?

Your lack of imagination is disturbing :-)

https://github.com/lelanthran/ZjsComponent

embedding-shape 8 hours ago

> So you can still inject <h1> or <br><br><br>... etc into your username, in the given example

How exactly, given that setHTML sanitizes the input? If you don't want to have any HTML tags allowed, seems you can configure that already? https://wicg.github.io/sanitizer-api/#built-in-safe-default-...

Aachen 8 hours ago

> How exactly, given that setHTML sanitizes the input?

The article says that the output is:

    <h1>Hello my name is</h1>
So it keeps (non-script) html tags (and presumably also attributes) in the input. Idk how you're asking "how" since it's the default behavior

Stripping HTML tags completely has always been possible with the drop-in replacement `textContent`. Making a custom configuration object for that is much more roundabout

embedding-shape 8 hours ago

kccqzy 7 hours ago

There’s innerText if you don’t want markup. Or more verbosely, document.createTextNode followed by whatever.appendChild.

afavour 7 hours ago

> Who would ever want this?

Anyone who wants to provide some level of flexibility but within bounds. Say, you want to allow <strong> and <em> in a forum post but not <script>. It's not too difficult to imagine uses.

goatlover 4 hours ago

Forums would already have code that sanitizes user input when it's submitted. Users aren't directly setting html elements.

afavour 4 hours ago

dheera 6 hours ago

> So you can still inject <h1> or <br><br><br>... etc into your username

Are we taking out all the fun of the web? I absolutely loved the <marquee> names people had in the early days of Facebook, it was all harmless fun.

If injection of frontend code takes down your backend, your backend sucks, fix it.

jjcm an hour ago

What I really want is a <sandbox> element that can safely run dangerous code, not something that modifies dangerous code.

Iframes have significant restrictions as they can’t flow with the DOM. With AI and the increase in dynamic content, there’s going to be even more situations where you run untrusted code. I want configurable encapsulation.

dogtimeimmortal 5 hours ago

Title was a bit rage-baity. And I think you can already do sanitation by writing a function to check input before passing it to innerHTML?

This really just seems like another attempt at reinventing the wheel. Somewhat related, I find it ironic how i cannot browse hacks.mozilla.org in my old version of firefox("Browser not supported"). Also, developer.mozilla.org loads mangled to various degrees in current versions of palemoon, basilisk, and seamonkey

It's like there is some sort of "browser cartel" trying to screw up The Web.

Retr0id 5 hours ago

> you can already do sanitation by writing a function to check input before passing it to innerHTML

This is like saying C is memory safe as long as your code doesn't have any bugs.

More saliently, it does not consider parser differentials.

cogman10 8 hours ago

Seems like this has a bunch of footguns. Particularly if you interact with the Sanitizer api, and particularly if you use the "remove" sanitizer api.

Don't get me wrong, better than nothing, but also really really consider just using "setText" instead and never allow the user to add any sort of HTML too the document.

lelanthran 4 hours ago

> never allow the user to add any sort of HTML too the document.

What about when the author of the page wants to add large html fragments to the page?

Are you saying that you cannot think of a single use for this, considering how often innerHTML is being used?

evilpie 8 hours ago

Using an allowlist based Sanitizer you are definitely less likely to shoot yourself in the foot, but as long as you use setHTML you can't introduce XSS at least.

GalaxyNova 5 hours ago

It's worse than nothing, since inevitably people will use this thinking it's 100% safe when it's not.

pyrolistical 3 hours ago

And for those who want a better innerHTML, use insertAdjacentHTML https://developer.mozilla.org/en-US/docs/Web/API/Element/ins...

I don’t ever use it with user input, but use it often when building SPA without frameworks

tuyiown 9 hours ago

This is nice. The best part is that all aspects of network access are now properly controlled so that security transitioned from a chain of trusted code to a chain of trusted security setup on hosts, with existing workable safe defaults.

kevincloudsec 6 hours ago

naming the old behavior setHTMLUnsafe is what did it for me. security features that require developers to opt in don't work. making the unsafe path feel unsafe does.

shevy-java 6 hours ago

Well, the name SetHTML, or let's say:

    .set_html()
Makes objectively more sense than:

    .inner_html()
    .inner_html =
    .set_inner_html()
It is a fairly small thing, but ... really. One day someone should clean up the mess that is JavaScript. Guess it will never happen, but JavaScript has so many strange things ...

I understand that this here is about protection against attacks rather than a better API design, but really - APIs should ideally be as great as possible the moment they are introduced and shown to the public.

lloydatkinson 6 hours ago

To be pedantic that’s the DOM API, which is exposed to JavaScript.

The DOM API has always felt like, and still does, it was written by people that have never made an API.

pier25 4 hours ago

I don't think that's pedantic. Seems like a valid objection to me.

So many issues in the client JS world originate from insufficient or bad browser APIs.

lloydatkinson 2 hours ago

pier25 4 hours ago

Tangential but it's amazing in 2026 browsers still don't ship a native DOM morph/merge API like morphdom or idiomorph.

dvh 5 hours ago

Kids in the '90s:

  SQL("select * from user where name = " + name);
Kids in the '20s:

  div.innerHTML = "Hello " + user.name;

Legend2440 an hour ago

Kids in the '30s:

  "Summarize this email:  " + email.contents
Prompt injection is just the same problem on a new technology. We didn't learn anything from the 90s.

austin-cheney 3 hours ago

Another solution is just use this at the start of your code:

    delete Element.prototype.innerHTML;
Then assignments to innerHTML do not modify the element's textContent or child node list and assignments to it will not throw an error.

bryanrasmussen 6 hours ago

is there any situation where innerHTML would be preferable? I could suppose it might be more performant and so if you were constructing something that was not open to XSS it might theoretically be better (with the usual caveat that people always make mistakes about this kind of thing)

dbvn 8 hours ago

at what point can we consider the development of "set this element's text/html" to be done?

Aachen 8 hours ago

When browsers implement a variant that lets you separate data and code perhaps. That's what I expected when reading the headline: setHtml(code, data, data, ...), just like parameterised SQL works: prepare("select rowid from %s where time < %n", tablename, mynumber)

This new method they've cooked up would be called eval(code,options) if html was anything other than a markup language

itishappy 7 hours ago

antonyh 9 hours ago

A rather deceptive title, given that 'innerHTML' isn't going away.

jandrese 7 hours ago

I think the title is trying to convince you to switch from InnerHTML to SetHTML.

bingemaker 9 hours ago

Nice one. Will there be any impact on __dangerouslySetInnerHTML (React)?

shadowgovt 7 hours ago

Oh, that's nice-to-have. Good work, Mozilla.

It would close the loop better if you could also use policy to switch off innerHTML in a given page, but definitely a step in the right direction for plain-JavaScript applications.

giancarlostoro 5 hours ago

My corporate firewall blocks it due to the "hacks" in the subdomain / url. This is silly.

ok123456 3 hours ago

That's why the DNS for hackernews is news.ycombinator.com and not hackernews.org