Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Performance issues when escaping looooong value as html_attr #4322

Closed
Kocal opened this issue Sep 18, 2024 · 7 comments
Closed

Performance issues when escaping looooong value as html_attr #4322

Kocal opened this issue Sep 18, 2024 · 7 comments

Comments

@Kocal
Copy link
Contributor

Kocal commented Sep 18, 2024

Hi everyone, sorry if this issue is a duplicate, but I didn't find anything here.

For reference: symfony/ux#2178

With Symfony UX Map, when you create a Map with 1000 Marker and InfoWindow, the following code takes ~62% of the response time (near ~220 ms) to escape $string (here, a big JSON object which have been stringified):

case 'html_attr':
if ('UTF-8' !== $charset) {
$string = $this->convertEncoding($string, 'UTF-8', $charset);
}
if (!preg_match('//u', $string)) {
throw new RuntimeError('The string to escape is not a valid UTF-8 string.');
}
$string = preg_replace_callback('#[^a-zA-Z0-9,\.\-_]#Su', function ($matches) {
/**
* This function is adapted from code coming from Zend Framework.
*
* @copyright Copyright (c) 2005-2012 Zend Technologies USA Inc. (https://www.zend.com)
* @license https://framework.zend.com/license/new-bsd New BSD License
*/
$chr = $matches[0];
$ord = \ord($chr);
/*
* The following replaces characters undefined in HTML with the
* hex entity for the Unicode replacement character.
*/
if (($ord <= 0x1F && "\t" != $chr && "\n" != $chr && "\r" != $chr) || ($ord >= 0x7F && $ord <= 0x9F)) {
return '&#xFFFD;';
}
/*
* Check if the current character to escape has a name entity we should
* replace it with while grabbing the hex value of the character.
*/
if (1 === \strlen($chr)) {
/*
* While HTML supports far more named entities, the lowest common denominator
* has become HTML5's XML Serialisation which is restricted to the those named
* entities that XML supports. Using HTML entities would result in this error:
* XML Parsing Error: undefined entity
*/
static $entityMap = [
34 => '&quot;', /* quotation mark */
38 => '&amp;', /* ampersand */
60 => '&lt;', /* less-than sign */
62 => '&gt;', /* greater-than sign */
];
if (isset($entityMap[$ord])) {
return $entityMap[$ord];
}
return \sprintf('&#x%02X;', $ord);
}
/*
* Per OWASP recommendations, we'll use hex entities for any other
* characters where a named entity does not exist.
*/
return \sprintf('&#x%04X;', mb_ord($chr, 'UTF-8'));
}, $string);
if ('UTF-8' !== $charset) {
$string = iconv('UTF-8', $charset, $string);
}
return $string;

I.. don't know if its more an issue from Twig or a skill-issue from our StimulusAttributes usage.

But do you think something is optimizable here? is it safe to replace 'a very long stringified json object'|e('html_attr') with htmlentities(json_encode('...')? Because this combo improved performances by 62%!

Thanks!

@stof
Copy link
Member

stof commented Sep 18, 2024

inside quoted attribute values, it is safe to use the html strategy.

The html_attr strategy is specifically about the extra rules needed when escaping things in attribute names and unquoted values.

@Kocal
Copy link
Contributor Author

Kocal commented Sep 18, 2024

Indeed, I've reverted my code and used html as escaping strategy, the performances are a lot better!
image

Thanks @stof!

@Kocal Kocal closed this as completed Sep 18, 2024
@javiereguiluz
Copy link
Contributor

@stof could you please explain a bit more what do you mean by "unquoted values"? Maybe show an example?

The Twig docs (https://twig.symfony.com/doc/3.x/filters/escape.html) say this:

html_attr: escapes a string for the HTML attribute context.

@stof
Copy link
Member

stof commented Sep 18, 2024

@javiereguiluz HTML does not force you to put quotes around values of attributes. <input type=text> is valid HTML. When you don't have quotes around the value, it requires a lot more escaping (for instance, any whitespace has to be escaped so that it does not end the attribute value).

The attribute context is the context where you write those attributes.

Looking at the HTML parsing state, it has several state related to parsing attribute pairs: https://html.spec.whatwg.org/#before-attribute-name-state (sections 13.2.5.32 to 13.2.5.39)
As you can see there, the Attribute value (double-quoted) state and Attribute value (single-quoted) state states have very few special characters (basically, only & and the corresponding quote), which are already escaped by the html strategy of Twig. Other attribute-related states (related to attribute names and unquoted attribute values) have much more special characters, which is what the html_attr escaping strategy is about

@fabpot
Copy link
Contributor

fabpot commented Sep 18, 2024

We should maybe clarify the docs? Who wants to give it a try?

@javiereguiluz
Copy link
Contributor

Thanks a lot for the details @stof 🙏

@Kocal
Copy link
Contributor Author

Kocal commented Sep 19, 2024

Thanks Google, in fact that's a duplicate of #2615.

I will try to send a doc PR asap :)

EDIT: opened #4324

fabpot added a commit that referenced this issue Sep 21, 2024
…ocal)

This PR was merged into the 3.x branch.

Discussion
----------

[Doc] Document about `html` and `html_attr` strategies

Ref #4322

Commits
-------

ac88c9a [Doc] Document about `html` and `html_attr` strategies
javiereguiluz added a commit to symfony/ux that referenced this issue Sep 24, 2024
…erformances by switching to `html` escaping strategy (Kocal)

This PR was squashed before being merged into the 2.x branch.

Discussion
----------

[StimulusBundle] Improve `StimulusAttributes` rendering performances by switching to `html` escaping strategy

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | no <!-- please update src/**/CHANGELOG.md files -->
| Issues        | Fix #... <!-- prefix each issue number with "Fix #", no need to create an issue if none exist, explain below instead -->
| License       | MIT

<!--
Replace this notice by a description of your feature/bugfix.
This will help reviewers and should be a good start for the documentation.

Additionally (see https://symfony.com/releases):
 - Always add tests and ensure they pass.
 - For new features, provide some code snippets to help understand usage.
 - Features and deprecations must be submitted against branch main.
 - Changelog entry should follow https://symfony.com/doc/current/contributing/code/conventions.html#writing-a-changelog-entry
 - Never break backward compatibility (see https://symfony.com/bc).
-->

A better alternative to #2178, by changing the escaping strategy for HTML attributes value from `html_attr` to `html`, as indicated by `@stof` in twigphp/Twig#4322 (comment).

On my use case (a `Map` with 1000 `Marker` and `InfoWindow`), [I have a performance gain of ~68%](https://blackfire.io/profiles/compare/56f4d7d2-ee56-487f-8a78-420f4037165f/graph):
<img width="1057" alt="image" src="https://github.com/user-attachments/assets/8b7ff48d-8e6e-4dae-b09e-2c323bc2449d">

This PR should also improve performances of our packages using the `StimulusAttributes` DTO, like Chart.js, LiveComponent, ...

---

> [!IMPORTANT]
> The initial PR changed a bit, the default rendering strategy does not change, but instead we introduce a new configuration to use the new (and optimized) rendering strategy, see  #2180 (comment).

Commits
-------

647532a [StimulusBundle] Improve `StimulusAttributes` rendering performances by switching to `html` escaping strategy
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

No branches or pull requests

4 participants