Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Suggestions #59

Open
superpoincare opened this issue Jan 2, 2024 · 0 comments
Open

Suggestions #59

superpoincare opened this issue Jan 2, 2024 · 0 comments

Comments

@superpoincare
Copy link

superpoincare commented Jan 2, 2024

Nice work. I have some suggestions.

$html = rtrim($html, "\n");

I think this unintentionally trims newlines in the end where it isn't needed. I think your intent is to remove newline somehow added by the code before but it ends up cutting newlines elsewhere.

Another observation on this part:

// Preserve html entities
$source = preg_replace('/&([a-zA-Z]*);/', 'html5-dom-document-internal-entity1-$1-end', $source);
$source = preg_replace('/&#([0-9]*);/', 'html5-dom-document-internal-entity2-$1-end', $source);

There is also an &#x type of entities. I am not sure of the following but you could check if the entity is really a genuine one or fake by doing something like

html_entity_decode( $matches[0], ENT_QUOTES, 'UTF-8' ) === $matches[0] )

with preg_replace_callback Maybe not needed.

You could also add some random string every time in the "internal" string for security purposes, maybe I am saying something silly.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant