Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Extend images regex to also match the appendix after file extension #409

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Thaodan
Copy link
Contributor

@Thaodan Thaodan commented Jun 4, 2022

Some images have an appendix after the file extension making the
builtin regex fail.
An example would be something like https://example.com/foo.svg?size=800x600.

@wasamasa
Copy link
Collaborator

wasamasa commented Jun 4, 2022

The original code tries to match several URLs in a buffer, with this change that no longer works:

(let ((haystack "foo http://xxx.jpg <https://bar.jpg?>http://baz.jpg")
      (needle "\\(https?://[^ ]*?\\.\\(?:png\\|jpg\\|jpeg\\|svg\\|gif\\).*\\)"))
  (with-temp-buffer
    (insert haystack)
    (goto-char (point-min))
    (while (re-search-forward needle nil t)
      (message "%S" (match-string-no-properties 1)))))
;; "http://xxx.jpg <https://bar.jpg?>http://baz.jpg"

Generally, .* patterns should be avoided whenever possible for this reason because they're only terminated by a newline. Instead, it should be delimited by URL delimiters (whitespace, double quote, angular brackets). The existing regex could be rewritten using rx if you prefer.

@Thaodan
Copy link
Contributor Author

Thaodan commented Jun 4, 2022

The original code tries to match several URLs in a buffer, with this change that no longer works:

(let ((haystack "foo http://xxx.jpg <https://bar.jpg?>http://baz.jpg")
      (needle "\\(https?://[^ ]*?\\.\\(?:png\\|jpg\\|jpeg\\|svg\\|gif\\).*\\)"))
  (with-temp-buffer
    (insert haystack)
    (goto-char (point-min))
    (while (re-search-forward needle nil t)
      (message "%S" (match-string-no-properties 1)))))
;; "http://xxx.jpg <https://bar.jpg?>http://baz.jpg"

Generally, .* patterns should be avoided whenever possible for this reason because they're only terminated by a newline.

That's what I was wondering to but I thought it only works per line so how can white space by an issue.

Instead, it should be delimited by URL delimiters (whitespace, double quote, angular brackets). The existing regex could be rewritten using rx if you prefer.

That sounds like it make sense.

@Thaodan
Copy link
Contributor Author

Thaodan commented Jun 4, 2022 via email

@wasamasa
Copy link
Collaborator

wasamasa commented Jun 4, 2022

S-expression notation for regular expressions. Try M-x find-library RET rx RET for more information.

Some images have an appendix after the file extension making the
builtin regex fail.
An example would be something like https://example.com/foo.svg?size=800x600.

A fix for this is to match anything until a delimiters like
"'>) or :space:tab:
@Thaodan Thaodan force-pushed the display_images_with_appendix branch from aa569ac to e6efcd6 Compare June 4, 2022 22:05
@@ -74,7 +74,7 @@ See `enable-circe-display-images'."
:group 'circe)

(defcustom circe-display-images-image-regex
"\\(https?://[^ ]*?\\.\\(?:png\\|jpg\\|jpeg\\|svg\\|gif\\)\\)"
"\\(https?://[^ ]*?\\.\\(?:png\\|jpg\\|jpeg\\|svg\\|gif\\)[^\]\)>\"\' \n]*\\)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason you use rx is because it generates optimal regex and avoids typical errors, such as needlessly escaping characters that are not considered special in a character alternative. The URL delimiters would be matched by (rx (* (not (any "<>\"\t\r\n ")))) for example. Parentheses, square brackets and single quotes are not supposed to be there.

Additionally, that regex should be used after the "https?://" part. rx allows avoiding duplication here if the non-macro form is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I pushed my fixes to the existing regex that I already saved before working on replacing it with rx.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants