Skip to content

Fix 504 errors for crawlers #7

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maximmax42
Copy link

When crawlers try to index a folder on a website, say example.com/folder, Prerender is trying to fetch example.comfolder instead, thus giving a 504 to the bot. Adding a / between %{HTTP_HOST} and $2 in the .htaccess fixes that.

Real life example:
image
Last 2 are before the fix, first 2 are after the fix.

@varrocs
Copy link
Contributor

varrocs commented Oct 14, 2022

Hi
It used to be that way but the slash was deliberately removed because users had // in their URLs
See: https://github.com/prerender/prerender-apache/pull/5/files

@maximmax42
Copy link
Author

maximmax42 commented Oct 14, 2022

From https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule:

  • (from What is matched?) In per-directory context (Directory and .htaccess), the Pattern is matched against only a partial path, for example a request of "/app1/index.html" may result in comparison against "app1/index.html" or "index.html" depending on where the RewriteRule is defined.
    The directory path where the rule is defined is stripped from the currently mapped filesystem path before comparison (up to and including a trailing slash). The net result of this per-directory prefix stripping is that rules in this context only match against the portion of the currently mapped filesystem path "below" where the rule is defined.
  • (from Per-directory Rewrites) The removed prefix always ends with a slash, meaning the matching occurs against a string which never has a leading slash. Therefore, a Pattern with ^/ never matches in per-directory context.

If I understand this correctly, the pattern (and $2, by definition) will never have a leading slash, meaning %{HTTP_HOST}$2 in .htaccess will always result in something like example.comfolder, so the / between the host and $2 is required. It wouldn't be if this rewrite rule was in the VirtualHost context, which is when, I assume, people were getting double slashes with the %{HTTP_HOST}/$2 RewriteRule. Or maybe super old Apache version.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants