-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Suggest .htaccess rules to prevent some erroneous cache directories #101
Comments
@raamdev I think this behavior is correct on the part of QC. If the host changes, a separate cache should be kept for it since it's always possible that the host name would impact the content generated server-side. Even though a default WP install might do fine against its configured host name, you never know what else might be running on that server and/or via custom themes/plugins; which might alter the final output based on the host name in the request. You mentioned to me before that you thought a possible solution might be to offer site owners an This should resolve...
|
Thanks! I had actually forgotten that we discussed that. Yes, I agree that's the best approach.
No, it wasn't nested. It was at the same level as
Inside which I have:
And every cache file in those sub-directories is, as would be expected, a 404 (symlinked back to the default 404 file). I tried searching my apache logs for any requests matching some of the 404s, but came up empty. I'll leave this open for now and do some more testing on my live site. I'll also defer this issue for a future release, as I don't feel it's important enough to get out right away. |
Just a quick update on this: I've been running Quick Cache Pro (from April 16th) for the past two weeks, along with the
I'm installing the latest Quick Cache Pro as of today and will continue testing. I realized that this issue with erroneous directories might also have something to do with 404 Caching, as if an invalid URL is requested, Quick Cache will create the necessary subdirectories in to make the symlink to the 404 cache file. With 404 Caching disabled (the default), I bet these erroneous directories would go away. I'll let it run for a few days and then test again with 404 Caching disabled. |
I've had the latest dev version running for the past few days (404 Caching enabled). I have the following subdirectories in
Inside
The actual working URLs are here: I dug through my Apache access logs looking for the 404 errors to see if there was something funky about the
(I'm assuming these are the corresponding requests based on the fact the date and timestamps match up to when the 404 symlinks were created.) What's odd to me is that Apache returned a 404 when the request looks like it should go through. I mean, if you copy and paste those two URLs into your browser, they won't return a 404 but rather the post they're supposed to return. @JasWSInc Any idea what might be going on here? Or any thoughts about how else I can attempt to figure out what's going on here? I'm going to disable 404 Caching now and let it run for a few more days just to verify that this issue goes away with 404 Caching disabled. |
Regarding these two log entries in your Apache log...
These actually look wrong to me, but it might just be the Apache log format you're using. Could you check on this? Ordinarily, an HTTP request includes a The
|
In short, when I see
|
Ah, yes, you're absolutely right. I was looking at way too many log entries and didn't catch that. The So, this looks like it's just an invalid request and there's not a whole lot that we can do about that, correct? I just tried reproducing this, both in a browser and via the command line using I'm curious how such a request ever made it through to WordPress where Quick Cache picked it up. |
Right. I'm not aware of a way to stop this. It's just a 404 error really.
Here's how you can reproduce it. These requests are most likely coming from a bot, it would be very difficult to reproduce this in a browser. Instead of building a URL, think about the underlying HTTP communication that would occur if you made this request without using a URL; and instead you simply opened a socket that sends an invalid GET request with the correct <?php
error_reporting(-1);
ini_set('display_errors', TRUE);
$raamdev_ip = gethostbyname('raamdev.com'); // Resolve to an IP address.
$connection = fsockopen($raamdev_ip, 80, $errno, $errstr, 30); // Open connection.
if(!$connection) echo $errstr.' ('.$errno.')<br />'."\n";
else // We have a connection to `$raamdev_ip:80`. We're good so far.
{
/*
* BuildS a GET request that is intentionally invalid in this case.
*/
$request = 'GET http://raamdev.com/2014/linkedin-spam-black-girl-birthday-os-x-contacts-app/ HTTP/1.1'."\r\n";
// ↑ this is intentionally invalid; it should be `/2014/linkedin-spam-black-girl-birthday-os-x-contacts-app/`.
$request .= 'Host: raamdev.com'."\r\n"; // Apache virtual host @ `$raamdev_ip:80`.
$request .= 'Connection: Close'."\r\n\r\n";
/*
* Talk to the IP handling `raamdev.com`.
*/
fwrite($connection, $request);
/*
* Get the response; a 404 in this case.
*/
while(!feof($connection))
echo fgets($connection, 128);
/*
* Close the connection.
*/
fclose($connection);
} |
One thing you could do is investigate any reports from Google Webmaster Tools for For example, if you have an That said, this can happen even if you don't have any invalid links on the site. Some spiders just don't function properly. They get things wrong when they spider your site. You could scan your log files and try to find a bot that is consistently doing this to you; then ban it using a |
Thanks for explaining that and for the sample code. That really helped clarify a few things for me. :) I tested that script and it does exactly as you said; it recreates the
Got it. We'll just offer an I think it will also be a good idea to explain that with 404 caching enabled, any invalid request will result in the cache file symlink being created, just so that there's no confusion about why there are cache directories for seemingly invalid hosts. In fact, I can probably turn a lot of what we've talked about here into a wiki article and reference that right form the inline docs. |
Punting this to the Future Release milestone. |
Those are the result of a slightly misconfigured web server. Quick Cache uses the PHP Where do these requests come from? Well, a search engine bot that scans large amounts of sites could itself be misconfigured and make bad requests, which Quick Cache picks up and attempts to cache. The best way around this issue is to create an |
Ever since I installed zen cache at http://alcohol-abuse-and-addictions-agency.co.uk unless an .htaccess is not present at file manager the whole site is down. Even with the .htaccess deleted and the site is up none of the page or posts links work. permalinks is set to %postnames% at the bottom of the list. The .htaccess regularly re-appears and when it does the site comes down. I cannot just delete zencache as now i don't want to be further messed up. I have aws account with a created distribution correctly setup as per your excellent video with a cname at cpanel cdn. etc. i USED THE BEGIN Host EnforcerRewriteEngine on RewriteBase /
END Host Enforcerby replacing example with alcohol-abuse-and-addictions-agency.co.uk within it being careful that it was exactly the same bar the href but it didn't work and the only way i could get the site to show again was by deleting the .htaccess file agin completely. Still no links work but the cloudfront aws is very fast in rendering links that don't work |
@sallyfarmer It sounds like you may have an error in your |
@raamdev I'm just noting that this is another candidate for our new |
…at allows site owners to enforce an exact host name for all requests. See: **Dashboard → Comet Cache Pro → Plugin Options → Apache Optimizations → Enforce an Exact Host Name?**. See also: [Issue #101](wpsharks/comet-cache#101).
- **New Feature:** Comet Cache can now be configured to automatically clear the cache for date-based archive views whenever any single post is cleared due to changes in content, title, etc. See: **Dashboard → Comet Cache → Plugin Options → Automatic Cache Clearing → Auto-Clear "Date-Based Archives" Too?**. See also: [Issue #724](#724). - **New Pro Feature:** Apache Optimizations now include a new option that allows site owners to enforce an exact host name for all requests. See: **Dashboard → Comet Cache Pro → Plugin Options → Apache Optimizations → Enforce an Exact Host Name?**. See also: [Issue #101](#101). - **Bug Fix:** Apache detection sometimes inaccurate. So instead of using default WP core globals for server detection, Comet Cache now uses it's own set of Apache/Nginx/IIS detection functions. And, this release enhances our Apache and Nginx detection routines; making them smart enough to catch additional edge cases; i.e., to further reduce the likelihood of there being a false-positive. See [Issue #748](#748). - **Bug Fix:** Some XML-RPC and REST API requests were being cached inadvertently. See [Issue #855](#855). - **Bug Fix:** Broken textarea field due to `white-space:nowrap` in Firefox. See [Issue #866](#866). - **Bug Fix:** This release resolves empty directories being left in the cache folder, in some scenarios. See [Thread #866](https://forums.wpsharks.com/t/cache-folders-not-removed-during-clean-up-process/866). - **Bug Fix** (Pro): Some REST requests were being redirected incorrectly whenever Apache Optimizations were enabled. See [Issue #855](#855). - **Compatibility Bug Fix:** Some Jetpack API calls were being cached inadvertently. See [Issue #855](#855). - **Enhancement:** Notes in HTML source now indicate fully functional on first load for improved clarity. See [Issue #860](#860). - **Code Cleanup:** Enhancing security by removing `basename(__FILE__)` from direct access notices.
Tested for Site using NGINX Also tried adding the following manually for sites that use NGINX:
was unable to continue testing as there were problems with server detection please see comment here |
Confirmed WorkingTested Using: WordPress Version: 4.7.2 Tested using different incorrect web addresses/ made up subdomains; such as No erroneous cache directories: |
- **New Feature:** Comet Cache can now be configured to automatically clear the cache for date-based archive views whenever any single post is cleared due to changes in content, title, etc. See: **Dashboard → Comet Cache → Plugin Options → Automatic Cache Clearing → Auto-Clear "Date-Based Archives" Too?**. See also: [Issue #724](#724). - **New Pro Feature:** Apache Optimizations now include a new option that allows site owners to enforce an exact host name for all requests. See: **Dashboard → Comet Cache Pro → Plugin Options → Apache Optimizations → Enforce an Exact Host Name?**. See also: [Issue #101](#101). - **Bug Fix:** Apache detection sometimes inaccurate. So instead of using default WP core globals for server detection, Comet Cache now uses it's own set of Apache/Nginx/IIS detection functions. And, this release enhances our Apache and Nginx detection routines; making them smart enough to catch additional edge cases; i.e., to further reduce the likelihood of there being a false-positive. See [Issue #748](#748). - **Bug Fix:** Some XML-RPC and REST API requests were being cached inadvertently. See [Issue #855](#855). - **Bug Fix:** Broken textarea field due to `white-space:nowrap` in Firefox. See [Issue #866](#866). - **Bug Fix:** This release resolves empty directories being left in the cache folder, in some scenarios. See [Thread #866](https://forums.wpsharks.com/t/cache-folders-not-removed-during-clean-up-process/866). - **Bug Fix** (Pro): Some REST requests were being redirected incorrectly whenever Apache Optimizations were enabled. See [Issue #855](#855). - **Compatibility Bug Fix:** Some Jetpack API calls were being cached inadvertently. See [Issue #855](#855). - **Enhancement:** Notes in HTML source now indicate fully functional on first load for improved clarity. See [Issue #860](#860). - **Enhancement:** Enhancing security by removing `basename(__FILE__)` from direct access notices.
Comet Cache v170220 has been released and includes changes from this GitHub Issue. See the v170220 announcement for further details. This issue will now be locked to further updates. If you have something to add related to this GitHub Issue, please open a new GitHub Issue and reference this one (#101). |
During my testing of the new branched cache structure on a live site, I found that after several days my cache had the following directories:
Some of these should not exist, notably the uppercase
RAAMDEV-COM
andraamdev-comhttp
.The text was updated successfully, but these errors were encountered: