Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enforce stable collectstatic/staticfiles.json output #277

Open
PeterJCLaw opened this issue Mar 1, 2021 · 2 comments
Open

Enforce stable collectstatic/staticfiles.json output #277

PeterJCLaw opened this issue Mar 1, 2021 · 2 comments

Comments

@PeterJCLaw
Copy link
Contributor

Thanks for this project :)

I've been exploring reducing deployment times, in particular by not re-deploying static assets if they haven't changed. As we have a lot of things which go into our builds my approach has been to compare the outputs relative to the previous builds. This ensures that if the build process itself changes the change-detection would still be correct (without needing to worry about forgetting to update it).

Unfortunately while it seems that Django internally does keep the output of collectstatic stable, I believe that using the CompressedManifestStaticFilesStorage storage causes it to vary.

I believe the cause is in the post_process_with_compression method, where the determination of the compressed files is not order-preserving.

I think the fix could be as simple as adding sorted to the call to self.compress_files, but I'm not sure if there's any reason not to do that.

Would you be interested in a PR to fix this?

@evansd
Copy link
Owner

evansd commented Mar 2, 2021

Thanks, yes, I'd definitely welcome a PR which made things more deterministic. (Assuming it didn't add loads of complexity, which I can't imagine it would.)

I already hit one issue with gzip embedding timestamps leading to non-deterministic builds but was able to work around that:

# Explicitly set mtime to 0 so gzip content is fully determined
# by file content (0 = "no timestamp" according to gzip spec)
with gzip.GzipFile(
filename="", mode="wb", fileobj=output, compresslevel=9, mtime=0
) as gz_file:

As you say, it probably just needs a sorted somewhere.

@PeterJCLaw
Copy link
Contributor Author

Hrm, on now trying to fix this I think I may have been wrong about the underlying issue. While there is a place where sets are used in Whitenoise's handling (and which I'd thought was the cause of what I was seeing), I now can't see how the data from those ends up (if at all) in the manifest file. When stepping through I can only see the manifest file being saved before that logic happens (and adding a random.shuffle to the place I thought the issue was doesn't seem to break the test I'd put together for this).

I have however spotted a place in Django where a plain dict is passed to json.dumps, which could definitely be a cause of non-reproducibility.

I'll re-examine the original case I had at some point.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants