-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add LRU eviction mechanisms for streamed file chunks #11304
Add LRU eviction mechanisms for streamed file chunks #11304
Conversation
Build Artifacts
|
Looks like my tests are assuming certain things about the file system (and I forgot to update the tests for the small tweak to the server.py). |
… base process bus.
1b209f5
to
ba62b17
Compare
OK, hopefully I've fixed the test issues now - but I wouldn't bet against the Github Actions environment still doing something a little weird, possibly! |
ba62b17
to
8ca431d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also ignores the small diskcache that is used inside each ChunkedFile for the purposes of counting how much is used - should I include this instead?
Does this mean we'll always be undercounting by some handful of bytes because the diskcache isn't accounted for? Looks like diskcache
has a mechanism for eviction that defaults to "least recently stored" as the method but maybe we can just set it to LRU and not have to worry about it beyond that?
Yes - all we are using diskcache for is to cache a few things about the file, the total file size, and other relevant information we may have previously received in the HEAD request. We are also using it to coordinate download locks for chunks, so that multiple threads/processes can gain an exclusive lock to download a specific chunk of a file (and hence prevent duplication). Importantly, these diskcache instances are being created on a per ChunkedFile basis, so the eviction mechanisms aren't terribly useful to us for these purposes. I think I probably should just go and count the diskcache file size into the total size, as it will give us a more accurate count of how much space we are freeing up by evicting the file. |
Include diskcache directory in file size counts.
Have updated this to include in the diskcache in the overall file size. |
for dirpath, _, filenames in os.walk(chunked_file_dir): | ||
for file in filenames: | ||
file_path = os.path.join(dirpath, file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious how this is accounting for the diskcache differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's now doing an os.walk of the entire ChunkedFile dir (including the directory that diskcache is using) so it's enumerating all the files.
Previously, it was only listing the files in the top directory, which was all the chunk files, plus the directory used for diskcache (which it was ignoring because it was a directory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the code and tested it and everything works correctly.
However I think this implementation could be problematic in servers with a small disk (thinking for example of Raspberry PI servers using a SD card). The scheduled task to clean the cache could not be enough to clean all the cached space for users remotely browsing studio.
I think we should either provide an option to null this caching or at least increase the interval to run the cleaning task. Taking into account the cache has not limit to grow in disk, a 24 hours interval could be too long in some cases.
Have updated now to run the task hourly. |
Summary
References
Fixes #9389
Reviewer guidance
Any concerns about the new option? Should the default value be lower?
I haven't benchmarked the os.walk performance on a large installed system, so I am unsure how slow it would get, there may be room for optimization there.
This also ignores the small diskcache that is used inside each ChunkedFile for the purposes of counting how much is used - should I include this instead?Testing checklist
PR process
Reviewer checklist
yarn
andpip
)