Update unarchive to use tar
crate instead of tokio_tar
#127
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #103
This PR updates the
unarchive
recipe to use thetar
crate (actively maintained, but sync-only) instead of thetokio-tar
crate (async, but apparently unmaintained). This was specifically done to address the bug seen in #103, where tarfiles with long names weren't getting handled properlyI referenced #117 as inspiration, but ended up with a few different decisions in the implementation. Namely, the sync work happens in a
spawn_blocking
Tokio task, and uses a channel for the async parts (actually building the artifacts, which boil down to database writes). I also usedtokio_util::io::SyncIoBridge
so that we could keep doing async decompression (which I believe is sound and won't block the executor). This also has the advantage of not needing to buffer the tarfile into memoryOne disadvantage with this implementation is that the blob creation is all synchronous (there's a new
save_blob_from_reader_sync
function for this purpose). And, since there's no way to block to acquire a semaphore, we have to acquire a semaphore on the permit outside thespawn_blocking
section, meaning we have to keep a semaphore permit for the entire duration that the archive is unpacked. I couldn't figure out a good way around this without either buffering stuff in memory or having a separate lock specifically for synchronous code