Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[RFC] Add internal API for converting ZSTD_Sequence into seqStore #2715

Merged
merged 1 commit into from
Jun 24, 2021

Conversation

senhuang42
Copy link
Contributor

Actually, it turns out this functionality already exists, but we just add a small wrapper around it to provide a clean interface.
Currently, on silesia.tar with sequences generated from compression level 3 and ZSTD_generateSequences(), ZSTD_convertBlockSequencesToSeqStore runs at around 2100MB/s averaged across all blocks. Most of this time seems to be spent in ZSTD_finalizeOffCode().

More importantly though, I'll specify a general set of guidelines for dealing with hardware accelerated matchfinders and how they should be integrated into the library at a per-block level, and suggestions are welcome here.

Generally, a hardware-accelerated matchfinder must adhere to the below function signature, storing its result in an array of ZSTD_Sequence.

// Generic function signature for hardware matchfinders.
// Accepts a void* pointer for a "bag" of parameters that the matchfinder may use,
// possibly derived from the ZSTD_CCtx parameters.
//
// As an example, one could define a function
// size_t ZSTD_accelerated_findMatches(ZSTD_Sequence* sequences, size_t sequencesCapacity,
//                               void* params, const void* src, size_t srcSize);
//
// Returns number of sequences generated, storing the result in `sequences`, or a zstd error.
typedef size_t (*ZSTD_hardwareMatchFinder) 
     (ZSTD_Sequence* sequences, size_t sequencesCapacity, void* params,
      const void* src, size_t srcSize);

The reasoning being that then down the line, we can then define the following function that could potentially select between multiple accelerated matchfinders:

// This function selects the final hardware match finder used, depending on the
// parameters in the ZSTD_CCtx. 
//
// ZSTD_selectHardwareMatchFinder() then will return ZSTD_accelerated_findMatches.
ZSTD_hardwareMatchFinder ZSTD_selectHardwareMatchFinder(const ZSTD_CCtx* zc);

And finally, the code could be integrated like something along these lines, in ZSTD_compressBlock_internal() (and of course, a first implementation can hard-code a lot of these dynamic decisions for the purposes of testing).

static size_t ZSTD_compressBlock_internal(ZSTD_CCtx* zc,
                                        void* dst, size_t dstCapacity,
                                        const void* src, size_t srcSize, U32 frame)
{
    /* This the upper bound for the length of an rle block.
     * This isn't the actual upper bound. Finding the real threshold
     * needs further investigation.
     */
    const U32 rleMaxLength = 25;
    size_t cSize;
    const BYTE* ip = (const BYTE*)src;
    BYTE* op = (BYTE*)dst;
    DEBUGLOG(5, "ZSTD_compressBlock_internal (dstCapacity=%u, dictLimit=%u, nextToUpdate=%u)",
                (unsigned)dstCapacity, (unsigned)zc->blockState.matchState.window.dictLimit,
                (unsigned)zc->blockState.matchState.nextToUpdate);
                
    // HARDWARE ACCELERATED MATCHFINDING PATH HERE
    // ZSTD_useHardwareAccelerator() is a hypothetical function that determines
    // whether we use a hardware-accelerated approach for matchfinder, depending
    // on factors such as compression parameters and whatnot. The decision to use a hardware accelerator
    // could be predetermined/finalized during parameter initialization, and stored as a variable in the cctx.
    if (ZSTD_useHardwareAccelerator(zc)) {
        // Now, select a hardware matchfinder, based on parameters in ZSTD_CCtx
        ZSTD_hardwareMatchFinder matchFinder = ZSTD_selectHardwareMatchFinder(zc);
        
        // Reset the existing seqStore
        ZSTD_resetSeqStore(&cctx->seqStore);

        // Function pointer that delegates to the accelerated matchfinder to generate sequences.
        // `params` can be a custom struct of all required parameters for the particular matchfinder
        // `zc->hardwareSequences` is presumed already allocated and `zc->hardwareSequencesCapacity` is 
        //  already determined, likely during the decision to use hardware accelerated match-finding
        //  hardware acceleration during parameter finalization.
        size_t const nbSeqs = matchFinder(zc->hardwareSequences, zc->hardwareSequencesCapacity, &params, src, srcSize);
        
        // Generated sequences passed to new API, which gives us our final `zc->seqStore`
        FORWARD_IF_ERROR(ZSTD_convertBlockSequencesToSeqStore(...), "");
    } else {
        const size_t bss = ZSTD_buildSeqStore(zc, src, srcSize);
        FORWARD_IF_ERROR(bss, "ZSTD_buildSeqStore failed");
        if (bss == ZSTDbss_noCompress) { cSize = 0; goto out; }
    }
    ...

@senhuang42 senhuang42 merged commit 45d707e into facebook:dev Jun 24, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants