Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Performance Issue in 64-bit Builds Compared to 32-bit (Filter Decision Suspected) #274

Open
yesilcimenahmet opened this issue Dec 22, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@yesilcimenahmet
Copy link

Description:
There is a significant performance difference between 32-bit and 64-bit builds when using libspng. After detailed analysis and discussion in the zlib-ng repository, it was observed that the 64-bit build performs considerably worse compared to the 32-bit build.

Observed Behavior:
In our tests, encoding a PNG with 32-bit results in ~131 ms, while the same image takes ~400 ms with the 64-bit build.
Both tests were conducted on the same machine with the same configuration, using libspng and zlib-ng.
Detailed Analysis:
Filter Decision as the Cause:

Based on profiling, it appears the 64-bit build takes significantly longer because of differences in the filter decision logic.
The 32-bit build uses a more optimized path (e.g., SIMD vectorized loops), while the 64-bit build appears to rely on scalar operations.
Forcing the filter choice to SPNG_FILTER_CHOICE_NONE resolves the performance issue and brings the 64-bit performance in line with the 32-bit results. However, this is a manual workaround.
Filter Logic Differences:

libspng dynamically selects filters during the encoding process.
It seems the heuristic for choosing filters differs between 32-bit and 64-bit builds, potentially due to underlying differences in how zlib-ng operates in these environments.
zlib-ng Findings:

The analysis in the zlib-ng repository revealed that the 64-bit build might have suboptimal behavior in encode_scanline.
Scalar operations and loops dominate the profiling data in the 64-bit build, while the 32-bit build uses SIMD vectorized loops effectively.

Steps to Reproduce:
Use the provided C++ example to encode a raw image into a PNG with libspng.
Compare the encoding times between 32-bit and 64-bit builds.
Optionally, set the filter choice manually to SPNG_FILTER_CHOICE_NONE to observe how it impacts the 64-bit performance.

resultCode = spng_set_option(ctx, SPNG_FILTER_CHOICE, SPNG_FILTER_CHOICE_NONE);

Expected Behavior:
Both 32-bit and 64-bit builds should perform similarly, with comparable encoding times and efficient use of filters.

Links to Related Issues:
zlib-ng Performance Analysis

Request:
Could you investigate the filter decision logic in libspng? Specifically:

Why the 64-bit build seems to perform worse in selecting filters.
Whether this is related to differences in how zlib-ng interacts with libspng in 32-bit vs. 64-bit environments.
How the default filter heuristic could be improved for 64-bit builds to align with 32-bit behavior.

Full Example:

#include <iostream>
#include <fstream>
#include <vector>
#include <stdexcept>
#include <cmath>
#include <string>
#include <cstring>
#include <chrono>

extern "C" {
#include "/home/adam/spng-install/include/spng.h"
}

int WritePNGCallback(spng_ctx *ctx, void *user, void *src, size_t length)
{
    std::ofstream* out = reinterpret_cast<std::ofstream*>(user);
    if(!out->write(reinterpret_cast<const char*>(src), length))
    {
        return SPNG_IO_ERROR;
    }
    return SPNG_OK;
}

void EncodeRawImageToPNG(const std::string& RawFileName,
                         const std::string& PngFileName,
                         uint32_t Width,
                         uint32_t Height,
                         int DPI)
{
    spng_ctx* ctx = nullptr;
    spng_ihdr ihdr;

    std::ifstream rawFile(RawFileName, std::ios::binary);
    if(!rawFile.is_open()) throw std::runtime_error("Failed to open raw file.");
    rawFile.seekg(0, std::ios::end);
    std::streampos fileSize = rawFile.tellg();
    rawFile.seekg(0, std::ios::beg);

    std::vector<unsigned char> rawBuffer(fileSize);
    if(!rawFile.read(reinterpret_cast<char*>(rawBuffer.data()), fileSize))
        throw std::runtime_error("Failed to read raw file into memory.");
    rawFile.close();

    std::ofstream pngFile(PngFileName, std::ios::binary);
    if(!pngFile.is_open()) throw std::runtime_error("Failed to create/open PNG output file.");

    ctx = spng_ctx_new(SPNG_CTX_ENCODER);
    if(ctx == nullptr) throw std::runtime_error("Failed to create spng context.");

    try
    {
        std::memset(&ihdr, 0, sizeof(ihdr));
        ihdr.width = Width;
        ihdr.height = Height;
        ihdr.bit_depth = 8;
        ihdr.color_type = SPNG_COLOR_TYPE_TRUECOLOR_ALPHA;
        ihdr.compression_method = 0;
        ihdr.filter_method = 0;
        ihdr.interlace_method = SPNG_INTERLACE_NONE;

        int resultCode = spng_set_ihdr(ctx, &ihdr);
        if(resultCode != SPNG_OK)
            throw std::runtime_error(std::string("Failed to set IHDR: ") + spng_strerror(resultCode));

        resultCode = spng_set_option(ctx, SPNG_IMG_COMPRESSION_LEVEL, 1);

        //Remove this comment line after testing.
        //resultCode = spng_set_option(ctx, SPNG_FILTER_CHOICE, SPNG_FILTER_CHOICE_NONE);
        if(resultCode != SPNG_OK)
            throw std::runtime_error("Failed to set compression level to 1.");

        resultCode = spng_set_png_stream(ctx, WritePNGCallback, &pngFile);
        if(resultCode != SPNG_OK)
            throw std::runtime_error("Failed to set PNG stream callback.");

        double ppm = static_cast<double>(DPI) * 39.37;
        int ippm = static_cast<int>(std::round(ppm));
        spng_phys phys;
        std::memset(&phys, 0, sizeof(phys));
        phys.ppu_x = ippm;
        phys.ppu_y = ippm;
        phys.unit_specifier = 1;
        resultCode = spng_set_phys(ctx, &phys);
        if(resultCode != SPNG_OK)
            throw std::runtime_error("Failed to set pHYs chunk.");

        size_t imageSize = static_cast<size_t>(Width) * static_cast<size_t>(Height) * 4;
        if(rawBuffer.size() < imageSize)
            throw std::runtime_error("RAW buffer is smaller than the expected image size.");

        resultCode = spng_encode_image(ctx, rawBuffer.data(), imageSize, SPNG_FMT_RAW, SPNG_ENCODE_FINALIZE);
        if(resultCode != SPNG_OK)
            throw std::runtime_error(std::string("Failed to encode image: ") + spng_strerror(resultCode));
    }
    catch(...)
    {
        spng_ctx_free(ctx);
        pngFile.close();
        throw;
    }

    spng_ctx_free(ctx);
    pngFile.close();
}

int main(int argc, char *argv[])
{
    size_t w = 2480;
    size_t h = 3508;
    const std::string raw_fname(argv[1]);
    const std::string out_fname(argv[2]);
    /* Just assuming a squarish image for now */
    auto t0 = std::chrono::steady_clock::now();
    EncodeRawImageToPNG(raw_fname, out_fname, w, h, 300);
    auto t1 = std::chrono::steady_clock::now();
    auto diff = t1 - t0; 
    double total = std::chrono::duration<double>(diff).count();
    printf("img encode too %lf ms\n", total * 1e3);

    return 0;
}

Raw RGBA image
raw-rgba.zip

@randy408
Copy link
Owner

Why the 64-bit build seems to perform worse in selecting filters.

Filtering performance for encode depends entirely on compiler optimizations at this point, there is no SIMD code used there. Filtering behavior should be identical on 32-bit and 64-bit, i.e. zlib-ng gets the same data in both cases.

If you suspect it's actually selecting different filters you can check that by printing the selected filter here: https://github.com/randy408/libspng/blob/v0.7.4/spng/spng.c#L4594
Or decode the encoded PNG progressively, spng_get_row_info() gives you the filter used for each scanline.

You could try something different with the example code, set SPNG_IMG_COMPRESSION_LEVEL to 0 to minimize potential zlib-ng weirdness and SPNG_FILTER_CHOICE choice to SPNG_FILTER_CHOICE_ALL (otherwise filtering is disabled automatically when compression level is 0). if the slowdown is similar on 64-bit then it's probably not zlib-ng or some complex interaction between the two.

I think it comes down to the compiler not vectorizing code for the 64-bit build, which is a known issue.

@yesilcimenahmet
Copy link
Author

@randy408 The suggested changes were implemented. SPNG_IMG_COMPRESSION_LEVEL was set to 0, and SPNG_FILTER_CHOICE was set to SPNG_FILTER_CHOICE_ALL. With these adjustments, the performance on 64-bit improved significantly, now ranging between 58-60ms.

This seems to indicate an issue with libspng itself. Are there any plans to address this or make improvements to handle such cases better in 64-bit builds?

@randy408
Copy link
Owner

randy408 commented Jan 9, 2025

SIMD optimizations are the obvious choice, then it won't matter if the compiler isn't optimizing the code as it does on 32-bit, issue #37 is the one to subscribe to.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants