Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

audio: Resampling in chunks produces different results than resampling complete buffer #7358

Closed
icculus opened this issue Feb 21, 2023 · 5 comments · Fixed by #7571
Closed

Comments

@icculus
Copy link
Collaborator

icculus commented Feb 21, 2023

This applies to SDL2 and SDL3, but this test program happens to use SDL2 function names:

#include <stdio.h>
#include "SDL.h"

int main(void)
{
    SDL_AudioSpec spec;
    Uint8 *buf;
    Uint32 buflen;
    SDL_Init(SDL_INIT_AUDIO);
    SDL_LoadWAV("sound.wav", &spec, &buf, &buflen);
    SDL_AudioStream *s = SDL_NewAudioStream(spec.format, spec.channels, spec.freq, spec.format, spec.channels, 48000);
    #if 0
    SDL_AudioStreamPut(s, buf, buflen);
    #else
    Uint32 remaining = buflen;
    for (int i = 0; remaining > 0; i += 8192) {
        const Uint32 cpy = SDL_min(remaining, 8192);
        SDL_AudioStreamPut(s, buf + i, cpy);
        remaining -= cpy;
    }
    #endif
    SDL_FreeWAV(buf);
    SDL_AudioStreamFlush(s);
    const int avail = SDL_AudioStreamAvailable(s);
    void *cvtbuf = SDL_malloc(avail);
    SDL_AudioStreamGet(s, cvtbuf, avail);
    SDL_FreeAudioStream(s);
    SDL_RWops *rw = SDL_RWFromFile("output.raw", "wb");
    SDL_RWwrite(rw, cvtbuf, 1, avail);
    SDL_free(cvtbuf);
    SDL_RWclose(rw);
    SDL_Quit();
    return 0;
}

Basically, it loads a .wav file, resamples it to 48000Hz, and writes it to output.raw.

If you run the #if 0 block, you'll get all the audio resampled in a single operation. If you recompile with the #else block, it'll resample the audio in chunks.

In the latter case, we produce slightly less data. It's drifting by around 60 bytes per megabyte of audio provided (s16, mono, 44100Hz, resampled to 48000Hz).

I suspect the culprit is this line:

const int wantedoutframes = (int)(inframes * ratio); /* outbuflen isn't total to write, it's total available. */

When it has the entire buffer to work from, it comes up with a reasonable output buffer size as a one-time operation, but when it resamples in chunks, we're losing a few bytes here and there for each operation (and maybe accumulating extra bytes in the other direction, too, perhaps).

Not only is this inconsistent, I believe it's causing audio artifacts at these edges.

We need a better way to calculate output size here.

(and while it won't fix this problem in a case where we are actually streaming, maybe we should only convert the audio when the data is requested from the stream, instead of as it is pushed into the stream.)

@icculus icculus changed the title audio: Resampling in chunks produces different results than resampling in batch... audio: Resampling in chunks produces different results than resampling complete buffer Feb 21, 2023
@icculus
Copy link
Collaborator Author

icculus commented Feb 21, 2023

Also the usual chorus of "the libsamplerate path doesn't have this problem" applies, which makes we wonder if our time is better spent asking @erikd if we can use it under the zlib license and just integrate it into SDL's code. :/

@slouken
Copy link
Collaborator

slouken commented Feb 21, 2023

At this point, maybe that's a good idea...

@erikd
Copy link

erikd commented Feb 21, 2023

I am the original author of libsamplerate but I am no longer the maintainer and almost certainly not the sole copyright owner. I therefore do not have the authority to change the license or provide exceptions.

@1bsyl
Copy link
Contributor

1bsyl commented Feb 28, 2023

@icculus, I've looked a little bit.
and comparing using my own sound.wav:

libsamplerate:

  • produce less frames. for me, it generates: 60903 frames.
  • resample a big block vs small chunks, gives the same size but, hexdump is different at each chunk junction (it differs with 1 byte).

SDL:

  • big block, it gives 61004 frames.
  • with chunks, it gives 61001 frames.

maybe the solution, would be simply to clamp more what has to be generated

@1bsyl
Copy link
Contributor

1bsyl commented Mar 1, 2023

Looking again, and maybe here's a suggestion to improve.

In chunk mode:
we shouldn't hit the case when we use the right-padding. because, in chunk mode, it would be zero, and then the sampling will give wrong data in between chunks.
It's easy to detect and break at the end, and returns an effective 'outframes' size.

Then, the right padding is copied back to the left. but we should start to process more on the left padding. (maybe that just move the issue of calculation the number of frame to another place ...).

Maybe the API, to get the converted data should tell if we should never reach the right-padding, or we have to process it.
eg:
in chunk:

- resample without using rpadding
- resample without using rpadding
- resample without using rpadding
- resample without using rpadding
- resample without using rpadding
- resample using rpadding and stream if finished

in 1 shot:
- resample using rpadding and stream if finished

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants