audio: Resampling in chunks produces different results than resampling complete buffer #7358

icculus · 2023-02-21T18:16:57Z

This applies to SDL2 and SDL3, but this test program happens to use SDL2 function names:

#include <stdio.h>
#include "SDL.h"

int main(void)
{
    SDL_AudioSpec spec;
    Uint8 *buf;
    Uint32 buflen;
    SDL_Init(SDL_INIT_AUDIO);
    SDL_LoadWAV("sound.wav", &spec, &buf, &buflen);
    SDL_AudioStream *s = SDL_NewAudioStream(spec.format, spec.channels, spec.freq, spec.format, spec.channels, 48000);
    #if 0
    SDL_AudioStreamPut(s, buf, buflen);
    #else
    Uint32 remaining = buflen;
    for (int i = 0; remaining > 0; i += 8192) {
        const Uint32 cpy = SDL_min(remaining, 8192);
        SDL_AudioStreamPut(s, buf + i, cpy);
        remaining -= cpy;
    }
    #endif
    SDL_FreeWAV(buf);
    SDL_AudioStreamFlush(s);
    const int avail = SDL_AudioStreamAvailable(s);
    void *cvtbuf = SDL_malloc(avail);
    SDL_AudioStreamGet(s, cvtbuf, avail);
    SDL_FreeAudioStream(s);
    SDL_RWops *rw = SDL_RWFromFile("output.raw", "wb");
    SDL_RWwrite(rw, cvtbuf, 1, avail);
    SDL_free(cvtbuf);
    SDL_RWclose(rw);
    SDL_Quit();
    return 0;
}

Basically, it loads a .wav file, resamples it to 48000Hz, and writes it to output.raw.

If you run the #if 0 block, you'll get all the audio resampled in a single operation. If you recompile with the #else block, it'll resample the audio in chunks.

In the latter case, we produce slightly less data. It's drifting by around 60 bytes per megabyte of audio provided (s16, mono, 44100Hz, resampled to 48000Hz).

I suspect the culprit is this line:

SDL/src/audio/SDL_audiocvt.c

Line 299 in 1e5e8e2

    
           const int wantedoutframes = (int)(inframes * ratio); /* outbuflen isn't total to write, it's total available. */

When it has the entire buffer to work from, it comes up with a reasonable output buffer size as a one-time operation, but when it resamples in chunks, we're losing a few bytes here and there for each operation (and maybe accumulating extra bytes in the other direction, too, perhaps).

Not only is this inconsistent, I believe it's causing audio artifacts at these edges.

We need a better way to calculate output size here.

(and while it won't fix this problem in a case where we are actually streaming, maybe we should only convert the audio when the data is requested from the stream, instead of as it is pushed into the stream.)

The text was updated successfully, but these errors were encountered:

icculus · 2023-02-21T18:23:03Z

Also the usual chorus of "the libsamplerate path doesn't have this problem" applies, which makes we wonder if our time is better spent asking @erikd if we can use it under the zlib license and just integrate it into SDL's code. :/

slouken · 2023-02-21T19:33:44Z

At this point, maybe that's a good idea...

erikd · 2023-02-21T20:25:51Z

I am the original author of libsamplerate but I am no longer the maintainer and almost certainly not the sole copyright owner. I therefore do not have the authority to change the license or provide exceptions.

1bsyl · 2023-02-28T15:48:00Z

@icculus, I've looked a little bit.
and comparing using my own sound.wav:

libsamplerate:

produce less frames. for me, it generates: 60903 frames.
resample a big block vs small chunks, gives the same size but, hexdump is different at each chunk junction (it differs with 1 byte).

SDL:

big block, it gives 61004 frames.
with chunks, it gives 61001 frames.

maybe the solution, would be simply to clamp more what has to be generated

1bsyl · 2023-03-01T08:07:47Z

Looking again, and maybe here's a suggestion to improve.

In chunk mode:
we shouldn't hit the case when we use the right-padding. because, in chunk mode, it would be zero, and then the sampling will give wrong data in between chunks.
It's easy to detect and break at the end, and returns an effective 'outframes' size.

Then, the right padding is copied back to the left. but we should start to process more on the left padding. (maybe that just move the issue of calculation the number of frame to another place ...).

Maybe the API, to get the converted data should tell if we should never reach the right-padding, or we have to process it.
eg:
in chunk:

- resample without using rpadding
- resample without using rpadding
- resample without using rpadding
- resample without using rpadding
- resample without using rpadding
- resample using rpadding and stream if finished

in 1 shot:
- resample using rpadding and stream if finished

icculus changed the title ~~audio: Resampling in chunks produces different results than resampling in batch...~~ audio: Resampling in chunks produces different results than resampling complete buffer Feb 21, 2023

ak-988 mentioned this issue Mar 4, 2023

apply a 5ms fade in/out envelope to each sound fabiangreffrath/woof#925

Closed

icculus mentioned this issue Apr 2, 2023

SDL3 audio conversion redesign #7571

Merged

icculus closed this as completed in #7571 Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio: Resampling in chunks produces different results than resampling complete buffer #7358

audio: Resampling in chunks produces different results than resampling complete buffer #7358

icculus commented Feb 21, 2023

icculus commented Feb 21, 2023

slouken commented Feb 21, 2023

erikd commented Feb 21, 2023

1bsyl commented Feb 28, 2023

1bsyl commented Mar 1, 2023 •

edited

Loading

audio: Resampling in chunks produces different results than resampling complete buffer #7358

audio: Resampling in chunks produces different results than resampling complete buffer #7358

Comments

icculus commented Feb 21, 2023

icculus commented Feb 21, 2023

slouken commented Feb 21, 2023

erikd commented Feb 21, 2023

1bsyl commented Feb 28, 2023

1bsyl commented Mar 1, 2023 • edited Loading

1bsyl commented Mar 1, 2023 •

edited

Loading