Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Rendering to loopback w/ simultaneous playback on hardware device #182

Closed
kunaljathal opened this issue Mar 27, 2018 · 7 comments
Closed

Comments

@kunaljathal
Copy link

kunaljathal commented Mar 27, 2018

I have the following situation, and I'm curious to know what the 'correct' way to implement a solution is:

I have a video object that streams (outputs) video and audio data - I am able to grab the audio data (samples) from this video object. I would like to spatialize this audio. To that end, I create an OpenAL loopback device, fill an OpenAL buffer and a source with the audio data (from the video object), play the source, render it on the loopback device, and transfer the rendered (i.e. now spatialized) samples back to the video object. The video object then plays this audio, spatialized, with no issues on my default playback hardware device. Cool.

Now: I also have an audio engine that also uses OpenAL, but is completely separate to the video object. The audio engine uses the default playback hardware device directly, with regular "non-streaming" files (.wav, etc.) to create buffers and sources and to spatialize and play them. Note that the audio engine handles multiples buffers and multiple sources. Cool.

Here's my eventual goal: I will have multiple video objects. I would like the audio from each to be spatialized, and for all objects to playback without sync issues etc. At the same time, the audio engine should also concurrently be able to play it's own sources & buffers and not 'interfere' with the audio being output from all the video objects. All audio, from the audio engine and the video objects, eventually is output via the same and only available hardware device (i.e. my laptop speakers, with a single sound card in it).

This is achievable, right?

My main confusion lies with the number of contexts -- or loopback devices -- I should be creating. Here's my current understanding, and I would love some clarity on it:

The audio engine opens and initializes the default device. It has a single context. Let's call this Context A. So any time the audio engine needs to create a buffer or play a source etc., it makes Context A the active context, creates the buffer/calls play on the source etc.

For the video objects, I need to either:

  1. Create a single loopback device, and create multiple contexts within it -- one context for each video object. So for 3 video objects, I will have Context V1, V2, and V3. Each context will have a single source-buffer pair. Whenever I need to spatialize the audio captured from a video object, I make it's context active, play its source, render the source on the loopback device, and grab the rendered samples. [ Question: Will the rendered samples be exclusive to the context, given that all video objects share the loopback device? That is to say, if I play the source associated to Context V1 and then call alcRenderSamplesSOFT on the loopback device, will I get only the samples associated to Context V1 (and hence Video Object # 1) ? Or will I get samples associated to all 3 video objects since they share the same device? ]

OR

  1. Create multiple loopback devices, one for each video object. Each loopback device will have a single context associated to it. Whenever I need to spatialize the audio captured from a video object, I make it's context active, play its source, render the source on it's respective loopback device, and grab the rendered samples. [ Question: Can I create multiple loopback devices? I will likely have several video objects, and are there any memory/performance or synchronicity issues I need to worry about in creating multiple loopback devices? Is it "okay" to do so? Will I be able to render samples across all devices simultaneously and not run into latency of any kind? etc. ]

So my main question is whether I should go ahead with solution # 1 or # 2 above. Again, I'm assuming either can be implemented concurrently with the audio engine doing it's thing on the playback hardware directly. Please let me know if my understanding of all this makes sense!

Thanks

@kcat
Copy link
Owner

kcat commented Mar 30, 2018

Here's my eventual goal: I will have multiple video objects. I would like the audio from each to be spatialized, and for all objects to playback without sync issues etc. At the same time, the audio engine should also concurrently be able to play it's own sources & buffers and not 'interfere' with the audio being output from all the video objects. All audio, from the audio engine and the video objects, eventually is output via the same and only available hardware device (i.e. my laptop speakers, with a single sound card in it).

So if I'm understanding right, you want to play and spatialize the audio from video objects completely separate from the audio logic of the rest of the app, just using the same audio output?

Most audio systems are capable of opening the same device multiple times and mixing what they're given to the output, even if the hardware itself isn't capable of mixing. In that case, you should be able to simply call alcOpenDevice with the same device name to get two independent ALCdevice* handles for the same audio device. No need to bother with loopback devices to reroute one output to another, just create two playback devices that use the same audio output. Each device can create its own context, which are independent of each other.

The main thing to watch out for with this is the current context. If your audio handling is all on one thread (i.e. both the video object audio and the main audio are handled on the same thread), just make sure the correct context is current before making the appropriate al* calls. If it's multithreaded, with the video object audio on one thread and the main audio on another, you need to be careful since the current context is global for the process. You have to use a mutex (or critical section on Windows) to prevent the current context from being changed by another thread while you're making al* calls. If you're using OpenAL Soft directly (that is, not through the router DLL), you can instead use thread-local contexts, so setting the current context on one thread doesn't affect other threads.

Or instead of opening two independent ALCdevice* handles, you can create two contexts with one ALCdevice*. Simply call alcCreateContext twice with the same device, and use one context for the video object audio and the other for the main audio. This avoids opening the audio device multiple times, as OpenAL Soft will do all the mixing. The same note about the current context applies (protect the current context with a mutex if doing multithreaded, or use the thread-local context extension if you're using OpenAL Soft directly), and there is the caveat that the device properties apply to both contexts. For instance, if you enable HRTF or set a specific output sample rate, both contexts will be affected. This isn't usually a big problem.

If I'm misunderstanding, or if you still have questions about what exactly to do, feel free to ask.

@kunaljathal
Copy link
Author

kunaljathal commented Apr 2, 2018

Thanks for the reply -- yes for the most part we have a common understanding here, but there are a few points of confusion:

  1. I can't avoid creating a loopback device -- the video object has it's "own" audio output framework -- it uses an internal ("native") audio track object that is tightly bound to the lower level audio output logic. So what I'm doing is intercepting the framework (at the level that I have access), grabbing the samples from it's audio track, spatializing them -- via the loopback device -- and then placing them back in the video player's audio track object. I cannot simply "mute" the video player altogether (i.e. pass the grabbed samples to OpenAL and have it render them to hardware directly), because the video player provides a specific kind of audio-video-sync functionality (i.e. keeping the audio and video in sync) which is then lost via this route. So I pretty much have to use loopback. Does that make sense?

  2. OK, I see what you're saying about context switching needing to happen if there's multiple threads, and I pretty much anticipated that happening, so thanks for confirming so. I'm curious though, if we're going to have to introduce mutexes/critical sections and have one thread potentially 'wait' for the other to free up the context, doesn't that introduce the possibility of glitches/gaps/lags in the audio output?

  3. Can you clarify what it means to use OpenAL Soft "directly" ? i.e. not through the "router" DLL? These are the first times I'm hearing of these concepts.

Thanks!

@kcat
Copy link
Owner

kcat commented Apr 2, 2018

I can't avoid creating a loopback device -- the video object has it's "own" audio output framework -- it uses an internal ("native") audio track object that is tightly bound to the lower level audio output logic. So what I'm doing is intercepting the framework (at the level that I have access), grabbing the samples from it's audio track, spatializing them -- via the loopback device -- and then placing them back in the video player's audio track object. I cannot simply "mute" the video player altogether (i.e. pass the grabbed samples to OpenAL and have it render them to hardware directly), because the video player provides a specific kind of audio-video-sync functionality (i.e. keeping the audio and video in sync) which is then lost via this route. So I pretty much have to use loopback. Does that make sense?

I see. I suppose it's not possible to replace/override the video object's audio output component with something that's controlled by the audio engine? Things like DirectShow and GStreamer can do that using a custom sink, which get the synchronized samples and handles the audio output.

Aside from simplifying the audio engine, it would also avoid another potential issue, of the video's audio format. If the video is stereo and the video object plays it as stereo, then you'll have to use stereo spatialization regardless of what the hardware output actually is. Or worse, if the video is mono and is played back as mono, you won't be able to give it spatialized audio (at most it could have mono effects and distance attenuation, but no panning).

OK, I see what you're saying about context switching needing to happen if there's multiple threads, and I pretty much anticipated that happening, so thanks for confirming so. I'm curious though, if we're going to have to introduce mutexes/critical sections and have one thread potentially 'wait' for the other to free up the context, doesn't that introduce the possibility of glitches/gaps/lags in the audio output?

It won't cause any glitches or gaps for the audio engine, no. For normal playback devices, OpenAL Soft mixes and feeds samples to the hardware asynchronously, and isn't affected by what the current context is (the current context just influences what the al* calls affect), so even if the video object is currently updating, the audio engine's device can still produce samples for output uninterrupted. It may make the audio engine take a little bit longer to apply its updates if it gets temporarily blocked (you'll have to wait to call alSource* and alListener* to change position, volume, etc), but that's about it.

Without knowing anything about how the video object actually works, I have no idea how it will react if it takes a bit longer than usual for it to get the replaced audio samples.

Can you clarify what it means to use OpenAL Soft "directly" ? i.e. not through the "router" DLL? These are the first times I'm hearing of these concepts.

When you install OpenAL using Creative's installer (oalinst.exe), it provides two files: OpenAL32.dll and wrap_oal.dll (possibly also ct_oal.dll if you have a Creative device). Here, wrap_oal.dll and ct_oal.dll are the actual OpenAL drivers, and OpenAL Soft can be added as another driver with the name soft_oal.dll. Then OpenAL32.dll is a "router", it looks for and loads those drivers, and basically combines them to appear as a single list of devices the app can see (it routes OpenAL calls from the app to the appropriate driver for a given device/context).

Using OpenAL Soft directly simply means you don't use the router. OpenAL32.dll is actually OpenAL Soft, which the app calls directly without being routed through an extra DLL.

@kunaljathal
Copy link
Author

kunaljathal commented Apr 3, 2018

I see. I suppose it's not possible to replace/override the video object's audio output component with something that's controlled by the audio engine? Things like DirectShow and GStreamer can do that using a custom sink, which get the synchronized samples and handles the audio output.

Yeah, that would be ideal. I can't seem to find a way to do that at the moment; the only point at which I seem to be able to access samples is prior to the sync. I'll keep digging, but for now it seems unlikely.

Aside from simplifying the audio engine, it would also avoid another potential issue, of the video's audio format. If the video is stereo and the video object plays it as stereo, then you'll have to use stereo spatialization regardless of what the hardware output actually is. Or worse, if the video is mono and is played back as mono, you won't be able to give it spatialized audio.

Yeah you're absolutely right; I'm currently 'forced' to not support video with mono tracks...

It won't cause any glitches or gaps for the audio engine, no. For normal playback devices, OpenAL Soft mixes and feeds samples to the hardware asynchronously, and isn't affected by what the current context is (the current context just influences what the al* calls affect), so even if the video object is currently updating, the audio engine's device can still produce samples for output uninterrupted. It may make the audio engine take a little bit longer to apply its updates if it gets temporarily blocked (you'll have to wait to call alSource* and alListener* to change position, volume, etc), but that's about it.

OK, along these lines -- I'm also curious/concerned about introducing performance issues, if any. I will likely have multiple loopback devices, and multiple sources playing in the audio engine, so say for instance I have 10 videos and 50 sounds playing from the audio engine. That's 11 devices open and 11 contexts to continually switch between. I have no idea about the estimate of the memory/performance implications/impact, if any. Do you have a rough guesstimate about the kinds of lag in the audio engine update / perf impact using these number of devices & contexts?

When you install OpenAL using Creative's installer (oalinst.exe), it provides two files: OpenAL32.dll and wrap_oal.dll (possibly also ct_oal.dll if you have a Creative device). Here, wrap_oal.dll and ct_oal.dll are the actual OpenAL drivers, and OpenAL Soft can be added as another driver with the name soft_oal.dll. Then OpenAL32.dll is a "router", it looks for and loads those drivers, and basically combines them to appear as a single list of devices the app can see (it routes OpenAL calls from the app to the appropriate driver for a given device/context).

OK, great -- so I am not using the router DLL. Does this mean I can use the thread-local context, and then not have to worry about context switching at all? I think I saw somewhere that alcSetThreadContext isn't part of the ALC_SOFT_loopback extension ....

@kunaljathal
Copy link
Author

Not sure if you're still following this thread, but I had a couple of more questions in addition to my last comment here:

  1. What happens if I try to set the position on a source (i.e. via alSource3f) that is NOT part of the currently active context?
  2. With respect to loopback devices, when I call alcRenderSamplesSOFT, does it render samples across ALL contexts associated to the device passed in, or just the currently active context?

@kcat
Copy link
Owner

kcat commented Apr 12, 2018

Sorry for lack of response.

That's 11 devices open and 11 contexts to continually switch between. I have no idea about the estimate of the memory/performance implications/impact, if any. Do you have a rough guesstimate about the kinds of lag in the audio engine update / perf impact using these number of devices & contexts?

11 devices/contexts will certainly have higher memory/cpu use, since each one maintains a dry buffer (the "master" mixing buffer all its contexts' sources and effects write to) which is then processed and written to the output.

How much impact that'll have will depend on your hardware, and how much load the system is already under.

Does this mean I can use the thread-local context, and then not have to worry about context switching at all? I think I saw somewhere that alcSetThreadContext isn't part of the ALC_SOFT_loopback extension ....

alcSetThreadContext is part of the ALC_EXT_thread_local_context extension. As long as each context is handled on a unique thread, then no you don't have to worry about context switching.

What happens if I try to set the position on a source (i.e. via alSource3f) that is NOT part of the currently active context?

It can change the position of some source of the context that is current, generate an error, or no-op. In theory it could also crash if there is no current context, but I don't think that will actually happen with any current implementation.

With respect to loopback devices, when I call alcRenderSamplesSOFT, does it render samples across ALL contexts associated to the device passed in, or just the currently active context?

All contexts associated with the device, not just the current context. This mirrors behavior of normal playback devices, which continues mixing the sources and effects for all its contexts as well, not just the current ones (for OpenAL Soft, at least).

@kunaljathal
Copy link
Author

Thanks a lot for the replies kcat. Will re-open if I have more questions. Cheers.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants