-
Notifications
You must be signed in to change notification settings - Fork 86
OptixPrime_b
This page describes the render core OptixPrime_b. This render core uses NVIDIA's Optix to produce path traced images of the scene supplied by the RenderSystem. For optimal performance on pre-RTX GPUs, a subsystem of Optix, Optix Prime, is used.
Overview
The OptixPrime_b render core consists of host code, written in C++, the Optix library, and device code, written in CUDA. The CUDA code consists of three main kernels:
-
generateEyeRays
, incamera.cu
, which fills a device-side buffer with primary rays, to be traced by Optix. -
shade
, inpathtracer.cu
, which processes Optix intersection results; -
finalizeConnections
inconnections.cu
, which processes Optix shadow ray intersections; -
finalizeRender
, infinalize_shared.cu
, which averages samples and applies brightness and contrast.
The CUDA kernels are called from rendercore.cpp
, as part of the wavefront algorithm.
Initialization
The render core is initialized in the RenderCore::Init
method in rendercore.cpp
. The method first initializes CUDA for the most capable device (if any):
uint device = CUDATools::FastestDevice();
cudaSetDevice( device );
Afer this, the Optix Prime context is created. The Optix Prime code will use the CUDA device that was selected. This also enables communications between CUDA and Optix Prime, which lets us prepare Optix Prime ray data using CUDA code, among other things.
CHK_PRIME( rtpContextCreate( RTP_CONTEXT_TYPE_CUDA, &context ) );
`CHK_PRIME( rtpContextSetCudaDeviceNumbers( context, 1, &device ) );'
The scene that the RenderSystem provides typically consists of one or more instances of one or more meshes. Optix will build an acceleration structure for each mesh, and a top-level structure over the collection of instances. For this, we need an RTPModel
, which we create in anticipation of the scene:
topLevel = new RTPmodel();
CHK_PRIME( rtpModelCreate( context, topLevel ) );
The remainder of the Init
method prepares counters, blue noise data and timing events.
SetTarget
Like other render cores, the OptixPrime_b core renders to an OpenGL texture, provided via the SetTarget
method. The core needs several buffers, shared between Optix and CUDA, which are allocated when SetTarget
is executed. SetTarget
is executed whenever the window size changes. To prevent frequent allocation/deallocation, the SetTarget
method reserves some extra space, which can be used to grow the render target without reallocations.
UpdateTopLevel
The top level is part of the acceleration structure that Optix uses for efficient ray/scene intersections. It is a bounding volume hierarchy over the (possibly transformed) bounding boxes of the instances in the scene. Since it is small, it can be updated each frame in very little time. Using the top level hierarchy, rigid motion is essentially free. Rigid motion here refers to all animation that can be expressed using a matrix transform: translation, rotation, scaling, shearing, and so on. UpdateTopLevel
notifies Optix Prime of the current set of instances and their transforms:
CHK_PRIME( rtpModelSetInstances( *topLevel, instancesBuffer, transformBuffer ) );
CHK_PRIME( rtpModelUpdate( *topLevel, RTP_MODEL_HINT_ASYNC ) );
The instance transforms are also needed in the CUDA code. Therefore, whenever the top level is updated, the updated instances in device memory must also be synchronized.
Render
The Render method performs the actual rendering using path tracing to the OpenGL texture. The first step is ensuring that the render target is no longer in use, e.g. by post processing of the previous frame:
glFinish();
Next, we check if convergence needs to restart. This is typically the case when the scene or the camera viewpoint changed. In all other cases, we add a new sample for each pixel to the accumulator. We get the average sample values by dividing the accumulated values by the sample count. When convergence restarts, the seed is also reset to ensure static noise while moving. In practice this is much easier on the eyes.
accumulator->Clear( ON_DEVICE );
samplesTaken = 0;
camRNGseed = 0x12345678;
The wavefront algorithm starts with the setup of primary rays. For this, the generateEyeRays
CUDA function is called:
generateEyeRays( SMcount, extensionRayBuffer[inBuffer]->DevPtr(), extensionRayExBuffer[inBuffer]->DevPtr(),
RandomUInt( camRNGseed ), blueNoise->DevPtr(), samplesTaken,
view.aperture, view.pos, right, up, view.p1, GetScreenParams() );
Executing the CUDA function fills a buffer with primary rays. This buffer can directly be used by Optix Prime, which will find the nearest intersection for each ray in the buffer. Optix Prime uses a query for this, which we prepare just outside the wavefront loop:
RTPquery query;
CHK_PRIME( rtpQueryCreate( *topLevel, RTP_QUERY_TYPE_CLOSEST, &query ) );
The wavefront loop starts with tracing the prepared rays:
CHK_PRIME( rtpQueryExecute( query, RTP_QUERY_HINT_NONE ) );
The intersection results are then used in the shade
CUDA function. This function determines for each pixel if a new path segment is to be generated, which will be the input for the next iteration of the wavefront loop. The shade
function may also generate one shadow ray per pixel per path segment.
Once the wavefront loop completes, the shadow rays are traced in a separate Optix query.
if (counters.shadowRays > 0)
{
...
}
After executing the wavefront loop, the accumulator now contains one or more samples for each pixel. The final image is constructed in the finalizeRender CUDA function, which divides the color of each pixel in the accumulator by the sample count, and applies brightness and contrast. After this, the OpenGL texture is released.
finalizeRender( accumulator->DevPtr(), scrwidth, scrheight, samplesTaken, brightness, contrast );
renderTarget.UnbindSurface();
Shade
(TODO)