Friday, May 16, 2025

Path Tracing Optimizations: OMMs & Dynamic BLAS Compaction Insights

Share

Introduction

Alpha-tested geometry like that found in vegetation can severely impact ray tracing performance. In Indiana Jones and the Great Circle, developers achieved a remarkable 55% reduction in GPU time using two key techniques: Opacity MicroMaps (OMMs) and dynamic BLAS compaction. These advanced optimizations not only speed up the main path tracing pass but also reduce VRAM usage, making them a perfect case study for optimizing modern RTX pipelines.

In this detailed article, we dive into the technical specifics behind these optimizations. Whether you are a game developer, graphics engineer, or technical artist, you will find actionable insights to improve scene performance on GPUs such as the RTX 5080. We reference authoritative sources like the Shader Execution Reordering and Live State Reductions post and the official Indiana Jones website for in-depth benchmarks and context.


How Opacity MicroMaps Reduce AHS Overhead

When path tracing complex scenes, alpha-testing in ray tracing can quickly become a bottleneck. Traditionally, ray tracing uses Any Hit Shaders (AHS) to manage alpha-tested geometry evaluations. However, the overhead associated with multiple shader invocations can be significant.

What Are Opacity MicroMaps?

Opacity MicroMaps (OMMs) are designed to precompute the opacity state for micro-triangles, thereby reducing the number of required AHS invocations. This hardware-accelerated approach ensures that if a micro-triangle is wholly opaque or completely transparent, the expensive shader evaluation can be bypassed completely. In effect, OMMs replace the frequent shader calls with a quick lookup process.

OMM Baking: CPU vs. GPU Workflows

Initially, all OMMs in the game were baked on the CPU. The process, although robust, was sometimes time intensive—especially for models with complex UV mappings and cover-maps. Developers later shifted much of this workload to an offline cooker, where the baking process could run asynchronously after mesh submissions. Key parameters such as maxSubdivisionLevel and dynamicSubdivisionScale were tweaked to balance performance with VRAM usage.

For more technical details on OMM integration, refer to the OMM SDK documentation and integration guide.

Debugging and Verification

An effective debugging mode was implemented that visualizes whether primary rays traverse the full hardware-accelerated path. In this mode, rays that complete their traversal with a single iteration are highlighted in green, while those requiring additional iterations are marked in purple. This quick visual check helps ensure that OMMs are active for all intended BLAS segments.


Dynamic BLAS Compaction for VRAM Savings

Dynamic Bottom-Level Acceleration Structures (BLASs) form the cornerstone of efficient ray tracing. BLAS compaction reduces the memory footprint of these structures, allowing more complex scenes to be rendered without exhausting VRAM.

Understanding BLAS Compaction

BLAS compaction is typically employed for static geometry built once and never updated. However, in Indiana Jones and the Great Circle, dynamic vegetation models benefit from compaction as well. These models, while updated or refitted periodically, never require a complete rebuild. Compaction thus enables a near 41% VRAM reduction—from 1027 MB down to 606 MB—on an RTX 5080, as measured using Nsight Graphics.

Practical Considerations for BLAS Updates

It is important to note that for BLAS compaction to be applicable, the compacted BLAS must not be rebuilt post-compaction. In scenarios where dynamic models are in constant flux, careful planning is required to decide when to update versus when to rebuild these structures. This balance is essential for maintaining both high quality and low memory usage.

Performance Benchmarks and Impact

In comprehensive profiling, the use of OMMs resulted in significant improvements:

  • TraceMain pass reduced from 7.90 ms to 3.58 ms (a 55% reduction).
  • SharcUpdate phases and shadow ray evaluations saw a similar 55% reduction, with even marginal savings in the SunTracing pass.

Furthermore, the periodic shader sample counts in AHS evaluations dropped markedly. These benchmarks not only demonstrate the efficiency of OMMs and BLAS compaction but also provide a roadmap for future optimizations in other ray tracing implementations.

Conclusion & Next Steps

Combining Opacity MicroMaps with dynamic BLAS compaction offers a highly effective strategy for optimizing ray tracing performance in complex, alpha-tested scenes. The techniques discussed here have already shipped in Indiana Jones and the Great Circle, illustrating their practical application. Such optimizations are critical as we push the boundaries of realism on emerging GPU architectures like the RTX 5080.

As a call-to-action, developers and technical artists are encouraged to explore further by downloading the OMM SDK and utilizing the Nsight Graphics tool for profiling and benchmarking. Stay updated with DXR 1.2 advancements and NVIDIA’s continuous improvements to the hardware acceleration in future releases. For additional insights on shader-level optimizations, review our related article on Shader Execution Reordering and Live State Reductions.


Final Thoughts: These cutting-edge optimizations showcase the balance between performance gains and resource efficiency. Embracing such techniques can significantly elevate the visual fidelity and efficiency of next-generation games and applications.


author avatar
WorldAiStream

Read more

Related updates