NVIDIA SLI AND STUTTER AVOIDANCE: A Recipe for Smooth Gaming and Perfect Scaling with Multiple GPUs gameworks.nvidia.com | GDC 2015 NVIDIA SLI AND STUTTER AVOIDANCE: Iain Cantlay (Developer Technology Engineer) Lars Nordskog (Developer Technology Engineer) gameworks.nvidia.com | GDC 2015 WHY SLI? “SLI” – Set of multi-GPU technologies Pixel counts increasing at a staggering rate (4K+) Emulating the “hardware of tomorrow” VR – 2 eyes, 2 GPUs gameworks.nvidia.com | GDC 2015 SLI BASICS AFR – Alternate Frame Rendering One frame per GPU in parallel Want linear performance improvements for each GPU added “SLI scaling” AFR SLI abstracts all non-primary GPUs away from the runtime Game sees one GPU Driver does the “magic” gameworks.nvidia.com | GDC 2015 SLI BASICS Single GPU frame rendering Display GPU0 n n n+1 n+1 n+2 n+2 n+3 Time gameworks.nvidia.com | GDC 2015 n+3 n+4 n+4 SLI BASICS 2-way Alternate Frame Rendering Parallelism Display GPU0 GPU1 n n+1 n+2 n+2 n n+1 n+3 n+4 n+4 n+3 n+5 n+6 n+6 n+5 Time gameworks.nvidia.com | GDC 2015 n+7 n+8 n+8 n+7 n+9 n+9 SLI BASICS Allocated resources replicated per AFR GPU Static GPU resource mirrored between GPUs Reading from local memory is optimal Static textures, IBs, VBs, etc. Dynamic GPU resources can diverge RTs, UAVs gameworks.nvidia.com | GDC 2015 SLI BASICS AFR Pros AFR Cons Up to linear performance scaling Non-uniform flip intervals (microstutter) “Frame” provides natural data dependency boundary* *Interframe dependencies Uniform workloads (frames similar) Input latency does not reduce with increased performance gameworks.nvidia.com | GDC 2015 SLI BASICS AFR Pros AFR Cons Up to linear performance scaling Non-uniform flip intervals (microstutter) “Frame” provides natural data dependency boundary* *Interframe dependencies Uniform workloads (frames similar) Input latency does not reduce with increased performance gameworks.nvidia.com | GDC 2015 MICROSTUTTER Naïve parallelism -> non-uniform flip intervals… Reported framerate 2x, but perceived framerate closer to single GPU Display GPU0 GPU1 Time gameworks.nvidia.com | GDC 2015 FLIP METERING … but SLI driver handles frame flip metering, so you don’t have to! Back pressure to application avoids animation stutter Display GPU0 GPU1 Time gameworks.nvidia.com | GDC 2015 INTERFRAME DEPENDENCIES We know static resources replicated to all GPUs Never change, so no problem …but some RTs/UAVs are modified by GPU Correctness! Driver must keep RTs/UAVs in sync between GPUs Sustain “illusion” of single GPU Data transferred GPU->GPU when reference “dirty” gameworks.nvidia.com | GDC 2015 INTERFRAME DEPENDENCIES Transferring data hurts SLI performance Some transfers not necessary Game updates resource entirely each frame …But other transfers are necessary Techniques that need previous frame results as input Temporal feedback (luminance adaptation, TXAA) Compute (simulations) Partial updates (tiled shadowmap, cubemap, atlas textures) Driver transfers entire mip slice/buffer gameworks.nvidia.com | GDC 2015 INTERFRAME DEPENDENCIES SLI Profile skips transfers deemed unnecessary Blunt instrument Prioritize correctness NVIDIA tests, ships official SLI profile with driver Profiles usually more complicated than AFR1/AFR2 gameworks.nvidia.com | GDC 2015 INTERFRAME DEPENDENCIES SLI-enabled in Control Panel without SLI profile = single GPU NVIDIA Control Panel AFR Modes AFR1 transfers all dirty resources -> low scaling, but no corruption AFR2 skips some transfers -> better scaling, but possible corruption gameworks.nvidia.com | GDC 2015 SO WHERE’S THE PROBLEM? Driver doesn’t have all the information about game’s intent Need game to behave well, provide hints to driver, and become “AFR-aware” for optimal performance! gameworks.nvidia.com | GDC 2015 COMMON SCALING PITFALLS (CPU) Queries or APIs preventing queuing of frames (Bad!) Solution: SLI input latency same as single, so allow n+1 frames in flight for n GPUs Readbacks to CPU (Dangerous!) Solution: Avoid, or delay readback by n frames via buffering to avoid CPU stalling Stalling Map() writes (Bad!) Solution: Use WRITE_DISCARD/WRITE_NO_OVERWRITE gameworks.nvidia.com | GDC 2015 COMMON SCALING PITFALLS (GPU) Necessary transfer causing GPU->GPU serialization Solution: Decouple GPUs, look back n frames on n-way config (input local) Mod of per-frame ping-pong buffer -> n-way dependent transfers Solution: Always Discard/Clear dynamic resource before bind, QA SLI with 3-way config GPU-generated data not regenerated every frame Solution: Regenerate data on each GPU, or hint to keep gameworks.nvidia.com | GDC 2015 INITIAL STEPS Renaming EXE to AFR-FriendlyD3D.exe Enables n-way AFR, skips all transfers Corruption, but ideal for checking “speed of light” No scaling with rename -> CPU-GPU serialization or CPU-boundedness Query NVAPI to detect SLI via number of GPUs NvAPI_D3D_GetCurrentSLIState() SLI profile “aware”… no profile returns “1 GPU” gameworks.nvidia.com | GDC 2015 SLI COMPATIBILITY PROCESS 1. Think through your interframe dependent effects/systems 2. Run with exe renamed to “AFR-FriendlyD3D.exe” to skip all GPU->GPU transfers of dirty resources 3. Test thoroughly, looking for corruption 4. Address resources that are not “AFR Friendly” Regenerate data for all GPUs, or hint to driver that data must persist Hint what data can be discarded gameworks.nvidia.com | GDC 2015 TAKING ACTION “How do I regenerate data for each GPU?” “Explicit synchronization” Keep track of which GPUs receive updates Re-issue for each “dirty” GPU Allows discarding of transfers + GPU coherency Regenerate work or hint to driver to keep? Generally regenerating better, but case by case Only so much data practically transferred per frame (performance) gameworks.nvidia.com | GDC 2015 TAKING ACTION “How do I regenerate data for each GPU?” (cont) Frame GPU0 n n n+1 n+1 n+2 n+2 n+3 n+3 n+4 n+4 n+5 n+5 n+6 Simulation steps gameworks.nvidia.com | GDC 2015 n+6 n+7 n+7 n+8 n+8 TAKING ACTION “How do I regenerate data for each GPU?” (cont) Simulation is duplicated for each GPU Still faster than doing a transfer! Frame GPU0 GPU1 n n-1 & nn n+1 n+1 & n+2 n & n+1 n+2 n+3 n+3 & n+4 n+2 & n+3 n+4 n+4 n+5 n+5 & n+6 n+4 & n+5 n+6 n+7 n+7 & n+8 n+6 & n+7 n+8 Simulation steps gameworks.nvidia.com | GDC 2015 n+6 n+8 TAKING ACTION “How do I hint what I *DO* need to persist between frames?” NvAPI_D3D_BeginResourceRendering()/NvAPI_D3D_EndResourceRendering() Wrap update -> driver transfers early/efficiently NVAPI_D3D_RR_FLAG_FORCE_DISCARD_CONTENT works as Discard/Clear for ping-pong case Begin/End assume only next GPU needs data NVAPI_D3D_RR_FLAG_MULTI_FRAME if used for multiple frames USE WITH CAUTION!!! Final update of resource in frame Don’t Begin/End > 1 time per frame, per resource Begin/End takes precedence over profile Entire resource transferred gameworks.nvidia.com | GDC 2015 TAKING ACTION “How do I hint what I *DON’T* need to persist between frames?” ID3D11DeviceContext1::DiscardView()/DiscardResource() Ideal solution Before bind in current frame Only supported in DX11.1 ID3D11DeviceContext::Clear*() Before bind in current frame NvAPI_D3D_SetResourceHint() Driver excludes resource from SLI “dirty” state tracking (never transfers) Sticky through allocation lifetime gameworks.nvidia.com | GDC 2015 TAKEAWAYS SLI excellent for substantially increasing GPU performance Ensure AFR friendly CPU behavior Use AFR-FriendlyD3D.exe Anticipate interframe dependent effects/systems Design them to be AFR friendly At minimum, focus on regenerating data or hinting what to keep between frames BeginResourceRendering/EndResourceRendering hints NVIDIA can remove the rest with profiles, but Discard APIs better gameworks.nvidia.com | GDC 2015 OH YEAH… Getting testing builds to NVIDIA early (Please ) For SLI profiling, identification issues, advice QA SLI on 3-way configuration Needs profile or AFR-FriendlyD3D.exe to scale gameworks.nvidia.com | GDC 2015 OTHER RESOURCES Nsight SLILog Plans to expand functionality and clarity GPUView gameworks.nvidia.com | GDC 2015 QUESTIONS? Thank you! Iain Cantlay (icantlay@nvidia.com) Lars Nordskog (lnordskog@nvidia.com) gameworks.nvidia.com | GDC 2015 GAMEWORKS Get the latest information for developers from NVIDIA and continue the discussion gameworks.nvidia.com gameworks.nvidia.com | GDC 2015
© Copyright 2024