#Screen tracing in a random direction causes great lag

75 messages · Page 1 of 1 (latest)

coarse elbow
#

I'm trying to implement screen space reflections and global illumination, but I'm encountering a performance drop when I sample in a random direction. The strange thing is that when I sample with roughness set to 0, the renderer runs smoothly as usual. However, when the roughness is even slightly greater than 0, the program begins to lag significantly.

Here’s the function I’m using to generate random noise:

vec3 hash(vec3 a){
    a = fract((a * 10+randomSeed) * 0.8);
    a += dot(a, a.yxz + 19.19);
    return fract((a.xxy + a.yxx)*a.zyx);
}

I calculate the reflected direction like this:

vec3 specReflected = reflect(normalize(viewPos), viewNormal);
vec3 randDir = normalize(hash(viewPos*12)-0.5);
vec3 reflected = specReflected+linRoughness*randDir;

Finally, I use the screenTrace function to find an intersection with the scene (sorry for the messy code).

By the way, I’m new to graphics programming and coding in general, so I apologize if my code is messy and if my questions are basic.

#

The screenTrace function:

vec4 SSRayMarch(vec3 origin, vec3 dir, vec3 normal, float step, float stepIncrease, float minSteps, float maxSteps, float overshootThreshold, float thickness, float dynamicThicknessStep, float maxThickness){
    float binarySearchSteps = 5;
    vec3 rayPos = origin+dir;
    float depthThreshold = thickness * step;
    for (int i = 0; i < maxSteps; i++){
        vec4 projected = projection * vec4(rayPos, 1);
        projected.xy /= projected.w;
        projected.xy = projected.xy * 0.5 + 0.5;
        
        if (projected.x < 0 || projected.x > 1 || projected.y < 0 || projected.y > 1 || projected.z < 0) {return vec4(-1);}
        
        float depth = texture(g2, projected.xy).w;
        float deltaDepth = projected.z - depth;

        float threshold = min(depthThreshold, maxThickness);
#
if (depth > 800){threshold = 100000; step+=stepIncrease;}
        if (deltaDepth >= 0 && i >= minSteps){
            if (deltaDepth < threshold){
                vec4 nsample = texture(g2, projected.xy);
                if (dot(nsample.rgb, normal) < 0.95 || nsample.r == 2){
                    for (int j = 0; j < binarySearchSteps; j++){
                        projected = projection * vec4(rayPos, 1);
                        projected.xy /= projected.w;
                        projected.xy = projected.xy * 0.5 + 0.5;
            
                        deltaDepth = projected.z - texture(g2, projected.xy).w;
                        
                        step *= 0.5;
                        if (deltaDepth >= 0){rayPos -= dir*step;}
                        else {rayPos += dir*step;}
                    } 
                    
                    projected = projection * vec4(rayPos, 2);
                    projected.xy /= projected.w;
                    projected.xy = projected.xy * 0.5 + 0.5;
                                    
                    return vec4(projected.xy, length(origin-rayPos), depth);
                }
            }
        }
        
        rayPos += dir*step;
#
if (i > overshootThreshold){stepIncrease *= 4;}
        step += stepIncrease;
        depthThreshold = thickness * (1+step*dynamicThicknessStep);
    }
    
    return vec4(-1, -1, length(origin-rayPos), -1);
}
full widget
#

and set to 0 at runtime or compile time?

coarse elbow
full widget
#

if compiletime it's not surprising, having that set to 0 would make your randDir computation irrelevant so the compiler is just returning specReflected

#

maybe it's doing a lazy check?

#

like if your material is 0 it has no need to run the rest of the code, though I'm not sure if that's an optimization GPUs make

#

try renderdocing it

coarse elbow
#

its a problem i've encontered even in previous attemps of coding a renderer

#

not only in ssr but also other types of reflections and even path tracing

full widget
#

what's the code for fract

coarse elbow
#

i use the builtin function

full widget
#

oh is it fractionary part

coarse elbow
#

yes

coarse elbow
#

if I use the random value in other functions it runs just fine, it lags only when I use it for raymarching for some reason

coarse elbow
full widget
coarse elbow
#

Yes

coarse elbow
#

I dont think its the actual problem here

full widget
#

it gets hard to read separated like that but a lot of that can be written in one if

coarse elbow
#

Yes

full widget
coarse elbow
#

Wdym?

full widget
#

your best friend for debuggin graphics apps

full widget
coarse elbow
#

Im new to gpu stuff and graphics

full widget
#

like how works gets scheduled across the processors in a GPU

#

ok so

coarse elbow
#

Its one of mt first projects

full widget
#

I can't write too much about it rn, maybe later

coarse elbow
#

Ok

full widget
#

try sharing your code in a friendlier format it gets hard to follow

coarse elbow
#

Np

full widget
#

try to read up on how work gets submitted to a GPU

#

get familiarized with stuff like workgroups and warps / wavefronts

#

google those terms

coarse elbow
#

Ok

#

Btw do u think I can do something for the random direction problem?

fossil hull
#

Your raymarch is suboptimal, you can trace against raw Z.. avoiding any space conversions in the loop

coarse elbow
coarse elbow
#

Btw I've done the exact same thing in a minecraft shader and there was no performance drop

obsidian raptor
#

Also it's expected that high roughness will lead to worse perf- you'll be thrashing the cache by sampling randomly in a texture

coarse elbow
obsidian raptor
#

Ok that rules out the compiler sneaking in optimizations for when the roughness is 0

#

My leading hypothesis is cache thrashing, but only a profiler can say for sure

coarse elbow
obsidian raptor
#

It's unrelated to OpenGL

#

You're sampling a texture, right?

#

For the trace

coarse elbow
#

yes

obsidian raptor
#

That texture lives in memory and access to it goes through a cache hierarchy

#

Accessing it in a spatially coherent manner (neighborhood pixels sample similar points) means a lot of accesses will be serviced by low level caches, which are quick but small

#

But when your roughness is high, the access pattern becomes incoherent and the caches will fail more often, leading to more slow loads from main memory

#

I'm not sure that fully explains the huge 30+ ms drop in perf, but it's all I can think of now

coarse elbow
#

its weird

#

when i tried the same think on a mc shader it had just a minimal performance drop

obsidian raptor
#

MC may be bottlenecked by other things already, idk

#

My #1 suggestion is to use a graphics profiler

coarse elbow
#

i mean in a mc shader having the roughness to 1 made almost 0 difference in performance

#

so maybe its the mc pipeline that's different

#

or maybe it doesn't use opengò

obsidian raptor
#

opengò

#

It uses OpenGL

#

Anyway yeah it's possible you have some silly goof in your code, kinda hard to say without seeing it

#

And by that I mean the code you didn't show as well

coarse elbow
obsidian raptor
#

🇺🇸