#DX12 Performance

22 messages · Page 1 of 1 (latest)

alpine schooner
#

I think I need a different approach for how I've set up rendering. I'm getting serious bottlenecks from having a shit load of GameObjects
About 500-600 simple cubes sharing the same mat/shader/etc will bring the fps down to <30

void Batch::Render(ID3D12GraphicsCommandList2* commandList, D3D12_VIEWPORT viewPort, D3D12_RECT scissorRect, D3D12_CPU_DESCRIPTOR_HANDLE rtv, D3D12_CPU_DESCRIPTOR_HANDLE dsv, XMMATRIX viewProj, Frustum& frustum)
{
    for (int i = 0; i < m_gameObjectList.size(); i++) 
    {
        XMFLOAT3 pos;
        float radius;
        m_gameObjectList[i]->GetBoundingSphere(pos, radius);

        if (!FRUSTUM_CULLING_ENABLED || frustum.CheckSphere(pos, radius))
            m_gameObjectList[i]->Render(commandList, m_rootSignature.Get(), viewPort, scissorRect, rtv, dsv, viewProj);
    }

    for (int i = 0; i < m_gameObjectListTransparent.size(); i++)
    {
        XMFLOAT3 pos;
        float radius;
        m_gameObjectListTransparent[i]->GetBoundingSphere(pos, radius);

        if (!FRUSTUM_CULLING_ENABLED || frustum.CheckSphere(pos, radius))
            m_gameObjectListTransparent[i]->Render(commandList, m_rootSignature.Get(), viewPort, scissorRect, rtv, dsv, viewProj);
    }
}```
#
void GameObject::Render(ID3D12GraphicsCommandList2* commandListDirect, ID3D12RootSignature* rootSig, D3D12_VIEWPORT viewPort, D3D12_RECT scissorRect, D3D12_CPU_DESCRIPTOR_HANDLE rtv, D3D12_CPU_DESCRIPTOR_HANDLE dsv, XMMATRIX viewProj)
{
    if (!m_model || !m_shader)
        return;

    commandListDirect->SetPipelineState(m_shader->GetPSO().Get());
    commandListDirect->SetGraphicsRootSignature(rootSig);
    commandListDirect->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);

    commandListDirect->RSSetViewports(1, &viewPort);
    commandListDirect->RSSetScissorRects(1, &scissorRect);

    UINT numRenderTargets = 1;
    commandListDirect->OMSetRenderTargets(numRenderTargets, &rtv, FALSE, &dsv);

    if (m_material)
    {
        auto texHeap = m_material->GetTextureHeap();
        ID3D12DescriptorHeap* ppHeaps[] = { texHeap };
        commandListDirect->SetDescriptorHeaps(_countof(ppHeaps), ppHeaps);

        commandListDirect->SetGraphicsRootDescriptorTable(1, texHeap->GetGPUDescriptorHandleForHeapStart());
    }   

    MatricesCB matricesCB;
    matricesCB.M = m_worldMatrix;
    matricesCB.InverseTransposeM = XMMatrixTranspose(XMMatrixInverse(nullptr, m_worldMatrix));
    matricesCB.VP = viewProj;
    commandListDirect->SetGraphicsRoot32BitConstants(0, sizeof(MatricesCB) / 4, &matricesCB, 0);

    m_model->Render(commandListDirect);
}```
thick trellis
#

Use a profiler. Visual Studio has a pretty good one built-in, and PIX can profile your application as well

alpine schooner
#

I heard something once that using a vector of pointers can cause performance issues due to needing to refill the cache or something. Should I try refactor to avoid this?

thick trellis
#

Nothing stands out to me in the code you've shown

#

A vector of pointers is usually not the fastest possible thing. However, I strongly doubt it'd hurt your framerate this much

alpine schooner
#

Anyone know what this is about?

#

Both

if (m_material)```
and especially
```cpp
Model* sphereModel = nullptr```

I feel like shouldn't take more than a few cycles each and yet they're eating loads
thick sedge
#

At 30 fps that’s 33ms per frame, which would be about 65 microseconds per object. That’s definitely very high.have you confirmed that the time spent is even in the function you listed?

#

For microoptimization you would have to look at the disassembly for optimized code, since there not a simple one-to-one translation from source to instructions. Likely you just have a cache miss on a load that’s causing a long stall.

alpine schooner
#

Anyone know how I can debug something like this? How do I know what this d3d12sdklayers represents?
Apparently when there's a massive performance hit on the first variable declaration line, what it's really showing is how long it took to enter the function (Aka copying arguments and stuff). But every argument I use is either a pointer or lvalue so I'm not sure why else it might be taking long to enter

I have ideas for how to refactor lots of stuff to avoid cache misses but I'm not sure if that's what's actually causing an issue or not

thick trellis
#

d3d12sdklayers is part of the D3D12 driver

#

That data is telling you that your program is spending most of its time in the driver - your code isn't doing very much

#

Do you know if the GPU is taking longer to render a frame than the CPU? How much time if your CPU taking each frame?

#

Hmm, looks like VS doens't really show you the time spent waiting for the GPU. Its sampling profiler just tells you what percentage of your time is spend in which function - not what your total time is

#

If your GPU is taking longer than your CPU, focusing on CPU optimizations won't help you

alpine schooner
thick trellis
#

I've never used the GPU profiler in VS, idk how good it is

#

Nsight is really good, though

#

It'll tell you if you're CPU-bound of GPU-bound for sure

#

You'll want to use the GPU Trace Profiler

thick sedge
#

d3d12sdklayers is the debug layer