#:frog_gone: martty's mesa misadventures

3278 messages · Page 4 of 4 (latest)

dark vortex
#

no, actually unrelated to what you said

#

that is a major problem too but not exactly what I meant 🐸

wanton carbon
#

but what is it

dark vortex
#

radv reserves 4096 dwords of cmdbuf space for every draw, always

#

which means we won't be able to fit a lot of draws without chaining in between each of them

wanton carbon
#

mm

#

why does it do that?

#

also how many draws in dad do you need to break radw?

dark vortex
#

i didn't run dad on radw yet

dark vortex
#

we should be able to do better

#

it'd probably be fine code for the amdgpu winsys

#

actually "reserves" is a bit misleading

#

it just makes sure 4096 bytes are free, but it might not consume all of them

#

pretty much a vector::reserve()

wanton carbon
#

ye but that should be fine

dark vortex
#

well it will still waste space in the case that we don't use the 4096 bytes

#

will probably waste around 4090 bytes or so per chained ib

wanton carbon
#

NEW SUBMIT: 1
Submit VA: 30027a000
Submit Length: 64
Chunk 0: 0 - 64(va 30027a000)

PKT0 (offset 0)
nbio230.mmBIF_BX_DEV0_EPF0_VF0_MM_INDEX(1)
PKT0 (offset 8)
<unknown>(181c0000)
these are probably not cbs

dark vortex
#

ye

#

the PKT0 with nbio something is just invalid mem I think

#

ah I think I know what's wrong with chaining

#

prop sets IB_SIZE field in the INDIRECT_BUFFER cmd, we don't

#

if this changes anything it's interesting that that works on loonix

wanton carbon
#

hm

dark vortex
#

oh actually I discovered a perhaps more likely bug

#

didn't fix it tho

#

bleh setting proper IB_SIZE bits is going to be annoying

wanton carbon
#

why do the sub IBs start with a dma copying the entire IB to 0x0?

#

wtf does that mean

dark vortex
#

whut

#

perhaps prefetching memes, but I don't have that in my stuff

#

writing the IB_SIZE does indeed fix the hang

#

it doesn't fix the test fully tho

wanton carbon
#

PKT3_INDIRECT_BUFFER (offset fbf0)
IB_BASE_LO(1240000)
SWAP(0)
IB_BASE_HI(3)
IB_SIZE(3f00)
IB_VMID(0)
CHAIN(1)
PRE_ENA(1)
CACHE_POLICY(0)
PRE_RESUME(0)
PRIV(0)
Referenced IB (VA 0x 301240000, size fc00):
PKT3_DMA_DATA (offset 0)
ENGINE_SEL(1)
SRC_CACHE_POLICY(0)
DST_SEL(2)
DST_CACHE_POLICY(0)
SRC_SEL(3)
CP_SYNC(0)
SRC_ADDR_LO_OR_DATA(1240000)
SRC_ADDR_HI(3)
DST_ADDR_LO(0)
DST_ADDR_HI(0)
BYTE_COUNT(fc00)
DIS_WC(0)
SAS(0)
DAS(0)
SAIC(0)
DAIC(0)
RAW_WAIT(0)

#

this is how it looks for me

#

for all of them basically

dark vortex
#
PKT3_INDIRECT_BUFFER (opcode 3f, offset fbf0)
    IB_BASE_LO(1470000) 
    SWAP(0) 
    IB_BASE_HI(3) 
    IB_SIZE(3f00) 
    IB_VMID(0) 
    CHAIN(1) 
    PRE_ENA(1) 
    CACHE_POLICY(0) 
    PRE_RESUME(0) 
    PRIV(0) 
Referenced IB (VA 0x 301470000, size fc00):
PKT3_DRAW_INDEX_AUTO (opcode 2d, offset 0)
    INDEX_COUNT(3) 
    DRAW_INITIATOR(2) 
PKT3_DRAW_INDEX_AUTO (opcode 2d, offset c)
    INDEX_COUNT(3) 
    DRAW_INITIATOR(2)
```for all of them as well
#

not completely sure what discord parsed here but it's something

wanton carbon
#

maybe its bc your cbs are in device local, while mine are in gtt

#

and then prefetching memetics

dark vortex
#

well I didn't change anything about cb placement so they should be in gtt in both cases

wanton carbon
#

i mean on prop drv

dark vortex
#

ah ye right

wanton carbon
#

NEW SUBMIT: 1 9
Submit VA: 300048000
Submit Length: 64
Chunk 0: 0 - 64(va 300048000)

PKT0 (offset 0)
nbio230.mmBIF_BX_DEV0_EPF0_VF0_MM_INDEX(1)
PKT0 (offset 8)
<unknown>(181c0000)
this is coming from creategraphicspipes

#

NEW SUBMIT: 1 8
Submit VA: 300270000
Submit Length: 64
Chunk 0: 0 - 64(va 300270000)

PKT0 (offset 0)
nbio230.mmBIF_BX_DEV0_EPF0_VF0_MM_INDEX(1)
PKT0 (offset 8)
dcn300.mmREFCLK_CGTT_BLK_CTRL_REG(181c0000)
this is from createswapchain

#

ah could this be transfer queue packets or something like that?

dark vortex
#

don't think so

#

transfer queue cmds are just a subset of the pkt3 cmds iirc

#

but no special encoding or anythin

#

it's getting a bit late now

#

think the chaining stuff I have might regress some cts tests but the basic non-chained submission still seems to work, so i might just push it regardless

wanton carbon
#

ye do it

#

git has history

dark vortex
#

pushd

#

tomorrow is the only day of the week where i need to get up early aaa

#

also i'm going on a concert the evening and then visiting parents so it's sorta unlikely I'll be able to investigate more until next week

wanton carbon
#

ye dw

#

there is no rush

#

dis a weird chunk

wanton carbon
#

i just liked the i with accent

#

btw

#

before you leave

#

how do you use VK_DRIVER_FILES?

#

ok i think i get it

#

flerk, i had some local changes from april that i forgot about

#

i put them up, not sure if very useful

dark vortex
#

not sure how that happened

dark vortex
wanton carbon
#

i put it on radv-win32-flerk

#

apologies

dark vortex
#

np

wanton carbon
#

flerk worked for 150k draws, but killed my system at 1.5M KEKW

dark vortex
#

oh wait whut it actually werked

wanton carbon
#

non-flerk could do the 1.5M

#

so thats better him

#

i'll bring over the non-cs stuff from flerk

#

and also set up cts again agonyfrog

wanton carbon
#

ye boii

#

chaining completely broiled my mind smh

dark vortex
#

oh wait tf

#

the primary ones pass for you?

wanton carbon
#

ye

#

the display shits itself slightly during

#

but they pass

#

ah no

#

they are flaky

#

sadness

#

interestingly even the secondary one flakes

#

but that doesn't lose the device

#

we are not waiting or not flushing something?

wanton carbon
#

thats p nice tho

#

ty for the env var tip

void crow
wanton carbon
#

not yet

#

i have a theory tho, so lets see

#

and by lets see i mean once i am done with chores for today, i may or may not enough energy left to test the theory out

void crow
#

do it like me, ignore the chores until it works

wanton carbon
#

tempting

wanton carbon
#

seems like the driver is materializing some memory out of nowhere frog_think

#

i believe this might be the key

wanton carbon
#

zo i narrowed a quantum amount

#

it seems like the hang is related to the lack of swapchain?

#

@dark vortex did the single tri spam ever hang for you?

#

or just the cts?

dark vortex
#

both hang for me

wanton carbon
#

humm

wanton carbon
#

doesn't seem like semas are working tho, tf

wanton carbon
#

some progress on looking at the IB flags with many dad draws
S - is the present sema signalled by the submit or not
P - preamble flag
M - main chunk flag
R - result

S P M R
0 0 0 good
1 0 0 good
0 c 0 good
1 c 0 good
0 d 0 hang
1 d 0 hang
0 d 204 loss or corrupted
1 d 204 corrupted
0 c 204 corrupted
1 c 204 corrupted
#

seems like sema is not doing anything

wanton carbon
#

ah the sema wait fails when there was no signal submitted

#

good to know

wanton carbon
#

i shrimply do not get it
20000 draw submit: aww you're so sweet
many submits, swept from 1000 to 20000 draws: aww you're so sweet
two submits, 10000 and 20000 draws: hello human resources?

wanton carbon
#

@dark vortex i figured this one out ^

dark vortex
#

noice

wanton carbon
#

waiting for fences is fooked

#

therefore resetting pool was messing with inflight ibs

dark vortex
#

that’s no good

wanton carbon
#

ye but at least i have an idea wtf is happening 😄

wanton carbon
#

the wait is fine, the reset was not

hollow pier
#

yeah I am really curious about that branch. very interesting and surprising tbh

midnight shore
#

end of amdvlk nigh? :shocked: :shocked: :shocked:

wanton carbon
#

just what the doctor ordered

onyx vector
#

What the SHIT

#

imagine what faith would have accomplished if she had been freed from intel 10 years ago

dark vortex
#

I mean

#

if you want to play mr. negative all the fuckin time, you could say "what the shit, now radv will suffer from windows users barging in on the issue tracker about random undebuggable bs"

onyx vector
#

what is this, moronix comment section?

dark vortex
#

no

#

I'm excited for it too, but this is bugging me in the back of my mind a bit

onyx vector
#

yeah ok fair, though quite a remote possibility rn I guess

dark vortex
#

radv on windows is cool, but actually exposing it means we should properly support it too

onyx vector
#

RADV_I_WILL_NOT_BE_A_DICK_ON_THE_ISSUE_TRACKER=1

dark vortex
#

and for actual proper support, debuggability and the lack thereof on windows is a huge issue

onyx vector
#

there, fixed it

dark vortex
#

regardless of whether people are going to be stupid on the issue tracker actually (linux users can do that too)

hollow pier
#

I am kinda curious what the endgoal for this is since iirc I remember Faith saying on #nouveau that the windows-y bits in radv (afaik there was some work to make it compile, right?) sometimes made things annoying

dark vortex
#

me too

#

no idea why faith is suddenly interested in windows stuff

onyx vector
#

maybe she's just taking the piss and this is like me writing a glsl 1.10 backend

hollow pier
#

possibly it's some prototyping/testing the waters for nvk on windows. I remember she said once that she'd sooner support windows than OpenRM lmfao

dark vortex
#

would make sense

#

either way, I'm not sure if we can expect her to debug/solve everything about windows issues

onyx vector
#

I guess because WDDM is actually documented?

dark vortex
#

not the interesting parts

#

everything remotely interesting is pDriverPrivateData undocumented bs

hollow pier
onyx vector
#

I wasn't so pleased to see dozen being introduced either actually

#

letting Microsoft in and playing into their dumb wddm-in-wsl games

dark vortex
#

I wasn't exactly jumping with joy either but I don't know of a real reason to refuse them without being kinda ass-backwards

hollow pier
#

hot take: should've just went d3d on linux

dark vortex
#

tbh they would've done this whether it was going to get upstream or not

#

if we'd have been ass-backwards about it it would've just become a downstream fork

onyx vector
#

yeah pls no d3d in linux

#

don't let the trojan horse in

dark vortex
#

and this way, everyone profits from fixes to nir etc on MS time/dime

hollow pier
#

yeah that's a fair point

onyx vector
#

we have actual royalty-free standards, i find it extremely concerning how some people don't care and would happily embrace a proprietary thing from another platform that we have zero control over

hollow pier
#

anyways back to the og point, I wonder how viable it'd be to just do it the "yeah we do support windows, you can use it but we really dont support that" way ala dxvk

onyx vector
#

for the sake of like, a supposedly nicer C++ API over GL

dark vortex
#

it's a bit of a risky move though

onyx vector
#

on the other other hand

hollow pier
#

why though, doesn't this take care of the risk?

dark vortex
#

the risk is still getting the reputation of "broken driver most of the time"

onyx vector
#

it means mesa can single handedly add whatever they want to vulkan on everything

hollow pier
#

ugh. I see

dark vortex
#

yeah that's fair

#

perhaps devsh will be happy to profit off function call work without having to fix his linux build

midnight shore
#

it's bait for amd to kill off amdvlk and move their resources to radv frognant

dark vortex
#

is that a good thing?

#

it might become one but it doesn't have to be

midnight shore
#

it's "frognant" first and foremost

onyx vector
#

anything that kills llvm amdgpu further is good imo

midnight shore
dark vortex
#

well the issue is that radv philosophy is quite different from the usual amd philosophy and I don't think amd will want to change

midnight shore
#

tho I hope they will care about games rather than viewperf vk equiv. xd

onyx vector
dark vortex
#

yes, and the way they care about the games is exactly where the differences show imo

#

mesa has quite a strong aversion to app profile bs, for example

#

but I doubt amd will want to just stop making app profiles

onyx vector
#

amd contributing works out for radeonsi, or does it ?

dark vortex
#

yes but that's a bit of a different thing because it's basically meaningless for the gaming division

midnight shore
#

well they could do viewperf and other cad things profiles instead right frognant

onyx vector
#

isn't linux overall like that for amd

midnight shore
#

but they don't

onyx vector
#

stadia is dead, the deck ships radv, all distros I know of ship radv, why does amd even bother with amdvlk

dark vortex
#

because amdvlk code is shared between linux and windows

#

they're maintaining it as a windows driver first and foremost I'm pretty sure

onyx vector
#

yeah sure but like, does anyone actually care about amdvlk on linux

#

how does amd justify it

dark vortex
#

¯_(ツ)_/¯

dark vortex
midnight shore
#

to be clear I'm not serious either way

#

it's all "frognant"

dark vortex
#

yeah I don't think it's going to happen either way

dark vortex
midnight shore
onyx vector
#

i have some insight here that I can't share, but my opinion is AMD doesn't really care about gaming so much about weird niche use-cases that rely on their stack

#

that's why they keep supporting their funny stack

#

even though almost nobody uses it

midnight shore
#

and some meme apps that aren't games want to work on amd drivers and if you run then on radv and something breaks unique to radv they won't care even if it's their bug

onyx vector
#

yeah that sorta stuff

onyx vector
#

radeonsi isn't relevant for gaming but neither is amdvlk (anymore, stadia was a thing ig)

midnight shore
#

the entire Linux thing is clearly meaningless for amd judging by the state of amdgpu frognant

onyx vector
#

Terakan on windows?

dark vortex
#

yes triangel is working on that i think

onyx vector
#

tri👼

#

insane (in a way I'm supportive of)

dark vortex
#

sadly I don't think I can really disagree 🐸

wanton carbon
#

i think maybe a bigger question is if they are interested in consumer graphics anymore at all

wanton carbon
#

@ocean sphinx henlo

ocean sphinx
#

Hello

wanton carbon
#

i think my ideal api would be state delta objects

#

basically hw command buffer fragments you can prerecord/compile

#

proper banger, no issues, 10/10

ocean sphinx
#

Uuuhm, how does that work for stateless hw

#

Like, that would map very well on like amd where pipelines I think actually include bits of command buffers to set state iirc, but on some GPUs gpu have huge structure with all the state

wanton carbon
ocean sphinx
wanton carbon
#

well in that case the state delta is just the state

ocean sphinx
#

But that'd like everything

#

Uhm

wanton carbon
#

maybe i do not follow

ocean sphinx
#

Like, you can't really "update" state because there is no state, you need to create it from scratch and best you can do is reuse some descriptors that haven't changed

#

So that means you need to statically know the state at every draw

wanton carbon
#

sure

#

in that case the state delta is the full state

ocean sphinx
#

I see

#

And SK your idea is that for each vendor you have different granularity?

wanton carbon
#

basically it degenerates to vkPipes

ocean sphinx
#

Yeah

wanton carbon
#

SK?

#

but basically instead of having an api which vkpipe = make_state_bukkit(...); you have vkstatedelta = make_state_delta(state_from, state_to);

#

you pass in undefined or whateves for from_state to get the vkpipe

ocean sphinx
#

Aaaaah I see

#

And so the driver can decide how much state the delta contains based on how much it is capable of changing dynamically

wanton carbon
#

ye

ocean sphinx
#

That makes a lot of sense cool

wanton carbon
#

imagine you are pinging off batches of draws separated by just copies into the cb

ocean sphinx
#

Yeah I see

#

And I suppose you'd have a null state too

#

So the user would create a null to something state then a delta each time a draws wants something different

#

Though the combinatorics are kind of explosive if you require that the previous state exactly matches the current state

wanton carbon
#

thats the state_from = undefined

#

which means you store all the state

ocean sphinx
#

Yeah but not the whole thing

#

You need to be able to say "this particular thing is undefined"

#

Like, I don't care what the previous alpha test function was just change it to this new one

wanton carbon
#

yeah

#

ah I see - sure

#

yes I suppose you could put that in the delta build

ocean sphinx
#

Sounds cool, I think one problem with this is that the api tells you nothing about what would be fast and what would be slow though

wanton carbon
#

as opposed to.. any current api? 😄

ocean sphinx
#

Which I guess is where much of the friction with GPL comes from

#

Like, if the dev needs to change that state and they have no option then they have to implement logic to cache things

wanton carbon
#

i think that is orthogonal

#

what you want to have is for the driver to inform you for state bucketing

ocean sphinx
#

Uhm

wanton carbon
#

but i think that would be useful for any pipeline approach

#

@ocean sphinx or do you mean what state is needed to be baked into the shader binary?

ocean sphinx
#

Whereas eg. GPL with fast and optimized linking makes it cleat what is static, dynamic and possibly dynamic but slower

#

Also, sometimes having control over it makes sense too. Like the driver may not know whether having separate pipelines is better than having slower but dynamic ones

#

And GPL also gives you control over that

wanton carbon
#

this thing is more about going between the states, but the state itself could be defined like GPL