#:frog_gone: martty's mesa misadventures
3278 messages · Page 4 of 4 (latest)
but what is it
radv reserves 4096 dwords of cmdbuf space for every draw, always
which means we won't be able to fit a lot of draws without chaining in between each of them
i didn't run dad on radw yet
think it's a worst-case assumption for if every state changes
we should be able to do better
it'd probably be fine code for the amdgpu winsys
actually "reserves" is a bit misleading
it just makes sure 4096 bytes are free, but it might not consume all of them
pretty much a vector::reserve()
ye but that should be fine
well it will still waste space in the case that we don't use the 4096 bytes
will probably waste around 4090 bytes or so per chained ib
NEW SUBMIT: 1
Submit VA: 30027a000
Submit Length: 64
Chunk 0: 0 - 64(va 30027a000)PKT0 (offset 0)
nbio230.mmBIF_BX_DEV0_EPF0_VF0_MM_INDEX(1)
PKT0 (offset 8)
<unknown>(181c0000)
these are probably not cbs
ye
the PKT0 with nbio something is just invalid mem I think
ah I think I know what's wrong with chaining
prop sets IB_SIZE field in the INDIRECT_BUFFER cmd, we don't
if this changes anything it's interesting that that works on loonix
hm
oh actually I discovered a perhaps more likely bug
didn't fix it tho
bleh setting proper IB_SIZE bits is going to be annoying
why do the sub IBs start with a dma copying the entire IB to 0x0?
wtf does that mean
whut
perhaps prefetching memes, but I don't have that in my stuff
writing the IB_SIZE does indeed fix the hang
it doesn't fix the test fully tho
PKT3_INDIRECT_BUFFER (offset fbf0)
IB_BASE_LO(1240000)
SWAP(0)
IB_BASE_HI(3)
IB_SIZE(3f00)
IB_VMID(0)
CHAIN(1)
PRE_ENA(1)
CACHE_POLICY(0)
PRE_RESUME(0)
PRIV(0)
Referenced IB (VA 0x 301240000, size fc00):
PKT3_DMA_DATA (offset 0)
ENGINE_SEL(1)
SRC_CACHE_POLICY(0)
DST_SEL(2)
DST_CACHE_POLICY(0)
SRC_SEL(3)
CP_SYNC(0)
SRC_ADDR_LO_OR_DATA(1240000)
SRC_ADDR_HI(3)
DST_ADDR_LO(0)
DST_ADDR_HI(0)
BYTE_COUNT(fc00)
DIS_WC(0)
SAS(0)
DAS(0)
SAIC(0)
DAIC(0)
RAW_WAIT(0)
this is how it looks for me
for all of them basically
PKT3_INDIRECT_BUFFER (opcode 3f, offset fbf0)
IB_BASE_LO(1470000)
SWAP(0)
IB_BASE_HI(3)
IB_SIZE(3f00)
IB_VMID(0)
CHAIN(1)
PRE_ENA(1)
CACHE_POLICY(0)
PRE_RESUME(0)
PRIV(0)
Referenced IB (VA 0x 301470000, size fc00):
PKT3_DRAW_INDEX_AUTO (opcode 2d, offset 0)
INDEX_COUNT(3)
DRAW_INITIATOR(2)
PKT3_DRAW_INDEX_AUTO (opcode 2d, offset c)
INDEX_COUNT(3)
DRAW_INITIATOR(2)
```for all of them as well
not completely sure what discord parsed here but it's something
maybe its bc your cbs are in device local, while mine are in gtt
and then prefetching memetics
well I didn't change anything about cb placement so they should be in gtt in both cases
i mean on prop drv
ah ye right
NEW SUBMIT: 1 9
Submit VA: 300048000
Submit Length: 64
Chunk 0: 0 - 64(va 300048000)PKT0 (offset 0)
nbio230.mmBIF_BX_DEV0_EPF0_VF0_MM_INDEX(1)
PKT0 (offset 8)
<unknown>(181c0000)
this is coming from creategraphicspipes
NEW SUBMIT: 1 8
Submit VA: 300270000
Submit Length: 64
Chunk 0: 0 - 64(va 300270000)PKT0 (offset 0)
nbio230.mmBIF_BX_DEV0_EPF0_VF0_MM_INDEX(1)
PKT0 (offset 8)
dcn300.mmREFCLK_CGTT_BLK_CTRL_REG(181c0000)
this is from createswapchain
ah could this be transfer queue packets or something like that?
don't think so
transfer queue cmds are just a subset of the pkt3 cmds iirc
but no special encoding or anythin
it's getting a bit late now
think the chaining stuff I have might regress some cts tests but the basic non-chained submission still seems to work, so i might just push it regardless
pushd
tomorrow is the only day of the week where i need to get up early aaa
also i'm going on a concert the evening and then visiting parents so it's sorta unlikely I'll be able to investigate more until next week
i just liked the i with accent
btw
before you leave
how do you use VK_DRIVER_FILES?
ok i think i get it
flerk, i had some local changes from april that i forgot about
i put them up, not sure if very useful
oh didn’t even notice
not sure how that happened
where? radv-win32’s HEAD is still my commit
np
flerk worked for 150k draws, but killed my system at 1.5M 
oh wait whut it actually werked
non-flerk could do the 1.5M
so thats better 
i'll bring over the non-cs stuff from flerk
and also set up cts again 
ye
the display shits itself slightly during
but they pass
ah no
they are flaky
sadness
interestingly even the secondary one flakes
but that doesn't lose the device
we are not waiting or not flushing something?
did you figure it out? 👀
not yet
i have a theory tho, so lets see
and by lets see i mean once i am done with chores for today, i may or may not enough energy left to test the theory out
tempting
seems like the driver is materializing some memory out of nowhere 
i believe this might be the key
zo i narrowed a quantum amount
it seems like the hang is related to the lack of swapchain?
@dark vortex did the single tri spam ever hang for you?
or just the cts?
both hang for me
humm
single tri spam doesn't hang for me 🤔
doesn't seem like semas are working tho, tf
some progress on looking at the IB flags with many dad draws
S - is the present sema signalled by the submit or not
P - preamble flag
M - main chunk flag
R - result
S P M R
0 0 0 good
1 0 0 good
0 c 0 good
1 c 0 good
0 d 0 hang
1 d 0 hang
0 d 204 loss or corrupted
1 d 204 corrupted
0 c 204 corrupted
1 c 204 corrupted
seems like sema is not doing anything
i shrimply do not get it
20000 draw submit: aww you're so sweet
many submits, swept from 1000 to 20000 draws: aww you're so sweet
two submits, 10000 and 20000 draws: hello human resources?
@dark vortex i figured this one out ^
noice
waiting for fences is fooked
therefore resetting pool was messing with inflight ibs
that’s no good
ye but at least i have an idea wtf is happening 😄
yeah I am really curious about that branch. very interesting and surprising tbh
end of amdvlk nigh? :shocked: :shocked: :shocked:
just what the doctor ordered
What the SHIT
imagine what faith would have accomplished if she had been freed from intel 10 years ago
I mean
if you want to play mr. negative all the fuckin time, you could say "what the shit, now radv will suffer from windows users barging in on the issue tracker about random undebuggable bs"
what is this, moronix comment section?
yeah ok fair, though quite a remote possibility rn I guess
radv on windows is cool, but actually exposing it means we should properly support it too
RADV_I_WILL_NOT_BE_A_DICK_ON_THE_ISSUE_TRACKER=1
and for actual proper support, debuggability and the lack thereof on windows is a huge issue
there, fixed it
regardless of whether people are going to be stupid on the issue tracker actually (linux users can do that too)
I am kinda curious what the endgoal for this is since iirc I remember Faith saying on #nouveau that the windows-y bits in radv (afaik there was some work to make it compile, right?) sometimes made things annoying
maybe she's just taking the piss and this is like me writing a glsl 1.10 backend
possibly it's some prototyping/testing the waters for nvk on windows. I remember she said once that she'd sooner support windows than OpenRM lmfao
would make sense
either way, I'm not sure if we can expect her to debug/solve everything about windows issues
I guess because WDDM is actually documented?
not the interesting parts
everything remotely interesting is pDriverPrivateData undocumented bs
there's also a leverage issue where supporting openrm gives nvidia some leverage that could backfire
I wasn't so pleased to see dozen being introduced either actually
letting Microsoft in and playing into their dumb wddm-in-wsl games
I wasn't exactly jumping with joy either but I don't know of a real reason to refuse them without being kinda ass-backwards
hot take: should've just went d3d on linux
tbh they would've done this whether it was going to get upstream or not
if we'd have been ass-backwards about it it would've just become a downstream fork
and this way, everyone profits from fixes to nir etc on MS time/dime
yeah that's a fair point
we have actual royalty-free standards, i find it extremely concerning how some people don't care and would happily embrace a proprietary thing from another platform that we have zero control over
anyways back to the og point, I wonder how viable it'd be to just do it the "yeah we do support windows, you can use it but we really dont support that" way ala dxvk
for the sake of like, a supposedly nicer C++ API over GL
unless we get proper debugging tools this is going to be the state of it yes
it's a bit of a risky move though
on the other other hand
why though, doesn't this take care of the risk?
the risk is still getting the reputation of "broken driver most of the time"
it means mesa can single handedly add whatever they want to vulkan on everything
ugh. I see
yeah that's fair
perhaps devsh will be happy to profit off function call work without having to fix his linux build
it's bait for amd to kill off amdvlk and move their resources to radv 
it's "
" first and foremost
anything that kills llvm amdgpu further is good imo
why not if it gets radeonsi treatment
well the issue is that radv philosophy is quite different from the usual amd philosophy and I don't think amd will want to change
tho I hope they will care about games rather than viewperf vk equiv. xd
can you clarify what that means
yes, and the way they care about the games is exactly where the differences show imo
mesa has quite a strong aversion to app profile bs, for example
but I doubt amd will want to just stop making app profiles
amd contributing works out for radeonsi, or does it ?
yes but that's a bit of a different thing because it's basically meaningless for the gaming division
well they could do viewperf and other cad things profiles instead right 
isn't linux overall like that for amd
but they don't
stadia is dead, the deck ships radv, all distros I know of ship radv, why does amd even bother with amdvlk
because amdvlk code is shared between linux and windows
they're maintaining it as a windows driver first and foremost I'm pretty sure
yeah sure but like, does anyone actually care about amdvlk on linux
how does amd justify it
¯_(ツ)_/¯
hm
yeah I don't think it's going to happen either way
actually i think it might just be official support™️
iiuc the pal part, which contains all the juicy bits, is also used by their prop old api drivers
i have some insight here that I can't share, but my opinion is AMD doesn't really care about gaming so much about weird niche use-cases that rely on their stack
that's why they keep supporting their funny stack
even though almost nobody uses it
and some meme apps that aren't games want to work on amd drivers and if you run then on radv and something breaks unique to radv they won't care even if it's their bug
yeah that sorta stuff
i think it's all meaningless for AMD from a gaming persp
radeonsi isn't relevant for gaming but neither is amdvlk (anymore, stadia was a thing ig)
the entire Linux thing is clearly meaningless for amd judging by the state of amdgpu 
yes triangel is working on that i think
🐸
sadly I don't think I can really disagree 🐸
i think maybe a bigger question is if they are interested in consumer graphics anymore at all
@ocean sphinx henlo
Hello
i think my ideal api would be state delta objects
basically hw command buffer fragments you can prerecord/compile
proper banger, no issues, 10/10
Uuuhm, how does that work for stateless hw
Like, that would map very well on like amd where pipelines I think actually include bits of command buffers to set state iirc, but on some GPUs gpu have huge structure with all the state
you mean the pipeline is just big bag of state? or wdym
Yeah, like for mali (and many othet tilers) the draw points to a struct that points to a bunch of structs that points to a bunch of structs... With all the state
well in that case the state delta is just the state
maybe i do not follow
Like, you can't really "update" state because there is no state, you need to create it from scratch and best you can do is reuse some descriptors that haven't changed
So that means you need to statically know the state at every draw
basically it degenerates to vkPipes
Yeah
SK?
but basically instead of having an api which vkpipe = make_state_bukkit(...); you have vkstatedelta = make_state_delta(state_from, state_to);
you pass in undefined or whateves for from_state to get the vkpipe
Aaaaah I see
And so the driver can decide how much state the delta contains based on how much it is capable of changing dynamically
ye
That makes a lot of sense cool
imagine you are pinging off batches of draws separated by just copies into the cb
Yeah I see
And I suppose you'd have a null state too
So the user would create a null to something state then a delta each time a draws wants something different
Though the combinatorics are kind of explosive if you require that the previous state exactly matches the current state
Yeah but not the whole thing
You need to be able to say "this particular thing is undefined"
Like, I don't care what the previous alpha test function was just change it to this new one
Sounds cool, I think one problem with this is that the api tells you nothing about what would be fast and what would be slow though
as opposed to.. any current api? 😄
Which I guess is where much of the friction with GPL comes from
Like, if the dev needs to change that state and they have no option then they have to implement logic to cache things
i think that is orthogonal
what you want to have is for the driver to inform you for state bucketing
I mean, GPL does encode a lot of hw restrictions on what can be dynamic
Uhm
but i think that would be useful for any pipeline approach
@ocean sphinx or do you mean what state is needed to be baked into the shader binary?
Yeah so, some state is registers you can just change, some is stuff that needs to be hardcoded, some can be dynamic by adding logic to the shader but has a cost draw time. Your architecture allows the driver to implicitly pick among those things as needed but does nothing to communicate what is happening to the programmer
Whereas eg. GPL with fast and optimized linking makes it cleat what is static, dynamic and possibly dynamic but slower
Also, sometimes having control over it makes sense too. Like the driver may not know whether having separate pipelines is better than having slower but dynamic ones
And GPL also gives you control over that
this thing is more about going between the states, but the state itself could be defined like GPL
