#:frog_gone: martty's mesa misadventures
1 messages · Page 3 of 1
i am fairly confident it is not
i think with the wave state manip working it might be possible to unhalt s_halt-ed waves
but there was some footnote in rocm that that puts the queue into some error state
i don't know where that error state is
@raven vortex moving it here not to disrupt pixelduck's thread
i am dumping the CS of the windows driver
i just mutilated umr until it worked 🙂
it can work on... windows?
well parsing a pm4 stream from a file, yes
oh my god stop before someone files issues about radv on windows not working
oh dear
to be clear to any future windows user i will not waste my time booting up windows to debug shit
unrelated but I wonder if the windows driver supports GWS (I think not?)
lol
I'd guess it doesn't because the only user on linux is hip stuff which doesn't work on windooze
i have no knowledge of the amd kmd on windows
but i'd hazard that indeed it doesn't have any code to make it work
still want me to use GWS in radv? :frog_demon:
that's ambitious
tf windoze
I can't believe Windows would do this
windows has some weird ass sync primitives if you dig for it
@dark vortex apologies sire, did you have a windows build guide for mesa or am i misremembering
I have one for lavapipe
I never found a way that didn't involve building llvm sadly
holy fuck that was only a year ago
my perception of time is absolutely trashed
acceleration structures accelerate time
windows build kinda scuffed :unsurprised-electric-beast:
ye boii
was better than expected tbqh
@dark vortex nb i didn't need to build llvm
i'm gucci 🙂
heh
ye for radv that works ig
i imagine you need llvm for lavapipe tho 
for llvm-based software rasterizers building without llvm isn’t that optimal
but there is deterministic worst case performance and memory requirements
ssshh
i don't wanna attract him
also the driver is not being found
THOMAS
INFO | DRIVER: Located json file "C:\mesa\share\vulkan\icd.d\radeon_icd.x86_64.json" from registry "HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\Drivers"
INFO | DRIVER: Driver C:\mesa\share\vulkan\icd.d\radeon_icd.x86_64.json is not recognized as a known driver. It will be assumed to be active
DRIVER: Found ICD manifest file C:\WINDOWS\System32\DriverStore\FileRepository\u0389188.inf_amd64_cd9701bcd4981eb7\B389045\amd-vulkan64.json, version "1.0.0"
DEBUG: Searching for ICD drivers named .\amdvlk64.dll
DRIVER: Found ICD manifest file C:\mesa\share\vulkan\icd.d\radeon_icd.x86_64.json, version "1.0.0"
DEBUG: Searching for ICD drivers named C:\mesa\bin\vulkan_radeon.dll
well is vulkan_radeon.dll at the correct location?
oh wait can you try running vkvia
I remember this
or more accurately, the ptsd is back
vkvia might report some random windows error code when you try to load the DLL
I’m not sure exactly what field it was but there was some metadata inside the DLL that was missing and memedows returned the error code then
Whats up with drivers and three letter acronyms
my memory of the 2022 dll loading incident is sparse, the brain tends to suppress memories of traumatic events
i'm getting this
lets see
ok, i figured it out
for some reason radv is compiling a second dll, z.dll
that does not get found and then module loading fails
any meson whisperers in chat
that does not help because it tries to static link all manner garbanzo from windows (which can't be done)
but
dep_zlib = dependency('zlib', version : '>= 1.2.3',
fallback : ['zlib', 'zlib_dep'],
required : get_option('zlib'), static : true)
this worky
gah
vkvia still reports the same
might be the metadata issue now?
any recollection of what that was?
nice
the metadata req was vkvia-only then
so not sure how much applies
if all else fails, dot’s suggestion about windows loader snaps might help 🐸
I remember needing to use windbg and foraging through a lot of logspam tho
fug
vulkaninfo doesn't work either
so i guess i have to solve this
no pastduck meson files hanging around that solve this?
no
iirc I didn’t modify the meson files at all
no idea if anything changed (and if so, what)
haven’t seriously used windows for almost a year I think
:cat_pain:
you did kind of ask for it
that
adding version
i think its a matter of making an rc file to compile with the dll
welp
btw how do you intend to even make a windows winsys
given the kmd api is proprietary and unstable?
it not be
wut
windows has the wddm, which comes with a libdrm equivalent thing
which is a stable interface against the kmd
you don't talk to the kmd, just the same as linux
its in fact more encapsulated
yeah I did know about wddm
well you do talk more or less directly to kmd in linux after the initial bit
but in the wddm model you don't
didn’t know it had a libdrm thing that allows for custom submit etc
wdym?
well to submit your stuff you need to have a way to give the kmd a pointer to a cmdbuf that contains the pkt3 at least
and probably other vendor specific stuff as well?
yeah thats basically the api it has
give a gpuva to the "commands"
if you look at radv, it doesn't talk much to the kmd
seems to be the same on windows
should actually be
and so here we are
I’m still not sure how good an idea this but you did manage to capture my interest
you mean
in a feasibility way or in a "good idea" way
as in opening pandora's box
idk it feels kinda weird, I guess you could have a pandora’s box moment
but rationally thinking about it I doubt that needs to be a concern rn
do you have sth specific in mind
no
alrighty
well yes actually
[RADV]screen freeze and game crash
unplayable fix immediately
os windows 11
gpu some old ass radeon or smth
<end of issue report>
yeyeye
not that it’s a likely thing to occur
on the other hand testing against amd prop
if there are multiple icds, does the loader pick the first?
i'll switch to asking thomas
he loves this shit 
well wdym picking the first
well, how does it work when you have both radv and amdvlk
do you have to disable one?
no at least not on Linux
you get different phys devices for each one
(sauce: i have all 3 drivers on my desktop)
it should
yeah
wait
probably not gonna be doing stuff before i make stuff with the winsys now is it 😄
null winsys doin its best 
radv won’t really report physical devices unless you RADV_FORCE_FAMILY it I think
i guess thats a nice self contained place to start
you might have to do some phys device detection plumbing
i actually had a moment when i thought that windoze requires driver signing now
but that is only for kernel mode ones
thanks for the help, i imagine the next bit will take me a while
I’ll go sleep and debug hangs more tomorrow, let’s see who’s done first 🐸
enjoy! gn
apparently only vkvia is so picky
for just loading a driver the version is not needed
@dark vortex can I ask for a pair of eyeholes
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/amd/vulkan/radv_device.c#L999 creates some meta shaders
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/amd/vulkan/radv_device.c#L1034 creates the fallback cache
isn't this the wrong way around
radv pipeline caching is a bit weird I think, afaik meta shaders are cached a bit separately
don’t ask me how tho 🐸
well it was crashing for me until i moved the fallback cache creation before the meat shaders
well i mean, it is still crashing, but somewhere else now 😄
ye ok I see
the meta shaders are separately cached, which are stubbed out on win32
so it falls back to the global fallback, which is not created on time
all good projects lead to that gif
win32 fence: implemented
tune in next time to see it blow up!
ngl the common vk code is pretty bonkerific in mesa
owo what is this
D3D
sus
it's got a null terminator
source: trust me bro
ok but zig unironically has a generalized concept of "pointer with sentinel terminator" which I find to be very funny because no one does that except with C strings
TDR is achieve
its great how many names they have misspelled in this single api
hmmmmm, it seems like the device_coherent memories don't differ in creation from the incoherent ones 🤔 how does this work
wym
_D3DKMT_QUERYSTATSTICS_REFERENCE_DMA_BUFFER
statstics
mfw they took statstics in college
by the power of baba and booey I have done it
magnificient driver capable of running a single cmdcopybuffer
we will not enter the fact that i wasted hours on accidentally overwriting part of a CS with nop padding into the annals
tf
hololens
el mao
enum wraps to 32 bits on msvc C
what a classic classic bug
@onyx vector moment
forgot to put type, still compiled
C is the best
they see me unrollin'
they haltin'
visitin' and tryna catch me typing dirty
tryna catch me typing dirty
will i get immediate ban
lol
also i now know what acquirenextimage and queuepresent do and.. i didn't want to know
i somehow thought these happen on the gpu 🐸
clearly no mental model whatsoever
does rt work tho
i don't see why it wouldn't 😛
in fact Tear can finally into "hw"RT
well i mean when this is usable
#855613452118130729 message inb4 windows deck
🐸
noice
alright, lets get proper gpu side sync up (fortunately all the primitives seem to map to the same thing on the CP, so it shouldn't be that hard) + timeline semas
then i can try running vuk examples
if that works somewhat then CTS
i wonder why wouldn't I just implement binary semas and fences on top of timeline semas 😈
@midnight shore can meson tell me what command line was used to make a build?
do you mean you want to see list of commands during build (i.e. verbose mode) or..?
i mean i have a dir i have configured
can i ask meson what commandline i used to make it
ah there is
I don't know any options like that and there's nothing in build directory in meson-info or whatever that could be easily translated back to your command
oh what is it
meson-log.txt has
Build started at 2023-04-08T17:59:15.413712
Main binary: C:\Program Files\Meson\meson.exe
Build Options: -Dgallium-drivers= -Dvulkan-drivers=amd -Dbuildtype=debug -Dllvm=disabled '-Dprefix=C:\mesa' -Dlibdir=lib64
Python system: Windows
cool
heheh, this is why the speed is nonexistent
ahh probably because its reading from WC memory 
ah(h)
did that clear it up or
bas has two articles on it
In this article I show how reading from VRAM can be a catastrophe for game performance and why.
ye I know what wc memory is
and in fact I even read that article
it's just that I've never seen it abbreviated that way
like ESO
must've been jaker 1 who did
and apparently my compiler's ability to unroll abbreviations is limited
yeah, currently i am using intel's implementation of a software blit for presentation
but uhh, thats catastrophically slow if the image is not in host_cached
CTS time
rt cts when 
you tell me mr pixel
well they’re there and they should work (mostly)(on loonix that is, you tell me how it is on windows)
lets see soon
for now cpu cores going brr
btw the cts will def fail as i have not implemented the compoot queue
and i have already observed some sparkling corruption when using msaa 
yus sparse binding is optional
had to disable all the device group tests, I think they are broken
why is specifically this test failing to allocate memory?
ah right, thats the thing i put off implementing
most conformant driver
All=229684 Crash=1 Fail=56754 NotSupported=160895 Pass=12032 QualityWarning=2
managed to render a cut down version of the xml
jebus how do people do this
probably gonna start running smaller bits
christ almighty
--gtest_filter=...
where do i pass that?
mfw i ran cts for a good hour yesterday after losing the device around minute 5
metal 🤘
what's this
windbg
:frog_gone: martty's mesa misadventures
what does frog_gone mean in this context
how though, isn't the kernel interface entirely alien
and why
and who's paying for it
you
idk why are you running bda on an ancient system
but there are some practical benefits if its starts working nicely
it's just what shady uses
the interface to it is designed arround passing data into the shader as-if it was some kind of vararg fn call
ideally culminating at some point with unified addressing/memory, but we're not there yet
you misunderstood my comment
so the way I pass arguments to my kernels is I just yeet them into push constants (with spilling into an UBO mayhaps later)
indeed I did
but it seems like radv on windows is significantly more effort
i didn't think AMD would provide the relevant documentation, but I guess they do
but there is a very localized part of radv that talks to the kernel, so only that bits need to be written again
what's your patreon
https://shorturl.at/ghivN this is my wallet link
don't be a sussy and click it
good wallet link 
so how is radvindows going along
can't wait to tell r2d2 in the vk server that there is oss amd drivers for windows
oh no
you're lucky it was a joke
i haven't worked on it recently, it seems like large CSs lose the device
so i wanted to see what the submission looks like from amdvlk
gfx11
ah yes, the hw i definitely have
this wouldn't require a trap handler at all
hm
maybe not, you just gotta get the waves to stop once
maybe raise an exception in some other way or something
the point of the trap handler is to support breakpoints
with the amount of concurrency a gpu has i think the trap handler makes a lot of sense
you don't want the host for selection between threads, there are too many
fair yeah
the only thing needed is generous grant of time
ayuuh
an off-topic thing, but i started writing a tut
this is the first half
if someone takes a read and sees how it works, let me know
Grand Prone Ursine
Called Locutus
I cackled
Good read, a bit familiar since I am knowledgeable about this topic from birth but
Top and bop 
Oh that wasn't supposed to be a joke 
hanson is sponsoring the post
Anyways, nice post so far 
@midnight shore here
at first glance, I really like the diagrams, they really convey what happens, but the analogies are uh
delicious
Within a single drawcall, each stage can also finish out-of-order (remember that passing over a work item doesn't mean it is done).
However there is a guarantee that dependent work is correctly ordered: the GPU will not execute a fragment shader which depends on a vertex shader that has not yet been completed.
huh?
I guess you mean that stages within a draw can overlap in time
but that's not what comes to my mind when I read the bit I quoted
ok well
yeah nevermind
perhaps you could try not using "finish out of order" in any context at all
like just say that draw calls that were started execute concurrently or something
and that stages within a single draw call also execute concurrently
yes
there is a bit of fun in there, i think its not super distracting
yeah I didn't read it at normal pace so maybe it's fine
i can tone it up or down ofc
well I said "uh" because I wasn't sure about it
I'm not saying it's over the top
I'll re-read at normal pace once shrotly
We now ramp up the difficulty. We introduce a new draw (#5), but this time, we will have a logical depency between between the vertex shader of this draw and the fragment shader of draw #4.
wy us lot lttrs wh fw lttrs do trk
looks good overall I think (I have not yet read at normal pace)
what I think you could experiment with
if you are willing to spend time and/or force someone else to spend time for you
there are lots of diagrams that change relatively little from one to another
so I'd explore replacing them with an animated svg/css/js/whatever thing
and perhaps instead of queue being just a line with things branching off it, you could also say "vkCmdDraw" at each branch point, changing color of each one as that draw is started
and perhaps transpose the entire diagram (so queue flows top to bottom, and stages run left to right) but that's a very minor thing
I'd also try to use greater distance between hues in your stage pallette
because vs and fs look kinda close
in terms of color
specifically when you look at barrier/feeder control rules
are you volunteering 😳
no, I don't actually know how to make animated svgs or such
anyways, good feedback, thanks
but I've seen some on wikipedia
but do you mean just animation between successive states
not a single big video
cause i can see that being nice
yeah
so that you could explore transitions more easily
I think the way you depict diagrams is really good, it's really not overwhelming the reader with details
actually
actually probably ignore this
as it is now, flowing from left to right is good
and to fit labels like vkCmdDraw on top you could just rotate them 60 degrees or smth
yeah that sounds neat
thanks! thats great to hear
still great
they kinda go hand in hand
i think nano is just a fan of monsieur GPU
dango stands for
D
A
N
G
O
maybe you could somehow make stages flow to the right too, hmm...
that is intentional i guess
important to not ascribe too much temporality to concurrent things
yeah
it's at a point where if you add more detail it just becomes cluttered and overwhelming
yea nevermind
I guess dango stands for "DANG, nObody kekwd"
also ye I think the diagrams are really readable rn
also imo you should explicitly spell out the technical names for each analogy name you introduce
e.g. feeder doesn't have a technical googlable term in the article (I guess you could use "command processor"?)
yeah well I guess as it is, it's fine too
the main queue one is I guess on AMD
but it is definitely something that should be elaborated
how to map the mental model to API and GPU things
but haven't written that yet
unfortunately i only have the boring cache model to explain memory deps
I'm sure you can come up with a convoluted story about a pirate and a wizard or something
yes, but i don't want to add it if it doesn't help
Aye
Only thing it would help with is the dwindling attention span of the internet's population
excellent
i wonder if amd pulling the plug on polaris and vega spells radw
do it 
actually I just installed windows too 👀
perhaps one day where I have too much free time I shall Take A Look™️
ye
since i totally haven't just nerdsniped myself how do you figure out the native kmd interface thingy
well you write a vulkan program
then you trace the calls
it calls into the kernel via the d3dkmt interface
then you can dump the arguments
what you do for call tracing would've been my question here
windbg
but i kinda wanted to make sth nicer on that front
i did the dumping just using vs's debugger
but thats not ergonomic
ic
i've actually become really rusty in doing schtuffs in windows I notice
(or maybe I wasn't particularly great in debugging-fu ever before I switched to linux, that's also an option
)
btw does that spell "i will probably go back to it soon" or "i wish I could go back to it soon but it ain't happenin"
yea

but let me know if you wanna hacc on it together or sumfin
😳
need to set up all the windows toolchain stuffs but I might get to that later this evening
ye
^
ok nice
oh best part is
you also have the 6700xt
therefore 100% coverage with a single effort
ok gonna start setting things up
(otherwise I’d need to debug rgp segfaulting on a random capture and I don’t really want to do that right now)
i may or may not have fallen off a few tangents
story of me life
(i am stuck in a 640x480 windows update screen badly upscaled to 4k)
i am getting an aneurysm from reading #engine-dev
splendid
hnnnng forgot glslangValidator
though installing vulkan sdk will be benefishial in general
after that I should have it I think
sweet
i installed python 3.12, which removed distutils, which meson still uses
what the actual fuck
i installed python 3.11 now but when I execute python in cmd it launches a windows store instance with the python page
yeah nice, that worked
ok, meson setup went through this time
ah, they messed up the dxheaders dep, splendid
well "messed up" lol
it needs massaging to build radv on main
========== Build: 77 succeeded, 0 failed, 6 up-to-date, 4 skipped ==========
========== Build started at 20:24 and took 16,528 seconds ==========
```splendid
like a brick shithouse
next is making vulkaninfo pick it up
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\Drivers
put a key in here
at least thats what I did
oh i just set VK_DRIVER_FILES
that worked for me
it doesn't seem to find any devices tho
wdym?
well vulkaninfo outputs nothing
hmm maybe it crashes, isn't it supposed to print an error
with my branch?
yes
trying to figure out how I can get the vs debugger to launch random exes instead of random project names
you can just drag the exe onto the vs window
ah, it needs device->ws->query_value implemented
stub incoming
btw how do we want to do collaboration stuff (as in merging changes etc)
i do merge requests to your branch?
i can give you push rights?
we just pinky promise not to force push until its time to clean up
also works for me
inviteth
fixin' for a fix
vkcube do be runnin too
on the radw?
yes
heh
so where i got to was
cts has a big submit that just loses the device
also I think some things don't really work
like MSAA
hm msaa is a weird one
but i am happy that they didn't muck up the kmd interface
don't think you'd depend on lots of kernel stuff there
so
there is some part of amdgpu that i kind of just copied out
might need a second look if i got it right
if i didn't it might explain things
which test is this?
might be worth looking into d3dkmt interface dumping on the prop drivers with that test
do you have some pointers on how to do the do with dumping call params with windbg
ye, it would be worth
been a long time since I used the thing
not sure tbh
let me attempt to rake my brain
i used windbg to trace, not for the dump
but i imagine i would do the following
which is what i do by hand
bp at win32u!...D3DKMT call... -> when bp is hit, put bp on a frame above on the call -> RCX is the pointer to the struct taken by the call
do you recall where the list is?
wdym the list? list of all tests?
dEQP-VK.api.command_buffers.record_many_draws
found it
this has some variants
not sure which one was the deadly
bleh
did you also encounter
CMake Error at C:/Program Files/CMake/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find Python3 (missing: Python3_EXECUTABLE Interpreter)
Reason given by package:
Interpreter: Cannot use the interpreter "C:/Users/fried/.pyenv/pyenv-win/shims/python"
also i love how windows gives me "fried" as username, thank you windows for letting me choose this would definitely have been my choice
wtf
wtf indeed
especially if you consider this earlier message
Found PythonInterp: C:/Users/fried/.pyenv/pyenv-win/shims/python3 (Required is at least version "3")
great
there's about 5 reasons this message can appear
there is no way to distinguish between any of these 5 reasons
splendid
ah I think it just straight up cannot handle softlinks
i'll just try the stupid windows store thing
oh my fucking god my parents got a notification because I downloaded something from the windows store
jesus fucking christ
why does your install have parental controls lmao
I thought you freshly installed windows or something
anyways, I mustn't waste your precious time
yes but through association with my account (because fresh installs aren't possible without signing in anymore, congratulations for choosing win11) the parental controls demons have been awoken from their slumber once more
ebic
more cts build failures

oh stale cmake cache
what will happen sooner, cts build finishes or parental controls cut off my access or my dad disables the parental controls
funny, didn't want to waste ssd space with windows
now spinning rust is the bottleneck on builds
rust ruins the day yet again
the answer is cts build
dEQP-VK.api.command_buffers.record_many_draws_primary_2 repros the issue
hmmm
trying to break all d3dkmt functions only yields some in gdi32
namely D3DKMTOpenAdapterFromHdc and D3DKMTOpenAdapterFromGdiDisplayName
its not called that
hold on
they look like this
d3dkmt is the userland name, but this one is the kernel one or something
win32u!NtGdiDdDDISubmitCommand is the submit for ex
ah yes
those are a few more
hackerman voice I'm in
right as the music beatdrops hard, this is peak programming
g
ops
this is not the debugger window
i haven't begun to peak myself
you said the parameter structs are the same between the D3DKMT and the NtGdiDdDDI functions?
oh right one frame above
ye
just trying to find my way around the very first QueryAdapterInfo call
it's the same value in both frames so /shrug
hmm?
so what i found that worked was that i moved the bp into the caller, on the call
this way rcx is not clobbered
it may be that you can find it in the ntgdi* frame too, but i didn't look enough
not sure what you mean, I just looked at .frame /r 1 to determine the registers of the caller frame, and shrimply r to determine current frame regs
both turn out to the same value, rcx=00000009ca4fdc98
might be a special case tho
but also
so the rcx is not preserved from the calling frame
regs are not like variables when using the debugger
I know
i thought you'd know 😛
hence I'm still a bit confused what switching frames actually achieves here
so thats not what i did
i put a breakpoint on the caller
which will be hit next time
with a good rcx
aaah I see
so in the backtrace I choose the RetAddr of frame 0 or of frame 1 
as in does the RetAddr of frame 0 belong to the next instruction in frame 1 (in which case I'd probably want to break on the one of frame 0)
uhh
(it seems that yes it does)
good
what is going on here
hmm actually
I probably don't want to break at the return address of the NtGdi function
that gives me absolutely nothing because rcx has already been clobbered (since that is right after the NtGdi function executes)
me is very confuse
it is definitely not late or anything
me going clobbin' with the boys
so, when you say you break at the caller, you probably want to break before the function that ends up calling NtGdi* is called, right
(the call conv does not say to preserve rcx or does it)
uuuh do you remember what that was in windbg commands
i mean its a syscall kinda thang
just go one frame up and look for the call? break there?
righto
sorry, didn't do this bit with windbg
what did you do it with, vs debugger?
ye
hm
still not sure if I got the right data, looks weird
am now right at the start of the function that calls NtGdi* stuff
none of the calls in my stack (up to NtGdiDdDDISubmitCommand) look like that
perhaps compiler switcherood some things
ah ye I think I see it
so I'm supposed to break on exactly that call?
oh well
hmm, other idea
since we can call D3DKMTSubmitCommand ourselves we should be able to obtain the location of that as a module offset (which I presume is some random place in their proprietary kmd shim stuff if I understand the driver stack correctly?)
we can break on that too, and then we should always have the actual D3DKMT struct without callstack traversal magic
this is how it looks like in my driver atm
this is just one up from the NtGdiDdDDISubmitCommand
and i checked now that rcx is the correct ptr
one up from NtGdiDdDDISubmitCommand is
00007ffe`56885be7 498b86d0000000 mov rax,qword ptr [r14+0D0h]
00007ffe`56885bee 4c8ba890d30000 mov r13,qword ptr [rax+0D390h]
00007ffe`56885bf5 4d8ba5a8090000 mov r12,qword ptr [r13+9A8h]
00007ffe`56885bfc 498b842458220000 mov rax,qword ptr [r12+2258h]
00007ffe`56885c04 488d8d20010000 lea rcx,[rbp+120h]
00007ffe`56885c0b ffd0 call rax
yeah i just noticed actually
i think the one i have makes a lot of sense too
the rcx I had before that was from a NtGdiDdDDISubmitCommand call from inside CreateDevice
ahhh
maybe that had some weird stuff in it or something
ye
so uh, how do you calculate the driver private data size when submitting stuff
or rather what is is based on
private data is the secret bit
yeah
you can look at what radw does, but honestly i didn't manage yet to decode what is in there
radw does some weird calculations on constants
the result is quite drastically different from what the prop driver gives tho
ye
so ye, next step: figure out what those are ig
maybe with an intermediate step of making a dumper for the privatedata
well for today, I'll just try to properly decode the submit struct (and try to reconcile the private data with what you found out) (seems there are multiple chunks
) and then I think I'll go do the eep
yup
yeah if I had to guess it's pretty much just the IBs referenced by that submit, I think the unknown thingy might be some speshul flags thingy
bit random but I think tomorrow I'll try getting RADV_DEBUG=hang to work, perhaps it can help figure out where things go wrong
fwiw this is what I got out of that submit
ye that was my reversing as well
https://github.com/kkent030315/EvilHooker maybe this could help and it is also named well
My phone actually wanted to autocorrect "kernel" to "Jensen" 
Why do you want to dump d3dkmt calls
Explain like I have a bachelor's in computer science
graphics drivers have two parts, the user mode driver (umd) and the kernel mode driver (kmd)
in a somewhat weird way, even though the kmds are bespoke, they still kind of expose a sort-of-ish unified interface
in loonix this is the libdrm
on windows this is a bit more complex
but there is a direct interface which is the d3dkmt
dx12/dxgi goes through a different path
but anyways, so what the vulkan amd umd driver does is talk to the kmd, via the d3dkmt interace
we have an existing umd, radv
it needs to be taught how to talk to the windows kmd
(this is called the 'winsys')
so we are looking at the amd umd driver comms
@floral viper was that good for ELIHABICS?
It was, ty 
One of u should get a job at AMD so you can look at the code yourself
Alternatively, you can try requesting access
but then you can't write the code yourself
i believe people have repeatedly asked AMD to document these bits
but they did not
so
it is already running for a very limited definition of running
Let's say uh hello triangle
^
Let's say uh passing the entire Vulkan cts
that went 0-100 very fast
ye
current issue is that big submits fail bc we didn't figure out all the bits on the submit (probably)
then there will be an issue of having to port some of device stuff from the linux kmd or reversing device init
How big is big
don't recall, its one of the cts tests
then there will be some smaller bits that might require more massaging, like idk sparse and other randos
sounds good
how do I build radv on Windows
half joking because I won't be at my PC for a few days (at an ISV visit)
are you at geryu's
I am farther north
i have a mesa fork, and then its reasonably straightforward
What does this fork add? The bits that make it worky on Windows?
yea
you can already build mesa on windows
and radv, but its not superbly functional
Me irl
i will leave this explain with a meme, as customary
*this rad boy
@dark vortex am prepping sth nice
noice
didn't get around to do a lot today
watched some halloween (the series) movies instead
also nice
let me put this code up somewhere
so you can revel
inv sent
may weep (in private)
NEW CHUNK
0 - 1e0
PKT3_DMA_DATA ENGINE_SEL(1) SRC_CACHE_POLICY(0) DST_SEL(2) DST_CACHE_POLICY(0) SRC_SEL(3) CP_SYNC(0) SRC_ADDR_LO_OR_DATA(580000) SRC_ADDR_HI(3) DST_ADDR_LO(0) DST_ADDR_HI(0) BYTE_COUNT(1e0) DIS_WC(0) SAS(0) DAS(0) SAIC(0) DAIC(0) RAW_WAIT(0)
PKT3_EVENT_WRITE EVENT_TYPE(19) EVENT_INDEX(0)
PKT3_SET_CONTEXT_REG gfx1030.mmDB_RENDER_OVERRIDE(0)
PKT3_SET_CONTEXT_REG gfx1030.mmPA_SC_AA_CONFIG(0)
PKT3_SET_CONTEXT_REG gfx1030.mmVGT_TESS_DISTRIBUTION(d8181e0c)
PKT3_SET_CONTEXT_REG gfx1030.mmPA_SC_CONSERVATIVE_RASTERIZATION_CNTL(100000)
PKT3_SET_CONTEXT_REG gfx1030.mmVGT_LS_HS_CONFIG(0)
PKT3_SET_CONTEXT_REG gfx1030.mmPA_SC_BINNER_CNTL_0(19fc00a3) gfx1030.mmPA_SC_BINNER_CNTL_1(3ff0000)
PKT3_SET_CONTEXT_REG gfx1030.mmPA_SC_SCREEN_SCISSOR_TL(0) gfx1030.mmPA_SC_SCREEN_SCISSOR_BR(40004000)
PKT3_SET_CONTEXT_REG gfx1030.mmDB_DFSM_CONTROL(2)
PKT3_WRITE_DATA PFP(1) WR_CONFIRM(1) WR_ONE_ADDR(0) memory async(5) DST_ADDR_LO(1b0010) DST_ADDR_HI(1)
PKT3_SET_CONTEXT_REG gfx1030.mmTA_BC_BASE_ADDR(3002810) gfx1030.mmTA_BC_BASE_ADDR_HI(0)
PKT3_SET_CONTEXT_REG gfx1030.mmPA_SU_POINT_SIZE(80008) gfx1030.mmPA_SU_POINT_MINMAX(ffff0000) gfx1030.mmPA_SU_LINE_CNTL(8)
PKT3_SET_CONTEXT_REG gfx1030.mmDB_STENCILREFMASK(1ffff00) gfx1030.mmDB_STENCILREFMASK_BF(1ffff00)
PKT3_SET_CONTEXT_REG gfx1030.mmDB_DEPTH_BOUNDS_MIN(0) gfx1030.mmDB_DEPTH_BOUNDS_MAX(3f800000)
PKT3_SET_CONTEXT_REG gfx1030.mmPA_SC_WINDOW_SCISSOR_TL(80000000) gfx1030.mmPA_SC_WINDOW_SCISSOR_BR(40004000)
i set up dumping of IBs
and decoding
i detoured the d3dkmt stuff so we can bookkeep
its p nifty
oi nice
gfx1030.mmPA_SU_POINT_MINMAX(ffff0000)
will these be renamed if you change ceo @floral viper
That naming looks fine
alternate timeline: jensen huang, ceo of amd, introduces PA_HUANG_POINT_MINMAX registers
ok now with a bit more eep let's take a look with them noice tools
did git just sarcastically clone this repo
do you have anything against printing the reg fields on separate lines like so
PKT3_ACQUIRE_MEM
ME(1)
COHER_CNTL(0)
CP_COHER_SIZE(ffffffff)
CP_COHER_SIZE_HI(ff)
CP_COHER_BASE(0)
CP_COHER_BASE_HI(0)
POLL_INTERVAL(a)
PKT3_EVENT_WRITE
EVENT_TYPE(f)
EVENT_INDEX(4)
PKT3_EVENT_WRITE
EVENT_TYPE(24)
EVENT_INDEX(0)
esp with lots of fields I think it's moar readable
not sure but for me, I get "unmappable", which is probably because the driver already has the stuff mapped so we can't doublemap it
setting up detouring of D3DKMTLock calls and intercepting mapped ptrs as we speak
also wtf is D3DKMTLock (non-2)
ima skip that for now lol
weird
just the older api
ye but they have some weird ass "array of pages to map" shit in there
ah do you have rebar on
suppose I do yea
i don't
then ye with rebar it might keep shit mapped
alternatively just unmap it lol
then map it again
fuck you driver
lmao
this is really weird tho
they have an INDIRECT_BUFFER packet that apparently jumps to 0x30001c000, but none of the cmdbufs in the submit are located there 
it doesn't have to be in the submit data, no?
i think we just need to follow indirects the same way
map and decode
yeah, doin' it rn
it's a bit annoying because I can't write any locals from the umr decode callbacks
right now I just write globals which isn't very clean but it should work ig
it just needs to hold together until we figure it out
looks like it werks
just some more state setup
what I'm not sure about is how all of the IBs are connected
if I had to guess they're probably meant to be executed one after the other but it'd be up to the kmd to actually chain them together then
yeah
i think imma try ramping up the number of draws and seeing if anything changes
even at 100k draws there isn't really a whole lot of difference
i looked for this like 5 times now so ima just pin it so I don't have to look 6 times
the test falls apart at exactly 16384 draws
that number still works but anything above that goes boom
the ib size doesn't change when going from 16384->16385 though, so something else is probably going on
I'm also not completely sure if the GPU actually hangs?
I don't get any device_lost or anything, nor do I see TDR mentioned anywhere in system logs
It Just Fails™️
nice
ah i recall now
i think the device might be lost, its just never uncovered if you don't run a second test
actually the test starts failing at 16385 draws, but the GPU doesn't start hanging there
it hangs at some other number I guess
tfw installing wsl for grep because windows commandline tools are just that garbage
i also noticed the chunk's cmd size member is less than the actual cmdbuf's size probably
I record the draw 100k times but there are only 6k packets in the log, and the size is also onlx 0xfc00
it seems like the size indeed maxes out at 0xfc00
not sure how that is supposed to work 
splits ibs at that size or smth?
there are no additional ibs in that chunk
humm
I am pretty sure the ib continues beyond that size, too
I don't think any ib just ends with the nth draw packet, we just stop parsing there
that's not exactly it but ye you might be onto something
hehe yup exactly, they do split the IBs
huh it was always there in the file
ok not questioning it lmao
wonder if it's always safe to split into chunks of 0xfc00 dwords with packet sizes and stuff
don't think it is because there's packets with arbitrary sizes
gonna just put the splitting into radeon_emit logic directly
ah nvm the amdgpu winsys has splitting logic too
yoink
actually can't just yoink
rebasing on top of half a year of changes 
i need to rework submission quite a bit anyway
since the limit is so laughably low (less than 16KB, wtf?) I'm considering to just always allocate CS BOs with a size of that limit
then we just write into the mapped ptr directly and chain more stuff in cs_grow if necessary
not sure which winsys refactor you mean btw
the cs munging can be shared with loonix
he was mentioning sth
the cs munging in loonix is a bit overgrown and contains schtuff I'm not sure we need
it mainly is so cluttered because it has to account for queues where we can't chain shit
which is video stuff
no idea how that is supposed to work with the d3dkmt limits and I don't think it's very useful to worry about it
you had some code in there that dispatches the preamble cs separately, did that ever work?
hm ok
additional 6 mo of bikeshed has degraded my neural matter
but i was trying all manner of random shit :/
it's kinda necessary now because we can't just patch this in
but it doesn't seem like any commands from the non-preamble cmdbuf go through atm
oh we lose the device too lol
interesting, this must be hanging the CP
so there's no soft recovering out of this one
and yet, my session remains completely unharmed
superior os you say
2 things
- messed up size of private data
- forgor to pad
with that I can at least submit the preamble stuff again without things blowing up
I’m trying to get submitting anything but the preamble working rn
messed up size once more
now I'm hanging the gpu properly
:O
hmmm
ok, so I seem to have figured things out
we shouldn't set any flags for the preambles, both should be 0
with that, I can get the smoke tests to pass again
on some things
perhaps they control some scheduling/sync stuff or w/e
prop does sometimes submit 2 IBs with both flags being 0 I think
gib updated dadachum
you mean the log i got?
i pushed my changes so far too
the draw is now executed 200k times, if that's a problem you can just revert the last commit
also uh the 2 ib with both flags 0 is not where I thought it'd be, it's not generated by the main submit at least
it works for us tho so ¯_(ツ)_/¯
ngl the takeaway will be that the prop driver does various random stuff out of superstition
yesh
