#:frog_gone: martty's mesa misadventures
1 messages ยท Page 2 of 1
he will never let you fondle the bits
๐ฅน
what is a mension 
plural of mansion perchance?
@dense totem you are hereby obliged to help martty develop this debugger technology
it's already working, just that the rough bits (which is basically everything) need to be addressed
someone do addr2line in aco ๐ธ
i shall take a look if I can do some fun stuff as well when RMV is landed
it's kinda overdue but I have other priorities right now ๐ซ
no that's fine there's no deadline
ok
put slight pressure on Daniel as he's blocking the MR ๐ธ
maybe, as soon as I can get to debugging cases that need debuginfo
I think RT in general would benefit from it
inb4 it completely loses it
oh wait
my current Mr assumes there's one source SPIR-V
when doing byte offsets
I guess that will need to be amended somehow to identify which spv it came from ๐ธ
perhaps hash (shader identifier) would do?
can you link the mr again? would like to read into it a bit
99% complete, only 99% left
thanks
there's an MR that integrates it into brw (this is incomplete) which would be used as reference
Attached: 1 video
The GPU debugger I have been working on this xmas break!
Below is a proof of concept (running on the deck, gfx10.3).
The pixels changing belong to a single wave that outputs to a fixed location, that is being selected for debugging - I have placed a breakpoint on an instr in the shader, reached by single stepping the instrs f...
vrooooooooooooooooooooooooooooooooooooooooooooooom
gf watched vid and we agreed that i have succeeded in making the most boring video in existence
i hope people read the text 
i didn't really help that much i guess (yet? radvgdb incoming??? ๐ณ๐ณ ๐ณ) but boost is least I can do 
i was thinking of calling it radbg
sauce reveal ๐ณ ๐ณ ๐ณ
i cannot in good faith post something that has
resume_wave(true);
fun fact, this actually stops all the waves
lmao
plus the issue is that i have some code in radv, some in aco, some in umr and some on its own
i want to bring the umr mods over first, then just have a mesa fork
hmm I wonder if we can upstream it someday
i am sure we can
it would be behind a flag i think
but it needs to be written
aco legwork ๐ธ
yeah
I was also considering making the actual debugger a separate executable with radv exposing some entrypoints for a backend
not sure how much other people would like a whole new private api tho ๐ธ
i don't follow
well I'm thinking of your usual CPU debugger workflow where you launch/attach your app inside of a debugger
ah this was not clear
this is that
beginning of the video, I "attach"
end of the video, I ctrl-c on the debugger, and the influence ends
well, mostly
ah yeah
I also read your message as "flag" = envvar and that was what I was not sure about, mostly
well yes
you need some cooperation from aco
but this can be just a flag when you load radv
that aco should generate debuggable shaders
hmm thinking about it again envvar actually sounds fine
you don't actually need any more cooperation from radv than that, except that you need to import a bo into the target process
but this can be done via a layer
if you want to launch apps from a debugger, execve exists, and otherwise doing something like RADV_SHADER_DEBUGGING=1 ./app seems like a reasonable requirement for attaching
nice
halting the waves behind radv's back ๐
yep, i implemented that
but its a bit less useful since i wanted the debugger to run on a remote sesh, while the target ran on a local sesh
but if you want to do this kinda "wave edit" stuff i have in the video, you could have them both on the local, ye
maybe useful sometimes
๐คซ
ah right ๐ธ
I kinda forgot that your abilities on the local sesh are, uh, limited while there are halted waves on the gpu
dementia gaming
currently running the debugger requires root
coz you are really plugging values left and right on the gpu
but if it works nicely, the used stuff can migrate into the kernel
and then it can work without privs
makes sense
not gonna lie, the output is very boring
it would be cool to see the whole screen
with the setting the breakpoint, you stepping it, etc
it's very cool but what you show of it is not, if that makes sense
you overestimate the capabilities lmao
see i can't have a video of something stepping
by hand
remember that freezes the shit out of the screen
hmm you do it thru code rn?
so it is stepping automatically
hmm
it would've been cool to step in there, change v0 to 1.0 and then the pixel turns red
but its not how it works lol
fair do
i know it be boring
but things freezing up really throws a wrench into making a video
yeah makes sense
it would be neat to see el code, and a bigger explanation
lookin forward to writeup

yeah toot be bad format for that
so i'll write it up
(guess why this thread really exists ๐ธ)
also i commissioned a banner from gf, hopefully arrives on time to adorn the blagpost
good froge banner 
also post this to LGD
i posted the contents already ๐ธ
can almost still write his age with a single hex digit and already demented smh
this is what alcohol Mesa dev does to OUR childrenโs brains!!!
Like and share ๐จ
if Iโm not mistaken your age and my age would have the same number of bits
ye when both are represented as floats, duh
wot my 3 character mesa MR was not covered by phoronix, i am shattered
Looking forward to the release of MGD (martty giraffix defrogger)
would it not be possible to have a usable system if you run the debuggee on a secondary GPU ?
possibly, i don't know enough of the inner workings
but if you compile shady to compute, that i think would work on a single gpu too
just launch it on a compute queue
what references did you use to make this happen
is there some comprehensive writeup on how the internals work
or is it mostly gleaned from source and talking to other smellies
notfromajedi.gif
i'm only interested in shady stuff anyways
notfromajaker.dogjiff
i am working on the writeup, which aims to be reasonably comprehensive on the touched bits
are there AMD pdfs explaining how to talk to the command processor, the rings etc ?
mostly looking at isa docs, radv, amdgpu, amdkfd and rocgdb
is that before or after the depth writeup 
no
awesome
dw
i have not forgotten

it is only the isa docs, and there are some quite outdated and limited docs on the 3d registers up to SI
but i kinda went in blind, hoping that it will work, despite not seeing code doing this thing
like gpr writing from the host
but it does work
for what purpose?
i'd like to understand how the GPU is talked to in general
why would you want pdf when you can read source (which is less likely to have errata than written docs) and ask people 
oh not even everyone at amd knows how the CP works
i don't think i'm worth the people's time, i just want to appreciate exactly what martty is doing
(and I don't just mean Jaker in this case
)
amd should get rid of packet format zoo and put a macro expanding thing like NV has
all the experts on AMD HW are employed by Valve so ๐ธ
most of what the debugger is doing is through banging at mmio registers
working the CP is a much finer art
but like, the KMD is doing all the talking right
you can't talk to the GPU without coordinating with the KMD and expert that to work
for the debugger?
in general
yes, in general, everything goes through the KMD
but for the userspace mmio it is just a relay
the KMD does not understand what is happening
Think of it as martty whispering sweet nothings into the cp's ear
I am looking forward to the write-up though
Maybe I'll learn how our own hardware works finally
the secret knowledge, i want it inserted into my brain through a high-speed cable plugged into the back of my neck
i have chosen the jekyll theme, so i am like 94.3% done
martty.github.io incoming? :frog_demon:
implying the writeup isn't just the correct incantation order to evoke the behaviour out of the magic rock
in fact that link is already live
nice
snazzy anchors
I wonder if gpuopen finally added those after I asked
subscribe for nft tips
newt, frog & toad tips
meta question, how should i refer to Samuel Pitoiset? like that? handle? Sam P? Pitoiset?
Iโd probably use Samuel Pitoiset or hakzsam
assuming this is in context of write up?
around 3k words already ๐ธ
4.5k now, got to the end roughly, ig will be around 5k when i am done
a five parter doorstopper
still have to fill out the code, refs, images, that stuff
broโs writing a book
a bruhk
i hashed out some ideas from what could be done in the future - any more ideas?
Memory violation debugging
No wave filtering required. Once a memory violation is detected, the wave can be stopped and examined from the debugger. Memory violations are generally not precise (the faulting PC is not the one issuing the access), but workarounds could be used to make it so. It might even be possible to stop the context loss from occuring.
Data breakpoints
Just as on the CPU, seems like some registers can be programmed to trap when interesting addresses are accessed.
Fragment debugging
Want to examine a variable written in the FS visually? Just swap out the color VGPRs before they get written!
True compute debugging
Shared memory, nonuniform behaviour is all accessible.
Source level debugging
Needs some significant amount plumbing for sure, but the GLSL -> SPIRV side is done. addr2line support in NIR is being worked on.
Arbitrary code execution
Additional code could be compiled by the debugger, then called by the shader.
Debugging barrier hangs
Waves can be stopped when stuck in barriers, and released to prevent context loss, while diagnosing issues.
Multi-wave debugging
More waves can be trapped and worked on at the same time.
Assertions in shader code
Can trap when hit with a debugger present.
Conditional breakpoints/tracepoints?
yus
also memory watch points
thats there
no idea about feasibility
that's already under data breakpoints
oh
also you can construct most of these in software probably anyways
<- cannot read
with some varying degrees of annoyance
add time travel debugging

multiverse debugging next pls
also I wouldn't call
Memory violations are generally not precise (the faulting PC is not the one issuing the access), but workarounds could be used to make it so.
workarounds
it should rather be said that the compiler needs to play along for a bit
a software feature
there is a hw bit that makes mem violations precise
ah I see
probably it is just forcing accesses to complete until the next instr is chewed
other than that seems good
the workaround i'd do is saving addr vgprs until accesses complete
so that is a compiler thing
just too many options :S
just say "but needs extra work to make it so"
don't like "workaround"? ๐
changed
call it a workthrough instead of a workaround
also I'd add a paragraph about this vs renderdoc/pix/etc
ah well renderdoc one should cover things
pix is basically the same in that it instruments shader and then replays things
i'll share a preview of the articles here before i post them, so i'll ask for some extra ๐ then
Time travel debugging maybe?
did jaker travel back in time to post this idea sooner 
yus
Now you're thinking with portal 2 RTX
remunged:
Memory violation debugging
No wave filtering required. Once a memory violation is detected, the wave can be stopped and examined from the debugger. Memory violations are generally not precise (the faulting PC is not the one issuing the access), but with extra work they can be made so. It might even be possible to stop the context loss from occuring.
Data breakpoints
Just as on the CPU, seems like some registers can be programmed to trap when interesting addresses are accessed.
Fragment debugging
Want to examine a variable written in the FS visually? Just swap out the color VGPRs before they get written!
Conditional breakpoints
Shader or host can evaluate arbitrary (even memory dependent) expressions to trigger breakpoints.
True compute debugging
Shared memory, nonuniform behaviour is all accessible.
Source level debugging
Needs some significant amount plumbing for sure, but the GLSL -> SPIRV side is done. addr2line support in NIR is being worked on.
Arbitrary code execution
Additional code could be compiled by the debugger, then called by the shader.
Debugging barrier hangs
Waves can be stopped when stuck in barriers, and released to prevent context loss, while diagnosing issues.
Multi-wave debugging
More waves can be trapped and worked on at the same time.
Assertions in shader code
Can trap when hit with a debugger present.
Time travel debugging
It is possible to record the entire state evolution of a wave, which afterwards the host can replay, freely stepping forwards and backwards.
get booey'd on, baba
the bababooey is you
Maybe you could do something like AOP :0
rapa nui
It's like an onion, this one
imagine playing obscure lil indie games that no one has heard about (except us)
whats AOP? advent ocasio-pointer?
advent of pode
But yes, cursed aspect oriented programming where you inject logging in between stuff you care about or do other cross-cutting concerns like dat
aspect oriented programming
bit more elaboration plox
I looked at two explanations on SO and didn't understand shit 
seems like super OOP from what I can tell
clep sprinkled some confusion crack on us then left
this is nice
MK ULTRA type beat
that sounds like a really weird way to say "tracing"
Well, it's one example of what you can do with it
in this moment, I am enlightened (and euphoric)
Aop is more about defining aspects, cross-cutting concerns in your code like logging, error handling, that kind of stuff. Then you patch these aspects in through reflection or somethin
Anyways it'd be really cursed 'cause you'd basically be patching stuff into your spirv
So it's essentially part of the arbitrary code execution bullet point tbf
Anyways
yeah i am not closer to understanding, but i wouldn't dare to question the fruity one
Tbh I only have a superficial understanding of it too, my dad talked to me about his implementation of it in C# once and a coworker mentioned it for the node-based stuff we were doing, but other than that not much
Essentially you add a prelude or postlude(??) to some unit of work
And it can be good for debugging, logging, synchronising stuff even
Really anything you might want to be handled automagically that would be boilerplate otherwise
aspect oriented programming is when the reflection does stuff
Am I speaking chinese or like, what is not understandable about this 
I'll have you know hangul is the world's best alphabet
The world's most aspect-oriented alphabet 
2023 aspect-oriented incident
i wuz her e
but anyways, its not the concept i have trouble with, its more like how it fits here
Well, maybe I misunderstand what you're actually doing tbqh, I shall await your writeup and then comment more
ok I will write 1 to 0xcafebabe once it is done, please wait kindly
@wanton carbon I hath returned
now my precision-guided munition-based threats work again ๐ป
an unrelated thought - how about a radv ext for loading hsa/isa?
embedded isa was a thought that came up before
what do you plan to use that for?
rocm is a mess
ah
I mean if it were for loading code you enter in debugger's prompt, I'd rather just experiment and try to maybe come up with a more specialized ext
but otherwise it seems like a question targeted at people who would be involved into implementing this rocm-on-radv thing
what was the conclusion?
that there aren't terribly many use cases for this ๐ธ
well, yea
but for testing hw, its the best
eg. figuring out hw bugs ๐ธ
https://martty.github.io/posts/radbg_part_1/ aight part 1! please let me know any feedback, no sharing yet pls kthxbai 
Intro GPUs are complex beasts - and certainly more mysterious in some ways than CPUs which come with ample amounts of documentation and manuals. Aspiring graphics programmers are sometimes left to scrounge old GDC presentations on performance tips and very little is known about the inner workings of some GPUs - too small to have a devrel contact...
you mean you don't want me to send this to my coworkers yet?
this blogpost will explode amd office computers (NOT FAKE)
well, let me know when this doesn't blow up our compoopers
read it on your phone (on the toilet, you pooper)
I feel like this sentence could use a comma after the word 'Linux', or the whole "AMD...Linux" clause could be parenthesized, idk
Furthermore, umr, AMDโs user-mode debugging facility for Linux also hinted towards the possibility of manipulating waves for graphics.
I almost forgot that BDA is core now
not something the OpenGL would teach you
this sentence has too many commas (or maybe clauses) methinks. it breaks my parser either way
The most well-known debugger under Linux, GDB, is a flexible software, it has a number of targets, which contain the knowledge on how to translate debugging actions (setting a breakpoint, reading the stackframeโฆ) are implemented on the specific architecture.
I would split it into two sentences and maybe remove the "is a flexible software"
(I hope this is the kind of feedback you are looking for lmao)
ye def
also for higher level stuff like "boring" "too detailed" and "stop smoking meth"
and especially "i have no idea what you are talking about"
this sentence isn't really a question, but I suppose you could keep the question mark as a stylistic choice
But perhaps we can be inspired by what it does to the GPU?
so far I have no suggestions on the content itself. It seems legit
there isn't much content btqh
but i thought some background is useful
difficult to say how much is enough to make the content enjoyable if you are not versed in the dark arts tho
too often do blogs go off into the deep end without proper background, so this is refreshing
people who know the ๐ฆle stuff already can skip it if they want, but it will be vital for the others
the return buttons on the footnotes are nice
can take no credit for that, but i agree
btw how do you determine if something belongs as a footnote vs in parens vs as a hyperlink?
i am also debating including "the worlds most boring video" as a (de)motivation at the end of the introduction
if you do, you better explain what's going on because that was conchfusing as heck the last time you showed us ๐
perhaps then it would be wise to include at the end
when everything has already been explained i guess
otherwise i can't say its a veegooper before explaining later what that is
my vees feel gooped rn
same
having fully read the thing, I can't wait to see what happens in part 2
I have already blocked all memory I have of being in #1053054445518323732
spoiler alert: not much
its just about configuring the deq
i may include the very best memes from this thread to keep people from gagging at too much bash
but then in part 3
how does it feel length-wise?
kinda short, but then again I already know all this stuff
someone who doesn't will have to unpack it more slowly
hmm, not sure how much the read time counter at the top lies
that felt accurate for me
but technical stuff will always depend on the reader's knowledge level
you should send this to someone outside this post and get their opinion
get me mum to read about wave status registers
finally something to read while amdvlk is trying to run control
(it is taking literal minutes to start up jaker pls fix)
I mean someone who perhaps knows about renderdoc and such, but not necessarily how AMD hardware works
which is closer to the average graphics dev methinks
i will move the rest of shitting on amdvlk to #1026957683921797120
speaking of which, brb
looks good from a quick read, nice work 
you forgor to share it in LGD 
now I did already
it means spread it to the whole world immediately, obviously
i cannot read
what is lgd btw
ligma linux gaming dev discord
Pixel has shown me the secret passage
luckily zero people have reacted to it so it effectively did not happen
its not a huge deal, but ye, wanna polish it up first
while we're at reviewing the draft, ishi's mastodon page link is a link to his maud.social page but on mastodon.gamedev.place
maybe it'd be better to link to maud.social directly
ye idk how
i tried ๐ฟ
ahh
๐ฆ i
you have a footnote explaining that KMD = kernel mode driver, but you only use KMD once in the entire blog๐ ฑ๏ธost, methinks you could just write "kernel mode driver" here and maybe introduce the abbreviation in a later article if it gets used there more
maybe it'd instead be worth instead adding a footnote to "vector GPR" that they are unrelated to GLSL/HLSL vectors and are just one 32bit value?
that's what made me bring up the whole footnote thing hehe
gonna do sleep then get back to u all
O neat
"Aspiring graphics programmers are sometimes left to scrounge old GDC presentations on performance tips and very little is known about the inner workings of some GPUs - too small to have a devrel contact to give insight."
The "too small to have a devrel contact to give insight" should be the first thing in that sentence imo, since you shift to talking about people in general
"Aspiring graphics programmers too small to have a devrel contact..." is a crisp way to start
"wealth of insight into the inner workings." You could change it to "said inner workings" or something similar, since you're repeating inner workings
Or "Too small to have a devrel contact, aspiring graphics programmers [...]"
"(unfortunately 3D registers documents are no longer published for new architectures)" feels a bit disconnected from the preceding sentence, you could add a "though" at the start to link between the two
Also is it 3d registers or 3d register
Genuinely unsure
Or "Without a devrel contact, small (in stature) graphics developers..."
Imagine being a homunculus graphics dev. So tiny that even your whole body weight can't press a key on a keyboard. Would be a sad life
I'm not quite there yet but if my buddies keep zapping me with that shrink ray I'll have to file for bankrupcy
but you'd be able to climb inside people to cure their brain worms
by giving them a lobotomy
"[...] I got curious - I have previously seen [...]" I'd put an "as I had previously seen" there instead
"but the conclusion was" probably "but his conclusion"
I'll be honest, everywhere a hyphen appears needs some refactoring
"(how to recover [...])" to "(how would one recover [...])"
RenderDoc also offers shader debugging - this is achieved by a form of CPU/GPU hybrid emulation of the shader code - since shader behaviour has a lot of implementation dependence, for faithful replay Renderdoc compiles small snippets of shader code, that are ran when the program is stepped, while the shader state is maintained in memory.
RenderDoc also offers shader debugging, which is achieved with a form of hybrid CPU/GPU shader code emulation. Since shader behaviour [sic] has a lot of implementation dependence, for faithful replay RenderDoc compiles small snippets of shader code that run when the program is stepped while the shader state is maintained in memory.
You put [sic] after the British spelling 
" and arbitrary computation on them (buffer device address)," could say "arbitary computation thereof" if u wanna be fancy, just a nitpick tho
:nouk: when
Btw RenderDoc needs to be capitalized in two places
yeah I deliberately didn't wanna remove his tone, but stuff is just a bit shuffled
too much passive voice
Can't tell if that's a jerk or serious
that's like the most annoying english grading comment
I still don't entirely know what passive voice is because I was in the lower level english classes
I assume its just a conspiracy to keep me from hitting word counts
It's when American reporters say "the cop's gun was fired" rather than saying "the cop fired the gun"
oh, then yeah that might actually be something to watch out for in this
the sentence indirection gets confusing when you're trying to describe something technical
"although there is an extension VK_EXT_device_fault" could be rephrased as "although the VK_EXT_device_fault extension exists"
"The debugger waits for a signal to happen, then accesses the state of the inferior (reading / writing memory), then resumes it. " that second "then" is superfluous
what would you replace it with
nothing
it won't make sense then
it's the first one that's superfluous actually
"these trigger when a piece memory is written" I think this is missing an "of"?
not sure, might be a technical term I don't know lmao
this sentence I'd rewrite as "The most well-known Linux debugger, GDB, is a flexible program with a number of targets which know how to translate debugging actions (setting a breakpoint, reading the stackframe...) to a specific architecture"
it says target ... architecture
so I think it means architecture targets, the way compilers have?
maybe
as in there are builds of gdb for different architectures
not sure how relevant that is to anything
oh. I tried interpreting it that way but it confused me
Targets (Debugging with GDB)
I think it's literally just that gdb has a concept called targets
it does explain it tho
it's a mapping from the gdb concepts to the architecture's way of implementing them I guess
I think martty should elaborate on that if he brings it up
I know strictly nothing about gdb ofc
Same
ok imma go to bed I'll continue reading after approximately 8 hours of sleep innit
Certified bruv moment
Bit cheeky
i left in some mistakes for u guys to pick at, as a treat :3
good morning sweetie
i'm glad the homunculus angle was not lost, i am striving to be inclusive
ong
nah all good stuff
ye the gdb point was that it essentially lets you add more targets while keeping the ui and all the other shit separate
this is what rocgdb does
its a piece of code that translates the abstract gdb commands to the specific hw / sw
crazy how it be like that
but dw, this was the primary thing i wanted to learn coz knowledge poisons the mind
its crazy that there be people out there not understanding what a target is, no?
my first thought as to what you meant by "target" in the post was indeed correct, but the way it was brought up conchfused me
rocgdb is AMDโs ROCm based gdb + custom target that can debug compute programs,
I think of target as "target platform", not this exact definition
I guess the way you wrote it in the blog was more in line with the GDB meaning of "target", but to someone who doesn't use GDB, it's confusing
i will put some more words around that bit to make it clearer
those are my 3ยข anyways (due to inflation)
so you did end up using harvey
but its never phrased as "the cop was fired"
howdy again
howdy
i am about to upload a renewed version where the hyphens have been murderised
nice
well, there is one hyphen that does make sense so far
I will not say which to keep you on your toes
i inherited the hyphens from my gramgram, how dare you
๐
"with amdkfd, which is a separate KMD from amdgpu, that graphics applications use."
could be rewritten as "with amdkfd, a separate KMD from amdgpu which graphics applications use"
privileges instead of "privilages"
ok updated now
noice
I have now finished reading
good stuff
I like that you take it kinda slow
was understandable even for me, a complete neophyte
thats good news
nono in fact i will add a tag with "difficulty-level:clep"
I'm honored/insulted
the natural state of man
hinsulted (new win32 type dropped)
sometimes when I'm really tired, I'll write typos like that constantly
tbh i find i make most typos because i start thinking of the next word that should be typed and then i start mixing in the letters
sometimes I consider using a spell checker for my blog, but then I don't
then I continue writing raw html (sigma masochistic personality)
the real spell checkers are the friends we made along the way
The disadvantage of such emulation is that due to computational requirements, emulating larger bodies of work is not currently done, essentially providing a single threaded view of the execution (no cross-invocation or cross-subgroup communication).
I didn't know that... Well renderdoc would have never been useful for me then anyways
i think for pixel shader it might run the quad for derivatives
amdkfd and amdgpu can coexist simultaneously?
i thought the former was some kind of headless thing
wdym? that you run two programs that use either?
both can't be the lowest level device driver, one would have to talk to the other
seems like amdkfd does in fact talk to amdgpu
and used to talk to radeon too
some of kfd has been merged into amdgpu, but far from all
posting date is one year behind :)
this begs the question, if radv can run compute shaders on the dedicated compute queue, what does amdkfd even do, just debugger support ?
the post is from Jan 15 though
guess martty intends to release it officiallyโข๏ธ on jan 15?
it's not yet Jan 15 2023 either
eh it requires me to put a date
wdym? it does nothing because radv doesn't talk to amdkfd
how long have you been working on this again 
what does it do for rocm compute that amdgpu doesn't already do for vulkan/gl compoot
so uh
the other 15 secret queues that exist in the hw
how hard could it be to teach rocgdb to tend to VK cs invocations ? surely not that hard at all then
zero hard
on a scale of zero to zero though
NaN hard
oh wait rocgdb
yes
i mean, nothing currently in there does it, so its just putting my code into rocgdb
but why would you wanna strap to the beautiful ecosystem of rocm when you don't have to
why not just put my code into gdb
there isn't really a gain from using rocgdb
the question is, what prevents rocgdb from working with vulkan right now, out of the box
for compoot i mean
it talks to amdkfd which doesn't know anything about your workload i imagine, because that doesn't go through amdkfd
but tbh i couldn't compile rocm so uhhh
who knows ๐
i just get the packages from ayymd
i got the old rocm support in anydsl dusted off this week to let a student work on a topic with it
(they're gonna try to talk to the RT hw from within rocm basically)
and another one in RDNA3
what i mean is that they don't FF bits for RT
it's a 100% improvement, try getting that from a NV card
so there can be no issue doing RT, because it is just software
do we know what nvidia does in detail ?
well everything except the (arguably small) bit the hw does I guess
whereas on a theoretical CUDA bog, it might require a hand cranked register
also the way rocgdb does debugging is not the way i have implemented, afaik
my way is the legacy way
i think
oh right, now i remember
i think rocgdb does the cwsr form of debugging, where it just pulls a wave from the device
that must hurt
but tbh i don't grok what rocdbgapi does fully
but lmk if you try and what happens, afaik at least the documentation talks about PM4 queues
but the conclusion was that graphics waves are most likely too complex to save and restore (how would one recover rasterization and output state?).
we're not save/restoring graphics waves, I'd just say graphics state here (because it's the rasterizer and FS output state and other bits that can't be save/restored, not shaders)
graphics waves cannot be saved and restored (due to additional rasterization and framebuffer state that would also need to be saved and restored)
like this?
almost, but I now have more questions
what is the definition of saving or restoring a wave here
does wave still exist when until it's "restored" or does it need to be "restarted" somehow
because if latter I can see that with graphics we can't do that
and I guess in that case this phrasing is fine
oh well I guess this detail doesn't matter anyway so it's all good
what if you said it like this to avoid the implication
because rasterization and framebuffer state cannot be saved and restored, graphics waves also cannot be saved and restored
in cwsr you quit the wave you have saved the state of, then when you need to restore you relaunch them with the appropriate GPR allocation then trap them immediately to reload the state
I see
let's just say that we can't restore graphics waves then, without note in parens
I misunderstood things initially
what is the implication Jaker? are you gonna hurt waves?
btw what is cwsr
compute wave save/restore
the implication (afaik) is that raster/fbo state cannot be saved and restored
so why not just say that
no
the important bit is that we can't restart graphics waves
not being able to save/restore FF state would be secondary problem here
I thought the inability to restore that state was what killed it, but I guess not
well its a dual thing - PS waves get launched by FF hw
and the FF hw waits for those waves to complete
so it needs to somehow identify them
atoclkotamotagi, kinda no way to pull that FF state and put it back
(according to our currently limited knowledge of the arcane machinations of the amd gpu internals)
maybe an extra sentence would help clarify things in that paragraph then
sounds like a plan
btw where did you get that screenshot of renderdoc
it looks almost unlike the renderdoc I've seen
imagine that it is bigger
will do
problem solved ๐
Wew this thread has almost 2k messages already. When did you create it?
Dec 18 ish?
scrolled up, 15
https://martty.github.io/posts/radbg_part_2/ part II draft (attn: @dark vortex DRAFT)
weow
shouldโve told me that before I made a bot that crawls this channel for links and shares them on my linkedin
phew, i was fearing that it was automatically submitting them to be printed on TP as well
good news
I recommend reinstalling all the packages that are already installed.
might be worth mentioning that to use pacman on deck you need to disable readonly mode
if only i did that
ah itโs in different section
guess normal people who read from top to bottom will not be affected by this
actually
guess normal people who read from top to bottom will not be affected by this
more rra
i am actually sadly once again hanging my gpu with bvh builds for no reason
maybe I should apply rra to the code then
from a few more quick looks, looks nice 
shouldn't the withered wojak be the deck
the witheredjak is martty
it's Van Gogh not vangogh 
bruh
van bruh
u mean Van Bruh
van goghment
I finally read the post and it was rather pleasant
I didn't take note of grammatical anomalies, though I did notice some
no diarrhea?
that will come in part 3 I hope
I don't remember when we (you?) adopted that, but 
this thread is inspiring me to start writing that "other" thing
too bad it's 5:30am and I need to schleep rn
schleep now, then we should start other thing too, yesyes
huhh ๐
cool stuff, huh
but it is still mbcp
don't think it interferes with the debooger
shadow regs are used to optimize shadow mapping
yeah I'm just not sure what this is for
wdym?
well what does this MR together with mbcp on achieve that wasn't previously done
I thought it might improve debugging experience in some cases (e.g. debugging non-graphics on GFX queue?) but not sure, just random hypothesis
i think it just enables gfx preemption
the gfx registers were not saved, so switching corrupts the CP state
"just" - i don't mean to downplay, its a cool thing
so you're saying graphics waves can be somehow suspended with this and you'll still get usable desktop?
no
its command level preemption
you drain the waves, switch the CP state, run some other tasks, switch CP back, run next cmd
that is my understanding
I see, so it's just fixing some state being lost/corrupted when preempting?
yes
I see
again, thats my understanding
ok time to make this code 20% less shit
this meme is from the future, you will need to read part 3 to understand
i managed to pull the debug enable code into the debugger, no longer need to do it before running the app, thats neat
now lets see what code is actually needed, and what is only there bc of superstition
(this was bc i misunderstood tf that register does and never unstalled the vmid which just hangs)
I read this and my first thought was "is that some gmod map"
not until you make it
as suspected, it isn't even needed to stall the vmid lol
in fact, practically nothing is required, i guess i had something else fucked up and went on a wild goose chase lol (sad lol)
@dark vortex somewhat degarbo'd code here: https://gitlab.freedesktop.org/martty/radbg-poc
still no chance of you running it locally but you can ๐
I shall take a look within 3-5 business days
if bda so good, explain this
posted part I on the interwebs, thanks all for being my rra
Making an AMDGPU debugger part I is now up: https://martty.github.io/posts/radbg_part_1/
An introduction to the problem, some background on debugging - and a plan on how to make the AMDGPU debugger!
I don't have twitter either
i beginning to understand how fwog is so advanced
I can only afford to be addicted to one social media platform
and risk missing a gob meltdown? weird
you frogor lgd for real this time 
you can post it if u want, more forganic that way 
i was waiting for the permission 
ah don't, dw
it was just the draft that i wanted to keep under wraps
@midnight shore or @raven vortex do you have a good description of what a ring is? am i correct that it is a ringbuffer, that contains the PM4 or other type packets?
A ring is a hardware queue that takes PM4, yeah
Hm wait, it doesnโt quite take PM4, it takes whatever launch commands to IBs (indirect buffers) that contains PM4
do you know if the queueing is implemented using ring buffers?
queueing?
s/queueing/the hardware queue ishi was talking about/
but it should be more granular i think than entire IBs
there is IB chaining which is basically a jump command you can place at the end of an IB, but not sure what granular is about
umr checks the ring halting, it seems a bit too coarse for it to wait for an entire IB to me, but maybe thats just how it is
OK, so we might want to differentiate the literal ring (the queue containing IBs) and the conceptual ring (which is a synonym of the hardware command processor)
where did you see the only IB bit? I see amdgpu writing all manner of pm4 to the rings
hmm
I've never looked at how it works tbh
the userspace is definitely recording IBs but maybe the direct submission format is PM4 too
Hmm
"launch" this IB is def a pm4 packet
i guess having IBs kinda makes ring stopping detection a bit iffy
how is ring halt detection performed by umr?
looks at if the ring pointers are moving
yeah the interaction with IBs might be a bit sussy then
@wanton carbon I did some messing about with ROCm today and i found that it works just fine without amdkfd at all, only rocgdb cares, also I did manage to bring down the system (well, the desktop really, restarting gdm still fixed it) with a broken kernel, and it failed to recover much like with shady compute shaders on the gfx queue
ofc now that I'm home and I try to repro the problem, the AMD repos go down -_-
wdym with works fine without amdkfd? which part?
running rocm programs without rocgdb
this is what rocgdb has to say about that
/long_pathname_so_that_rpms_can_package_the_debug_info ๐ธ
it's very cursed
their packaging is super brittle too
i'm finding out that having any kernels but the supported one prevents amdgpu-dkms from building
fun times
ah, got the stupid kernel module built, after 3 attempts it caught something instead of hanging
i think this amdkcl module is new
the lack of proper debug info is on us though
nice sleuthing
ye my conclusion was that it may be better to stay out of the mess that is rocm currently
the problem is how amd understands open-source, yes they publish the source, but they package things up themselves and don't mainline stuff properly
ubuntu has repackaged some of the rocm things, and not only is AMD not acknowledging that, their packages conflict
they have different version schemes too, so if you want rocm-cmake from rocm 5.4, the version number for that is some long-ass string like 3.4.15145131.50400 ... and so ubuntu's rocm-cmake gets installed instead, but that's incompatible
endless fun
AMD also don't seem to give easy instructions for building individual components from source
so you either use their automated mystery meat scripts/megapackages, or you're on your own
some would argue that's not really opensource at all, more like source-available
yes
well no it's MIT i think, so it's proper libre or whatever
but the development process is not that open at all
any part 3 reviewers awake? https://martty.github.io/posts/radbg_part_3/
bueno
i will toot this together with part 2 so that the lizard part of my brain doesn't get too much of the drug chemical
noice
shit your initial trap handler mr was more than a month ago???
wtf where is my january, it's like last weekend was new years ๐ญ
Sam Pitoiset
I'd still use hakzsam
shit did a Sam remain
I thought I eradicated them all
fml
name elongenated 
don't really like the SRBM footnote, i'll rework
i should explain VMID instead
yeah that's a bit confoosing
smh the amdgpu doc is wrong saying there can be 16 concurrent processes
its just 15
check now?
sick burn
my 10 day Linux driver stack experience
wait you had like literally zero experience with anything like that before?
also interesting that you know your way around amdgpu virtual memory management
as part of RMV debugging I tried to follow the codepaths from ioctl entrypoint to where PTEs are updated and I got lost real quick
do I? now thats news to me ๐
well at least somewhat, or your pc is good at grepping kernel source
wouldn't have found the vega interrupt thingies at all lmao
heh heh
Wait did part 2 come out during my biweekly crack binge
Also it keeps telling me a new version of the content is available as soon as I open the page
why would I read any further when I know this is the best part
"However of gfx9 and above", rewrite to "However from gfx9 and above" maybe?
also, I think the hyphen in the first sentence should be a colon
I think you're supposed to use hyphens for important asides or something (like an important version of parens), idk
depending on the assumed knowledge of the reader, it might be useful to explain in slightly greater detail what "banked registers" are
The second sense in which the term banking is used for registers refers to the splitting of a set of registers into groups (banks) each of which can be accessed in parallel. Using four banks increases the maximum number of accesses supported by a factor of four, allowing each bank to support fewer access ports (reducing area and energy use) for a given effective access count.
https://electronics.stackexchange.com/a/102743
I guess my point is that some people might be confused when you just say "The GPU can have many of a certain type of register, organised into banks." without at least a sentence to explain the implication of banks
I dunno if it's a real problem. It all depends on how much you assume the reader knows or how willing they are to look this stuff up
good point
i'll add a banner
if you don't know what register banks are, turn off your computer and leave. you are not welcome
nah but good points, i'll make some edits
although this is not why they are banked in this case
kek even more reason to explain it
clep you have been in a coma for 25 years
you missed so much
carlo's engine can now render a cube
glad that you figured out what GRBM is, when I was playing with SQTT I looked everywhere but still couldnโt get an idea what that means exactly
i cannot figure out what it stands for however ๐
G is probably gfx, RB is register bank i guess, and m is martty
green, red, blue, metallic
yeah it seems like driver devs have not yet escaped the primal urge to make every code concept a vague acronym
I don't think I've seen this many in other places
its just more efficient to write bbby, instead of you know what
thanks all, parts 2 and 3 are up now ๐ฅณ (https://mastodon.gamedev.place/@martty/109735033572188329)
Making an AMDGPU debugger parts II and III are now up:
https://martty.github.io/posts/radbg_part_2/
https://martty.github.io/posts/radbg_part_3/
I'll show how to turn the Steam Deck into the Steam Devk in part II, and then we dive into the wonderful world of MMIO to install a trap handler in part III!
hype hype hype
the final thing holding back gpgpu
you mean stuff like unhandled page faults? iirc I read somewhere they might go into a trap handler as well
well, yes and no i guess
mem violations indeed go into the trap handler if enabled
but that makes more sense for the debugger
rather that you could have exceptions by triggering the trap handler, which could use the faulting PC to look up an unwind table, etc. etc.
its not like there is a stack to unwind or something currently, but gob is working on it 
that'd be a cursed setup
are you really alive if you haven't violated the rule of 5 on the gpu?
sure thing
FWIW not sure if you found it already but I found a trap handler as well
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/core/runtime/trap_handler/trap_handler.s
neat
smh nobody noticed that i forgor an s_rfe_b64 from the isa snippet
sry my integrated Neuralink compiler was busy lowering my dopamine levels so I would satisfy my cravings for a cold and refreshing Coca Colaโข๏ธ
to the gulag with ye
the co-lag
https://www.phoronix.com/news/AMDGPU-Compute-ISA-Debug-Kernel single-handedly forced a megacorp to develop months worth of debugger code in mere days ๐

well done
also yeah these patches show that i am missing some bits for the trap enablement
ironically i had them before, but removed because they didn't seem to do anything, but I guess the current code might hang sometimes
https://martty.github.io/posts/radbg_part_4/ once more unto the breach my dear comrades?
for the emperor
@ Bas
(i added a space so it doesn't ping people but) I think you could just leave the @ out
or say Bas Nieuwenhuizen probably
DERE IS NO TIME TO BE LOAST!
BATUL BROTHAS!
SPEHSS MAHRENS, TODEH THE ENEMEH IS AT OUA DOAR! WE KNOW OUA DUTEH AND WE WILL DO EET. WE FIGHT FOR OUR HONOR AS BLUD REHVENS, AS SPESS MAHRENS, AND WE FIGHT IN THE NEHME OF THE EMPRA!
AND IF WE DIE THIS DEH WE DIE IN GLOAREH, WE DIE HEROES' DEFFS, BUT WE SHALL NOT DIE, NO! IT IS THE ENEMEH WHO WILL...
felt like handles is the right amount of personalness
well the problem is that handles aren't globally coherent the same
@ Bas glc slc dlc then
in most other places his handle is bnieuwenhuizen
idk
the blooper reel is great I shall include that as well, should I ever write blogs
also

ah right, @MaybeBas
You misspelled step here
https://martty.github.io/posts/radbg_part_4/#single-stepp-and-wave-trace
Maybe you meant to write "single-schlepp"
use a piece of memory to communicate to the host (๐ hey host! i am in the trap handler now) - if we want to breakpoint on an instruction instead of just the wave, we can now enable single stepping
what does this mean btw
the steppenwolf
uhh, wdym
actually nevermind, it just stuck out to me as really weird, but I get now that putting s_trap at the beginning of the wave and single stepping to the intended instruction is just this current thing you're implementing, not how full proper breakpoints would work
its pretty much how full proper breakpoints would work
except you don't need to roundtrip to the host
Why does this heading have an octothorpe at the end of it
https://martty.github.io/posts/radbg_part_4/#busier-waiting
huh? I thought with proper breakpoints we'd just put s_trap on the instruction where we want to break or something
instead of putting it at the beginning of the wave and stepping until breakpoint
perhaps, i just don't know how instruction caches work exactly
so i can guarantee the step one works, but not this one
yeah makes sense
i uhh
dual anchoring, maximum stability
there's something called S_ICACHE_INV so I think that could be interesting future direction, you'd run that on trap return I think
but you set the bp from the host
so you need to find the mmio hook that invalidates icaches
well the halted wave is sitting in trap handler, right?
so you could just put that instruction into the trap handler
now i don't know what you are proposing then
I think what you might have in mind is: when we already have waves running a program and we set a breakpoint, the breakpoint might be ignored by hw because instruction caches won't be invalidated
is that it?
because I thought we were talking about how we'd resume from a breakpoint/after a breakpoint
i gtg now, lets continue later
that meme at the end hits different
overall a pretty good post. I can even comprehend most of it (which is cool, given that I haven't even looked at radv)
The only feedback I have is that hyphens feel a little overused
i removed so many hyphens already
will do another pass
I can excuse it as being part of your writing style
Ye im not a garbage writer but a garbage person, please understand
So you mean after a wave gets filtered, use instructions to get to a specific instr?
I think the tradeoff here is that you can have a million threads that will hit that instr now
So you need to filter again
which can be done, sure
smh i had only like 3 left
i guess you are an OG hyphen hater
we can pick this back up, if you want
I don't think that's what I mean
then i might need a bit more explanation
ye sorry
to ensure we're on the same track, the breakpoint construct you have in mind is that s_trap is placed on the start of the program and once it's hit, you single step to the instruction that we're intending to break on
right?
yep
well
more like you select the wave when that first s_trap is hit
then selected waves single step
but yes
yes, right
why can't we replace the instruction we intend to break on with s_trap?
then inside trap handler we could do extra conditions (filtering) and also before we return from trap handler we'd run the intended instruction
perhaps not in this exact form of course, but the idea should be clear
this should be possible as well
ok now let me remember the root of this discussion
but wait
anyway, I initially just misinterpreted the step in your write up as suggesting that as the current best method to construct breakpoints
so how it would work is you'd still trap the wave at the beginning, then replace the instruction with another trap, then in that second trap you need to run the intended instruction for all waves that were originally not trapped, then keep those waves trapped until our originally trapped wave executes the trap, then host debugger time, then back to trap handler, undo the changed instruction, release all waves with icache invl
i think its a bit complex
if i'd knew more about icaches it could be made simpler
I don't understand, why do we need to trap the wave at the beginning?
i don't know the mmio hook for icache invl from the host
so any instr manip has to happen in the trap handler
it can't be through CP because then you can't set breakpoints while stopped
why? you could just write s_trap using e.g. SDMA or even directly (if shaders are in visible VRAM) and cache invalidation is covered in two ways:
if we set breakpoint before submitting command buffer, we can invalidate instruction caches (if there's manual action required to begin with) when we submit the command before
if we set breakpoint while shader was halted on a breakpoint, we need to invalidate instruction cache when wave resumes (this would happen inside trap handler)
this is conceptual in my mind of course, and relies on assumption that s_icache_inv does what it says
well, it depends on what cache s_icache_inv dumps, yes, i'd imagine it is for the current WGP
so thats not enough
if it is more, then it can be enough
i guessed right
I assume you have other waves in mind? I kinda felt like it's fine to just let them run impeded (for the current draw/dispatch) but maybe that needs more thought
(I'm assuming that if we overwrite the first int32 of an instruction with s_trap it will either see the old instruction or s_trap)
imagine you are stopped in a VS, now you possibly can't set a BP for an FS
riiight
hmm
maybe we should just put s_icache_inv at beginning of all waves then
i imagine that has perf implications ๐
that's not going to be full speed but better than single stepping through each wave
but its not each wave
well okay
s_icache_inv but gated behind same filtering thingy as the current approach does
its just the wave you care about
that could be possible
lets just find the mmio hook
yeah I imagine if there's a way to flush TLB from host in the middle of command buffer exec there must also be a way to dump all caches
possibly
problems is there are many units with icaches ๐
but i think SQC is a reasonable guess
thats the thing that drives waves
aah, and I guess CLIENT_INVALIDATE_ALL_VMID drops all TLBs?
or perhaps that's something stronger
anyway beside the point
i think you understand why i did the stepping instead ๐
yes
I certainly do, it's much simpler
I just misinterpreted that line in your write up as suggesting that approach as "the current best one" or something
perhaps not the best way to put it
"what you'd do when building a debugger, that's not a stepping stone thing intended to be replaced later"
no I didn't mean to say that the way you put it in write up is bad
just that my initial confusion is the cause for this discussion
4th and final Making an AMDGPU debugger post is out!
https://martty.github.io/posts/radbg_part_4/
All about trap handlers, fds and busy waiting waves in this final installment which completes the writeup about the proof-of-concept GPU debugger!
Post on Reddit and I'll upboat
They'll be jelly that you have achieved the second-lowest-level of graphics programming
should've written the debugger in assembly too smh
but where, /r/idiotsincars?
I was thinking r/okbuddyphd actually
getting good chuckles out of that sub
I have to confess something
I laughed because of the funny ducks
I don't know what the Brillouin Zone is
I looked at the wikipedia article and instantly experienced ego death
that's the point of the sub
too bad people post high school-tier stuff there frequently
its like the twilight zone, except in many details that make it somewhat different
they're the same, but different

