#:frog_gone: martty's mesa misadventures

1 messages ยท Page 2 of 1

void crow
#

jenson needs his 4th mension tho

wanton carbon
#

he will never let you fondle the bits

void crow
#

๐Ÿฅน

floral viper
#

plural of mansion perchance?

wanton carbon
#

its a house full of men for a manly man

#

no girls allowed

midnight shore
#

@dense totem you are hereby obliged to help martty develop this debugger technology

#

it's already working, just that the rough bits (which is basically everything) need to be addressed

dark vortex
#

someone do addr2line in aco ๐Ÿธ

midnight shore
#

@raven vortex ๐Ÿคฐ

#

nir debug info thing has been working for a while now

dark vortex
#

i shall take a look if I can do some fun stuff as well when RMV is landed

raven vortex
#

it's kinda overdue but I have other priorities right now ๐Ÿซƒ

midnight shore
#

no that's fine there's no deadline

raven vortex
#

ok

midnight shore
#

put slight pressure on Daniel as he's blocking the MR ๐Ÿธ

raven vortex
#

maybe, as soon as I can get to debugging cases that need debuginfo

#

I think RT in general would benefit from it

midnight shore
#

inb4 it completely loses it

#

oh wait

#

my current Mr assumes there's one source SPIR-V

#

when doing byte offsets

#

I guess that will need to be amended somehow to identify which spv it came from ๐Ÿธ

#

perhaps hash (shader identifier) would do?

dark vortex
#

can you link the mr again? would like to read into it a bit

raven vortex
dark vortex
#

thanks

midnight shore
#

there's an MR that integrates it into brw (this is incomplete) which would be used as reference

wanton carbon
#

Attached: 1 video

The GPU debugger I have been working on this xmas break!
Below is a proof of concept (running on the deck, gfx10.3).
The pixels changing belong to a single wave that outputs to a fixed location, that is being selected for debugging - I have placed a breakpoint on an instr in the shader, reached by single stepping the instrs f...

โ–ถ Play video
dark vortex
#

oh shid

#

where is my phone

#

need to boost boost boost

wanton carbon
#

vrooooooooooooooooooooooooooooooooooooooooooooooom

#

gf watched vid and we agreed that i have succeeded in making the most boring video in existence

#

i hope people read the text KEKW

dark vortex
#

i didn't really help that much i guess (yet? radvgdb incoming??? ๐Ÿ˜ณ๐Ÿ˜ณ ๐Ÿ˜ณ) but boost is least I can do froge

wanton carbon
#

i was thinking of calling it radbg

dark vortex
#

hmm yeah

#

btw you have da sauce code anywhere

wanton carbon
#

๐Ÿ˜ณ not yet

#

i will i promise

dark vortex
#

sauce reveal ๐Ÿ˜ณ ๐Ÿ˜ณ ๐Ÿ˜ณ

wanton carbon
#

i cannot in good faith post something that has
resume_wave(true);

#

fun fact, this actually stops all the waves

dark vortex
#

lmao

wanton carbon
#

plus the issue is that i have some code in radv, some in aco, some in umr and some on its own

#

i want to bring the umr mods over first, then just have a mesa fork

dark vortex
#

hmm I wonder if we can upstream it someday

wanton carbon
#

i am sure we can

#

it would be behind a flag i think

#

but it needs to be written

#

aco legwork ๐Ÿธ

dark vortex
#

yeah

#

I was also considering making the actual debugger a separate executable with radv exposing some entrypoints for a backend

#

not sure how much other people would like a whole new private api tho ๐Ÿธ

wanton carbon
#

i don't follow

dark vortex
#

well I'm thinking of your usual CPU debugger workflow where you launch/attach your app inside of a debugger

wanton carbon
#

ah this was not clear

#

this is that

#

beginning of the video, I "attach"

#

end of the video, I ctrl-c on the debugger, and the influence ends

#

well, mostly

dark vortex
#

ah yeah

#

I also read your message as "flag" = envvar and that was what I was not sure about, mostly

wanton carbon
#

well yes

#

you need some cooperation from aco

#

but this can be just a flag when you load radv

#

that aco should generate debuggable shaders

dark vortex
#

hmm thinking about it again envvar actually sounds fine

wanton carbon
#

you don't actually need any more cooperation from radv than that, except that you need to import a bo into the target process

#

but this can be done via a layer

dark vortex
#

if you want to launch apps from a debugger, execve exists, and otherwise doing something like RADV_SHADER_DEBUGGING=1 ./app seems like a reasonable requirement for attaching

dark vortex
#

halting the waves behind radv's back ๐Ÿ˜ˆ

wanton carbon
#

yep, i implemented that

#

but its a bit less useful since i wanted the debugger to run on a remote sesh, while the target ran on a local sesh

#

but if you want to do this kinda "wave edit" stuff i have in the video, you could have them both on the local, ye

#

maybe useful sometimes

wanton carbon
dark vortex
#

ah right ๐Ÿธ
I kinda forgot that your abilities on the local sesh are, uh, limited while there are halted waves on the gpu

#

dementia gaming

wanton carbon
#

currently running the debugger requires root

#

coz you are really plugging values left and right on the gpu

#

but if it works nicely, the used stuff can migrate into the kernel

#

and then it can work without privs

dark vortex
#

makes sense

autumn notch
#

not gonna lie, the output is very boring

#

it would be cool to see the whole screen

#

with the setting the breakpoint, you stepping it, etc

#

it's very cool but what you show of it is not, if that makes sense

wanton carbon
autumn notch
#

mayhaps mayhaps

wanton carbon
#

see i can't have a video of something stepping

#

by hand

#

remember that freezes the shit out of the screen

autumn notch
#

hmm you do it thru code rn?

wanton carbon
#

so it is stepping automatically

autumn notch
#

hmm

wanton carbon
#

it would've been cool to step in there, change v0 to 1.0 and then the pixel turns red

#

but its not how it works lol

autumn notch
#

fair do

wanton carbon
#

i know it be boring

#

but things freezing up really throws a wrench into making a video

autumn notch
#

yeah makes sense

#

it would be neat to see el code, and a bigger explanation

#

lookin forward to writeup

wanton carbon
#

yeah toot be bad format for that

#

so i'll write it up

#

(guess why this thread really exists ๐Ÿธ)

wanton carbon
dark vortex
#

good froge banner froge

wanton carbon
#

i posted the contents already ๐Ÿธ

dark vortex
#

oh right

#

dementia gaming

wanton carbon
#

can almost still write his age with a single hex digit and already demented smh

dark vortex
#

this is what alcohol Mesa dev does to OUR childrenโ€™s brains!!!

Like and share ๐Ÿ˜จ

dark vortex
wanton carbon
wanton carbon
#

wot my 3 character mesa MR was not covered by phoronix, i am shattered

floral viper
#

Looking forward to the release of MGD (martty giraffix defrogger)

onyx vector
#

would it not be possible to have a usable system if you run the debuggee on a secondary GPU ?

wanton carbon
#

possibly, i don't know enough of the inner workings

#

but if you compile shady to compute, that i think would work on a single gpu too

#

just launch it on a compute queue

onyx vector
#

what references did you use to make this happen

#

is there some comprehensive writeup on how the internals work

#

or is it mostly gleaned from source and talking to other smellies

floral viper
onyx vector
#

i'm only interested in shady stuff anyways

wanton carbon
#

notfromajaker.dogjiff

#

i am working on the writeup, which aims to be reasonably comprehensive on the touched bits

onyx vector
#

are there AMD pdfs explaining how to talk to the command processor, the rings etc ?

wanton carbon
#

mostly looking at isa docs, radv, amdgpu, amdkfd and rocgdb

floral viper
onyx vector
#

awesome

wanton carbon
#

i have not forgotten

floral viper
#

same, I still have the tab open

#

no one else wrote anything yet so we're still good KEKW

wanton carbon
wanton carbon
# onyx vector awesome

it is only the isa docs, and there are some quite outdated and limited docs on the 3d registers up to SI

onyx vector
#

that's what i feared

#

would you say the 3d registers pdf is useful

#

i never read it

wanton carbon
#

but i kinda went in blind, hoping that it will work, despite not seeing code doing this thing

#

like gpr writing from the host

#

but it does work

wanton carbon
onyx vector
#

i'd like to understand how the GPU is talked to in general

midnight shore
#

why would you want pdf when you can read source (which is less likely to have errata than written docs) and ask people listenyoupieceofshit

onyx vector
#

how the command processor etc work

#

a high level view at least

wanton carbon
#

oh not even everyone at amd knows how the CP works

onyx vector
#

i don't think i'm worth the people's time, i just want to appreciate exactly what martty is doing

wanton carbon
#

(and I don't just mean Jaker in this case KEKW)

midnight shore
#

amd should get rid of packet format zoo and put a macro expanding thing like NV has

onyx vector
#

all the experts on AMD HW are employed by Valve so ๐Ÿธ

wanton carbon
#

most of what the debugger is doing is through banging at mmio registers

#

working the CP is a much finer art

onyx vector
#

but like, the KMD is doing all the talking right

#

you can't talk to the GPU without coordinating with the KMD and expert that to work

wanton carbon
#

for the debugger?

onyx vector
#

in general

wanton carbon
#

yes, in general, everything goes through the KMD

#

but for the userspace mmio it is just a relay

#

the KMD does not understand what is happening

floral viper
#

Think of it as martty whispering sweet nothings into the cp's ear

#

I am looking forward to the write-up though

#

Maybe I'll learn how our own hardware works finally

onyx vector
#

the secret knowledge, i want it inserted into my brain through a high-speed cable plugged into the back of my neck

wanton carbon
#

i have chosen the jekyll theme, so i am like 94.3% done

midnight shore
wanton carbon
wanton carbon
midnight shore
#

nice

wanton carbon
#

snazzy anchors

floral viper
#

I wonder if gpuopen finally added those after I asked

wanton carbon
#

its open

#

just right click -> inspect and add them yourself

floral viper
wanton carbon
#

subscribe for nft tips

dark vortex
#

newt, frog & toad tips

wanton carbon
#

meta question, how should i refer to Samuel Pitoiset? like that? handle? Sam P? Pitoiset?

dark vortex
#

Iโ€™d probably use Samuel Pitoiset or hakzsam

#

assuming this is in context of write up?

wanton carbon
#

yup

#

part I and III are mostly done

wanton carbon
#

around 3k words already ๐Ÿธ

wanton carbon
#

4.5k now, got to the end roughly, ig will be around 5k when i am done

#

a five parter doorstopper

#

still have to fill out the code, refs, images, that stuff

dark vortex
#

broโ€™s writing a book

wanton carbon
#

a bruhk

wanton carbon
#

i hashed out some ideas from what could be done in the future - any more ideas?

Memory violation debugging
  No wave filtering required. Once a memory violation is detected, the wave can be stopped and examined from the debugger. Memory violations are generally not precise (the faulting PC is not the one issuing the access), but workarounds could be used to make it so. It might even be possible to stop the context loss from occuring.
Data breakpoints
  Just as on the CPU, seems like some registers can be programmed to trap when interesting addresses are accessed.
Fragment debugging
  Want to examine a variable written in the FS visually? Just swap out the color VGPRs before they get written!
True compute debugging
  Shared memory, nonuniform behaviour is all accessible.
Source level debugging
  Needs some significant amount plumbing for sure, but the GLSL -> SPIRV side is done. addr2line support in NIR is being worked on.
Arbitrary code execution
  Additional code could be compiled by the debugger, then called by the shader.
Debugging barrier hangs
  Waves can be stopped when stuck in barriers, and released to prevent context loss, while diagnosing issues.
Multi-wave debugging
  More waves can be trapped and worked on at the same time.
Assertions in shader code
  Can trap when hit with a debugger present.
dark vortex
#

Conditional breakpoints/tracepoints?

wanton carbon
#

yus

dark vortex
#

also memory watch points

wanton carbon
#

thats there

dark vortex
#

no idea about feasibility

midnight shore
#

that's already under data breakpoints

dark vortex
#

oh

midnight shore
#

also you can construct most of these in software probably anyways

dark vortex
#

<- cannot read

midnight shore
#

with some varying degrees of annoyance

floral viper
#

add time travel debugging

midnight shore
dark vortex
#

multiverse debugging next pls

midnight shore
#

also I wouldn't call

Memory violations are generally not precise (the faulting PC is not the one issuing the access), but workarounds could be used to make it so.
workarounds

#

it should rather be said that the compiler needs to play along for a bit

#

a software feature

wanton carbon
#

well, there is a hw workaround

#

hence

midnight shore
#

oh

#

wat

#

what is it working around exactly?

wanton carbon
#

there is a hw bit that makes mem violations precise

midnight shore
#

ah I see

wanton carbon
#

probably it is just forcing accesses to complete until the next instr is chewed

midnight shore
#

could you clarify that it's a hw thing then

#

in the post

wanton carbon
#

but its only gfx10

#

so <10, you'd do sw

#

and anyways

midnight shore
#

other than that seems good

wanton carbon
#

the workaround i'd do is saving addr vgprs until accesses complete

#

so that is a compiler thing

#

just too many options :S

midnight shore
#

just say "but needs extra work to make it so"

wanton carbon
#

don't like "workaround"? ๐Ÿ˜„

midnight shore
#

no ๐Ÿธ

#

"workaround" is a bit too strong for me, like "hack"

wanton carbon
#

changed

floral viper
#

call it a workthrough instead of a workaround

midnight shore
#

also I'd add a paragraph about this vs renderdoc/pix/etc

wanton carbon
#

i have one about renderdoc

#

dunno much about pix

midnight shore
#

ah well renderdoc one should cover things

#

pix is basically the same in that it instruments shader and then replays things

wanton carbon
#

i'll share a preview of the articles here before i post them, so i'll ask for some extra ๐Ÿ‘€ then

autumn notch
#

Time travel debugging maybe?

wanton carbon
#

did jaker travel back in time to post this idea sooner thonk

floral viper
#

yus

autumn notch
#

Now you're thinking with portal 2 RTX

wanton carbon
#

remunged:

Memory violation debugging
  No wave filtering required. Once a memory violation is detected, the wave can be stopped and examined from the debugger. Memory violations are generally not precise (the faulting PC is not the one issuing the access), but with extra work they can be made so. It might even be possible to stop the context loss from occuring.
Data breakpoints
  Just as on the CPU, seems like some registers can be programmed to trap when interesting addresses are accessed.
Fragment debugging
  Want to examine a variable written in the FS visually? Just swap out the color VGPRs before they get written!
Conditional breakpoints
  Shader or host can evaluate arbitrary (even memory dependent) expressions to trigger breakpoints.
True compute debugging
  Shared memory, nonuniform behaviour is all accessible.
Source level debugging
  Needs some significant amount plumbing for sure, but the GLSL -> SPIRV side is done. addr2line support in NIR is being worked on.
Arbitrary code execution
  Additional code could be compiled by the debugger, then called by the shader.
Debugging barrier hangs
  Waves can be stopped when stuck in barriers, and released to prevent context loss, while diagnosing issues.
Multi-wave debugging
  More waves can be trapped and worked on at the same time.
Assertions in shader code
  Can trap when hit with a debugger present.
Time travel debugging
  It is possible to record the entire state evolution of a wave, which afterwards the host can replay, freely stepping forwards and backwards.
autumn notch
#

Noice

#

Also I had not seen jakongo's idea before posting mine, bababooey

floral viper
#

get booey'd on, baba

wanton carbon
#

the bababooey is you

autumn notch
#

Maybe you could do something like AOP :0

floral viper
#

rapa nui

autumn notch
floral viper
#

imagine playing obscure lil indie games that no one has heard about (except us)

wanton carbon
#

whats AOP? advent ocasio-pointer?

floral viper
#

advent of pode

autumn notch
#

But yes, cursed aspect oriented programming where you inject logging in between stuff you care about or do other cross-cutting concerns like dat

floral viper
#

watbulb aspect oriented programming

dark vortex
#

I do aspect oriented programming

#

my code is shit in exceptionally many aspects

floral viper
#

I looked at two explanations on SO and didn't understand shit nice

#

seems like super OOP from what I can tell

wanton carbon
#

clep sprinkled some confusion crack on us then left

autumn notch
#

Just a sec, im eatin da pizza

#

Pizza pieee

midnight shore
#

this is nice

floral viper
#

pocket confusion

autumn notch
#

MK ULTRA type beat

midnight shore
autumn notch
#

Well, it's one example of what you can do with it

floral viper
#

in this moment, I am enlightened (and euphoric)

autumn notch
#

Aop is more about defining aspects, cross-cutting concerns in your code like logging, error handling, that kind of stuff. Then you patch these aspects in through reflection or somethin

#

Anyways it'd be really cursed 'cause you'd basically be patching stuff into your spirv

#

So it's essentially part of the arbitrary code execution bullet point tbf

autumn notch
#

Anyways

wanton carbon
#

yeah i am not closer to understanding, but i wouldn't dare to question the fruity one

autumn notch
#

Tbh I only have a superficial understanding of it too, my dad talked to me about his implementation of it in C# once and a coworker mentioned it for the node-based stuff we were doing, but other than that not much

#

Essentially you add a prelude or postlude(??) to some unit of work

#

And it can be good for debugging, logging, synchronising stuff even

#

Really anything you might want to be handled automagically that would be boilerplate otherwise

floral viper
#

aspect oriented programming is when the reflection does stuff

autumn notch
#

Am I speaking chinese or like, what is not understandable about this nervous

floral viper
#

I'm just shitposting, dw

#

and fwiw it's more like you're speaking korean

autumn notch
#

I'll have you know hangul is the world's best alphabet

#

The world's most aspect-oriented alphabet KEKW

#

2023 aspect-oriented incident

wanton carbon
#

but anyways, its not the concept i have trouble with, its more like how it fits here

autumn notch
#

Well, maybe I misunderstand what you're actually doing tbqh, I shall await your writeup and then comment more

wanton carbon
#

ok I will write 1 to 0xcafebabe once it is done, please wait kindly

dark vortex
#

@wanton carbon I hath returned

wanton carbon
#

now my precision-guided munition-based threats work again ๐Ÿ˜ป

wanton carbon
#

an unrelated thought - how about a radv ext for loading hsa/isa?

dark vortex
#

embedded isa was a thought that came up before

midnight shore
wanton carbon
#

rocm is a mess

midnight shore
#

ah

wanton carbon
#

running isa itself can be useful

#

eg Nemes' microbenchies

midnight shore
#

I mean if it were for loading code you enter in debugger's prompt, I'd rather just experiment and try to maybe come up with a more specialized ext

#

but otherwise it seems like a question targeted at people who would be involved into implementing this rocm-on-radv thing

wanton carbon
#

no, its unrelated

#

i could implement it

#

its not terribly involved

wanton carbon
dark vortex
#

that there aren't terribly many use cases for this ๐Ÿธ

wanton carbon
#

but for testing hw, its the best

#

eg. figuring out hw bugs ๐Ÿธ

wanton carbon
#

https://martty.github.io/posts/radbg_part_1/ aight part 1! please let me know any feedback, no sharing yet pls kthxbai frogeheart

floral viper
#

you mean you don't want me to send this to my coworkers yet?

wanton carbon
#

this blogpost will explode amd office computers (NOT FAKE)

floral viper
#

well, let me know when this doesn't blow up our compoopers

wanton carbon
#

read it on your phone (on the toilet, you pooper)

floral viper
#

I feel like this sentence could use a comma after the word 'Linux', or the whole "AMD...Linux" clause could be parenthesized, idk

Furthermore, umr, AMDโ€™s user-mode debugging facility for Linux also hinted towards the possibility of manipulating waves for graphics.

#

I almost forgot that BDA is core now

wanton carbon
#

not something the OpenGL would teach you

floral viper
#

this sentence has too many commas (or maybe clauses) methinks. it breaks my parser either way

The most well-known debugger under Linux, GDB, is a flexible software, it has a number of targets, which contain the knowledge on how to translate debugging actions (setting a breakpoint, reading the stackframeโ€ฆ) are implemented on the specific architecture.

#

I would split it into two sentences and maybe remove the "is a flexible software"

#

(I hope this is the kind of feedback you are looking for lmao)

wanton carbon
#

ye def

#

also for higher level stuff like "boring" "too detailed" and "stop smoking meth"

#

and especially "i have no idea what you are talking about"

floral viper
#

this sentence isn't really a question, but I suppose you could keep the question mark as a stylistic choice

But perhaps we can be inspired by what it does to the GPU?

#

so far I have no suggestions on the content itself. It seems legit

wanton carbon
#

there isn't much content btqh

#

but i thought some background is useful

#

difficult to say how much is enough to make the content enjoyable if you are not versed in the dark arts tho

floral viper
#

too often do blogs go off into the deep end without proper background, so this is refreshing

#

people who know the ๐Ÿฆle stuff already can skip it if they want, but it will be vital for the others

#

the return buttons on the footnotes are nice

wanton carbon
#

can take no credit for that, but i agree

floral viper
#

btw how do you determine if something belongs as a footnote vs in parens vs as a hyperlink?

wanton carbon
#

i am also debating including "the worlds most boring video" as a (de)motivation at the end of the introduction

floral viper
wanton carbon
#

perhaps then it would be wise to include at the end

#

when everything has already been explained i guess

#

otherwise i can't say its a veegooper before explaining later what that is

floral viper
#

my vees feel gooped rn

wanton carbon
#

same

floral viper
#

having fully read the thing, I can't wait to see what happens in part 2

#

I have already blocked all memory I have of being in #1053054445518323732

wanton carbon
#

spoiler alert: not much

#

its just about configuring the deq

#

i may include the very best memes from this thread to keep people from gagging at too much bash

#

but then in part 3

#

how does it feel length-wise?

floral viper
#

kinda short, but then again I already know all this stuff

#

someone who doesn't will have to unpack it more slowly

wanton carbon
#

hmm, not sure how much the read time counter at the top lies

floral viper
#

that felt accurate for me

#

but technical stuff will always depend on the reader's knowledge level

#

you should send this to someone outside this post and get their opinion

dark vortex
#

oi

#

hype

wanton carbon
#

get me mum to read about wave status registers

dark vortex
#

finally something to read while amdvlk is trying to run control

#

(it is taking literal minutes to start up jaker pls fix)

floral viper
#

I mean someone who perhaps knows about renderdoc and such, but not necessarily how AMD hardware works

#

which is closer to the average graphics dev methinks

dark vortex
floral viper
#

speaking of which, brb

wanton carbon
#

ah jaker is off to fix amdvlk, nice

#

what a champ

dark vortex
#

looks good from a quick read, nice work frogeheart

#

you forgor to share it in LGD listenyoupieceofshit

#

now I did already

wanton carbon
#

wtf

#

bruh its a draft

#

what do you think that means lmao

floral viper
#

it means spread it to the whole world immediately, obviously

dark vortex
#

i cannot read

wanton carbon
floral viper
#

what is lgd btw

dark vortex
#

secret frog server

#

it hath been un-shared, effective immediately

wanton carbon
floral viper
#

Pixel has shown me the secret passage

dark vortex
#

luckily zero people have reacted to it so it effectively did not happen

wanton carbon
#

its not a huge deal, but ye, wanna polish it up first

dark vortex
#

while we're at reviewing the draft, ishi's mastodon page link is a link to his maud.social page but on mastodon.gamedev.place

#

maybe it'd be better to link to maud.social directly

wanton carbon
#

ye idk how

dark vortex
wanton carbon
#

i tried ๐Ÿ˜ฟ

dark vortex
wanton carbon
#

ahh

wanton carbon
#

๐Ÿฆ i

dark vortex
#

you have a footnote explaining that KMD = kernel mode driver, but you only use KMD once in the entire blog๐Ÿ…ฑ๏ธost, methinks you could just write "kernel mode driver" here and maybe introduce the abbreviation in a later article if it gets used there more

#

maybe it'd instead be worth instead adding a footnote to "vector GPR" that they are unrelated to GLSL/HLSL vectors and are just one 32bit value?

floral viper
#

that's what made me bring up the whole footnote thing hehe

wanton carbon
#

gonna do sleep then get back to u all

autumn notch
#

O neat

#

"Aspiring graphics programmers are sometimes left to scrounge old GDC presentations on performance tips and very little is known about the inner workings of some GPUs - too small to have a devrel contact to give insight."
The "too small to have a devrel contact to give insight" should be the first thing in that sentence imo, since you shift to talking about people in general

fallen nest
#

"Aspiring graphics programmers too small to have a devrel contact..." is a crisp way to start

autumn notch
#

"wealth of insight into the inner workings." You could change it to "said inner workings" or something similar, since you're repeating inner workings

autumn notch
#

"(unfortunately 3D registers documents are no longer published for new architectures)" feels a bit disconnected from the preceding sentence, you could add a "though" at the start to link between the two

#

Also is it 3d registers or 3d register

#

Genuinely unsure

floral viper
#

Or "Without a devrel contact, small (in stature) graphics developers..."

#

Imagine being a homunculus graphics dev. So tiny that even your whole body weight can't press a key on a keyboard. Would be a sad life

autumn notch
#

I'm not quite there yet but if my buddies keep zapping me with that shrink ray I'll have to file for bankrupcy

fallen nest
#

but you'd be able to climb inside people to cure their brain worms

floral viper
#

by giving them a lobotomy

autumn notch
#

"[...] I got curious - I have previously seen [...]" I'd put an "as I had previously seen" there instead

#

"but the conclusion was" probably "but his conclusion"

fallen nest
#

I'll be honest, everywhere a hyphen appears needs some refactoring

autumn notch
#

"(how to recover [...])" to "(how would one recover [...])"

fallen nest
#

RenderDoc also offers shader debugging - this is achieved by a form of CPU/GPU hybrid emulation of the shader code - since shader behaviour has a lot of implementation dependence, for faithful replay Renderdoc compiles small snippets of shader code, that are ran when the program is stepped, while the shader state is maintained in memory.

RenderDoc also offers shader debugging, which is achieved with a form of hybrid CPU/GPU shader code emulation. Since shader behaviour [sic] has a lot of implementation dependence, for faithful replay RenderDoc compiles small snippets of shader code that run when the program is stepped while the shader state is maintained in memory.

floral viper
#

You put [sic] after the British spelling KEKW

autumn notch
#

" and arbitrary computation on them (buffer device address)," could say "arbitary computation thereof" if u wanna be fancy, just a nitpick tho

#

:nouk: when

floral viper
fallen nest
#

yeah I deliberately didn't wanna remove his tone, but stuff is just a bit shuffled

floral viper
#

Martty should put this in a Google doc so we can mark it up

#

We're grading an essay smart

fallen nest
#

too much passive voice

floral viper
#

Can't tell if that's a jerk or serious

fallen nest
#

that's like the most annoying english grading comment

#

I still don't entirely know what passive voice is because I was in the lower level english classes

#

I assume its just a conspiracy to keep me from hitting word counts

floral viper
#

It's when American reporters say "the cop's gun was fired" rather than saying "the cop fired the gun"

fallen nest
#

oh, then yeah that might actually be something to watch out for in this

#

the sentence indirection gets confusing when you're trying to describe something technical

autumn notch
#

"although there is an extension VK_EXT_device_fault" could be rephrased as "although the VK_EXT_device_fault extension exists"

#

"The debugger waits for a signal to happen, then accesses the state of the inferior (reading / writing memory), then resumes it. " that second "then" is superfluous

floral viper
#

what would you replace it with

autumn notch
#

nothing

floral viper
#

it won't make sense then

autumn notch
#

it's the first one that's superfluous actually

fallen nest
#

do you mean the first then

#

yeah

autumn notch
#

"these trigger when a piece memory is written" I think this is missing an "of"?

#

not sure, might be a technical term I don't know lmao

floral viper
#

seems like a missing word fo sho

#

fo rizzle

autumn notch
floral viper
#

what's a target btw

#

I didn't get that part

fallen nest
#

it says target ... architecture

#

so I think it means architecture targets, the way compilers have?

#

maybe

#

as in there are builds of gdb for different architectures

#

not sure how relevant that is to anything

floral viper
#

oh. I tried interpreting it that way but it confused me

autumn notch
#

I think it's literally just that gdb has a concept called targets

#

it does explain it tho

#

it's a mapping from the gdb concepts to the architecture's way of implementing them I guess

floral viper
#

I think martty should elaborate on that if he brings it up

autumn notch
#

I know strictly nothing about gdb ofc

floral viper
#

Same

autumn notch
#

ok imma go to bed I'll continue reading after approximately 8 hours of sleep innit

floral viper
#

Certified bruv moment

autumn notch
#

Bit cheeky

wanton carbon
#

i left in some mistakes for u guys to pick at, as a treat :3

floral viper
#

good morning sweetie

wanton carbon
#

i'm glad the homunculus angle was not lost, i am striving to be inclusive

floral viper
#

ong

wanton carbon
#

nah all good stuff

#

ye the gdb point was that it essentially lets you add more targets while keeping the ui and all the other shit separate

#

this is what rocgdb does

floral viper
#

I actually still don't know what a target is

#

ah, reading the docs explains it

wanton carbon
#

its a piece of code that translates the abstract gdb commands to the specific hw / sw

floral viper
wanton carbon
#

but dw, this was the primary thing i wanted to learn coz knowledge poisons the mind

#

its crazy that there be people out there not understanding what a target is, no?

floral viper
#

my first thought as to what you meant by "target" in the post was indeed correct, but the way it was brought up conchfused me

floral viper
#

I guess the way you wrote it in the blog was more in line with the GDB meaning of "target", but to someone who doesn't use GDB, it's confusing

wanton carbon
#

i will put some more words around that bit to make it clearer

floral viper
#

those are my 3ยข anyways (due to inflation)

wanton carbon
wanton carbon
autumn notch
#

howdy again

wanton carbon
#

howdy

#

i am about to upload a renewed version where the hyphens have been murderised

autumn notch
#

nice

#

well, there is one hyphen that does make sense so far

#

I will not say which to keep you on your toes

wanton carbon
#

i inherited the hyphens from my gramgram, how dare you

autumn notch
#

๐Ÿ˜”

#

"with amdkfd, which is a separate KMD from amdgpu, that graphics applications use."
could be rewritten as "with amdkfd, a separate KMD from amdgpu which graphics applications use"

#

privileges instead of "privilages"

wanton carbon
#

ok updated now

autumn notch
#

noice

#

I have now finished reading

#

good stuff

#

I like that you take it kinda slow

#

was understandable even for me, a complete neophyte

wanton carbon
#

thats good news

autumn notch
#

no it's not, make it harder to understand, do it now

#

#BringBackHyphens

wanton carbon
#

nono in fact i will add a tag with "difficulty-level:clep"

autumn notch
#

I'm honored/insulted

wanton carbon
#

the natural state of man

floral viper
#

hinsulted (new win32 type dropped)

wanton carbon
#

i have discovered another spicy typo i made

floral viper
#

sometimes when I'm really tired, I'll write typos like that constantly

wanton carbon
#

tbh i find i make most typos because i start thinking of the next word that should be typed and then i start mixing in the letters

floral viper
#

sometimes I consider using a spell checker for my blog, but then I don't

#

then I continue writing raw html (sigma masochistic personality)

wanton carbon
#

the real spell checkers are the friends we made along the way

onyx vector
#

The disadvantage of such emulation is that due to computational requirements, emulating larger bodies of work is not currently done, essentially providing a single threaded view of the execution (no cross-invocation or cross-subgroup communication).
I didn't know that... Well renderdoc would have never been useful for me then anyways

onyx vector
#

amdkfd and amdgpu can coexist simultaneously?

#

i thought the former was some kind of headless thing

wanton carbon
#

wdym? that you run two programs that use either?

onyx vector
#

both can't be the lowest level device driver, one would have to talk to the other

#

seems like amdkfd does in fact talk to amdgpu

#

and used to talk to radeon too

wanton carbon
#

some of kfd has been merged into amdgpu, but far from all

dark vortex
#

posting date is one year behind :)

wanton carbon
onyx vector
onyx vector
dark vortex
#

guess martty intends to release it officiallyโ„ข๏ธ on jan 15?

onyx vector
#

it's not yet Jan 15 2023 either

wanton carbon
#

eh it requires me to put a date

wanton carbon
onyx vector
#

how long have you been working on this again nervous

onyx vector
wanton carbon
#

debugger and other rocmy crap support

#

svm

#

xgmi

#

i dunno other bits and bobs

onyx vector
#

so uh

wanton carbon
#

the other 15 secret queues that exist in the hw

onyx vector
#

how hard could it be to teach rocgdb to tend to VK cs invocations ? surely not that hard at all then

wanton carbon
#

zero hard

floral viper
#

on a scale of zero to zero though

wanton carbon
#

ah

#

then NaN

onyx vector
#

NaN hard

wanton carbon
#

oh wait rocgdb

onyx vector
#

yes

wanton carbon
#

i mean, nothing currently in there does it, so its just putting my code into rocgdb

#

but why would you wanna strap to the beautiful ecosystem of rocm when you don't have to

#

why not just put my code into gdb

#

there isn't really a gain from using rocgdb

onyx vector
#

the question is, what prevents rocgdb from working with vulkan right now, out of the box

#

for compoot i mean

wanton carbon
#

it talks to amdkfd which doesn't know anything about your workload i imagine, because that doesn't go through amdkfd

#

but tbh i couldn't compile rocm so uhhh

#

who knows ๐Ÿ˜„

onyx vector
#

i just get the packages from ayymd

#

i got the old rocm support in anydsl dusted off this week to let a student work on a topic with it

#

(they're gonna try to talk to the RT hw from within rocm basically)

wanton carbon
#

amd doesn't really have RT hw

#

so should be no problem

onyx vector
#

hey

#

that's mean

#

they got that one instruction

dark vortex
#

and another one in RDNA3

wanton carbon
#

what i mean is that they don't FF bits for RT

dark vortex
#

it's a 100% improvement, try getting that from a NV card

wanton carbon
#

so there can be no issue doing RT, because it is just software

onyx vector
#

do we know what nvidia does in detail ?

dark vortex
#

well everything except the (arguably small) bit the hw does I guess

wanton carbon
#

whereas on a theoretical CUDA bog, it might require a hand cranked register

#

also the way rocgdb does debugging is not the way i have implemented, afaik

#

my way is the legacy way

#

i think

#

oh right, now i remember

#

i think rocgdb does the cwsr form of debugging, where it just pulls a wave from the device

onyx vector
#

that must hurt

wanton carbon
#

but tbh i don't grok what rocdbgapi does fully

wanton carbon
midnight shore
wanton carbon
midnight shore
#

almost, but I now have more questions

#

what is the definition of saving or restoring a wave here

#

does wave still exist when until it's "restored" or does it need to be "restarted" somehow

#

because if latter I can see that with graphics we can't do that

#

and I guess in that case this phrasing is fine

#

oh well I guess this detail doesn't matter anyway so it's all good

floral viper
#

what if you said it like this to avoid the implication

because rasterization and framebuffer state cannot be saved and restored, graphics waves also cannot be saved and restored

wanton carbon
#

in cwsr you quit the wave you have saved the state of, then when you need to restore you relaunch them with the appropriate GPR allocation then trap them immediately to reload the state

midnight shore
#

let's just say that we can't restore graphics waves then, without note in parens

#

I misunderstood things initially

wanton carbon
floral viper
#

btw what is cwsr

wanton carbon
#

compute wave save/restore

floral viper
#

so why not just say that

midnight shore
#

the important bit is that we can't restart graphics waves

#

not being able to save/restore FF state would be secondary problem here

floral viper
#

I thought the inability to restore that state was what killed it, but I guess not

wanton carbon
#

well its a dual thing - PS waves get launched by FF hw

midnight shore
#

and the FF hw waits for those waves to complete

#

so it needs to somehow identify them

wanton carbon
#

atoclkotamotagi, kinda no way to pull that FF state and put it back

#

(according to our currently limited knowledge of the arcane machinations of the amd gpu internals)

floral viper
#

maybe an extra sentence would help clarify things in that paragraph then

wanton carbon
#

sounds like a plan

floral viper
#

btw where did you get that screenshot of renderdoc

#

it looks almost unlike the renderdoc I've seen

wanton carbon
#

an old old presentation

#

the link is in the ๐Ÿฆถ ๐ŸŽถ

floral viper
#

it so smol

wanton carbon
#

imagine that it is bigger

floral viper
#

will do

wanton carbon
#

problem solved ๐Ÿ˜Ž

floral viper
#

Wew this thread has almost 2k messages already. When did you create it?

wanton carbon
#

scrolled up, 15

wanton carbon
dark vortex
#

weow

#

shouldโ€™ve told me that before I made a bot that crawls this channel for links and shares them on my linkedin

floral viper
#

meme game is on point

#

haven't read anything else yet

wanton carbon
#

phew, i was fearing that it was automatically submitting them to be printed on TP as well

#

good news

dark vortex
#

I recommend reinstalling all the packages that are already installed.
might be worth mentioning that to use pacman on deck you need to disable readonly mode

wanton carbon
#

if only i did that

dark vortex
#

ah itโ€™s in different section

#

guess normal people who read from top to bottom will not be affected by this

#

actually

#

guess normal people who read from top to bottom will not be affected by this

wanton carbon
#

more rra

dark vortex
#

i am actually sadly once again hanging my gpu with bvh builds for no reason

#

maybe I should apply rra to the code then

#

from a few more quick looks, looks nice frogapprove

onyx vector
#

shouldn't the withered wojak be the deck

floral viper
#

the witheredjak is martty

wanton carbon
#

bruh

midnight shore
#

van bruh

wanton carbon
#

u mean Van Bruh

floral viper
#

van goghment

#

I finally read the post and it was rather pleasant

#

I didn't take note of grammatical anomalies, though I did notice some

wanton carbon
#

no diarrhea?

floral viper
#

that will come in part 3 I hope

wanton carbon
#

i try to inflict minimum diarrhea

#

this our family motto

floral viper
#

I don't remember when we (you?) adopted that, but nice

#

this thread is inspiring me to start writing that "other" thing

#

too bad it's 5:30am and I need to schleep rn

wanton carbon
#

schleep now, then we should start other thing too, yesyes

midnight shore
#

huhh ๐Ÿ‘€

wanton carbon
#

cool stuff, huh

#

but it is still mbcp

#

don't think it interferes with the debooger

floral viper
#

shadow regs are used to optimize shadow mapping

midnight shore
wanton carbon
#

wdym?

midnight shore
#

I thought it might improve debugging experience in some cases (e.g. debugging non-graphics on GFX queue?) but not sure, just random hypothesis

wanton carbon
#

i think it just enables gfx preemption

#

the gfx registers were not saved, so switching corrupts the CP state

midnight shore
#

hmm I see

#

actually I don't thonk

wanton carbon
#

"just" - i don't mean to downplay, its a cool thing

midnight shore
#

so you're saying graphics waves can be somehow suspended with this and you'll still get usable desktop?

wanton carbon
#

no

#

its command level preemption

#

you drain the waves, switch the CP state, run some other tasks, switch CP back, run next cmd

#

that is my understanding

midnight shore
#

I see, so it's just fixing some state being lost/corrupted when preempting?

wanton carbon
#

yes

midnight shore
#

I see

wanton carbon
#

again, thats my understanding

wanton carbon
#

ok time to make this code 20% less shit

wanton carbon
#

this meme is from the future, you will need to read part 3 to understand

dark vortex
#

vmid
this is the only part I understand lmao

#

a very fun thing

wanton carbon
#

i managed to pull the debug enable code into the debugger, no longer need to do it before running the app, thats neat

#

now lets see what code is actually needed, and what is only there bc of superstition

wanton carbon
fallen nest
wanton carbon
#

not until you make it

#

as suspected, it isn't even needed to stall the vmid lol

#

in fact, practically nothing is required, i guess i had something else fucked up and went on a wild goose chase lol (sad lol)

wanton carbon
#

still no chance of you running it locally but you can ๐Ÿ‘€

dark vortex
#

I shall take a look within 3-5 business days

wanton carbon
#

need to clean up the isa too

#

yay, logo is now online

wanton carbon
#

if bda so good, explain this

wanton carbon
#

posted part I on the interwebs, thanks all for being my rra

floral viper
#

which web

#

s

wanton carbon
floral viper
#

ah

#

I would interact with this post if, you know, I had a mastodon account

wanton carbon
#

do it

#

spite the elon

floral viper
#

I don't have twitter either

wanton carbon
#

i beginning to understand how fwog is so advanced

floral viper
#

I can only afford to be addicted to one social media platform

wanton carbon
#

and risk missing a gob meltdown? weird

floral viper
dark vortex
wanton carbon
dark vortex
#

i was waiting for the permission frogapprove

floral viper
#

that server has a lot of frog emojis

#

this one has a weak frog game in comparison

dark vortex
#

indeed

#

I should pitch for more forg emojis

wanton carbon
#

it was just the draft that i wanted to keep under wraps

dark vortex
#

ye ik

#

wanted to make sure I don't need to do more RRA work

wanton carbon
#

@midnight shore or @raven vortex do you have a good description of what a ring is? am i correct that it is a ringbuffer, that contains the PM4 or other type packets?

raven vortex
#

A ring is a hardware queue that takes PM4, yeah

raven vortex
#

Hm wait, it doesnโ€™t quite take PM4, it takes whatever launch commands to IBs (indirect buffers) that contains PM4

dark vortex
#

do you know if the queueing is implemented using ring buffers?

dark vortex
#

s/queueing/the hardware queue ishi was talking about/

wanton carbon
raven vortex
#

there is IB chaining which is basically a jump command you can place at the end of an IB, but not sure what granular is about

wanton carbon
raven vortex
#

OK, so we might want to differentiate the literal ring (the queue containing IBs) and the conceptual ring (which is a synonym of the hardware command processor)

wanton carbon
raven vortex
#

hmm

#

I've never looked at how it works tbh

#

the userspace is definitely recording IBs but maybe the direct submission format is PM4 too

wanton carbon
#

Hmm

#

"launch" this IB is def a pm4 packet

#

i guess having IBs kinda makes ring stopping detection a bit iffy

raven vortex
#

how is ring halt detection performed by umr?

wanton carbon
#

looks at if the ring pointers are moving

raven vortex
#

yeah the interaction with IBs might be a bit sussy then

onyx vector
#

@wanton carbon I did some messing about with ROCm today and i found that it works just fine without amdkfd at all, only rocgdb cares, also I did manage to bring down the system (well, the desktop really, restarting gdm still fixed it) with a broken kernel, and it failed to recover much like with shady compute shaders on the gfx queue

onyx vector
#

ofc now that I'm home and I try to repro the problem, the AMD repos go down -_-

wanton carbon
onyx vector
#

running rocm programs without rocgdb

onyx vector
#

this is what rocgdb has to say about that

dark vortex
#

/long_pathname_so_that_rpms_can_package_the_debug_info ๐Ÿธ

onyx vector
#

it's very cursed

#

their packaging is super brittle too

#

i'm finding out that having any kernels but the supported one prevents amdgpu-dkms from building

#

fun times

#

ah, got the stupid kernel module built, after 3 attempts it caught something instead of hanging

#

i think this amdkcl module is new

#

the lack of proper debug info is on us though

wanton carbon
#

nice sleuthing

#

ye my conclusion was that it may be better to stay out of the mess that is rocm currently

onyx vector
#

the problem is how amd understands open-source, yes they publish the source, but they package things up themselves and don't mainline stuff properly

#

ubuntu has repackaged some of the rocm things, and not only is AMD not acknowledging that, their packages conflict

#

they have different version schemes too, so if you want rocm-cmake from rocm 5.4, the version number for that is some long-ass string like 3.4.15145131.50400 ... and so ubuntu's rocm-cmake gets installed instead, but that's incompatible

#

endless fun

#

AMD also don't seem to give easy instructions for building individual components from source

#

so you either use their automated mystery meat scripts/megapackages, or you're on your own

dark vortex
#

some would argue that's not really opensource at all, more like source-available

onyx vector
#

yes

#

well no it's MIT i think, so it's proper libre or whatever

#

but the development process is not that open at all

wanton carbon
dark vortex
#

hype hype hype

#

linkedin post published frogapprove

wanton carbon
#

bueno

#

i will toot this together with part 2 so that the lizard part of my brain doesn't get too much of the drug chemical

dark vortex
#

noice

#

shit your initial trap handler mr was more than a month ago???

#

wtf where is my january, it's like last weekend was new years ๐Ÿ˜ญ

#

Sam Pitoiset
I'd still use hakzsam

wanton carbon
#

shit did a Sam remain

#

I thought I eradicated them all

#

fml

#

name elongenated frogapprove

#

don't really like the SRBM footnote, i'll rework

#

i should explain VMID instead

dark vortex
#

yeah that's a bit confoosing

wanton carbon
#

smh the amdgpu doc is wrong saying there can be 16 concurrent processes

#

its just 15

wanton carbon
dark vortex
#

nice

#

and SRBM which selects based ME, PIPE, QUEUE and VMID4.
based on what?

wanton carbon
#

sick burn

dark vortex
#

my 10 day Linux driver stack experience
wait you had like literally zero experience with anything like that before?

#

also interesting that you know your way around amdgpu virtual memory management

#

as part of RMV debugging I tried to follow the codepaths from ioctl entrypoint to where PTEs are updated and I got lost real quick

wanton carbon
#

do I? now thats news to me ๐Ÿ˜„

dark vortex
#

well at least somewhat, or your pc is good at grepping kernel source

#

wouldn't have found the vega interrupt thingies at all lmao

wanton carbon
#

heh heh

autumn notch
#

Wait did part 2 come out during my biweekly crack binge

#

Also it keeps telling me a new version of the content is available as soon as I open the page

#

why would I read any further when I know this is the best part

#

"However of gfx9 and above", rewrite to "However from gfx9 and above" maybe?

floral viper
#

also, I think the hyphen in the first sentence should be a colon

#

I think you're supposed to use hyphens for important asides or something (like an important version of parens), idk

#

depending on the assumed knowledge of the reader, it might be useful to explain in slightly greater detail what "banked registers" are

#

The second sense in which the term banking is used for registers refers to the splitting of a set of registers into groups (banks) each of which can be accessed in parallel. Using four banks increases the maximum number of accesses supported by a factor of four, allowing each bank to support fewer access ports (reducing area and energy use) for a given effective access count.
https://electronics.stackexchange.com/a/102743

#

I guess my point is that some people might be confused when you just say "The GPU can have many of a certain type of register, organised into banks." without at least a sentence to explain the implication of banks

#

I dunno if it's a real problem. It all depends on how much you assume the reader knows or how willing they are to look this stuff up

wanton carbon
#

good point

#

i'll add a banner

#

if you don't know what register banks are, turn off your computer and leave. you are not welcome

#

nah but good points, i'll make some edits

wanton carbon
floral viper
#

kek even more reason to explain it

wanton carbon
#

it is just a form of addressing in this case really, nothin' fancy

wanton carbon
#

you missed so much

#

carlo's engine can now render a cube

raven vortex
#

glad that you figured out what GRBM is, when I was playing with SQTT I looked everywhere but still couldnโ€™t get an idea what that means exactly

wanton carbon
#

i cannot figure out what it stands for however ๐Ÿ˜„

#

G is probably gfx, RB is register bank i guess, and m is martty

fallen nest
#

green, red, blue, metallic

autumn notch
#

yeah it seems like driver devs have not yet escaped the primal urge to make every code concept a vague acronym

#

I don't think I've seen this many in other places

wanton carbon
wanton carbon
dark vortex
#

hype hype hype

wanton carbon
#

oh also

#

now that we know that trap handlers work, lets implement exceptions KEKW

dark vortex
#

absolutely splendid

#

throw result in glsl when

wanton carbon
#

the final thing holding back gpgpu

autumn notch
dark vortex
#

you mean stuff like unhandled page faults? iirc I read somewhere they might go into a trap handler as well

wanton carbon
#

well, yes and no i guess

#

mem violations indeed go into the trap handler if enabled

#

but that makes more sense for the debugger

#

rather that you could have exceptions by triggering the trap handler, which could use the faulting PC to look up an unwind table, etc. etc.

#

its not like there is a stack to unwind or something currently, but gob is working on it KEKW

dark vortex
#

that'd be a cursed setup

wanton carbon
#

are you really alive if you haven't violated the rule of 5 on the gpu?

midnight shore
#

hmm

#

longjmp in the shader?

wanton carbon
#

sure thing

dark vortex
wanton carbon
#

neat

wanton carbon
#

cwsr is quite similar

#

bit more stuf

wanton carbon
#

smh nobody noticed that i forgor an s_rfe_b64 from the isa snippet

autumn notch
#

sry my integrated Neuralink compiler was busy lowering my dopamine levels so I would satisfy my cravings for a cold and refreshing Coca Colaโ„ข๏ธ

wanton carbon
#

to the gulag with ye

floral viper
wanton carbon
autumn notch
solar ingot
#

well done

wanton carbon
#

also yeah these patches show that i am missing some bits for the trap enablement

#

ironically i had them before, but removed because they didn't seem to do anything, but I guess the current code might hang sometimes

wanton carbon
autumn notch
#

for the emperor

dark vortex
#

@ Bas
(i added a space so it doesn't ping people but) I think you could just leave the @ out

#

or say Bas Nieuwenhuizen probably

wanton carbon
wanton carbon
dark vortex
#

well the problem is that handles aren't globally coherent the same

wanton carbon
#

@ Bas glc slc dlc then

dark vortex
#

in most other places his handle is bnieuwenhuizen

#

idk

#

the blooper reel is great I shall include that as well, should I ever write blogs

wanton carbon
#

ah right, @MaybeBas

floral viper
#

Maybe you meant to write "single-schlepp"

midnight shore
#

use a piece of memory to communicate to the host (๐Ÿ‘‹ hey host! i am in the trap handler now) - if we want to breakpoint on an instruction instead of just the wave, we can now enable single stepping
what does this mean btw

midnight shore
#

actually nevermind, it just stuck out to me as really weird, but I get now that putting s_trap at the beginning of the wave and single stepping to the intended instruction is just this current thing you're implementing, not how full proper breakpoints would work

wanton carbon
#

its pretty much how full proper breakpoints would work

#

except you don't need to roundtrip to the host

floral viper
midnight shore
#

instead of putting it at the beginning of the wave and stepping until breakpoint

wanton carbon
#

perhaps, i just don't know how instruction caches work exactly

#

so i can guarantee the step one works, but not this one

midnight shore
#

yeah makes sense

wanton carbon
#

dual anchoring, maximum stability

midnight shore
#

there's something called S_ICACHE_INV so I think that could be interesting future direction, you'd run that on trap return I think

wanton carbon
#

but you set the bp from the host

#

so you need to find the mmio hook that invalidates icaches

midnight shore
#

well the halted wave is sitting in trap handler, right?

#

so you could just put that instruction into the trap handler

wanton carbon
#

now i don't know what you are proposing then

midnight shore
#

I think what you might have in mind is: when we already have waves running a program and we set a breakpoint, the breakpoint might be ignored by hw because instruction caches won't be invalidated

#

is that it?

#

because I thought we were talking about how we'd resume from a breakpoint/after a breakpoint

wanton carbon
#

i gtg now, lets continue later

floral viper
#

that meme at the end hits different

floral viper
#

overall a pretty good post. I can even comprehend most of it (which is cool, given that I haven't even looked at radv)

#

The only feedback I have is that hyphens feel a little overused

wanton carbon
#

will do another pass

floral viper
#

I can excuse it as being part of your writing style

wanton carbon
#

Ye im not a garbage writer but a garbage person, please understand

wanton carbon
#

I think the tradeoff here is that you can have a million threads that will hit that instr now

#

So you need to filter again

wanton carbon
#

which can be done, sure

wanton carbon
#

i guess you are an OG hyphen hater

wanton carbon
midnight shore
wanton carbon
#

then i might need a bit more explanation

midnight shore
#

ye sorry

#

to ensure we're on the same track, the breakpoint construct you have in mind is that s_trap is placed on the start of the program and once it's hit, you single step to the instruction that we're intending to break on

#

right?

wanton carbon
#

yep

#

well

#

more like you select the wave when that first s_trap is hit

#

then selected waves single step

#

but yes

midnight shore
#

yes, right

#

why can't we replace the instruction we intend to break on with s_trap?

#

then inside trap handler we could do extra conditions (filtering) and also before we return from trap handler we'd run the intended instruction

#

perhaps not in this exact form of course, but the idea should be clear

wanton carbon
#

this should be possible as well

midnight shore
#

ok now let me remember the root of this discussion

wanton carbon
#

but wait

midnight shore
#

anyway, I initially just misinterpreted the step in your write up as suggesting that as the current best method to construct breakpoints

wanton carbon
#

so how it would work is you'd still trap the wave at the beginning, then replace the instruction with another trap, then in that second trap you need to run the intended instruction for all waves that were originally not trapped, then keep those waves trapped until our originally trapped wave executes the trap, then host debugger time, then back to trap handler, undo the changed instruction, release all waves with icache invl

#

i think its a bit complex

#

if i'd knew more about icaches it could be made simpler

midnight shore
wanton carbon
#

i don't know the mmio hook for icache invl from the host

#

so any instr manip has to happen in the trap handler

#

it can't be through CP because then you can't set breakpoints while stopped

midnight shore
#

if we set breakpoint before submitting command buffer, we can invalidate instruction caches (if there's manual action required to begin with) when we submit the command before

#

if we set breakpoint while shader was halted on a breakpoint, we need to invalidate instruction cache when wave resumes (this would happen inside trap handler)

#

this is conceptual in my mind of course, and relies on assumption that s_icache_inv does what it says

wanton carbon
#

well, it depends on what cache s_icache_inv dumps, yes, i'd imagine it is for the current WGP

#

so thats not enough

#

if it is more, then it can be enough

#

i guessed right

midnight shore
#

I assume you have other waves in mind? I kinda felt like it's fine to just let them run impeded (for the current draw/dispatch) but maybe that needs more thought

#

(I'm assuming that if we overwrite the first int32 of an instruction with s_trap it will either see the old instruction or s_trap)

wanton carbon
#

imagine you are stopped in a VS, now you possibly can't set a BP for an FS

midnight shore
#

riiight

#

hmm

#

maybe we should just put s_icache_inv at beginning of all waves then

wanton carbon
#

i imagine that has perf implications ๐Ÿ˜„

midnight shore
#

that's not going to be full speed but better than single stepping through each wave

wanton carbon
#

but its not each wave

midnight shore
#

well okay

#

s_icache_inv but gated behind same filtering thingy as the current approach does

wanton carbon
#

its just the wave you care about

#

that could be possible

#

lets just find the mmio hook

midnight shore
#

yeah I imagine if there's a way to flush TLB from host in the middle of command buffer exec there must also be a way to dump all caches

wanton carbon
#

possibly

#

problems is there are many units with icaches ๐Ÿ˜„

#

but i think SQC is a reasonable guess

#

thats the thing that drives waves

midnight shore
#

aah, and I guess CLIENT_INVALIDATE_ALL_VMID drops all TLBs?

#

or perhaps that's something stronger

#

anyway beside the point

wanton carbon
#

i think you understand why i did the stepping instead ๐Ÿ˜„

midnight shore
#

yes

#

I certainly do, it's much simpler

#

I just misinterpreted that line in your write up as suggesting that approach as "the current best one" or something

#

perhaps not the best way to put it

#

"what you'd do when building a debugger, that's not a stepping stone thing intended to be replaced later"

wanton carbon
#

i'll reword it

#

but we'll see which one is better in the end ๐Ÿ˜„

midnight shore
#

just that my initial confusion is the cause for this discussion

wanton carbon
floral viper
#

Post on Reddit and I'll upboat

#

They'll be jelly that you have achieved the second-lowest-level of graphics programming

wanton carbon
#

should've written the debugger in assembly too smh

wanton carbon
floral viper
#

I was thinking r/okbuddyphd actually

wanton carbon
#

getting good chuckles out of that sub

autumn notch
#

I have to confess something

#

I laughed because of the funny ducks

#

I don't know what the Brillouin Zone is

#

I looked at the wikipedia article and instantly experienced ego death

floral viper
#

that's the point of the sub

#

too bad people post high school-tier stuff there frequently

wanton carbon
floral viper
#

they're the same, but different