Rosy | Graphics Programming | Page 15

broken fog Sep 4, 2025, 10:24 AM

#

tri count is also not nearly as impactful as pixel count

#

unless you have massive overdraw

elfin cape Sep 4, 2025, 10:26 AM

#

That was an example but as I am watching the video. It might be a 4k screen KEKW

cloud rivet Sep 4, 2025, 2:59 PM

#

elfin cape That was an example but as I am watching the video. It might be a 4k screen <:KE...

>wmic desktopmonitor get Caption, MonitorType, ScreenHeight, ScreenWidth
Caption              MonitorType          ScreenHeight  ScreenWidth
Generic PnP Monitor  Generic PnP Monitor  2160          3840
Default Monitor      Default Monitor      1440          2560
Default Monitor      Default Monitor

#

3840x2160 is the display I was recording

#

I'm going to draw the windows first and track those regions to avoid rendering the bg on

#

if I don't draw the bg or the triangle I get double the perf

#

if I only draw fps

#

it's the bg and something totally unrelated to drawing

#

I just need to start profiling

#

a mostly blank screen getting 77 fps is pretty ridiculous

#

this is my first C program

bronze socket Sep 4, 2025, 3:24 PM

#

do you have a 144p monitor

#

144hz

#

not 144p lmao

#

my guess is that you have vsync on and it's locking to 77

cloud rivet Sep 4, 2025, 3:26 PM

#

I did change my driver from studio to the game driver

#

maybe I did something there

#

#

#

yeah if I run rosy I can get way more

#

this is just my application being slow

#

I will start on a profiler using the window event tracing and performance tools

#

it'll be fun

brisk chasm Sep 4, 2025, 4:41 PM

#

just saying, they are going to remove wmic with the next win11 release 25h2

cloud rivet Sep 4, 2025, 4:50 PM

#

yeah planning on using win32 event tracing and diagnostic apis

#

those aren't deprecated

#

I will do what tracy does and use macros to trace in my application that are noop unless I define a #define profiling macro

#

and how I think I'll make this work is that when profiling is enabled I'll add a keyboard shortcut and it will profile for a short bit and dump a text file when it's done

#

I'll start on that today

#

my goal is to just start with a frame marker, then add support for zones, and then markers, figure out what's slow and maybe add memory profiling and just figure it out

#

I'll just work on it on a bit as things are slow and I need more profiling capabilities

#

to resolve slowness

cloud rivet Sep 4, 2025, 7:45 PM

#

I might start with some UI tbh

#

we'll see

cloud rivet Sep 5, 2025, 5:43 AM

#

#

I need to add a loop sampling thingy I guess

#

I'm also missing something

#

there's a lot of time spent somewhere idk what it is

#

ah

#

#

it's the render graph

#

that feels really slow

astral hinge Sep 5, 2025, 6:02 AM

#

are those numbers milliseconds

#

or seconds frogstare

cloud rivet Sep 5, 2025, 6:04 AM

#

void start_tr(AppContext *actx, Arena *arena, u32 trace_id) {
  if (!actx)
    fatal("no actx");
  if (!arena)
    fatal("no arena");
  if (!actx->trctx)
    fatal("no trctx");
  if (actx->trctx->num_traces <= trace_id)
    fatal("traces overflow");

  QueryPerformanceCounter(&actx->trctx->traces[trace_id].starting_time);
}

void end_tr(AppContext *actx, Arena *arena, u32 trace_id) {
  if (!actx)
    fatal("no actx");
  if (!arena)
    fatal("no arena");
  if (!actx->trctx)
    fatal("no trctx");
  if (actx->trctx->num_traces <= trace_id)
    fatal("traces overflow");
  QueryPerformanceCounter(&actx->trctx->traces[trace_id].ending_time);

  u64 duration
      = actx->trctx->traces[trace_id].ending_time.QuadPart - actx->trctx->traces[trace_id].starting_time.QuadPart;
  duration *= 1'000'000;
  duration /= actx->trctx->frame_frequency.QuadPart;
  actx->trctx->traces[trace_id].duration = (f64)duration;
  actx->trctx->traces[trace_id].duration /= 1'000'000;
}

#

whatever this is

#

https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

#

LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;

QueryPerformanceFrequency(&Frequency); 
QueryPerformanceCounter(&StartingTime);

// Activity to be timed

QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;


//
// We now have the elapsed number of ticks, along with the
// number of ticks-per-second. We use these values
// to convert to the number of elapsed microseconds.
// To guard against loss-of-precision, we convert
// to microseconds *before* dividing by ticks-per-second.
//

ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;

astral hinge Sep 5, 2025, 6:05 AM

#

aren't you supposed to multiply the result of QPC by the frequency of the cpu or something

cloud rivet Sep 5, 2025, 6:06 AM

#

duration /= actx->trctx->frame_frequency.QuadPart;

astral hinge Sep 5, 2025, 6:06 AM

#

ah

cloud rivet Sep 5, 2025, 6:06 AM

#

so I think its ms

astral hinge Sep 5, 2025, 6:08 AM

#

18 us for cpu raster seems quite fast

cloud rivet Sep 5, 2025, 6:08 AM

#

sorry

astral hinge Sep 5, 2025, 6:08 AM

#

idk what it's rasterizing though

cloud rivet Sep 5, 2025, 6:08 AM

#

oh sorry

#

no these are seconds

astral hinge Sep 5, 2025, 6:08 AM

#

ah

cloud rivet Sep 5, 2025, 6:08 AM

#

1 here = 1 second

#

it's really slow

astral hinge Sep 5, 2025, 6:09 AM

#

many opportunities for improvement then

cloud rivet Sep 5, 2025, 6:09 AM

#

ya

astral hinge Sep 5, 2025, 6:09 AM

#

cloud rivet 1 here = 1 second

oh man it should've been obvious when it said frame=0.033 lol

cloud rivet Sep 5, 2025, 6:09 AM

#

yeah I thought maybe you meant is that 33 seconds

#

I'm like no

astral hinge Sep 5, 2025, 6:10 AM

#

lol

cloud rivet Sep 5, 2025, 6:10 AM

#

the render graph is really slow

#

the cpu raster being slow is whatever, I knew that would be slow

astral hinge Sep 5, 2025, 6:11 AM

#

now that you have a profiling thingy, time to spam it

cloud rivet Sep 5, 2025, 6:11 AM

#

ya

astral hinge Sep 5, 2025, 6:11 AM

#

I think you should make a sampling profiler too just to get a quick overview of stuff without having to instrument

cloud rivet Sep 5, 2025, 6:11 AM

#

yes

#

what are you thinking specifically?

astral hinge Sep 5, 2025, 6:13 AM

#

Well there is the win32 function (s) for getting the stack pointer and then the call stack from it, which you can call from another thread (so it costs basically nothing on the main thread)

#

Idk what the function is though so you'd have to do a little research

cloud rivet Sep 5, 2025, 6:13 AM

#

I'll check that out

#

what does that do for me?

astral hinge Sep 5, 2025, 6:13 AM

#

Anyway, you can put the call stack info into a map and then do a little math to figure out how long you spent in each place

cloud rivet Sep 5, 2025, 6:14 AM

#

oh I see

astral hinge Sep 5, 2025, 6:14 AM

#

You take a sample of the call stack every millisecond or so

cloud rivet Sep 5, 2025, 6:14 AM

#

ohhhh

astral hinge Sep 5, 2025, 6:14 AM

#

That's how the profiler works in visual studio

cloud rivet Sep 5, 2025, 6:14 AM

#

that's cool!

#

I'll do that

astral hinge Sep 5, 2025, 6:15 AM

#

It'll be so cool to have your own suite of tooling

#

If something lacks you can just improve it

cloud rivet Sep 5, 2025, 6:15 AM

#

yeah, I want to add vk timing queries too

#

also memory use, since I have a custom allocator I can track all my memory

astral hinge Sep 5, 2025, 6:17 AM

#

hmm in debug mode you could make the allocate function a macro that records the source info too

cloud rivet Sep 5, 2025, 6:17 AM

#

yes that sounds very valuable

#

I should make those ms

brisk chasm Sep 5, 2025, 12:05 PM

#

what is xxx_tr? xxx_trace? or xxx_tablerow? or xxx_transient?

#

did anyone tell the c people that we have enough disk space today for storing all the characters for our source codes 🙂

cloud rivet Sep 5, 2025, 3:18 PM

#

I have a character budget because I am a responsible adult

#

waste not want not

#

Also clearly it would be xxxxx_trow and xxxxx_trs

#

smh

#

Nah

cloud rivet Sep 5, 2025, 3:26 PM

#

brisk chasm what is xxx_tr? xxx_trace? or xxx_tablerow? or xxx_transient?

I do this with subsystems, a major architectural piece in my code. It’s spammed everywhere. There’s not that many and they all have short versions of their names

#

There will only ever be one thing that gets to be called tr

#

The profiling code

#

I don’t want long varying names for these

#

I can spot a subsystem function easily. It’s easier to see the pattern working with my code

#

My internal functions are long and descriptive

#

These subsystem things all have a fat structure that is on the app context

#

actx->trctx->longthing.morelongerthings = bignamestuff;

#

you know what tr in trctx is because there’s only one tr

cloud rivet Sep 5, 2025, 9:07 PM

#

using StackWalker is going to be some really "works on my machine" level code

#

lol

#

#

https://github.com/JochenKalmbach/StackWalker/tree/master?tab=readme-ov-file#walking-the-callstack-of-other-threads-in-the-same-process

#

this is a how to, not a library

#

well it is a C++ library also, but I'm not using it

cloud rivet Sep 6, 2025, 12:21 AM

#

To walk the callstack of another thread inside the same process, you need to suspend the target thread (so the callstack will not change during the stack-walking).

#

heh

#

no thanks

#

I think I don't need to walk it

#

I can just get the current stack

#

the current place

astral hinge Sep 6, 2025, 12:25 AM

#

I believe you can also record the pc and then get just the current function (no call stack) from that

cloud rivet Sep 6, 2025, 12:25 AM

#

pc?

astral hinge Sep 6, 2025, 12:26 AM

#

program counter

brisk chasm Sep 6, 2025, 12:26 AM

#

i believe its all in dbghlp.h

#

emphasis on hlp 😛

astral hinge Sep 6, 2025, 12:27 AM

#

it's probably SymFromAddr

cloud rivet Sep 6, 2025, 12:59 AM

#

any game worth its salt should be adding -lDbgHelp as a dep tbh misinfo

#

you know, if this game was for reals from scratch I wouldn't need that, I'd just write my own operating system on my own from scratch mined materials via a machine I built by hand using tools I also built

echo crystal Sep 6, 2025, 1:05 AM

#

🪤

broken fog Sep 6, 2025, 1:12 AM

#

cloud rivet you know, if this game was for reals from scratch I wouldn't need that, I'd just...

the silicon part is genuinely not doable but writing your own os is

#

do it it will be fun froge_evil

cloud rivet Sep 6, 2025, 1:14 AM

#

nope

#

hard pass

#

sounds boring

#

I'm interested in graphics

astral hinge Sep 6, 2025, 1:15 AM

#

you probably need like 30 PhDs of knowledge to begin making your own computers from scratch

echo crystal Sep 6, 2025, 1:16 AM

#

broken fog the silicon part is genuinely not doable but writing your own os is

maybe fpga

cloud rivet Sep 6, 2025, 1:16 AM

#

uhh I beg to differ

#

that's a 4 bit calculator https://blog.lapinozz.com/learning/2016/11/19/calculator-with-caordboard-and-marbles.html

Placeholder

A 4-bit Calculator made in cardboard and marble

Might not be the most reliable or useful contraption but it's definitely a fun one.

echo crystal Sep 6, 2025, 1:17 AM

#

astral hinge you probably need like 30 PhDs of knowledge to begin making your own computers f...

and a lot of mining permits

broken fog Sep 6, 2025, 1:18 AM

#

cloud rivet that's a 4 bit calculator https://blog.lapinozz.com/learning/2016/11/19/calculat...

now run doom on it KEKW

cloud rivet Sep 6, 2025, 1:19 AM

#

well I got a stackframe, now what, I guess I do that get GetSymFromAddr64 thing

echo crystal Sep 6, 2025, 1:19 AM

#

pretty cool

cloud rivet Sep 6, 2025, 1:20 AM

#

I am pretty sure I will have to download the latest debughell.dll

#

for this to work

echo crystal Sep 6, 2025, 1:20 AM

#

I also saw a video of someone doing a computer made with water

broken fog Sep 6, 2025, 1:21 AM

#

mechanical computers are cool

wraith urchin Sep 6, 2025, 1:22 AM

#

echo crystal I also saw a video of someone doing a computer made with water

Was it this one? https://www.youtube.com/watch?v=IxXaizglscw

YouTube

Steve Mould

I Made A Water Computer And It Actually Works

The first 200 people to sign up at https://brilliant.org/stevemould/ will get 20% off an annual subscription.

Computers add numbers together using logic gates built out of transistors. But they don't have to be! They can be built out of greedy cup siphons instead! I used specially designed siphones to works as XOR and AND gates and chained them...

▶ Play video

wraith urchin Sep 6, 2025, 1:23 AM

#

cloud rivet that's a 4 bit calculator https://blog.lapinozz.com/learning/2016/11/19/calculat...

That is extremely cool

echo crystal Sep 6, 2025, 1:24 AM

#

yes

#

i love standup maths (featured in this video)

cloud rivet Sep 6, 2025, 1:34 AM

#

I should get SymGetLineFromAddr64 also

cloud rivet Sep 6, 2025, 2:04 AM

#

tracy calls SuspendThread

#

https://github.com/wolfpld/tracy/blob/6e214cab0a046a87e5cb9d611b13ebfcb853aa23/public/client/TracyProfiler.cpp#L900

#

When a crash occurs, execution in the crashing thread is redirected to the handler that was set earlier. The handler lists all threads running in the program and one by one pauses their execution, leaving only two threads\footnote{There is actually a race, which can result in another thread starting executing, as suspending all threads is not an atomic operation.} in a running state: the crashed thread, which is executing the crash handler and the profiler worker thread. This is done either by calling the \texttt{SuspendThread()} procedure on Windows, or sending the unused \texttt{SIGPWR} signal -- during profiler setup another handler was installed for this signal, one that enters an infinite sleep loop.

#

only for crashes

#

nm

#

anyway

#

it works

#

#

void capture_stack_tr(Arena *arena) {
  if (!arena)
    fatal("no arena");

  CONTEXT context;
  RtlCaptureContext(&context);
  DEBUG_PRINT("capturing stack %d\n", context.Rsp);

  STACKFRAME64 StackFrame;
  StackFrame.AddrPC.Offset = context.Rip;
  StackFrame.AddrPC.Mode = AddrModeFlat;
  StackFrame.AddrFrame.Offset = context.Rsp;
  StackFrame.AddrFrame.Mode = AddrModeFlat;
  StackFrame.AddrStack.Offset = context.Rsp;
  StackFrame.AddrStack.Mode = AddrModeFlat;

  if (!StackWalk64(IMAGE_FILE_MACHINE_AMD64,
                   GetCurrentProcess(),
                   GetCurrentThread(),
                   &StackFrame,
                   &context,
                   NULL,
                   SymFunctionTableAccess64,
                   SymGetModuleBase64,
                   NULL)) {
    fatal("stackwalk failed");
  }
  DWORD64 dwaddress = StackFrame.AddrPC.Offset;
  DWORD64 dwDisplacement = 0;
  const size_t symSize = sizeof(IMAGEHLP_SYMBOL64) * 1024;
  IMAGEHLP_SYMBOL64 *pSym = arena_alloc(arena, symSize);
  if (!pSym)
    fatal("OOM");
  memset(pSym, 0, symSize);
  pSym->Size = sizeof(IMAGEHLP_SYMBOL64);
  pSym->MaxNameLength = 1024;
  if (!SymGetSymFromAddr64(GetCurrentProcess(), dwaddress, &dwDisplacement, pSym)) {
    DEBUG_PRINT("nope %d\n", GetLastError());
  } else {
    DEBUG_PRINT("yup %s\n", pSym->Name);
  }
}

#

yup capture_stack_tr

#

void capture_stack_tr(Arena *arena) {

#

now I guess I just yolo capture the stack from another thread lol

#

I think that might end up with use after free? 😨

#

I think this is UB tbh

#

hrm all I need is the address

#

I'll just try it

#

tracy doesn't use RtlCaptureContext

#

wtf is it doing

#

https://github.com/wolfpld/tracy/blob/6e214cab0a046a87e5cb9d611b13ebfcb853aa23/public/client/TracyCallstack.hpp

#

https://github.com/wolfpld/tracy/blob/6e214cab0a046a87e5cb9d611b13ebfcb853aa23/public/client/TracyCallstack.cpp#L1148-L1161

#

so just have to follow where it gets the address from

#

cheating!

#

tracy uses libunwind

broken fog Sep 6, 2025, 2:26 AM

#

cloud rivet

huh what debugger is that

cloud rivet Sep 6, 2025, 2:26 AM

#

it's the rad debugger

broken fog Sep 6, 2025, 2:26 AM

#

that's rad

#

(sorry)

cloud rivet Sep 6, 2025, 2:26 AM

#

it's just windows atm I think

#

it's called rad because it's made by the rad games tool team inside Epic

#

idk how you get a job where you can just make some open source win32 debugger at company like Epic

#

All DbgHelp functions, such as this one, are single threaded. Therefore, calls from more than one thread to this function will likely result in unexpected behavior or memory corruption. To avoid this, you must synchronize all concurrent calls from more than one thread to this function.

#

well that's fine

#

A handle to the thread for which the stack trace is generated. If the caller supplies a valid callback pointer for the ReadMemoryRoutine parameter, then this value does not have to be a valid thread handle. It can be a token that is unique and consistently the same for all calls to the StackWalk64 function.

#

well

#

https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getthreadcontext

#

You cannot get a valid context for a running thread. Use the SuspendThread function to suspend the thread before calling GetThreadContext.

#

I don't think this is possible

#

with windows apis

cloud rivet Sep 6, 2025, 2:53 AM

#

fuck all this

cloud rivet Sep 6, 2025, 3:11 AM

#

it was cool getting the current processes stack though

#

I'm going to write a macro to just my existing profiling and wrap function calls I think are slow

#

I can track nested profiling since it maintains a state in the trace context

#

I learned a lot of stuff

#

event tracing is probably thing to use for this at some point

cloud rivet Sep 6, 2025, 3:37 AM

#

I'm going to add profiling info to my render graph so it records how long each part of the graph took

cloud rivet Sep 6, 2025, 4:07 AM

#

even with event tracing I'd have to instrument my code

cloud rivet Sep 6, 2025, 5:53 AM

#

#

it all makes sense

#

cool

#

ok, I'm just getting data still

#

I'll think about solutions later

#

really happy with my render graph tbh, it was easy to hook perf onto it

#

this is my render graph code now:

void render_gfx(AppContext *actx, Arena *arena) {
  if (!arena)
    fatal("render_gfx: null arena");
  if (!actx)
    fatal("render_gfx: null actx");
  if (!actx->gctx)
    fatal("render_gfx: null gctx");

  if (actx->gctx->minimized)
    return;

  start_render_graph_tr(actx, arena);
  render_node_t *current_node = actx->gctx->render_graph;
  while (true) {
    render_node_t *next_node = pump_graph(actx, arena, current_node);
    add_render_graph_tr(actx, arena, (u32)current_node->render_node_type, current_node->name);
    if (!next_node)
      break;
    current_node = next_node;
  }
  end_render_graph_tr(actx, arena);
  return;
}

cloud rivet Sep 6, 2025, 6:33 AM

#

#

@elfin cape is correct, filling that full bitmap takes no time at all, it's all other stuff

#

ok next thing I have to build is getting the average time spent in a function and the full time spent in a function per frame

#

this is all cpu bound

#

pretty sure?

#

I don't know actually

#

this is great, I was going to do a thing where I don't draw the background in areas I drew the window, and now I know that wouldn't help much at all I'd be saving in the least expensive part of the rasterization

#

that app perf window is hard to read I should nest the zones

cloud rivet Sep 6, 2025, 7:00 AM

#

I actually need a way to do that for any zone, find the average time in the zone, the total time spent in the zone per frame

cloud rivet Sep 6, 2025, 8:56 PM

#

#

now need to do the average & total time per frame thing for each zone

#

I was thinking of just using the mapped host visible vk buffer for the bitmap instead of a cpu bitmap to avoid a copy

#

I still have to get more data, but I should cache the characters also

elfin cape Sep 6, 2025, 9:01 PM

#

Are you using scanline algorithim for rasterizing a triangle?

cloud rivet Sep 6, 2025, 9:02 PM

#

it's just a really dumb bounding box for the triangle, then do a barycentric coordinate check for each pixel, I'm going to get to that

#

I knew that would be slow

elfin cape Sep 6, 2025, 9:03 PM

#

just checking froge

cloud rivet Sep 6, 2025, 9:05 PM

#

oh the render gfx should be nested under run app

#

once I have the data for per frame averages/total time of the slow stuff that runs in a loop I can start finally working on actually improving the perf

cloud rivet Sep 7, 2025, 12:13 AM

#

I might dump clang language extension vectors and matrices, they are a bit wonky I think

#

as in

#

they create weird padding and alignment issues on structs?

#

it's very strange

broken fog Sep 7, 2025, 12:15 AM

#

but writing vecmath in c without them is agonyfrog

cloud rivet Sep 7, 2025, 12:16 AM

#

what are you using?

#

I don't really care about operator overloading

#

sure it's more readable to use operators, it's not the end of the world

broken fog Sep 7, 2025, 12:18 AM

#

cloud rivet what are you using?

i'm writing cipipi anyway but i'm using apple's simd math lib which i'm fairly sure relies on that clang extension

cloud rivet Sep 7, 2025, 12:22 AM

#

ah

#

could just be a me issue

#

I guess I could look at compiler explorer the next time I run into a problem and try to understand it better

cloud rivet Sep 7, 2025, 1:17 AM

#

#

hrm

#

I'm not doing that right

#

#

thinkeyes

cloud rivet Sep 7, 2025, 1:47 AM

#

#

ok

#

it's just for loops on the rasterization I think

#

the total time inside the loop is far less time than just the work of iterating the loop

#

unless I have a bug in this logic, which I don't think so

cloud rivet Sep 7, 2025, 2:05 AM

#

#

          start_tr(actx, arena, actx->crctx->traces[18]); // x loop start
          {
            for (i32 x = x_min; x < x_max; x++) {
              start_tr(actx, arena, actx->crctx->traces[21]); // bb col start
              {
                float4 pos = pixel_to_sc(actx->crctx->bitmap_width, actx->crctx->bitmap_height, (f32)x, (f32)y);
                start_tr(actx, arena, actx->crctx->traces[23]);
                float4 bc = barycentric_coords(pos, t1);
                end_tr(actx, arena, actx->crctx->traces[23]);
                if (bc.x >= 0.f && bc.y >= 0.f && bc.z >= 0.f) {
                  rgba_to_rgba16(bc_mix(t1.colors, bc), pixel);
                }
                pixel++;
              }
              end_tr(actx, arena, actx->crctx->traces[21]); // bb col end
            }
          }
          end_tr(actx, arena, actx->crctx->traces[18]); // x loop end

#

it's just that for loop itself

#

for (i32 x = x_min; x < x_max; x++) {

#

idk

#

nesting these traces is really error prone

#

I need a better way

#

#

#

anyway

#

I think I got it

#

enough to go on

#

the more traces I showed in the UI the slower it got lol kekkedsadge

#

now numbers just have to go up

#

or down

cloud rivet Sep 7, 2025, 2:45 AM

#

#

got it a little bit faster

#

            for (i32 x = x_min; x < x_max; x += 6) {
+              f32 x_f = (f32)x;
               start_tr(actx, arena, actx->crctx->traces[20]); // bb col start
-              {
-                float4 pos = pixel_to_sc(actx->crctx->bitmap_width, actx->crctx->bitmap_height, (f32)x, (f32)y);
-                start_tr(actx, arena, actx->crctx->traces[23]);
-                float4 bc = barycentric_coords(pos, t1);
-                end_tr(actx, arena, actx->crctx->traces[23]);
-                if (bc.x >= 0.f && bc.y >= 0.f && bc.z >= 0.f) {
-                  rgba_to_rgba16(bc_mix(t1.colors, bc), pixel);
-                }
-                pixel++;
-              }
+              cr_test_triangle_pix(actx, arena, pixel, x_max_f, x_f, y_f);
+              cr_test_triangle_pix(actx, arena, pixel + 1, x_max_f, x_f + 1.f, y_f);
+              cr_test_triangle_pix(actx, arena, pixel + 2, x_max_f, x_f + 2.f, y_f);
+              cr_test_triangle_pix(actx, arena, pixel + 3, x_max_f, x_f + 3.f, y_f);
+              cr_test_triangle_pix(actx, arena, pixel + 4, x_max_f, x_f + 4.f, y_f);
+              cr_test_triangle_pix(actx, arena, pixel + 5, x_max_f, x_f + 5.f, y_f);
+              pixel += 6;
               end_tr(actx, arena, actx->crctx->traces[20]); // bb col end
             }

#

adding more doesn't really make it much faster

#

I'll add something to avoid areas that I know won't have any part of the triangle in it

#

that should reduce things

#

I can do these concurrently too

#

and use simd I guess

#

I'm going to look at my text rendering next that's also slow af

#

for the same reason

#

hrm I just need to do less here with the characters, by caching them I think

#

I shaved like 5 ms by doing the same thing to chars, only that's more complex and breaks character rendering

#

I'm going to add character caching

#

it's a whole lot of just do less

broken fog Sep 7, 2025, 4:15 AM

#

cloud rivet I can do these concurrently too

is your renderer multithreaded yet?

#

that's an easy win if it's not

#

well, "easy", you know what i mean KEKW

cloud rivet Sep 7, 2025, 4:32 AM

#

I don't want to do that yet

#

I think there's a way to get this to be faster just single threaded

#

one thing is to just reduce the scale, I probably shouldn't be going 4k with a cpu software rasterizer

broken fog Sep 7, 2025, 4:38 AM

#

ye KEKW

broken fog Sep 7, 2025, 4:38 AM

#

cloud rivet I think there's a way to get this to be faster just single threaded

yeah, fair enough

#

mt will add complexity so getting it to be as fast as possible single threaded is a good idea

#

but keep in mind some optimizations may be very well suited to a multithreaded renderer

cloud rivet Sep 7, 2025, 4:57 AM

#

yes, I plan on getting there

broken fog Sep 7, 2025, 5:07 AM

#

cloud rivet one thing is to just reduce the scale, I probably shouldn't be going 4k with a c...

btw a super high res is probably good for testing perf

#

any changes will be far more obvious

#

your idea about avoiding empty areas of the tri is pretty solid, you could look into dividing the tri into tiles or something like that

#

just thinking out loud but maybe a lil quadtree (2 or 3 levels depending on the size of the tri, no more) could speed things up significantly

cloud rivet Sep 7, 2025, 5:15 AM

#

yes

#

I'm going to get rid the ability to scale a window, it looks dumb and adds a lot of complexity

cloud rivet Sep 7, 2025, 5:36 AM

#

I can't really added the tracing code I have into the hot loops, the profiling itself distorts the profling

#

well

#

I did learn something from it

cloud rivet Sep 7, 2025, 6:43 AM

#

doing much better already

#

haven't even done the character caching yet

#

was at 28 fps before with this amount of text

#

with less text actually, and less profiling, so it's faster now with more text and more profiling

#

yeah I am going to cache the characters next

#

the next thing I'll do is try to use the host visible vk buffer for the bit map instead of a separate set of bytes

#

I'm hoping I can get 3ms per frame out of those changes

#

after that I'll add a slider ui widget

#

and then I'll use that to dynamically adjust the scale to find a good perf for this window size

#

if I shrink the window to this

#

I get 60 fps

#

#

hrm

#

I think the windows capture drops the frames a little

#

obs also drops the frames a little

#

after font caching, a good scale, I'll add multithreading

#

font caching + vk buffer, scale and multithreading

#

and the multithreading will I think be generating frames in the background, thinking I'll have a fif type deal

#

once I load more triangles I'll probably have to resort to simd and 32 byte aligned operations

cloud rivet Sep 7, 2025, 9:36 AM

#

#

from:

#

to

#

rendering even more text

#

my windows and ui basically don't cost anything anymore

#

before if I added text and made the windows bigger the fps would drop massively

#

tomorrow I'll try and use the vk host visible buffer as the bitmap

#

and see if that makes things faster

#

then I'll add multithreading and go back to working on adding some ui features and start actually doing some 3D maybe, finally

elfin cape Sep 7, 2025, 9:52 AM

#

what was the thing that was slowing down the text so much? 9ms -> 0.6ms speed up is really nice

cloud rivet Sep 7, 2025, 9:54 AM

#

I had a unecessary bounds checking and I removed scaling code and then I added font caching

#

I reduced the size of the function that rendered text quite a bit

elfin cape Sep 7, 2025, 9:55 AM

#

cloud rivet tomorrow I'll try and use the vk host visible buffer as the bitmap

about that I read some really long the ago that it would be slower but it would be nice to benchmark it

cloud rivet Sep 7, 2025, 9:56 AM

#

as long as it's faster than copying the full bitmap to the vk buffer it should be a win I hope

#

it's costing me a full ms

elfin cape Sep 7, 2025, 9:56 AM

#

agonyfrog

cloud rivet Sep 7, 2025, 9:57 AM

#

that's not even the submit,

#

I think the submit must be part of the wait? I don't know

elfin cape Sep 7, 2025, 9:57 AM

#

ngl I would do normal hw raster KEKW

cloud rivet Sep 7, 2025, 9:57 AM

#

oh you mean just use the win32 apis?

elfin cape Sep 7, 2025, 9:58 AM

#

render triangles using vulkan KEKW

cloud rivet Sep 7, 2025, 9:58 AM

#

ohh

#

yeah I am going to do that too

elfin cape Sep 7, 2025, 9:58 AM

#

oh okay

cloud rivet Sep 7, 2025, 9:58 AM

#

I have a gpu software raster pipeline set up

#

and a graphics pipeline

#

and a rt pipeline

#

I'm going to just work on them all

#

when I get bored with one I work on the other

cloud rivet Sep 7, 2025, 10:18 AM

#

oh another thing I did to make ui render faster is just draw the windows bg all at once

#

and I write the first row to memory and then just copy it to the rest of the rows

#

that made things go faster too, I had all this complex, and buggy, logic with padding and nonesense and I just removed it all and just draw a square, and then render font on top of it instead what I was doing before where I drew some part of the window, then some font the rest of the line, etc

cloud rivet Sep 7, 2025, 9:34 PM

#

I think I need the concept of a pixel

#

right now I am doing a serial process of here's my current pixel, let me convert it to coordinates, now let me do barycentric coordinates, does it pass or fail? mix a color, write it to bitmap, move on to next pixel

#

what I think I should do is actually uh

#

break all that up into layers

#

get a bunch of pixels all together at once

#

get their coordinates

#

get their barycentric coordinates

#

do the check

#

mix a color

#

write to bitmap

#

then I can do simd I think

#

the barycentric corodinate math would be a bunch of instructions based on how that's currently written

#

as would the mix

#

so I have to rewrite those

#

I think I should do this

#

I am going to test the vk buffer host visible mapping, maybe that's slow? I don't know

#

I'm also looking into hardware counters for profiling

#

specifically for my hardware

brisk chasm Sep 7, 2025, 9:43 PM

#

QueryPerformanceCounter/Frequency

cloud rivet Sep 7, 2025, 9:44 PM

#

I have that already

#

it's slow af

#

I need to sample

#

I think

#

I need to figure out how to do them periodically

#

I am currently looking at model specific registers for intel though

#

downloading the intel docs to read on my ipad https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

#

there's also this https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html

#

https://icl.utk.edu/papi/
https://github.com/intel/pcm

#

apparently my hardware sucks, Core i5-10400, I want all the new things

#

the AMX stuff sadcat

#

let me try this crazy vulkan buffer idea

#

maybe it is slow

#

I think I need to actually make a render graph node for the cpu rasterizer for this to work

#

since I have a per fif staging buffer

#

honestly doing the UI and cpu raster in a render graph makes a lot of sense thinkeyes

#

it is rendering

astral hinge Sep 7, 2025, 10:11 PM

#

cloud rivet I am going to test the vk buffer host visible mapping, maybe that's slow? I don'...

it'll only be slow if the memory is device local and you have a dGPU

#

also, what GPUs do is hierarchically rasterize blocks of pixels

#

so you start by testing all the large squares of pixels the triangle's AABB touches, then narrow it down a few times

#

at the lowest level it might be 8x8 or something

#

which you can SIMD-ify pretty well

cloud rivet Sep 7, 2025, 10:13 PM

#

nice

astral hinge Sep 7, 2025, 10:13 PM

#

well I guess the actual lowest level is 2x2 since those are quads of invocations. you might be interested in that for derivative calculations

#

you can even write your own "shaders" that are just callbacks

cloud rivet Sep 7, 2025, 10:15 PM

#

well a shader I can write to be a sequential set of operations on a per pixel that the gpu driver actually parallisizes all the instructions for yes?

#

but on the cpu side I have to do this myself I think?

#

hrm

#

I see what you mean I think

astral hinge Sep 7, 2025, 10:16 PM

#

yeah auto parallelization of shaders would be super hard

cloud rivet Sep 7, 2025, 10:16 PM

#

I'm not sure what derivative calculations you are referring to

astral hinge Sep 7, 2025, 10:16 PM

#

unless you write a vm or something

astral hinge Sep 7, 2025, 10:17 PM

#

cloud rivet I'm not sure what derivative calculations you are referring to

the ones for dFdx and dFdy that are implicitly used when you call texture functions with implicit lod sampling

cloud rivet Sep 7, 2025, 10:17 PM

#

right right

#

I want to use those for tangent generation also

#

I just vaguely know about it, I have to read through it to understand the actual math

astral hinge Sep 7, 2025, 10:18 PM

#

the math is pretty frog_shrimple le

cloud rivet Sep 7, 2025, 10:19 PM

#

nice

astral hinge Sep 7, 2025, 10:19 PM

#

it's just a single subtraction

#

#opengl message

cloud rivet Sep 7, 2025, 10:20 PM

#

yes that's right I have that in my notes from when we talked about it before

#

when I was looking at spirv stuff

astral hinge Sep 7, 2025, 10:21 PM

#

yeah so your gpu code is automagically SIMD because the compiler needs to do that anyway

#

but on the cpu it's tricky because your "shader" won't run in blocks of 2x2 threads in lockstep

#

that's why I suggested a vm earlier

cloud rivet Sep 7, 2025, 10:22 PM

#

yeah, I'm just going to avoid trying to replicate shaders where I work on one pixel at a time, and do CPU specific work in blocks I think

astral hinge Sep 7, 2025, 10:24 PM

#

if the work unit is 2x2 then you can hand-write simd shaders if you want, or just do a loop over the pixels

cloud rivet Sep 7, 2025, 10:25 PM

#

that would be a big perf win I think 2x2 simd

#

and not to hard to think about

#

as in the code won't be incredibly complex

#

I'm going to try that thanks

#

the call stack from another thread was a big no go btw

#

since you have to suspend threads for that to work

astral hinge Sep 7, 2025, 10:27 PM

#

damn

#

I need to see what my profiler code was doing

cloud rivet Sep 7, 2025, 10:28 PM

#

was that on linux?

#

it's different there

astral hinge Sep 7, 2025, 10:28 PM

#

it was windows

cloud rivet Sep 7, 2025, 10:29 PM

#

yes I am curious about your profiler code also

astral hinge Sep 7, 2025, 10:30 PM

#

I have my school projects scattered around my drives so I need to search

cloud rivet Sep 7, 2025, 10:30 PM

#

suspending a thread is horrific, if it is holding on to sync primitives for resources shared with other threads

astral hinge Sep 7, 2025, 10:31 PM

#

I'm pretty sure I didn't do that

#

considering it was sampling I think thousands of times per second

#

which is also the frequency of the normal vs profiler

#

huh I can't find the course files anywhere on my pc

cloud rivet Sep 7, 2025, 10:46 PM

#

was this for an undergraduate class?

astral hinge Sep 7, 2025, 10:47 PM

#

yeah

cloud rivet Sep 7, 2025, 10:47 PM

#

feels like maybe you had unintentionally produced phd dissertation level work

astral hinge Sep 7, 2025, 10:47 PM

#

no it was an assignment that everyone had to do lol

#

it wasn't that hard

cloud rivet Sep 7, 2025, 10:47 PM

#

was there a textbook associated with this task

#

I'd be interested in looking at it

echo crystal Sep 7, 2025, 10:48 PM

#

astral hinge huh I can't find the course files anywhere on my pc

maybe it's on ur github?

cloud rivet Sep 7, 2025, 10:48 PM

#

classwork on github is sus

astral hinge Sep 7, 2025, 10:48 PM

#

I checked that too

#

well there actually was a github organization that I was in for the class, but I left it

#

I thought I'd have a repo though frog_think

cloud rivet Sep 7, 2025, 10:49 PM

#

there's like some academic integrity problem with doing that imo

echo crystal Sep 7, 2025, 10:49 PM

#

not if it's private 🤫

cloud rivet Sep 7, 2025, 10:49 PM

#

just my opinion, but there may be uni policy also

astral hinge Sep 7, 2025, 10:49 PM

#

we were supposed to join the org for the class

#

and it was private

#

somehow the way we uploaded our projects was private to other students

cloud rivet Sep 7, 2025, 10:50 PM

#

maybe the class was just ok with the UB of doing it without suspending the thread

astral hinge Sep 7, 2025, 10:50 PM

#

lol maybe

echo crystal Sep 7, 2025, 10:53 PM

#

russian roulette profiling

cloud rivet Sep 7, 2025, 10:53 PM

#

btw I got that 11" ipad air with a pencil and I love it

echo crystal Sep 7, 2025, 10:54 PM

#

wow pretty interesting

echo crystal Sep 7, 2025, 10:54 PM

#

cloud rivet btw I got that 11" ipad air with a pencil and I love it

nice froge_love

cloud rivet Sep 7, 2025, 10:54 PM

#

have you guys ever seen The Deer Hunter? great movie, except for the weird Russian roulette part

#

I like with the pencil how I can hover my pencil over a link and click the pencil to follow links

astral hinge Sep 7, 2025, 11:08 PM

#

I wonder if all my files for that project were on a school drive

broken fog Sep 7, 2025, 11:16 PM

#

astral hinge so you start by testing all the large squares of pixels the triangle's AABB touc...

huh so my quadtree idea wasn't too far off

broken fog Sep 7, 2025, 11:16 PM

#

cloud rivet btw I got that 11" ipad air with a pencil and I love it

ooh nice, m3?

astral hinge Sep 7, 2025, 11:17 PM

#

yeah it's just a fatter tree

broken fog Sep 7, 2025, 11:17 PM

#

like more subdivisions per level?

broken fog Sep 7, 2025, 11:18 PM

#

cloud rivet there's like some academic integrity problem with doing that imo

cringe

#

all my uni assignments are on gh and open source KEKW

#

unless the course specifically disallows it

astral hinge Sep 7, 2025, 11:25 PM

#

I gave up looking for my projects for that class froge_sad

cloud rivet Sep 7, 2025, 11:29 PM

#

thanks for looking

cloud rivet Sep 7, 2025, 11:29 PM

#

broken fog ooh nice, m3?

I'm not sure what it has, I think m3

broken fog Sep 7, 2025, 11:29 PM

#

the latest air is m3 so if you bought new ye probably

#

mine is m1

#

honestly both are incredibly overkill for what most people do with an ipad KEKW

cloud rivet Sep 7, 2025, 11:30 PM

#

yah m3

#

just checked

#

I'm just going to read pds with it

broken fog Sep 7, 2025, 11:30 PM

#

yeah

cloud rivet Sep 7, 2025, 11:30 PM

#

actually needs the m3 maybe for that? these pdfs are huge lol

broken fog Sep 7, 2025, 11:30 PM

#

you know you have rt hardware in that thing right?

cloud rivet Sep 7, 2025, 11:31 PM

#

yeah but I don't plan to write apps for gross locked down sandboxed mobile devices

broken fog Sep 7, 2025, 11:31 PM

#

based

cloud rivet Sep 7, 2025, 11:31 PM

#

when people tell me they're writing for android or browsers I just feel so sad about it

broken fog Sep 7, 2025, 11:31 PM

#

browsers are fine tbh

cloud rivet Sep 7, 2025, 11:31 PM

#

or if they're talking about it in one of the channels

broken fog Sep 7, 2025, 11:31 PM

#

not the same as native dev but

cloud rivet Sep 7, 2025, 11:31 PM

#

idk, not for me

broken fog Sep 7, 2025, 11:31 PM

#

it's not a walled garden ecosystem like mobile

cloud rivet Sep 7, 2025, 11:32 PM

#

it's still kinda shit imo but I understand people can find that interesting

broken fog Sep 7, 2025, 11:32 PM

#

the web's actually kinda nice mostly because of how easy it is to ship something and share it with people

cloud rivet Sep 7, 2025, 11:32 PM

#

I hate it so much lol sorry

broken fog Sep 7, 2025, 11:32 PM

#

just send a url, no git clone, no build steps etc

#

but yeah i get why you'd hate it KEKW

cloud rivet Sep 7, 2025, 11:32 PM

#

I think peak browsers was 1990's HTML

broken fog Sep 7, 2025, 11:33 PM

#

false

#

peak websites tho

cloud rivet Sep 7, 2025, 11:33 PM

#

I like being able to see images and videos like YT I guess

#

I wish I could just use lynx to browse the web

echo crystal Sep 7, 2025, 11:34 PM

#

if u have adblock web is sooo much better

cloud rivet Sep 7, 2025, 11:35 PM

#

yup of course

broken fog Sep 7, 2025, 11:35 PM

#

echo crystal if u have adblock web is sooo much better

this is not a platform issue, it's a content issue (websites are garbage)

#

the web platform itself is pretty cool

echo crystal Sep 7, 2025, 11:36 PM

#

yeah

broken fog Sep 7, 2025, 11:36 PM

#

if you want to build nice ui quick there's nothing better than html+css imo

echo crystal Sep 7, 2025, 11:37 PM

#

probably some widget thing

#

not for me tho

cloud rivet Sep 7, 2025, 11:39 PM

#

same

#

I mean for work it's fine

#

just in my personal time I don't want to do anything with it

broken fog Sep 8, 2025, 12:28 AM

#

yeah that's fair enough

#

i don't feel like doing web shit on my free time either

cloud rivet Sep 8, 2025, 12:36 AM

#

@astral hinge maybe you used https://learn.microsoft.com/en-us/windows/win32/debug/capturestackbacktrace it capture the current thread though

cloud rivet Sep 8, 2025, 12:42 AM

#

broken fog i don't feel like doing web shit on my free time either

I used to love web dev :\

#

I think work killed it in my soul

#

it's also so much work now to build a website, and so expensive. A k8s cluster cost hundreds of dollars per month

#

if you just want some static website, sure

#

you could build a server less website using aws and step functions

#

yeah you could just run a LAMP stack or Django or whatever, heh

#

in 2025 👀

bronze socket Sep 8, 2025, 12:46 AM

#

basic html + crappy used pc + home wifimaxxing

cloud rivet Sep 8, 2025, 12:46 AM

#

:|

#

I don't run servers in my house

#

I guess App Engine or whatever is the thing to use maybe

#

iirc snapchat was on App Engine for a very long time

bronze socket Sep 8, 2025, 12:48 AM

#

I have a little nuc I got off ebay but I just use it to host static files to stream to my friends mainly + a matrix server

cloud rivet Sep 8, 2025, 12:48 AM

#

I think the last time I was excited about webdev was when Meteor was a thing

#

I used to go to the Meteor HQ for meetups every month

#

then react won

bronze socket Sep 8, 2025, 12:49 AM

#

I don't think I have whatever it takes to be excited about web frameworks

cloud rivet Sep 8, 2025, 12:49 AM

#

yeah it's jover for me too

bronze socket Sep 8, 2025, 12:49 AM

#

or tech in general

cloud rivet Sep 8, 2025, 12:49 AM

#

Meteor was so much fun

bronze socket Sep 8, 2025, 12:49 AM

#

I've noticed the only people still excited about tech are the ones that know the least about it

cloud rivet Sep 8, 2025, 12:50 AM

#

are you not including graphics programming as tech?

bronze socket Sep 8, 2025, 12:51 AM

#

I think for the most part it counts

#

when it comes to new innovation

cloud rivet Sep 8, 2025, 12:52 AM

#

all I want any more involves a C compiler

#

#

just moving my stuff to the render graph improved perf lol

#

frame time dropped by like 3ms and got like 5 more fps out of it

#

#

what it was before

#

gonna try passing in the staging buffer now

#

and removing the copy

#

I don't understand why that improved performance

#

how do computers work

bronze socket Sep 8, 2025, 1:02 AM

#

I assume your frame graph is smarter and emits better barriers and whatnot?

cloud rivet Sep 8, 2025, 1:03 AM

#

it doesn't emit any for any of this though

#

it does for vk transitions that need them

#

lol using the staging buffer works but it is dramatically slower

#

#

haha

broken fog Sep 8, 2025, 1:14 AM

#

cloud rivet if you just want some static website, sure

nah

cloud rivet Sep 8, 2025, 1:14 AM

#

I exchanged a 1ms copy for an extra 10ms to raster

broken fog Sep 8, 2025, 1:14 AM

#

just yeet it on vercel

#

their free plan is kinda nuts

cloud rivet Sep 8, 2025, 1:15 AM

#

yeah vercel would be great actually

#

maybe I can configure this buffer differently

#

 const VkBufferUsageFlagBits usage_flags = VK_BUFFER_USAGE_TRANSFER_SRC_BIT;
  const VkMemoryPropertyFlags mem_properties
      = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT;

#

VK_MEMORY_PROPERTY_HOST_CACHED_BIT bit specifies that memory allocated with this type is cached
on the host. Host memory accesses to uncached memory are slower than to cached memory,
however uncached memory is always host coherent.

#

that also increased the copy to draw image time and both of the submit times by 1ms

#

it's nuts

#

I don't see anyway to do this and it not be slow

#

but

#

I removed VK_MEMORY_PROPERTY_HOST_COHERENT_BIT since I don't need that and went back to using my bitmap I didn't get a win here either though

#

I did get a huge win from just shuffling my code around?

#

ok

#

I'm going to do the simd 2x2 quad stuff jaker brought up

#

I am regularly seeing 45 fps now

#

23ms frames

#

we are doing science in this thread

#

oh

#

also I had a bug with cpu raster and resize, and that's now fixed too by moving everything into the render graph

#

my render graph is fucking magical

#

it makes things faster and fixes bugs for me

#

emergent render graph behavior

bronze socket Sep 8, 2025, 1:37 AM

#

kinda hard to reliably get timings on light computation loads anyway because all the crap happening in your OS is pure noise to the timings

cloud rivet Sep 8, 2025, 1:38 AM

#

true

#

I am pretty certain that this change made it consistently faster

#

I think simd, MT and reduced scale and I'll be unblocked by performance issues on the CPU software raster progress for a bit

cloud rivet Sep 8, 2025, 5:32 AM

#

lol

#

you know what

#

this whole week

#

I have been running with verbose validation enabled KEKW

#

#

#

#

KEKW

cloud rivet Sep 8, 2025, 6:06 AM

#

still going to do the simd idea though

#

and multithreading

#

I'm so close to be finally doing 3D graphics

cloud rivet Sep 8, 2025, 3:15 PM

#

oh reading intel's manual I realized I should be using fused multiply add for my dot product, all kinds of things, I could be using shuffle to set a whole bunch of color values at once based on barycentric coordinates, man

#

but even without avx I should be using fma

#

for better precision

#

as it rounds less

cloud rivet Sep 8, 2025, 3:38 PM

#

I like reading the manual on my ipad on the bus, I also am reading the tinyrenderer lectures for software rasterization

#

I am pretty happy with the tech on this project. I think I have made major strides since Rosy

#

soon I will have nicer pixels

#

I'm serious about ripping out shaders and generating SPIRV with application code. I think in my UI I will make a SPIRV disassembly viewer for debugging. That's a ways off though, I want to get some 3D and lighting going in the CPU software renderer

#

also an IR view

#

maybe an IR view and I can click on or hover over the IR and see the SPIRV dis it will generate

#

I like how I added a string to my render graph nodes to document its intent when debugging I will do that with the IR too

astral hinge Sep 8, 2025, 8:23 PM

#

@cloud rivet I found the code with the help of a former classmate of mine. sadly it looks like it just suspends the thread and resumes it for every sample

📎 Profiler.cpp

#

somehow it gets 1000Hz resolution though, so maybe suspension isn't that expensive

#

here's an example of the output it'd generate when you call GenOutput()

📎 output.csv

cloud rivet Sep 8, 2025, 8:26 PM

#

astral hinge <@116331284749484040> I found the code with the help of a former classmate of mi...

thank you! someone on another discord suggesting using core affinity for the thread I want to sample and then another thread on the same core that wakes up periodically, which guarantees the thread is suspended

#

suspending a thread is very shitty if you have a mutex for example

#

in that thread and other threads

#

maybe things can deadlock

astral hinge Sep 8, 2025, 8:27 PM

#

hmm I can see how it could slow down other threads, but not how it could cause a deadlock

astral hinge Sep 8, 2025, 8:28 PM

#

cloud rivet thank you! someone on another discord suggesting using core affinity for the thr...

I don't fully understand this

#

does that mean checking the state of a thread and only sampling the program counter when it's suspended?

#

you'd also have to make sure it doesn't wake up while you're doing that, right?

cloud rivet Sep 8, 2025, 8:30 PM

#

astral hinge does that mean checking the state of a thread and only sampling the program coun...

numerous examples
http://blog.kalmbachnet.de/?postid=6
http://blog.kalmbachnet.de/?postid=16
http://blog.kalmbachnet.de/?postid=17

Why you should never call Suspend/TerminateThread (Part I) - Jochen...

Infos about windows development and dotNet-Framework and C#

Why you should never call Suspend/TerminateThread (Part II) - Joche...

Infos about windows development and dotNet-Framework and C#

Why you should never call Suspend/TerminateThread (Part III) - Joch...

Infos about windows development and dotNet-Framework and C#

#

seems to be an issue with using CriticalSection apis and other things

#

oh

#

replied to the wrong message

cloud rivet Sep 8, 2025, 8:32 PM

#

astral hinge you'd also have to make sure it doesn't wake up while you're doing that, right?

I guess the idea is that if you're running on the same logical core the other thread can't also be running? I may have misunderstood what they said

astral hinge Sep 8, 2025, 8:32 PM

#

oh I see

#

that's smart

cloud rivet Sep 8, 2025, 8:33 PM

#

The problem is that "printf" internally uses a CriticalSection;

#

haha

#

that sucks

astral hinge Sep 8, 2025, 8:35 PM

#

I see yeah

#

I mean it should be ok if your profiler thread doesn't do anything that takes a lock though

#

maybe GetThreadContext acquires a lock

cloud rivet Sep 8, 2025, 8:55 PM

#

the recommendation is to run stackwalk in a critical section based on the docs 😅

#

it's a cool thing to have

#

I may try it thank you for the codes

#

I will try it once I have a checkbox

#

then I can conditionally turn that on

cloud rivet Sep 9, 2025, 12:21 AM

#

I wonder if the compiler already just optimizes some code to FMA

broken fog Sep 9, 2025, 12:23 AM

#

probably does

#

iirc clang did on arm

#

compiler explorer is your friend

cloud rivet Sep 9, 2025, 12:23 AM

#

ya

#

let me copy my dot product into that and see what it does

astral hinge Sep 9, 2025, 12:24 AM

#

is it allowed if you don't have fast math?

cloud rivet Sep 9, 2025, 12:34 AM

#

it does not https://godbolt.org/z/ssYqdGqen

Compiler Explorer - C

typedef float float4 attribute((ext_vector_type(4)));

typedef float f32;

f32 dot(float4 a, float4 b) {
return a.x * b.x + a.y * b.y + a.z * b.z;
}

#

let me google this fast math thing

#

I saw in the intel manual actually

#

let me actually look at clang compiler args

#

-ffast-math

#

https://godbolt.org/z/GTfrnnPMM

Compiler Explorer - C

typedef float float4 attribute((ext_vector_type(4)));

typedef float f32;

f32 dot(float4 a, float4 b) {
return a.x * b.x + a.y * b.y + a.z * b.z;
}

#

I don't see it there either

#

I don't understand

astral hinge Sep 9, 2025, 12:38 AM

#

astral hinge is it allowed if you don't have fast math?

because fma can have different precision than doing each operation individually

cloud rivet Sep 9, 2025, 12:38 AM

#

128-bit Legacy SSE version: The first source and destination operands are the same. Bits (MAXVL-1:32) of the corresponding the destination register remain unchanged.

#

the registers it uses are SSE

#

MOVHLPS help

This instruction cannot be used for memory to register moves.

128-bit two-argument form:

Moves two packed single-precision floating-point values from the high quadword of the second XMM argument (second operand) to the low quadword of the first XMM register (first argument). The quadword at bits 127:64 of the destination operand is left unchanged. Bits (MAXVL-1:128) of the corresponding destination register remain unchanged.

128-bit and EVEX three-argument form

#

I'm going to compile with -ffast-math

#

hrm

#

running with smiliar frame time

#

using -ffast-math without optimizations just produces the same result as no args

#

https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffast-math

#

Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math. These include:

Floating-point math obeys regular algebraic rules for real numbers (e.g. + and * are associative, x/y == x * (1/y), and (a + b) * c == a * c + b * c),

No NaN or infinite values will be operands or results of floating-point operations,

+0 and -0 may be treated as interchangeable.

-ffast-math also defines the FAST_MATH preprocessor macro. Some math libraries recognize this macro and change their behavior. With the exception of -ffp-contract=fast, using any of the options below to disable any of the individual optimizations in -ffast-math will cause FAST_MATH to no longer be set. -ffast-math enables -fcx-limited-range.

This option implies:

-fno-honor-infinities
-fno-honor-nans
-fapprox-func
-fno-math-errno
-ffinite-math-only
-fassociative-math
-freciprocal-math
-fno-signed-zeros
-fno-trapping-math
-fno-rounding-math
-ffp-contract=fast

Note: -ffast-math causes crtfastmath.o to be linked with code unless -shared or -mno-daz-ftz is present. See A note about crtfastmath.o for more details.

#

and * are associative

#

lol what

#

when is that not the case

#

oh maybe this is floating point spec stuff

#

-ffast-math also defines the FAST_MATH preprocessor macro. Some math libraries recognize this macro and change their behavior.

#

man

#

I am learning so much

#

https://en.cppreference.com/w/cpp/numeric/math/fma

#

the std library checks this

#

if the macro constants FP_FAST_FMA, FP_FAST_FMAF, or FP_FAST_FMAL are defined, the function std::fma evaluates faster (in addition to being more precise) than the expression x * y + z for double, float, and long double arguments, respectively. If defined, these macros evaluate to integer 1.

#

well

#

it checks other things

#

https://en.cppreference.com/w/c/numeric/math/fma

#

why am I linking to cpp

astral hinge Sep 9, 2025, 1:02 AM

#

because your subconscious yearns for longer compile times

astral hinge Sep 9, 2025, 1:04 AM

#

cloud rivet + and * are associative

ye

#

changing the order can affect the result

cloud rivet Sep 9, 2025, 1:05 AM

#

https://godbolt.org/z/eWK9sv5zh

Compiler Explorer - C

typedef float float4 attribute((ext_vector_type(4)));

typedef float f32;

f32 dot(float4 a, float4 b) {
f32 rv = a.x * b.x;
rv = fma(rv, a.y, b.y);
rv = fma(rv, a.z, b.z);
return rv;
}

#

this is with fma

#

CVTSS2SD help

Converts a single-precision floating-point value in the “convert-from” source operand to a double-precision floating-point value in the destination operand. When the “convert-from” source operand is an XMM register, the single-precision floating-point value is contained in the low doubleword of the register. The result is stored in the low quadword of the destination operand.

128-bit Legacy SSE version: The “convert-from” source operand (the second operand) is an XMM register or memory location. Bits (MAXVL-1:64) of the corresponding destination register remain unchanged. The destination operand is an XMM register.

VEX.128 and EVEX encoded versions: The “convert-from” source operand (the third operand) can be an XMM register or a 32-bit memory location. The first source and destination operands are XMM registers. Bits (127:64) of the XMM register destination are copied from the corresponding bits in the first source operand. Bits (MAXVL-1:128) of the destination register are zeroed.

Software should ensure VCVTSS2SD is encoded with VEX.L=0. Encoding VCVTSS2SD with VEX.L=1 may encounter unpredictable behavior across different processor generations.

#

call fma@PLT does actually not appear in the -ffast-math version

#

if I am going just by counting the number of instructions

#

to determine speed

#

not using fma would be faster

#

but

#

it's probably not that simple

#

fma is more about avoiding loss of precision I think?

#

so fma is orthogonal to ffast-math ?

#

k anyway

broken fog Sep 9, 2025, 1:26 AM

#

cloud rivet it's probably not that simple

it isn't

#

x86 instructions don't tell you a lot more than c code

#

who knows what the cpu is actually doing at the hw level

cloud rivet Sep 9, 2025, 1:27 AM

#

stupid out of order execution

broken fog Sep 9, 2025, 1:27 AM

#

and it probably changes for every uarch

cloud rivet Sep 9, 2025, 1:27 AM

#

jk, that makes thing goes faster

broken fog Sep 9, 2025, 1:28 AM

#

cloud rivet stupid out of order execution

just write code for a 386 forgderp2

cloud rivet Sep 9, 2025, 1:28 AM

#

I was reading the intel manual today and the first chapter is about the history of intel processors

#

so it had had the 286/386/486 etc listed and what the innovations were for each architecture

#

it was fun to read about

broken fog Sep 9, 2025, 1:29 AM

#

huh

cloud rivet Sep 9, 2025, 1:29 AM

#

I read like 5 chapters of that thing

broken fog Sep 9, 2025, 1:29 AM

#

idk when ooo was introduced

cloud rivet Sep 9, 2025, 1:29 AM

#

it tells you in there

broken fog Sep 9, 2025, 1:29 AM

#

iirc 386 didn't have it

cloud rivet Sep 9, 2025, 1:29 AM

#

it did not

#

it's much later

broken fog Sep 9, 2025, 1:29 AM

#

pentium something?

cloud rivet Sep 9, 2025, 1:29 AM

#

it's like in the aughts

broken fog Sep 9, 2025, 1:29 AM

#

oh

cloud rivet Sep 9, 2025, 1:29 AM

#

anyway

#

it's in the pdf

broken fog Sep 9, 2025, 1:29 AM

#

netburst possibly

#

or p3

cloud rivet Sep 9, 2025, 1:31 AM

#

#

P6

#

1995

#

I was off

#

The centerpiece of the P6 processor microarchitecture is an out-of-order execution mechanism called dynamic
execution.

#

hrm

#

converting all my raster math to operate on float4[4]'s

#

will start with just hand doing what I already do

#

just operating at 4 float4s at a time

#

then change the raster code to use that

#

and then figure out how to do that with simd

#

that way by the time I get to simd I know the rest of it works and it's just getting the simd to work

cloud rivet Sep 9, 2025, 2:42 AM

#

man adding -ffast-math actually makes things slower

#

wtf

#

the first f is for fail

astral hinge Sep 9, 2025, 2:49 AM

#

you shouldn't use it anyway because it breaks a bunch of stuff

cloud rivet Sep 9, 2025, 2:49 AM

#

what is it good for then

#

that makes sense though

astral hinge Sep 9, 2025, 2:49 AM

#

cloud rivet what is it good for then

excellent question

#

it's probably only useful when applied at the function level

cloud rivet Sep 9, 2025, 2:50 AM

#

I get what it's breaking sort of with floating point implementation

astral hinge Sep 9, 2025, 2:50 AM

#

applying it to the whole application is just asking for trouble

cloud rivet Sep 9, 2025, 2:50 AM

#

hrm

#

how do you apply a compiler arg to just a function

astral hinge Sep 9, 2025, 2:53 AM

#

idk

#

but you can do it to just files

cloud rivet Sep 9, 2025, 2:54 AM

#

not with unity builds lol

astral hinge Sep 9, 2025, 2:54 AM

#

maybe there are pragmas that enable/disable scopes of fast math

#

which compiler are you using?

cloud rivet Sep 9, 2025, 2:55 AM

#

clang 20

#

for the vector language extensions

#

and because idk I can go read the source code for it I guess

astral hinge Sep 9, 2025, 2:56 AM

#

clang has #pragma float_control

#

look for it in here
https://clang.llvm.org/docs/LanguageExtensions.html

cloud rivet Sep 9, 2025, 2:57 AM

#

When pragma float_control(precise, on) is enabled, the section of code governed by the pragma uses precise floating point semantics, effectively -ffast-math is disabled and -ffp-contract=on (fused multiply add) is enabled. This pragma enables -fmath-errno.

#

ah

#

so disable it by default

#

hrm

#

interesting

#

I don't want fast math

#

I hate it

#

I didn't know it existed,but now I do, and I regret it

#

jk

#

I'm sure it has its uses whatever those are

#

big brain stuff

#

rewriting things to be float4[4]s is forcing me to fix my horrific raster code

#

which was just sort of written to work as I figured it out

#

so that's good

#

as long as I don't make anything slower

#

i was writing this garbage when I was recording my 1 hour videos

vagrant musk Sep 9, 2025, 4:18 AM

#

cloud rivet man adding -ffast-math actually makes things slower

My guess is it’s legacy hardware oriented optimizations, potentially?

cloud rivet Sep 9, 2025, 4:20 AM

#

why do you think that?

vagrant musk Sep 9, 2025, 4:21 AM

#

frogshrug I’m don’t do much at a hardware level, but I’m pretty sure floating point math is pretty fast nowadays

#

I also don’t know exactly how fast math works, but I for some reason want to assume it’s doing something in software to avoid some check?

cloud rivet Sep 9, 2025, 4:22 AM

#

I think ffast-math is about taking shortcuts

astral hinge Sep 9, 2025, 4:22 AM

#

ffast-math relaxes certain rules and allows the compiler to make more assumptions

cloud rivet Sep 9, 2025, 4:22 AM

#

and not old hardware imo

#

this work is hard

#

hrmm

#

might be faster to just rewrite the rasterizer

#

idk idk

#

I'm gonna keep giong

cloud rivet Sep 9, 2025, 4:54 AM

#

why is this making my rasterizer even faster

#

#

#

I haven't even done any simd yet

#

I was like at 25 fps 2 days ago

#

and I'm at 80 now

#

my frame rate dropped like 4ms from just migrating halfway to simd

astral hinge Sep 9, 2025, 4:59 AM

#

cloud rivet why is this making my rasterizer even faster

what did you change?

cloud rivet Sep 9, 2025, 5:00 AM

#

I migrated my functions like this

#

float4 cross(float4 a, float4 b);
f32 dot(float4 a, float4 b);

void cross4x4(float4 a[4], float4 b[4], float4 out[4]);
void dot4x4(float4 a[4], float4 b[4], float4 *out);

#

float4 barycentric_coords(float4 pos, p_triangle t);
void barycentric_coords4x4(float4 pos[4], p_triangle t, float4 bc[4]);

#

but

#

I'm just passing in 1 thing

#

for each of those float4 [4]

#

I'm just setting the [0] value right now

astral hinge Sep 9, 2025, 5:01 AM

#

the compiler is probably auto-simd'ing it

#

autovectorization

cloud rivet Sep 9, 2025, 5:01 AM

#

simd for free

astral hinge Sep 9, 2025, 5:02 AM

#

what's the 4x4 naming convention btw

#

for cross and dot it makes dense because there are four of each operand

#

sense*

#

but for barycentric there's still just one triangle

cloud rivet Sep 9, 2025, 5:02 AM

#

it's just like a marker that made sense with those and I kept using it

astral hinge Sep 9, 2025, 5:03 AM

#

btw you can possibly get more perf by making the arguments restrict

cloud rivet Sep 9, 2025, 5:04 AM

#

https://en.cppreference.com/w/c/language/restrict.html

#

hrm

#

I will put this in my notes, I don't understand what that pages is talking about right now

#

thanks

#

I don't want to cargo cult it, I want to thoroughly understand what that means

#

looks super interesting

#

allows the compiler to optimize, but I have to read it carefully

astral hinge Sep 9, 2025, 5:08 AM

#

putting restrict on a pointer means that a write through another pointer won't affect anything pointed to by the first

#

so it means the compiler can assume writes through some other pointer won't change the restrict data, allowing it to possibly emit fewer loads

cloud rivet Sep 9, 2025, 5:09 AM

#

what does writes through mean?

astral hinge Sep 9, 2025, 5:10 AM

#

just dereferencing and storing *foo = 42;

cloud rivet Sep 9, 2025, 5:11 AM

#

so it's a promise that doing *foo = bar; where bar is a pointer wont' change bar?

#

oh wait

#

I get it

#

no I don't get it

astral hinge Sep 9, 2025, 5:13 AM

#

I'll illustrate with a function

#

void foo(int* a, int* b, int* c)
{
  c[0] = a[0] * b[0];
  c[1] = a[0] * b[0];
}

#

so here the compiler must assume that a, b, and c can alias each other (point to overlapping regions of memory)

cloud rivet Sep 9, 2025, 5:18 AM

#

oh

#

I have heard about this before

astral hinge Sep 9, 2025, 5:18 AM

#

so when c[0] is written to, it must assume that a[0] and/or b[0] could have changed (because I use them twice)

cloud rivet Sep 9, 2025, 5:18 AM

#

oh thanks for explaining this

#

yes I think I ran into this with zig before

astral hinge Sep 9, 2025, 5:18 AM

#

if a and b are restrict then the compiler can no longer assume anything written to aliases them

cloud rivet Sep 9, 2025, 5:19 AM

#

that makes sense, thanks!

#

back when I was learning zig I read through the entire spec all the time

#

and there are all these builtins

#

that are I guess kind of like qualifiers in C

#

and I encountered that I think

astral hinge Sep 9, 2025, 5:20 AM

#

also you can use restrict on array parameters like this
void foo(int array[restrict 4]);

#

you can also put static inside the [] to assert that the pointer points to an array with at least 4 elements

cloud rivet Sep 9, 2025, 5:20 AM

#

this is a C only thing?

astral hinge Sep 9, 2025, 5:21 AM

#

yeah

#

without static, it's just a normal pointer parameter disguised as an array

cloud rivet Sep 9, 2025, 5:21 AM

#

is this a new use of the word static?

astral hinge Sep 9, 2025, 5:21 AM

#

yeah

cloud rivet Sep 9, 2025, 5:21 AM

#

that thing means so many things

astral hinge Sep 9, 2025, 5:21 AM

#

https://en.cppreference.com/w/c/keyword/static.html

#

In each function call to a function where an array parameter uses the keyword static between [ and ], the value of the actual parameter must be a valid pointer to the first element of an array with at least as many elements as specified by expression:

cloud rivet Sep 9, 2025, 5:22 AM

#

oh

#

that's cool

astral hinge Sep 9, 2025, 5:23 AM

#

I think it's worth trying in your perf-sensitive math functions. it probably helps with autovectorization

cloud rivet Sep 9, 2025, 5:23 AM

#

I should be using that everywhere

#

thanks

#

you know a lot

astral hinge Sep 9, 2025, 5:23 AM

#

astral hinge I think it's worth trying in your perf-sensitive math functions. it probably hel...

actually it probably doesn't if you already have compile-time loop bounds hmm

#

idk what static could help with

#

https://stackoverflow.com/questions/3430315/what-is-the-purpose-of-static-keyword-in-array-parameter-of-function-like-char

#

oh it implicitly asserts the pointer is non-null

#

using [static 1] to document/assert non-null arguments is interesting

#

ugly syntax though

cloud rivet Sep 9, 2025, 5:29 AM

#

ya

#

my debug build is getting slower and my relese build is getting faster

#

actually

#

I can't really tell on the debug

#

because it's such a tiny number

#

it probably doesn't mean anything

#

it's been hovering around the same values tbh

cloud rivet Sep 9, 2025, 8:44 AM

#

lol

#

#

KEKW

#

I haven't touched the triangle yet, that's just the background

#

_mm_storeu_si128 is so fast

cloud rivet Sep 9, 2025, 3:25 PM

#

I think I've been benefiting mostly accidentally from compiler optimizations as a result of changing my code and adding the compiler flags to enable simd

#

so after moving the rest of the rasterizer to the 4x4 SIMD, I have a plan for MT

#

I am thinking I will have 4 threads that each individually iterate over 1 of the quadrants of all 4x4 quads in a surface in NDC

#

and have them all run down each through the same surface

#

what I like about this is there's no overlap in where each thread will write to

#

I'll work down the bounds of a triangle and do tests on the 4 corners, ie for a 8x8 quad to see whether to even bother with doing any work and each 4x4 will also do a test

#

I need to keep everything aligned, the bitmap itself is the size of the full screen no matter how big the window is so I think it is safe to overwrite

#

on arbitrary window sizes

#

the in window UI windows are a bit trickier

broken fog Sep 9, 2025, 3:40 PM

#

cloud rivet

this with simd? how are you using it?

cloud rivet Sep 9, 2025, 3:41 PM

#

just stick stuff into simd registers, do ops

#

idk

#

it's magic

#

after I'm done reading the intel manual I think I'm going to read the latest C spec, and after that maybe I'll read about clang, since the optimizations it is doing are so dramatically impactful I should learn more?

#

I'm kind of worried that by doing the simd myself I'll actually make shit slower

#

because maybe the compiler is better at autovectorizing than I am at figuring out what I should do when

#

I'll make a change and see how it goes

#

you can see my raster triangle frame time dropped in that screenshot I hadn't even made any changes to it, same with the render text

#

like all I did I think was add the compiler flags

#

render text dropped from 200 micro seconds to 40

cloud rivet Sep 9, 2025, 4:06 PM

#

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3467.pdf

#

https://clang.llvm.org/docs/UsersManual.html

#

that manual prints nicely as a pdf

#

oh there's a -fproc-stat-report so I can just have the compiler print how long it took in stead of me trying to measure with a cli tool

echo crystal Sep 9, 2025, 4:42 PM

#

i think you should get model loading before optimising it too much

#

to get more "real" work

cloud rivet Sep 9, 2025, 5:23 PM

#

I have to write a model importer lol

#

I will just write an obj importer to start with

#

I can’t support gltf until I write a json parser

bronze socket Sep 9, 2025, 5:27 PM

#

you should watch the simdjson talks if you really want a rabbithole

cloud rivet Sep 9, 2025, 6:05 PM

#

I kind of need a png thing too

#

also I need a way to do matrix math

#

I just don't have anything yet

echo crystal Sep 9, 2025, 6:26 PM

#

the pain of having to diy frogsippy

cloud rivet Sep 9, 2025, 6:33 PM

#

it's fun

#

I have 10k lines of single triangle rendering

#

oh my mesh shader has three triangles, that's right

echo crystal Sep 9, 2025, 6:52 PM

#

nice

cloud rivet Sep 9, 2025, 6:55 PM

#

it's a triangle medley

cloud rivet Sep 10, 2025, 3:22 AM

#

this is a cool site https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html

Intel

Intel® Intrinsics Guide

Intel® Intrinsics Guide includes C-style functions that provide access to other instructions without writing assembly code.

astral hinge Sep 10, 2025, 3:44 AM

#

check out uops.info also

cloud rivet Sep 10, 2025, 3:58 AM

#

I don't understand that site at all

#

that's like benchmarking data?

#

I'm just trying to understand what ops to use 😅

#

that's pretty hard

astral hinge Sep 10, 2025, 4:04 AM

#

cloud rivet that's like benchmarking data?

it's architecture specific information about each instruction

#

since x86 instructions are basically high level and are implemented with micro ops

#

so they have different perf characteristics on different arches

cloud rivet Sep 10, 2025, 4:07 AM

#

I think I just don't know enough to make any sense out of the information there

astral hinge Sep 10, 2025, 4:07 AM

#

so I guess that site is more like an alternative to Agner Fog's instruction tables

#

but I mean you can explore and read about different instructions without worrying about how they perform on different arches

#

same as the Intel intrinsics guide

#

well the Intel guide also has the perf info, but only for their arches

#

uops also has AMD arches

cloud rivet Sep 10, 2025, 4:10 AM

#

I see

cloud rivet Sep 10, 2025, 4:44 AM

#

#

#

you know what I did

#

I removed all my hand crafted simd

#

and everything got faster

broken fog Sep 10, 2025, 4:54 AM

#

average simd experience

cloud rivet Sep 10, 2025, 4:54 AM

#

I'm too ignorant

#

the auto vectorization is way better

#

so just rewriting my code

broken fog Sep 10, 2025, 4:54 AM

#

astral hinge so they have different perf characteristics on different arches

when you're at the point you start looking at uarch opts you know you've gone off the deep end

cloud rivet Sep 10, 2025, 4:54 AM

#

and adding -mf16c

broken fog Sep 10, 2025, 4:55 AM

#

cloud rivet the auto vectorization is way better

i mean yeah it's not a you problem compilers are very good at what they do

cloud rivet Sep 10, 2025, 4:55 AM

#

it's those two things I get 100fps with just -mf16c and then another 100fps with my reorg

#

I went back to my code before I reorganized it and just used -mf16c and it went from 70fps to over 100

#

with both I get 200-300fps

#

I am actually undoing more of this code to see I get it even faster, the simd's removed but I did some weird stuff to get simd

cloud rivet Sep 10, 2025, 6:50 AM

#

#

you know what I realized

#

I don't need to test bary centric coordinate every pixel

#

just along the edges do I have to test it for every pixel

#

of the triangle

#

anyway

#

I agree with dodo

#

I think I should start like rendering actual things

cloud rivet Sep 10, 2025, 7:48 AM

#

I've just accidentally keep failing into better perf

broken fog Sep 10, 2025, 11:21 AM

#

cloud rivet just along the edges do I have to test it for every pixel

how do you detect the edges tho thonk

#

one thing i did find out working on my last sw rasterizer is you can calculate the barycentric coords incrementally and save a whole bunch of ops

#

didn't go much further than that tho cause i wrote the whole thing in like an afternoon

cloud rivet Sep 10, 2025, 1:25 PM

#

Yeah actually i need the barycentric coordinates for interpolation

#

So it’s more about outside the triangle I guess

#

You wrote a thing that works in an afternoon though, that’s amazing

#

Incremental is interesting

cloud rivet Sep 10, 2025, 2:23 PM

#

The lesson learned the last few days is write vector code in a way to enable the compiler to batch work without trying to tell it how

#

And to use F16C when using half types

broken fog Sep 10, 2025, 2:37 PM

#

cloud rivet You wrote a thing that works in an afternoon though, that’s amazing

well like two afternoons really

#

but yea

#

it barely works but hey it does work

elfin cape Sep 10, 2025, 8:49 PM

#

@cloud rivet I forgot to respond to you about the rad linker. For my own project I havent used since its okayish but would like to use it at work but the whole codebase and CI is such shit show its not possible.

#

I plan to use it in the next project thats in the works but I dont have that much free time...

cloud rivet Sep 10, 2025, 9:00 PM

#

nice

#

I don't link anything other than win32 and the vulkan dll so I haven't looked into it, and I run a unity build

#

I don't really have any linking problems to solve

#

if it's faster that's cool

elfin cape Sep 10, 2025, 9:01 PM

#

I link quite a lot of things
https://github.com/lukasino1214/foundation/blob/master/vcpkg.json

cloud rivet Sep 10, 2025, 9:02 PM

#

those are good things to link

#

what is daxa

elfin cape Sep 10, 2025, 9:02 PM

#

vulkan abstraction that I use made by lpotrick, saky gabe rundlett

#

its really nice

cloud rivet Sep 10, 2025, 9:03 PM

#

is that used instead of making vk api calls directly?

elfin cape Sep 10, 2025, 9:03 PM

#

yes

cloud rivet Sep 10, 2025, 9:03 PM

#

neat

elfin cape Sep 10, 2025, 9:04 PM

#

this is my task code for render graph
https://github.com/lukasino1214/foundation/blob/master/src/graphics/virtual_geometry/tasks/draw_meshlets_only_depth_masked.inl

#

there is bunch of macros to share code between shaders and C++

elfin cape Sep 10, 2025, 9:05 PM

#

cloud rivet those are good things to link

yes but its expensive 😭

#

sol2 was the most expensive. Just removing it lowered the compile time from 20s to 10s

#

thats on my 7950X

cloud rivet Sep 10, 2025, 9:10 PM

#

10s compile is not horrible

elfin cape Sep 10, 2025, 9:18 PM

#

it could be much better. I do some things that are not optimal. I hope the next project is going to be much better. We are using interfaces and forward declaring as much possible

cloud rivet Sep 10, 2025, 9:18 PM

#

I wish C++ modules were supported better

#

it seems like they aren't really usable?

elfin cape Sep 10, 2025, 9:18 PM

#

the support is okay but compile time is 5x worse

#

compared to headers...

cloud rivet Sep 10, 2025, 9:19 PM

#

uh

#

my unity build has been going well so far

#

it's just a hobby project

#

not a real thing

elfin cape Sep 10, 2025, 9:19 PM

#

You get benefit of C compile times

cloud rivet Sep 10, 2025, 9:20 PM

#

yeah

elfin cape Sep 10, 2025, 9:20 PM

#

no template explosion, etc...

cloud rivet Sep 10, 2025, 9:20 PM

#

compiles under 1 second still

#

I honestly never want to write in anything but C at this point, although I do sometimes miss zig

elfin cape Sep 10, 2025, 9:20 PM

#

I deal with that at work and its not fun at all. I deal with 3 hours compile time. I had to compile today twice...

cloud rivet Sep 10, 2025, 9:21 PM

#

my work's go monolith takes like 10 minutes to compile, but 3 hours jfc

elfin cape Sep 10, 2025, 9:21 PM

#

the worst thing this isnt a monolith thats just one product...

#

I am really happy about getting rid of boost that should cut down the compile times down a lot.

echo crystal Sep 10, 2025, 9:22 PM

#

what linker ?

elfin cape Sep 10, 2025, 9:22 PM

#

echo crystal what linker ?

https://github.com/EpicGamesExt/raddebugger

#

rad linker

#

the reason why the fortnite is listed there is because they reached the limit of symbols in pdbs KEKW

echo crystal Sep 10, 2025, 9:23 PM

#

does it work on loonix

elfin cape Sep 10, 2025, 9:23 PM

#

on linux you have mold

echo crystal Sep 10, 2025, 9:23 PM

#

i think rad debugger doesn't

elfin cape Sep 10, 2025, 9:23 PM

#

echo crystal i think rad debugger doesn't

it will

#Rosy