#L3D2 - x86-64 assembly toy software renderer
1 messages ยท Page 2 of 1
that'll be amazing, too bad I can't help much cuz just started w/ all of this (serious programming) some months ago, so got plenty to learn yet!
ah gl!
thanks! plan to learn asm too w/ your project ๐ or at least know how to read it!
for learning asm would recommend this
its written for 32 bit assembly but you can translate to 64 bit relatively easily
thanks! will go thru' it; so much stuff to learn in this area of computing, can't catch a breath!
congrats, new features looking p f good
just finished the video, the tetrapod felt very good at 60fps too!
good to hear!
morning!
in git, you have this cmd
nasm -f elf64 -o L3d.out && ld -T linker.ld -o L3d L3d.out
but in my machine it failed, however this one worked:
nasm -f elf64 L3d.asm -o L3d.out && ld -T linker.ld -o L3d L3d.out
just added L3d.asm before -o.
then, the cmd ld --verbose > linker.ld generated some lines at the start and end of the linker script (img) that need to be removed for linking to work correctly.
after that, everything worked and could delight myself watching the transparent fish and tetrapod at 60fps
also, didn't need to add the "PHDRS" header or w/e is called, what does it do? (only played the test scene so far, tho)
ohhh yeah the command is wrong
forgot
the custom linker is for marking the .text segment as writable
will update
it makes the editor work
if you try load a uv mapping in the editor without that it will segfault
done
thank you very much; yes, tried the editor afterwards this morning and it sfaulted, so I tried adding the ":text", ":rodata" etc cos ld was complaining that phdrs was not being referenced. That time it compiled and linked correctly only with a warning saying blablah (permissions stff) but then it sfaulted again at startup, however didn't try adding only just ":text" in the ld script, might try that tomorrow, can't rn cos eyes are giving up on me, gn gn
:rodata shouldnt be necasarry
working correctly now, tyvm
small break whilst trying to understand shadow mapping

deserved
found this long time ago, sharing it just in case you find it useful in some way, https://docs.freebsd.org/en/books/developers-handbook/x86/ perhaps the section A.5. Creating Portable Code, you probably know all of this though, anyway
x86 Assembly Language Programming
finished the first 5 lessons of asm tutor, it was very funny to see chapter one finishing with a segfault, lovely
getting there slowly
the entire process is currently understood, just working on understanding a quick way of inverting the view matrix
for converting coords in screen space to world space and then to light space
nearly there
the last step to figuring this out is the reconstruction of the w component when translating from ndc to clip space
after that can start implementing
actually wrote code for first time in a few days
fixed interpolation of uvd to use xmm so its faster
also can now interpolate zclip alongside zndc to make calculating wclip when doing the ndc to clip space calculation trivial
hopefully wont make any performance difference because of the simd rewrite ๐
either way will try start on doing shadow maps tonight
there is an exam today in the afternoon, and thats the last one for 10 days
so maybe can implement shadows and also have time to revise biol
also last day in college today and its study leave after this
- only 4 exams left
lots of time to write this
the simd on the left is the redone interpolation
on the right is the macro that was called 4 times previously to achieve the same result
not bothered to sync up the code on laptop with code on home pc bc this is just a quick change
that's pretty cool, does this mean also that you will be able to have more depth precision? because when you go from one coord system to another, there's always the risk of losing info because rounded precision errors... or so I read; maybe this doesn't make much sense, sorry, just learning about coordinate systems these past days
also good luck with your exams ๐
it does make sense, its just the prescision will be lost regardless because that information is undergoing 4 matrix transformations after that
thats unavoidable though
and its only like .0001 margin
also thx for gl
btw, was reading the man page of nasm and there's this flag -O that it's for optimising branch offsets, did you ever use it? have no idea what it does tho, just found it interesting that such option exists
its for optimising immediate values provided in jump instructions to become relative offsets
it does other stuff too
but not using it no
optimisation is completely off to make it assemble faster
gotcha thanks
little and often seems to be the way
cant do much at one time rn fsr
but whatever
just working on getting a shadow map generated
same here hehe, don't stress it, you have plenty of time
what you're doing is not easy!
gonna try finally finish off shadows
demotivated here as well, got eye surgery and can barely see anything, can't code sob ๐ญ but you got this, ket your subconscious work on it while you take breaks! Pretty sure those shadows will look awesome
hopefully
all frogs need rest
dw brain usually gets unstuck after rest
mm
shadow map
visualised with pygame reading the shadow map which is written to a file as a test
shadow map for this
light position is a little higher, obv will be changed but just to figure some stuff out doing this
this is how it looks when you cull backfaces rather than frontfaces with the default scene
wish I could see it sob
see what?
last screenshots
oh are they not showing?
correctly generated clip space coordinates for drawn points on the screen
next step is the clip space -> view space -> world space transformation
they are, but can't see them because I see everything blurred after surgery hehe. Keep updating, though! I'll check them qll once I recover my vision, and pretty sure Wizard is enjoying the updates as much as I do 
prescision issues, but projection matrix inverse is correctly constructed
and camera matrix too
clip space -> view space -> world space can now happen easily, but it will also happen tomorrow not today bc itsl ate
any algorithm/tricks in mind?
yeah, solved it alr
specifically for multiplying a 4xn matrix by a 4x4 matrix
performance with new alg
significantly better
well its not visible but
it is when doing shit tons of multiplications
~150fps as opposed to the 20 fps from before
I'll look it up when I can, sounds interesting! Pretty sure I'll learn onw thingk or two
look up how matrix multiplication works?
Gee that's an improvement alr
Your code
I also used (tried) aimd for matrix mul but the syntax was super toxic
It worked though
this algorithm works by creating 4 vectors that represent the values in different columns of the matrix
e.g.
does this for each for columns and puts that into a buffer
so 4 times
thats what this does
then for each row in the matrix
it multiplies the row by each of these columns, gets the sum and stores it in the correct place
it keeps the row value in xmm1 so its not modified
so instead the column values are changed out each time
this is more efficient than the other way around because the column vectors are aligned, so its quicker to move them in
probably not a noticable increase but whatever
Beautifuklly explained yhnak you
so right now you have all model, view and projection matrices ready, right? Like you can go from one coord system to another
have a bunch of other questions but I'm afraid they're all not too specific (vague) so will save them for later since you have stuff to do! great stuff so far, loving it
also examss coming up next week iirc? good luck with those! very soon for long break and uni
yeehaw
yea in fact the translations are done
just need to compare the shadowmap vals to the actual depth vals
next week yes
alr did one
*two
Outstanding
boluck!!!
great it doesnt work too well
its sort of creating the imprint of the shadow map rather than an actual shadow map
Also, I saw in LearnOpenGl (a popular ogl tutorial) that you attach a transformation matrix to each object (model matrix) and then at render time you'd multiply this matrix with the view and projection matrices and the position of that specific vertex that you're processing in the vertex shader, to determine the final location on the screen. How does this process look in your engine? No need to go in detail! Was wondering if you will store a transformation matrix per object? Or maybe just store the positions (for example) per object and create a model matrix at render time per object? Was very confused about how to structure this when reading about it ๐
Woooo
objects are stored in world space so a translation from object space isnt important
thats how its done in this engine
can't really see but is the shadow completely opaque?
bug eating time
imma do an experiment and try playing yume nikki (tried it the other day) without seeing crap, let's see how my mind likes it
thought of a very possible cause to the problem
was using the wrong depth val
thought abt it in the showewr and it makes sense wqhy its fucked
still very much dependant on camera position for some reason???
even more so than before
o shit wait
didnt update inv camera matrices
nearly?
few problems such as shadow acne but
getting there
added a bias and that fixes the acne alr
and made the shadows transparent
now to make rotating the camera not fuck the shadows
rotation matrix being a bastard again
hehehe
hard part done!!!!!!!!!
demotivation central over
๐ฅ๐ฅ๐ฅ
nvm
its whatever
needs to make the shadow map larger, that is giving some trouble
eh
and fixing some shadow acne
hopefully should be done soon
but again busy times
exams and work thrown in too now
busy times indeed, but shadow mapping looking good
also, work? do you have a job already b4 uni? if so, that's very impressive!
yeah
its not too bad actually
just heavy lifting and customer service mainly
not sure what the wage is, but by now its at least ยฃ200 so thats cool
get to experience weird software bugs too
such as the item scanner charging ยฃ506000 for water
great for you! good news then
lmao pretty cool
they need some tests over there
esp for the customer
mhm
they can see the cost of everything checked out so they laughed when it did that
happy little accidents
this game also needed a little bit more of testing
oh interesting, the video is not uploaded, thought it was going to (fixed)
lol is this amnesia?
yes it is
horror game AI be goofy sometimes ๐ญ
finally had some time to code
increased shadowmap res
working on reducing some bugs
such as the weird strip along the bottom
this line
unsure why it appears
ok fixed that
now to fix the weird bug where moving the camera simplydoesnt work
well it does but the inverse view gets messed up
not to be picky but, right behind the cube, what happened with the shadow? looks like part of it is mixed with the dark green square, but not really
here?
yes
limited colour depth
its not perfectly blended bc there simply arent enough colours in ansi to do this properly
this is th closest it can get
these are all the available colours
so, does that mean that sometimes you need to be extra careful not to confuse colour limitation with an actual bug? or do you recognise them easily
its fairly easy to recognise colour bugs
gotchu
just made a worthless asm program that sends a desktop notification 
oh, sorry, to be clear, my program is worthless bc it only loads libnotify's fx ptrs using dlopen/dlsym, and then it simply calls these fxs. If you were to do it from scratch, you'd need to interact w/ dbus, which apparently is pain and death since it's not very well documented. Besides learning a wee of asm, as I was reading about dlopen/dlsym, was wondering if it's possible to hot reload some parts of our engines, assuming we divide it into modules... having different .so files for various modules of the engine, then at runtime, we'd use dlopen to load the .so files and dlsym to load the fxs we need... wonder if it's even worth it... lol 
https://github.com/makercrew/dbus-sample some info about how to interact w/ dbus, if you're interested
ya, every time I mess with asm I learn something new, one way or another
fascinating
goes hard
gonna screenshot
congrats, vig, great job 
bg music is no vacations, please by deuteronomy on pure staircase
thx
music added on request of a friend
gr8 ๐
exams done, should have more time now
man nvm then lol
never got around to anything
will just continue on this whenever motivation strikes again then :///
it's ok to rest
picking stuff back up :)
working on quakes texture mapping technique
rather than doing persp correct mapping for each pixel
trying to get double resolution working before the gp direct video
got it working to only process the last pixel of a row and the first pixel
next step is to lerp between also
oh, and draw persp correct every 8 or so pixels
got the subdivision counter set as a macro so it can be adjusted if needed
its going roughly around 300fps rn, and thats without the shadowmap lerping being divided too
2 subdivisions
5 subdivisions
now for the lerping part
screen space subdivision working now kinda
just not doing the end
250fps increase of around 100fps, its not lerping the shadowmap either so thats p good
might get another 100fps increase from the shadowmap
lerps every 8 pixels
updates
๐ซก yes
working on shadow map lerp
using python to visualise the shadowmap for testing
you can kinda see it in the engine but its not ideal
just testing stuff rn
getting there
having a smaller lerp size might be better
yeah that looks a bit better
its written everything but does it run right
no not really
can't wait
looking alr
will add some stained glass and stuff to make it a little more interesting and showcase some more stuff
working on double res
maybe full return to this? who knows
just working on changing the system of adressing points now
bc ofc with double res its a bit wacky
and then got annoyed with how the code is a big ball of mud
will rewrite lots of this
rewrote a decent portion
getting there from scratch
got some new stuff too
like quaternions
not sure too well if they are working properly yet or if it's just a bad perspective projection matrix
double resolution also supported now
having some fun making the code look non shit
clean code is nice
back at it in full speed
maybe
any speed is good
awesome
its fun if u wanna try it
quite like messing around sometimes
u can write pretty code too
that's pretty neat code
yeah
its the reason for the rewrite
alot of the old code is from when didnt really know much asm
as a result its really really bad
hacky even
then rewrite was much needed
yeah
it was really bad
this was before knowledge of macros
so hence magic numbers lying everywhere
big pain to debug before
yeah
the entire engine left
will be done in due time
might go quicker from learning new things while rewriting
got lines and backface culling working
looks much nicer now
things are going nicely
unfortunately spent fucking ages fixing backface culling and only realising at the end that two lines where the wrong way around
resulted in incorrect vectors
was focusing on outputs of cross and dot product for debugging
also discovered the amazing dpps instruction whilst debugging so it wasnt all in vain
also significantly cleaner than the old code
and a bug in which the camera position input had to be absolute is no longer present fsr
not sure why that was a bug initially but whatever
also the winding order check at the end is instead done by doing a bit test on the msb of the result of dot product rather than.... loading it onto the fpu, loading 0 and then doing an instant fpu comparison
which was stupid but ofc also written in very early stages
mhm!
uh
this
however its written in 32 bit asm so there has to be a little bit of translation to 64 bit
this ones better

worked a little on trying to get textures working but not to much avail
could probably fix today but just cant be bothered
its half an hour to midnght and has been working on this thing all day
was rest time
tada
its better than it was in the original too this time
doesnt go all fucky when you go to close
can go as close as you want and its still fine
oop nvm it segfaults if you enter it at a certain angle
whatever can fix that later once screen space subdivision is done
nvm figured it out lol
didnt pop w back off the stack in situations where the object is clipped to near plane
average assembly woes
the link in the README doesn't work fwiw https://github.com/L226n/L3d-engine
it 404's
maybe it's a private repo
yeah
never made the repo
will release it in a bit
just put a notice there so ppl dont think its abandoned
at least its easy to debug lol
when gdb do this u kinda know u forgot to pop or forgot to push somewhere
i would probably unironically use asm if it were portable ๐
yeah thats the only problem
i use high-level asm (C)
yeah
asm is so fun
would be much nicer if it where portable tho yes
arent u trying to target p much everything with your engine
yeah
figures as to use c then
yeah also it's the lang i'm best at
what is the operating system in the screenshots?
linux
with funny xp skin on it
will never port this to windows its too much effort
I thought NASM was portable
RIP
the assembly is portable but
the system calls are the problem
syscall calls the linux kernel to execute certain things, like printing text
getting input, sleep, etc
windows doesnt do it like that
ok so it works with any ISA as long as it's running on a supported linux kernel version
nice
cinnamon skin
this is a software renderer that renders in real time?
yeah
some older ones yea
it should be significantly better now tho
already the rewrite is much more optimised
I always go to disassembly as a last resort, I guess that's all you look at though
mhm
been writing assembly for a little over a year now
only knew python and tiny bits of cpp before that and decided it would be cool to learn something new
really awesome
thank youuu
Image died 
ohh yeah a few old ones are
they where links to images in a server that has since been deleted
probably fastest 4x4 matrix multiplier possible
that doesnt use vgatherdps at least
would use it but it didnt exist in avx1
and isnt supported by processor :p
what about SIMD?
or is that vgatherdps?
movss is SIMD
everything here is simd
the only non simd stuff is the changing of rbx to detect the end of the source matrix
yes xmm
vgatherdps is simd also yes
those are SIMD registers yeah?
yes
neat
xmm0 is 128 bits and here its holding 4 single prescision floats
so all your data has to be aligned, do you ever have to pad?
well
data doesnt have to be aligned
its just its a good idea to
all data defined in the .data segment is aligned to 16 bytes but the allocated data such as stuff from model loading isnt
you can move unaligned data into a register with movups for single prescision but its a little slower than movaps, which is for aligned
ok sorry if this sounds dumb but I imagine you have to work directly with memory addresses, is all the cache and memory addressing handled for you?
this could be that tiny bit faster by ensuring allocated data for the models vertices is 16 byte aligned
cache is handled yes
and memory addressing
to an extent
it's all virtual memory addresses?
prob similar to how it is in C
yeah
how do you keep your understanding intact with respect to your code. I write high level code that is inherently readable and sometimes after a while I go back to code and forget how it works
how do you deal with that
pile of comments
describing exactly what everything does p much
unless its blatantly obvious
is there a time when you will achieve your goals with this project and go to a higher level language?
but will have to read through
no
asm is fun
maybe if it gets boring will prob do smth else but for now its great fun
I'm glad you have found something you enjoy, I can see how it can be a lot of fun
hehe thx
it is tho
imo it is until you have to deal with C ABI calls 
which on windows is about 90% of the time 
actually probably more like 25% but like
that's far more than it is on linux 
no idea abt windows
not using any external libraries at all on this project so ofc its 0% c apis
unless you count syscalls which dont relaly
On windows, syscalls donโt exist
Instead you call WinAPI, which is meant for usage by C code
stack increase, 28h ; adjust stack ptr
mov rcx, %1 ; load %1 into rcx
call ExitProcess ; end program`````` stack increase, 28h ; adjust stack ptr
mov qword rax, [rel sOut] ; load sout handle
; print to console
mov r9, 0 ; no pointer to store the number of characters written
mov rdx, %1 ; load string
mov r8, %2 ; load str length
mov rcx, rax ; move stdout handle to rcx
call WriteConsoleA
stack decrease, 28h ; correct stack ptr (program segfaults on even numbers of prints elsewise)```
; increase increases the stack size
; decrease decreases the stack size
%macro stack 2 ; operation, amount
%ifidn %1, increase
sub rsp, %2
%elifidn %1, decrease
add rsp, %2
%else
%error "Invalid operation. Expected 'increase' or 'decrease'."
%endif
%endmacro
```(stack macro)
oh this is horrid
agreed
the stack manipulation stuff is (according to GPT-4o, which... GPT-4o does not know much about NASM on windows so take this with a grain of salt) because of how window's C ABI works
no idea
I have a decent amount of macros to try to make NASM look more like higher level languages
and I'm definitely not done writing those 
presumably its allocating stack space for the call
but its still a stupid convention
yeah haha noticed
even got errors for when u dont use ur macro right
compare rax, [rel v0], [rel v1]
if l, do_false
PRINT hello, helloLen
do_false:```this is an if statement for me
kinda cursed
I feel like I should actually update that macro so I can say less or maybe even < instead of l
make sure to include the distinction between less-than and below
always confusing that one
below..?
always forgetting if below/above or less-than/greater-than is signed
signed comparisons
ohh
I have not dealt with unsigned
ah right no problem then
it can be useful sometimes to exploit it tho
for instance in a specific segment a number has to be between 0 and some other val
if l, do_false translates to jnl do_false
you can just use one compare if you use an unsigned comparison bc that would mean the twos complement negative is intepreted as higher than the higher bound
which is, kinda jarring
but it reads more like how a higher level lang does, and I couldn't think of a better way to implement it
does below have a corresponding jump instruction?
ight
jz/jnz
je/jne
ja/jna
jb/jnb
jl/jnl
ja/jna
jp/jnp
js/jns
jmp
probably some more too
cant remember
le, ge
; l = less (signed <)
; le = less_equal (signed <=)
; g = greater (signed >)
; ge = greater_equal (signed >=)
; b = below (unsigned <)
; be = below_equal (unsigned <=)
; a = above (unsigned >)
; ae = above_equal (unsigned >=)
```jump operator note updated

I'mma guess
even vs odd
never looked into it bc its probably not needed
no idea what parity means ๐ญ
ur probably right
JP, JPE
Jump if parity
Jump if parity even
JNP, JPO
Jump if not parity
Jump if parity odd
yeah
I ONLY figured that out because of discrete mathematics using parity for even/odd
jpo jpe thats a new one
http://unixwiz.net/techtips/x86-jumps.html
came from intel x86
I think JPO is the same as JNP and JPE is the same as JP based on how this page is formatted
the entirety of windows is wacky
the only good decision they made was making a microkernel instead of a monolithic kernel
whoah there's js and jns?
I'm assuming that's based on positive or negative?
I fail to see how it'd be signed vs unsigned int
yeah sign value
I kinda wish those were in higher level langs tbh
you cant differentiate signed vs unsigned int anyway
same thing
it just tests the msb
yea, that's why I fail to see how it'd do that
because storage wise there's no difference
mhm
signedness is a construct invented by higher level langs in order to sell more compilers ๐ง
real
never understood why high level langs do that anyway its not that hard to deal with them being the same
I mean I know I can just x < 0 or x >= 0
but like
how do I know the compiler is gonna optimize that properly? 
java doesn't even have unsigned 
it probably will optimise that to use test
test?
test eax, 0x80000000
bit test
it performs a bitwise and of the two operands and sets status flags
test eax, 0x80000000
jnz .signed
is quicker than
cmp eax, eax
js .signed
because cmp performs a subtraction of eax from eax
I'm halfway tempted to 
if I abstracted away all your syscalls, would you be willing to accept a PR?
huh
just using it to host the files publicly tbh
if u want to then go ahead
would be happy to see it work on windows lol
basically; on github
people can "fork" other people's project, which creates a copy of it that links back to the original
they can then modify their fork freely, without affecting the original project
and then from there, they make a pull request, where the owner of the project can merge the changes back into the original project
that sounds like it could easily break everything if not done right
could try? shouldnt be too hard... all of the syscalls are in 1 file anyway
I have no idea how to deal with pull requests if the original project is updated after a fork is created
github won't let it merge until the author of the fork figures out how to update their fork to include the latest commits of master, which is not the most straightforward process 
could just keep two versions of the main code - linux ver and windows ver
unless the update to the base project happens to not change any of the files the fork changes
yea that's what I was thinking tbh
move all the existing syscalls into a linux specific file, have a windows specific file, replace the existing file with one that checks operating system using a macro and chooses which os specific file to use based on that
possibly
although with some calls there are situations where some data structs would have to be changed
e.g. sys_ioctl
that gets terminal size in rows + columns, also changes aroundsome settings
not sure if you would be able to do some of that stuff with the windows terminal?
I'm sorta
new to NASM so I most likely don't know enough to port it yet 
and the course I'm taking is for NASM on linux, so I'm having to figure out windows specific stuff on my own
pretty sure winapi has stuff for that
https://stackoverflow.com/questions/23369503/get-size-of-terminal-window-rows-columns
yep, granted it's gonna be ugly in NASM 
oh the terminal has to disable canonical mode too for inputto work properly
canonical mode?
it means that input is polled as soon as you type a character
all the keybinds are read from stdin so
https://learn.microsoft.com/en-us/windows/console/setconsolemode
looks like this is probably related?
; add
; sub
; mul
; div
; idiv
;bit test
;it performs a bitwise and of the two operands and sets status flags
;test eax, 0x80000000
;jnz .signed
;
;is quicker than
;cmp eax, eax
;js .signed
; echo input and line input for non-canonical input processing
```the note collection grows
dont forget imul
very useful instruction, virtually the same thing but instead of having to have your inst as
mul rbx
you can do
imul r9d, dword[addr]
except the second operand of imul can be anything, immediates, registers or memory
much more handy than needing rax to be one of your operands and not being able to multiply by immediates

Also
Do you use the gc unused symbols option of gcc?
gc sections, thatโs the one
I kinda setup my command line to minimize file size
gets a better file size than OZ while using O3 for the stuff Iโve written so far```
nasm -f win64 -o test.obj src/test.asm -O3
gcc -m64 -o test.exe test.obj -lkernel32 -nostdlib -O3 -s -fno-ident -Wl,--strip-all -fno-rtti -foptimize-strlen -fstore-merging -ftree-vectorize -fmerge-all-constants -fomit-frame-pointer -flto -Wl,--gc-sections -e main
Oh?
Donโt think thatโs a thing on windows typically 
yeah
Or maybe at all 
Why does linux get all the fancy and functional ASM/C/C++ stuff while windows gets nothing good for low level 
I have literally had better experiences with VS Code, a microsoft product, on linux, than I have had with VS Code for Windows, a microsoft product
And not only that โ thatโs VS Code on linux in a VM, not even running on actual hardware
oh?
i use msys2 (with the mingw64 backend) on windows versions that support it which gives you an entire unix environment (including a package manager using arch's pacman)
Iโฆ think I am using some distribution of mingw, Iโll look later today ig
Though Iโm not using msys
on windows xp, i just use git for windows which comes bundled with bash (and then i manually download mingw and add it to the path)
separate downloads for mingw exist so you might have one of those
if you downloaded git you may have downloaded mingw
like the git terminal on windows
it's a mingw terminal iirc
I use winget and I get git in powershell
z buffer yay
more efficient from last attempt again because the memory address for the depth buffer isnt recalculated from cartesian coords every pixel
got a spare register this time
getting much improvements
yess

how does one use ld?
just checked, gcc does come with ld
eh I think you might not be able to help me with usage, lol
asking chatgpt moment, because google isn't coming up with many answers
there we go
libkernel32 is located in a completely freaking arbitrary location with this distrobution of mingw, but ok 
test.exe is my gcc params
vvv.exe is ld
this is my test program, not l3d
even with just 4 hello worlds and colors and no dead code, it still makes a pretty decent difference 
this is a nasm program being linked with default ld params, vs the exact same obj file for the exact same nasm program being linked with the param set I have for gcc
-s --gc-sections -e main
adding this to the ld args brings it up to par in size
(basically just tells the linker to remove dead symbols/sections I believe)
am I safe to assume that I can fork l3d-engine, or would I need to fork l3d2, which is private?
oh l3d2 doesnt exist yet hang on
will just quickly finish off whats happening here and then upload it
just needing to finish commenting some segments
https://github.com/L226n/L3D2/tree/main @coral lark

not as many errors as I was expecting...
which is alarming, considering not a single one of those is about syscalls 
it seems to be unhappy with any simd code that exist in l3d
also the extreme lack of rels
lol what
turns out debugging is easier if I compile it to elf64 instead of win64
what does rel do
movaps xmm5, [objbuf+rax] ;load vertex data for point A
can I like
specify a data type for this?
OH right
yeah
whats next after qword hm
try xmmword[objbuf+rax]
on linux the assembler willjust infer the type here bc it cant be anything other than 128 bits
xmmword not defined
ptr is not a nasm keyword [-w+ptr]
Ig add that as a linker arg?
oh wait no that's the nasm command saying not a nasm keyword
C:\Users\User\AppData\Local\bin\NASM\nasm -f elf64 -o l3d.obj l3d.asm -O0 -l l3d.lst -g
u need to use win64 if you want to have an executable on windows
cant use elf64 apparently
unless ur just
yea I'm aware
but the problem is if I do that, I get friccen no good debug info
elf64 gives me the same linker errors but with actual usuable debug info
oh are u just trying to assemble it
I'm trying to work through the linker errors currently
so what was saying that movaps xmm5, [objbuf+rax] wasnt right, the linker?
linker says this
(gcc is calling ld behind the scenes)
calling ld directly says the same thing
nasm gives no warnings/errors
oh yeah its not liking that is it
yeah, it is infact not
movaps xmm0, [rel scratchpad] ;now xmm0 = {X0 Y0 X1 Y1}
this however, is fine
only happens with objbuf
mov rdx, qword[objbuf+rbx] ;move point B XY into rdx
this line causes it too, so it's not because of the movaps either
its bc of the addr
did some reading and its bc the addr here doesnt fit inside 32 bits so its truncated
can u send ur linker arg here?
nasm -f elf64 -o l3d.obj l3d.asm -O0 -l l3d.lst -g
linkers:
gcc -m64 -o l3d-gcc-debug.exe l3d.obj -lkernel32 -nostdlib -Og -g -e _start
ld -o l3d-kd.exe l3d.obj -LC:\MinGW\mingw64\x86_64-w64-mingw32\lib -l:libkernel32.a -s --gc-sections -e _start
ok so
mov qword[rel alloc_data.addr+rcx], rax ;save start addr to new slot
happens here too
I think it might be the pointer math?
its possible yes
yeah seems like it's every time pointer math is being done
probably rather
then again
it works for scratchpad
allocated memory has a very large addr so
yeah not sure why that is
try moving the objbuf definition all the way up to the top of the .data segment
in data.asm
no idea why this is happening tbh
funny that its perfectly okay on linux but breaks the instant you try on windows
same thing
oh
pointer math with variable offset
scratchpad seems to always be used with a constant offset
mov r9, objbuf
add r9, rax
movaps xmm5, [r9] ;load vertex data for point A```yea doing this doesn't cause a linker error
problem is I have no idea if that'll brick something else 
yeah you use r9
okay
if you go through every time a label is used with a register for an offset
and change it to just be a register
that should work
add rax, objbuf
movaps xmm5, [rax] ;load vertex data for point A
add rbx, objbuf
movaps xmm3, [rbx] ;point B
add rcx, objbuf
movaps xmm4, [rcx] ;and point C```like that?
yeah
ok well it has to be [rel objbuf] but ok
should work
now I have to do that acrossed every file that does this
but I'mma head home first, considering I'm currently sitting in school while I don't need to, and my neck hurts
fair enough, gl
Especially considering Iโm cross compiling it to linux too 
ouch
unfortunately pretty clueless for anything that isnt linux asm so this is all new stuff
I think Iโm using an llvm based gcc
So it might be that
from what the internet seems to think that wouldnt make a difference and this is just a basic difference in how win64 works from elf64
@fierce iris any ideas?
eh?
these linker errors for any lines featuring stuff like [pointer+rax]
read through this and all i can say is average windows tomfoolery
It is comical to me how much better low level linux is than low level windows
nah not at all ๐
i do linux but not even low level
the only time i used asm was when i made some glue code for a crappy OS i made a long while ago
the lowest thing i actually use is C
well, i really only do C lol
i haven't found a need for any other systems/general-purpose lang
you sure there's no better way?
feels a bit scuffed to me
but I'm starting to get link errors on l3d finally 
ok l3d is now the only file giving link errors
I now have a non functional l3d exe 
wait 
but I assembled this for an elf, how is it running when I run it with gdb
push rdx
mov rdx, rcx
lea rcx, [rel alloc_data.addr]
add rcx, rdx
pop rdx
mov [rcx], rax ;save start addr to new slot```turns out I have to do it this way it seems 
ohh yeah ofx
ofc
lea doesnt work properly
wtf windows????
just denied lea of its effective purpose
yea but u had to change it yes?
idk what lea exactly does
mov [alloc_data.addr + rcx], rax ;save start addr to new slot
```this is in place of this, which windows throws a fit over
ohh right
I...
think I am gonna macro that, because that's messy
lea loads the address specified in brackets e.g.
mov rax, 7
mov rdx, 51
lea rcx, [rax+rdx+9]
rcx = 7+51+9
its extremely useful but clearly on windows this just wouldnt work
because of the mixing of registers and immediates
yeah
although here you have lea rcx, [rel alloc_data.addr], and its better do just do mov rcx, alloc_data.addr
same function just quicker
thought that lea instruction was in the original lol
but lea just being next to useless on windows is fucked
mov ecx, dword[alloc_data.pointer] ;get pointer for new alloc here
mov qword[alloc_data.addr+rcx], rax ;save start addr to new slot
add dword[alloc_data.pointer], 12 ;then increase the pointer```original code is just this
tried that
causes segfault
I'm doing a set of instructions equivalent to
add rcx, addr but with lea instead of add, so I have to do a roundabout thing
also I just thought of something that probably won't work
absolutely amazed at the way asm on windows works this is horrible
dword[rbx]+rbx does infact not work

have to load them first
lea_offset rbx, [rel alloc_data.addr]
mov dword[rbx], eax ;move the length allocated here```does not like this one though
segfault
tf
rbx is the problem
try inspecting rbx value after lea
if its able to load the addr alloc_data.addr into rbx it shouldnt segfault after
<- complete noob at gdb (has no idea how to inspect stuff)
lea_offset rbx, [rel alloc_data.addr]
.b:
mov dword[rbx], eax ;move the length allocated here
add a .b after the lea value, then launch gdb and type b <whatever the parent label for .b is>.b
then type r to run the program, it will stop at that .b breakpoint
then type i r
which will show all registers
also is there a more efficient way to start gdb?
kinda annoying going
C:\users\user\downloads\gdb.exe (don't ask)
exec file
file file
run```every time
if you pass the executable as an argument like gdb l3d.exe
not sure if that would work like that on windows? hopefully it does
_alloc.b
random but okay
oh ok
so it's segfaulting somewhere else
just try moving the .b around until you get to an instruction that segfaults
I'd imagine ecx should not be 0?
can you run a debugger and it catches where the segfault happens?
it catches the segfault but it doesn't understand what's going on with the stack where it segfaults
ok but it shows you where though right, which instruction?
doing this
but with 100 instead of 40bf3
which actually might be instruction number
if that's the case, it's in load_ltx
in a spot that doesn't really make sense
yeah you can actually
forgot abt that
if u run it it will catch on a specific point and tell u where it is
ok but where 
have u inserted and pushes/pops?
1 push/pop pair and it's not the problem
hmm
where does the program go to after _alloc? 
would just insert random breakpoints further and further into the program until you just happen to go past it
it returns
what even calls _alloc?
l3d.asm
_start
allocates memory for the framebuffer and depth buffer
if u havent translated the syscall properly its not gonna work and will cause a segfault
because it gets the return addr of the allocated data in rax after the syscall
which if its then used to address stuff will cause problems
send the full code for _alloc
I would be really surprised if gdb couldn't catch a segfault
it cant when theres stack issues
but those are easy to debug
if its segfaulting at ret thats almost definitely stack stuff
I just noticed a stray pop rax
nvm not stray
lea_offset does a push and pop on r9 to use as a temp var
set a breakpoint at the start and end of _alloc
then show values for rsp register at both
equal
0 at end
not sure whats happening here lol
incomprehensible
segfault at ret is wacky af
try pushing rsp and rbp then popping them back at the end
sub rsp, 32 ; Allocate 32 bytes of shadow space
xor rcx, rcx
mov rdx, [rel alloc_data.available]
mov r8, 0x3000 ; Set flAllocationType = MEM_COMMIT | MEM_RESERVE (r8)
mov r9, 0x40 ; Set flProtect = PAGE_READWRITE (r9)
.f:
call VirtualAlloc
add rsp, 32 ; Allocate 32 bytes of shadow space```solution
๐ thanks microsoft
yes
yea (as far as I can tell)
gets through it and to the next method
thats good
push rcx
lea_offset rcx, [rel alloc_data.addr]
mov qword[rcx], rax ;save start addr to new slot
pop rcx```for some reason, trying to use push and pop for rcx results in rcx being 0
if _init_screen works too then _alloc definitely works too
rcx is 0 after the pop or after the push?

after the pop
it was infact not 0 at first
rcx is a caller saved register so that might have something to do with it??
but pushing isnt a call so
uh
weird stuff
maybe avoid using rcx for your lea if thats happening
smth like r8 which isnt used so often
chatgpt thinks the reason is because I modify rcx after pushing
wouldn't that kinda defeat the entire purpose of push and pop if so? 
yeah
chatgpt is wrong there
what does the lea_offset macro look like?
is it just an alias for lea?
%macro lea_offset 2
push r9
mov r9, %1
lea %1, %2
add %1, r9
pop r9
%endmacro
leas %2 to %1, then offsets by %1
will test this on linux
yeah no its saved
much confusion
maybe a windows thing?
why is not lea rdi, rdi valid?
ohh
this also does nothing
same as mov rdi, rdi
which is in essence the same as nop
try this
5
and yet it doesnt work on this situation
lea %1, %2
add %1, r9``````
mov %1, %2
add %1, r9```
lea doesn't cause a segfault but mov does in that lea_offset macro I defined
wait...
rcx is 0, even though rax isn't, and I assign rax's value to rcx
er
no I do the inverse but why is rax not 0
oh
rax shouldnt be zero it should be the addr of the data allocated?
also for this isnt it easier to just do add %1 %2? or does that not work
does not work
๐ญ
oh so its looping correctly until the last?
nope
huh
its prob bc [rdi+8] then
if windows doesnt like those addresses
if its not then check rdis value
iteration 157 is when it errors
rdi is 6295541
sounds alright
if rdi is behaving itself for 157 iterations there shouldnt be a reason for it to break unless either
_alloc didnt allocate enough data or:
rdi starts off as 6291456
actually those numbers came from two separate runs
ah that figures
bc the difference there is 4085
difference should be a multiple of UNIT_SIZE, so 26
6291456
6295541
guess it's consistent
HEADER_LEN acctually accounts for the odd diff
yeah




