#L3D2 - x86-64 assembly toy software renderer

1 messages ยท Page 2 of 1

sand copper
#

you got right there

#

what's the next step after the supah release? what's in your mind?

eternal snow
#

shadows

#

wanting to do that

sand copper
#

that'll be amazing, too bad I can't help much cuz just started w/ all of this (serious programming) some months ago, so got plenty to learn yet!

eternal snow
#

ah gl!

sand copper
#

thanks! plan to learn asm too w/ your project ๐Ÿ™‚ or at least know how to read it!

eternal snow
#

for learning asm would recommend this

#

its written for 32 bit assembly but you can translate to 64 bit relatively easily

sand copper
#

thanks! will go thru' it; so much stuff to learn in this area of computing, can't catch a breath!

sand copper
#

congrats, new features looking p f good

#

just finished the video, the tetrapod felt very good at 60fps too!

eternal snow
#

good to hear!

sand copper
#

morning!

in git, you have this cmd

nasm -f elf64 -o L3d.out && ld -T linker.ld -o L3d L3d.out

but in my machine it failed, however this one worked:

nasm -f elf64 L3d.asm -o L3d.out && ld -T linker.ld -o L3d L3d.out

just added L3d.asm before -o.

then, the cmd ld --verbose > linker.ld generated some lines at the start and end of the linker script (img) that need to be removed for linking to work correctly.

after that, everything worked and could delight myself watching the transparent fish and tetrapod at 60fps

sand copper
#

also, didn't need to add the "PHDRS" header or w/e is called, what does it do? (only played the test scene so far, tho)

eternal snow
#

ohhh yeah the command is wrong

#

forgot

#

the custom linker is for marking the .text segment as writable

#

will update

#

it makes the editor work

#

if you try load a uv mapping in the editor without that it will segfault

#

done

sand copper
#

thank you very much; yes, tried the editor afterwards this morning and it sfaulted, so I tried adding the ":text", ":rodata" etc cos ld was complaining that phdrs was not being referenced. That time it compiled and linked correctly only with a warning saying blablah (permissions stff) but then it sfaulted again at startup, however didn't try adding only just ":text" in the ld script, might try that tomorrow, can't rn cos eyes are giving up on me, gn gn

eternal snow
#

:rodata shouldnt be necasarry

sand copper
#

working correctly now, tyvm

eternal snow
#

small break whilst trying to understand shadow mapping

sand copper
#

deserved

#

finished the first 5 lessons of asm tutor, it was very funny to see chapter one finishing with a segfault, lovely

eternal snow
#

getting there slowly

#

the entire process is currently understood, just working on understanding a quick way of inverting the view matrix

#

for converting coords in screen space to world space and then to light space

eternal snow
#

nearly there

eternal snow
#

the last step to figuring this out is the reconstruction of the w component when translating from ndc to clip space

#

after that can start implementing

eternal snow
#

actually wrote code for first time in a few days

#

fixed interpolation of uvd to use xmm so its faster

#

also can now interpolate zclip alongside zndc to make calculating wclip when doing the ndc to clip space calculation trivial

#

hopefully wont make any performance difference because of the simd rewrite ๐Ÿ˜›

#

either way will try start on doing shadow maps tonight

#

there is an exam today in the afternoon, and thats the last one for 10 days

#

so maybe can implement shadows and also have time to revise biol

#

also last day in college today and its study leave after this

#
  • only 4 exams left
#

lots of time to write this

#

the simd on the left is the redone interpolation

#

on the right is the macro that was called 4 times previously to achieve the same result

#

not bothered to sync up the code on laptop with code on home pc bc this is just a quick change

sand copper
#

also good luck with your exams ๐ŸŒ 

eternal snow
#

thats unavoidable though

#

and its only like .0001 margin

#

also thx for gl

sand copper
#

btw, was reading the man page of nasm and there's this flag -O that it's for optimising branch offsets, did you ever use it? have no idea what it does tho, just found it interesting that such option exists

eternal snow
#

its for optimising immediate values provided in jump instructions to become relative offsets

#

it does other stuff too

#

but not using it no

#

optimisation is completely off to make it assemble faster

sand copper
#

gotcha thanks

eternal snow
#

little and often seems to be the way

#

cant do much at one time rn fsr

#

but whatever

#

just working on getting a shadow map generated

sand copper
#

same here hehe, don't stress it, you have plenty of time wicked what you're doing is not easy!

eternal snow
#

gonna try finally finish off shadows

eternal snow
#

got...... somewhere

#

generated a shadow map but got demotivated again

sand copper
#

demotivated here as well, got eye surgery and can barely see anything, can't code sob ๐Ÿ˜ญ but you got this, ket your subconscious work on it while you take breaks! Pretty sure those shadows will look awesome

eternal snow
#

hopefully

graceful sage
#

froge all frogs need rest
dw brain usually gets unstuck after rest

eternal snow
#

mm

eternal snow
#

shadow map

#

visualised with pygame reading the shadow map which is written to a file as a test

#

shadow map for this

#

light position is a little higher, obv will be changed but just to figure some stuff out doing this

#

this is how it looks when you cull backfaces rather than frontfaces with the default scene

sand copper
#

wish I could see it sob

eternal snow
#

see what?

sand copper
#

last screenshots

eternal snow
#

oh are they not showing?

eternal snow
#

correctly generated clip space coordinates for drawn points on the screen

#

next step is the clip space -> view space -> world space transformation

sand copper
# eternal snow oh are they not showing?

they are, but can't see them because I see everything blurred after surgery hehe. Keep updating, though! I'll check them qll once I recover my vision, and pretty sure Wizard is enjoying the updates as much as I do frogeheart

eternal snow
#

ahh right

#

get well soon then

#

didnt consider that oops

sand copper
#

Dw dw, thank you

#

I wish more people would come to dee your work!

eternal snow
#

ah maybe one day

#

glad u enjoy it so much

eternal snow
#

prescision issues, but projection matrix inverse is correctly constructed

eternal snow
#

and camera matrix too

#

clip space -> view space -> world space can now happen easily, but it will also happen tomorrow not today bc itsl ate

eternal snow
#

seems to be working okay

#

for world space translation

eternal snow
#

oop matrix multiplication is being a bit too cpu intensive

#

time to make it faster

sand copper
#

any algorithm/tricks in mind?

eternal snow
#

yeah, solved it alr

#

specifically for multiplying a 4xn matrix by a 4x4 matrix

#

performance with new alg

#

significantly better

#

well its not visible but

#

it is when doing shit tons of multiplications

#

~150fps as opposed to the 20 fps from before

sand copper
#

I'll look it up when I can, sounds interesting! Pretty sure I'll learn onw thingk or two

eternal snow
#

look up how matrix multiplication works?

sand copper
eternal snow
#

yea

#

simd goes crazy

sand copper
#

Your code

eternal snow
#

alr

#

can explain if u want its not too complex

sand copper
#

I also used (tried) aimd for matrix mul but the syntax was super toxic

#

It worked though

eternal snow
#

this algorithm works by creating 4 vectors that represent the values in different columns of the matrix

#

e.g.

#

does this for each for columns and puts that into a buffer

#

so 4 times

#

thats what this does

#

then for each row in the matrix

#

it multiplies the row by each of these columns, gets the sum and stores it in the correct place

#

it keeps the row value in xmm1 so its not modified

#

so instead the column values are changed out each time

#

this is more efficient than the other way around because the column vectors are aligned, so its quicker to move them in

#

probably not a noticable increase but whatever

sand copper
#

Beautifuklly explained yhnak you

#

so right now you have all model, view and projection matrices ready, right? Like you can go from one coord system to another

sand copper
#

have a bunch of other questions but I'm afraid they're all not too specific (vague) so will save them for later since you have stuff to do! great stuff so far, loving it

#

also examss coming up next week iirc? good luck with those! very soon for long break and uni frogapprove froge yeehaw

eternal snow
#

just need to compare the shadowmap vals to the actual depth vals

eternal snow
#

alr did one

#

*two

sand copper
eternal snow
#

great it doesnt work too well

#

its sort of creating the imprint of the shadow map rather than an actual shadow map

sand copper
#

Also, I saw in LearnOpenGl (a popular ogl tutorial) that you attach a transformation matrix to each object (model matrix) and then at render time you'd multiply this matrix with the view and projection matrices and the position of that specific vertex that you're processing in the vertex shader, to determine the final location on the screen. How does this process look in your engine? No need to go in detail! Was wondering if you will store a transformation matrix per object? Or maybe just store the positions (for example) per object and create a model matrix at render time per object? Was very confused about how to structure this when reading about it ๐Ÿ˜

sand copper
eternal snow
#

thats how its done in this engine

sand copper
eternal snow
#

yeah

#

its also wrong

sand copper
#

bug eating time

#

imma do an experiment and try playing yume nikki (tried it the other day) without seeing crap, let's see how my mind likes it

eternal snow
#

thought of a very possible cause to the problem

#

was using the wrong depth val

#

thought abt it in the showewr and it makes sense wqhy its fucked

eternal snow
#

still very much dependant on camera position for some reason???

#

even more so than before

#

o shit wait

#

didnt update inv camera matrices

#

nearly?

#

few problems such as shadow acne but

#

getting there

#

added a bias and that fixes the acne alr

eternal snow
#

and made the shadows transparent

#

now to make rotating the camera not fuck the shadows

#

rotation matrix being a bastard again

#

hehehe

#

hard part done!!!!!!!!!

#

demotivation central over

sand copper
#

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

eternal snow
#

nvm

#

its whatever

#

needs to make the shadow map larger, that is giving some trouble

#

eh

#

and fixing some shadow acne

#

hopefully should be done soon

#

but again busy times

#

exams and work thrown in too now

sand copper
eternal snow
#

job

#

11 hour work day at a farm shop

sand copper
#

sounds rough, hope you're doing good

#

guessing it's temporary until you get to uni?

eternal snow
#

yeah

#

its not too bad actually

#

just heavy lifting and customer service mainly

#

not sure what the wage is, but by now its at least ยฃ200 so thats cool

#

get to experience weird software bugs too

#

such as the item scanner charging ยฃ506000 for water

sand copper
sand copper
#

they need some tests over there

eternal snow
#

prehaps

#

or they keep it bc its funy

sand copper
#

esp for the customer

eternal snow
#

mhm

#

they can see the cost of everything checked out so they laughed when it did that

sand copper
#

happy little accidents

sand copper
#

oh interesting, the video is not uploaded, thought it was going to (fixed)

graceful sage
#

lol is this amnesia?

sand copper
#

yes it is

graceful sage
#

horror game AI be goofy sometimes ๐Ÿ˜ญ

eternal snow
#

finally had some time to code

#

increased shadowmap res

#

working on reducing some bugs

#

such as the weird strip along the bottom

#

this line

#

unsure why it appears

#

ok fixed that

#

now to fix the weird bug where moving the camera simplydoesnt work

#

well it does but the inverse view gets messed up

sand copper
#

not to be picky but, right behind the cube, what happened with the shadow? looks like part of it is mixed with the dark green square, but not really

eternal snow
sand copper
#

yes

eternal snow
#

limited colour depth

#

its not perfectly blended bc there simply arent enough colours in ansi to do this properly

#

this is th closest it can get

sand copper
#

oh, didn't know about that

#

that will produce interesting effects

eternal snow
#

these are all the available colours

sand copper
#

so, does that mean that sometimes you need to be extra careful not to confuse colour limitation with an actual bug? or do you recognise them easily

eternal snow
#

its fairly easy to recognise colour bugs

sand copper
#

gotchu

sand copper
#

just made a worthless asm program that sends a desktop notification froge_yeehaw

eternal snow
#

that's p good

#

how d you interface that?

sand copper
#

oh, sorry, to be clear, my program is worthless bc it only loads libnotify's fx ptrs using dlopen/dlsym, and then it simply calls these fxs. If you were to do it from scratch, you'd need to interact w/ dbus, which apparently is pain and death since it's not very well documented. Besides learning a wee of asm, as I was reading about dlopen/dlsym, was wondering if it's possible to hot reload some parts of our engines, assuming we divide it into modules... having different .so files for various modules of the engine, then at runtime, we'd use dlopen to load the .so files and dlsym to load the fxs we need... wonder if it's even worth it... lol frogegreenexcited

eternal snow
#

pjj rogjt

#

*oh right

#

interesting anyway

sand copper
#

ya, every time I mess with asm I learn something new, one way or another

#

fascinating

eternal snow
#

shadows almost perfected

#

fixed the rotation messing things up

eternal snow
#

goes hard

graceful sage
#

froge_love gonna screenshot

sand copper
#

congrats, vig, great job froge_love

eternal snow
#

thx

#

music added on request of a friend

sand copper
#

gr8 ๐Ÿ‘frogapprove

eternal snow
#

exams done, should have more time now

eternal snow
#

man nvm then lol

#

never got around to anything

#

will just continue on this whenever motivation strikes again then :///

graceful sage
#

froge it's ok to rest

eternal snow
#

picking stuff back up :)

#

working on quakes texture mapping technique

#

rather than doing persp correct mapping for each pixel

#

trying to get double resolution working before the gp direct video

eternal snow
#

got it working to only process the last pixel of a row and the first pixel

#

next step is to lerp between also

#

oh, and draw persp correct every 8 or so pixels

#

got the subdivision counter set as a macro so it can be adjusted if needed

#

its going roughly around 300fps rn, and thats without the shadowmap lerping being divided too

#

2 subdivisions

#

5 subdivisions

#

now for the lerping part

eternal snow
#

1 pixel lerp

#

looping it is an issue bc of the labels being messed up

eternal snow
#

screen space subdivision working now kinda

#

just not doing the end

#

250fps increase of around 100fps, its not lerping the shadowmap either so thats p good

#

might get another 100fps increase from the shadowmap

#

lerps every 8 pixels

graceful sage
#

him updates

eternal snow
#

mhm

#

finally

graceful sage
#

๐Ÿซก yes

eternal snow
#

finished

#

for texture mapping that is

#

soon to make it work for the shadowmap also

eternal snow
#

working on shadow map lerp

#

using python to visualise the shadowmap for testing

#

you can kinda see it in the engine but its not ideal

#

just testing stuff rn

eternal snow
#

getting there

#

having a smaller lerp size might be better

#

yeah that looks a bit better

#

its written everything but does it run right

#

no not really

eternal snow
#

theres the problem

#

guess where it lerps lol

#

finished

#

p good

eternal snow
#

works in editor too

#

now to work on the gp direct video

graceful sage
#

bigfrog can't wait

eternal snow
#

looking alr

#

will add some stained glass and stuff to make it a little more interesting and showcase some more stuff

eternal snow
eternal snow
#

working on double res

#

maybe full return to this? who knows

#

just working on changing the system of adressing points now

#

bc ofc with double res its a bit wacky

eternal snow
#

and then got annoyed with how the code is a big ball of mud

#

will rewrite lots of this

eternal snow
#

rewrote a decent portion

#

getting there from scratch

#

got some new stuff too

#

like quaternions

#

not sure too well if they are working properly yet or if it's just a bad perspective projection matrix

#

double resolution also supported now

#

having some fun making the code look non shit

#

clean code is nice

graceful sage
#

bigfrog back at it in full speed

eternal snow
#

maybe

graceful sage
#

KingPray any speed is good

eternal snow
#

awesome

graceful sage
#

I really should mess around with assembly at some point shrimple

graceful sage
#

him that's pretty neat code

eternal snow
#

yeah

#

its the reason for the rewrite

#

alot of the old code is from when didnt really know much asm

#

as a result its really really bad

#

hacky even

graceful sage
#

frogapprove then rewrite was much needed

eternal snow
#

yeah

#

it was really bad

#

this was before knowledge of macros

#

so hence magic numbers lying everywhere

graceful sage
#

froge_sad big pain to debug before

eternal snow
#

yeah

graceful sage
#

refactoring done or still a few parts left?

eternal snow
#

the entire engine left

graceful sage
#

KingPray will be done in due time

#

might go quicker from learning new things while rewriting

eternal snow
#

got lines and backface culling working

#

looks much nicer now

#

things are going nicely

#

unfortunately spent fucking ages fixing backface culling and only realising at the end that two lines where the wrong way around

#

resulted in incorrect vectors

#

was focusing on outputs of cross and dot product for debugging

#

also discovered the amazing dpps instruction whilst debugging so it wasnt all in vain

#

also significantly cleaner than the old code

#

and a bug in which the camera position input had to be absolute is no longer present fsr

#

not sure why that was a bug initially but whatever

#

also the winding order check at the end is instead done by doing a bit test on the msb of the result of dot product rather than.... loading it onto the fpu, loading 0 and then doing an instant fpu comparison

#

which was stupid but ofc also written in very early stages

eternal snow
#

new code vs

#

old code (shit)

graceful sage
#

got much done

eternal snow
#

mhm!

graceful sage
#

๐Ÿซก

#

do you have a recommendation for where to get started with assembly? froge

eternal snow
#

uh

#

this

#

however its written in 32 bit asm so there has to be a little bit of translation to 64 bit

#

this ones better

graceful sage
#

will check it out

eternal snow
#

barycentric coordinate calculator working again

#

nice

graceful sage
eternal snow
#

worked a little on trying to get textures working but not to much avail

#

could probably fix today but just cant be bothered

#

its half an hour to midnght and has been working on this thing all day

graceful sage
#

forgeeep was rest time

eternal snow
#

closer

#

fixed a bug with texture loading

eternal snow
#

tada

#

its better than it was in the original too this time

#

doesnt go all fucky when you go to close

#

can go as close as you want and its still fine

#

oop nvm it segfaults if you enter it at a certain angle

#

whatever can fix that later once screen space subdivision is done

#

nvm figured it out lol

#

didnt pop w back off the stack in situations where the object is clipped to near plane

fierce iris
#

average assembly woes

twin quest
#

it 404's

#

maybe it's a private repo

eternal snow
#

yeah

#

never made the repo

#

will release it in a bit

#

just put a notice there so ppl dont think its abandoned

eternal snow
#

when gdb do this u kinda know u forgot to pop or forgot to push somewhere

fierce iris
#

i would probably unironically use asm if it were portable ๐Ÿ’€

eternal snow
#

yeah thats the only problem

fierce iris
#

i use high-level asm (C)

eternal snow
#

yeah

#

asm is so fun

#

would be much nicer if it where portable tho yes

#

arent u trying to target p much everything with your engine

fierce iris
#

yeah

eternal snow
#

figures as to use c then

fierce iris
#

yeah also it's the lang i'm best at

twin quest
#

what is the operating system in the screenshots?

eternal snow
#

linux

#

with funny xp skin on it

#

will never port this to windows its too much effort

twin quest
#

I thought NASM was portable

eternal snow
#

not sure the windows terminal could support it either

#

uh

#

well

twin quest
#

RIP

eternal snow
#

the assembly is portable but

#

the system calls are the problem

#

syscall calls the linux kernel to execute certain things, like printing text

#

getting input, sleep, etc

#

windows doesnt do it like that

twin quest
#

ok so it works with any ISA as long as it's running on a supported linux kernel version

eternal snow
#

no idea how windows does it

#

it should work on any 64 bit linux distro

twin quest
#

nice

eternal snow
#

regardless of kernel ver

#

not sure if it would work on bsd

#

no idea how bsd works

twin quest
#

this is a cool project

#

what is the windows manager that you themed to look like xp?

eternal snow
#

cinnamon skin

eternal snow
#

happy that ppl find this cool

twin quest
#

this is a software renderer that renders in real time?

eternal snow
#

yeah

twin quest
#

oh I see you report FPS in some of the screenshots

#

neat

eternal snow
#

some older ones yea

#

it should be significantly better now tho

#

already the rewrite is much more optimised

twin quest
#

what kind of debugger do you use with your project?

#

oh gdb

#

neat

eternal snow
#

yeah

#

gdb is great

twin quest
#

I always go to disassembly as a last resort, I guess that's all you look at though

eternal snow
#

mhm

#

been writing assembly for a little over a year now

#

only knew python and tiny bits of cpp before that and decided it would be cool to learn something new

twin quest
#

really awesome

eternal snow
#

thank youuu

coral lark
#

Image died froge_sad

eternal snow
#

ohh yeah a few old ones are

#

they where links to images in a server that has since been deleted

eternal snow
#

probably fastest 4x4 matrix multiplier possible

#

that doesnt use vgatherdps at least

#

would use it but it didnt exist in avx1

#

and isnt supported by processor :p

coral lark
#

what about SIMD?
or is that vgatherdps?

twin quest
#

movss is SIMD

eternal snow
#

the only non simd stuff is the changing of rbx to detect the end of the source matrix

twin quest
#

yes xmm

eternal snow
#

vgatherdps is simd also yes

twin quest
#

those are SIMD registers yeah?

eternal snow
#

yes

twin quest
#

neat

eternal snow
#

xmm0 is 128 bits and here its holding 4 single prescision floats

twin quest
#

so all your data has to be aligned, do you ever have to pad?

eternal snow
#

well

#

data doesnt have to be aligned

#

its just its a good idea to

#

all data defined in the .data segment is aligned to 16 bytes but the allocated data such as stuff from model loading isnt

#

you can move unaligned data into a register with movups for single prescision but its a little slower than movaps, which is for aligned

twin quest
#

ok sorry if this sounds dumb but I imagine you have to work directly with memory addresses, is all the cache and memory addressing handled for you?

eternal snow
#

this could be that tiny bit faster by ensuring allocated data for the models vertices is 16 byte aligned

#

cache is handled yes

#

and memory addressing

#

to an extent

twin quest
#

it's all virtual memory addresses?

eternal snow
#

prob similar to how it is in C

twin quest
#

oh ok

#

so the OS handles it

eternal snow
#

yeah

twin quest
#

how do you keep your understanding intact with respect to your code. I write high level code that is inherently readable and sometimes after a while I go back to code and forget how it works

#

how do you deal with that

eternal snow
#

pile of comments

#

describing exactly what everything does p much

#

unless its blatantly obvious

twin quest
#

is there a time when you will achieve your goals with this project and go to a higher level language?

eternal snow
#

but will have to read through

#

no

#

asm is fun

#

maybe if it gets boring will prob do smth else but for now its great fun

twin quest
#

I'm glad you have found something you enjoy, I can see how it can be a lot of fun

eternal snow
#

hehe thx

fierce iris
#

- "asm is fun"

eternal snow
#

it is tho

coral lark
#

imo it is until you have to deal with C ABI calls KEKW

#

which on windows is about 90% of the time bleaker_kekw

#

actually probably more like 25% but like
that's far more than it is on linux bleaker_kekw

eternal snow
#

no idea abt windows

#

not using any external libraries at all on this project so ofc its 0% c apis

#

unless you count syscalls which dont relaly

coral lark
#

On windows, syscalls donโ€™t exist
Instead you call WinAPI, which is meant for usage by C code

#
    stack increase, 28h             ; adjust stack ptr
    mov rcx, %1                     ; load %1 into rcx
    call ExitProcess                ; end program``````        stack increase, 28h         ; adjust stack ptr
        mov qword rax, [rel sOut]   ; load sout handle

        ; print to console
        mov r9, 0                   ; no pointer to store the number of characters written
        mov rdx, %1                 ; load string
        mov r8, %2                  ; load str length
        mov rcx, rax                ; move stdout handle to rcx
        call WriteConsoleA
        stack decrease, 28h         ; correct stack ptr (program segfaults on even numbers of prints elsewise)```
#
; increase increases the stack size
; decrease decreases the stack size
%macro stack 2 ; operation, amount
    %ifidn %1, increase
        sub rsp, %2
    %elifidn %1, decrease
        add rsp, %2
    %else
        %error "Invalid operation. Expected 'increase' or 'decrease'."
    %endif
%endmacro
```(stack macro)
eternal snow
#

oh this is horrid

coral lark
#

agreed

eternal snow
#

disgusting calling convention

#

tf

coral lark
#

the stack manipulation stuff is (according to GPT-4o, which... GPT-4o does not know much about NASM on windows so take this with a grain of salt) because of how window's C ABI works

eternal snow
#

thats wacky

#

tf did windows do to make that happen

coral lark
#

bleakekw no idea

#

I have a decent amount of macros to try to make NASM look more like higher level languages
and I'm definitely not done writing those KEKW

eternal snow
#

presumably its allocating stack space for the call

#

but its still a stupid convention

eternal snow
#

even got errors for when u dont use ur macro right

coral lark
#
    compare rax, [rel v0], [rel v1]
    if l, do_false
        PRINT hello, helloLen
    do_false:```this is an if statement for me
eternal snow
#

kinda cursed

coral lark
#

I feel like I should actually update that macro so I can say less or maybe even < instead of l

eternal snow
#

make sure to include the distinction between less-than and below

#

always confusing that one

coral lark
#

below..?

eternal snow
#

always forgetting if below/above or less-than/greater-than is signed

#

signed comparisons

coral lark
#

ohh
I have not dealt with unsigned

eternal snow
#

ah right no problem then

#

it can be useful sometimes to exploit it tho

#

for instance in a specific segment a number has to be between 0 and some other val

coral lark
#

if l, do_false translates to jnl do_false

eternal snow
#

you can just use one compare if you use an unsigned comparison bc that would mean the twos complement negative is intepreted as higher than the higher bound

coral lark
#

does below have a corresponding jump instruction?

eternal snow
#

jb

#

theres many jumps

coral lark
eternal snow
#
jz/jnz
je/jne
ja/jna
jb/jnb
jl/jnl
ja/jna
jp/jnp
js/jns
jmp
#

probably some more too

#

cant remember

coral lark
#

le, ge

eternal snow
#

oh yeah

#

jbe jge jle jae

coral lark
#
; l = less (signed <)
; le = less_equal (signed <=)
; g = greater (signed >)
; ge = greater_equal (signed >=)
; b = below (unsigned <)
; be = below_equal (unsigned <=)
; a = above (unsigned >)
; ae = above_equal (unsigned >=)
```jump operator note updated
eternal snow
#

no idea what jp does

#

thats the only one

#

'jump if parity'

coral lark
#

KEKW
I'mma guess
even vs odd

eternal snow
#

never looked into it bc its probably not needed

#

no idea what parity means ๐Ÿ˜ญ

#

ur probably right

coral lark
#

JP, JPE
Jump if parity
Jump if parity even

JNP, JPO
Jump if not parity
Jump if parity odd

yeah

#

I ONLY figured that out because of discrete mathematics using parity for even/odd

eternal snow
#

jpo jpe thats a new one

coral lark
eternal snow
#

oh forgot

#

jc too

#

jump if carry

coral lark
#

I think JPO is the same as JNP and JPE is the same as JP based on how this page is formatted

eternal snow
#

used that a few times

fierce iris
eternal snow
#

jrcxz

#

rcx is such a wacky reg

fierce iris
#

the only good decision they made was making a microkernel instead of a monolithic kernel

coral lark
#

whoah there's js and jns?

eternal snow
#

yeah

#

jump if signed

coral lark
#

I'm assuming that's based on positive or negative?
I fail to see how it'd be signed vs unsigned int

#

yeah sign value

eternal snow
#

uh

#

yes

coral lark
#

KEKW I kinda wish those were in higher level langs tbh

eternal snow
#

you cant differentiate signed vs unsigned int anyway

#

same thing

#

it just tests the msb

coral lark
eternal snow
#

mhm

fierce iris
#

signedness is a construct invented by higher level langs in order to sell more compilers ๐ŸงŒ

eternal snow
#

real

#

never understood why high level langs do that anyway its not that hard to deal with them being the same

coral lark
#

how do I know the compiler is gonna optimize that properly? KEKW

coral lark
eternal snow
#

it probably will optimise that to use test

coral lark
#

test?

eternal snow
#

test eax, 0x80000000

#

bit test

#

it performs a bitwise and of the two operands and sets status flags

#
test  eax, 0x80000000
jnz  .signed

is quicker than

cmp  eax, eax
js  .signed
#

because cmp performs a subtraction of eax from eax

coral lark
eternal snow
#

not sure what a pull request implies

#

github noob

coral lark
#

KEKW huh

eternal snow
#

just using it to host the files publicly tbh

#

if u want to then go ahead

#

would be happy to see it work on windows lol

coral lark
#

basically; on github
people can "fork" other people's project, which creates a copy of it that links back to the original
they can then modify their fork freely, without affecting the original project
and then from there, they make a pull request, where the owner of the project can merge the changes back into the original project

eternal snow
#

that sounds like it could easily break everything if not done right

#

could try? shouldnt be too hard... all of the syscalls are in 1 file anyway

coral lark
#

I have no idea how to deal with pull requests if the original project is updated after a fork is created
github won't let it merge until the author of the fork figures out how to update their fork to include the latest commits of master, which is not the most straightforward process bleakekw

eternal snow
#

could just keep two versions of the main code - linux ver and windows ver

coral lark
coral lark
eternal snow
#

possibly

#

although with some calls there are situations where some data structs would have to be changed

#

e.g. sys_ioctl

#

that gets terminal size in rows + columns, also changes aroundsome settings

#

not sure if you would be able to do some of that stuff with the windows terminal?

coral lark
#

I'm sorta
new to NASM so I most likely don't know enough to port it yet KEKW
and the course I'm taking is for NASM on linux, so I'm having to figure out windows specific stuff on my own

coral lark
eternal snow
#

ah well ur free to mess around with porting to windows if u like

#

oh cool

coral lark
eternal snow
#

oh the terminal has to disable canonical mode too for inputto work properly

coral lark
#

canonical mode?

eternal snow
#

it means that input is polled as soon as you type a character

#

all the keybinds are read from stdin so

eternal snow
#

oh yeah it is

#

ENABLE_ECHO_INPUT and ENABLE_LINE_INPUT both need to be disabled

coral lark
#
; add
; sub
; mul
; div
; idiv

;bit test
;it performs a bitwise and of the two operands and sets status flags
;test  eax, 0x80000000
;jnz  .signed
;
;is quicker than
;cmp  eax, eax
;js  .signed


; echo input and line input for non-canonical input processing
```the note collection grows
eternal snow
#

very useful instruction, virtually the same thing but instead of having to have your inst as
mul rbx
you can do
imul r9d, dword[addr]

#

except the second operand of imul can be anything, immediates, registers or memory

#

much more handy than needing rax to be one of your operands and not being able to multiply by immediates

coral lark
#

Also
Do you use the gc unused symbols option of gcc?

#

gc sections, thatโ€™s the one

#

I kinda setup my command line to minimize file size
KEKW gets a better file size than OZ while using O3 for the stuff Iโ€™ve written so far```
nasm -f win64 -o test.obj src/test.asm -O3
gcc -m64 -o test.exe test.obj -lkernel32 -nostdlib -O3 -s -fno-ident -Wl,--strip-all -fno-rtti -foptimize-strlen -fstore-merging -ftree-vectorize -fmerge-all-constants -fomit-frame-pointer -flto -Wl,--gc-sections -e main

eternal snow
#

uhhh

#

not using gcc

coral lark
#

Oh?

eternal snow
#

just get an object file with nasm then link with ld

#

that's all for now

coral lark
#

Donโ€™t think thatโ€™s a thing on windows typically KEKW

eternal snow
#

yeah

coral lark
#

Or maybe at all bleakekw

eternal snow
#

it wouldn't be

#

ld is the gnu linker

coral lark
#

Why does linux get all the fancy and functional ASM/C/C++ stuff while windows gets nothing good for low level bleaker_kekw

eternal snow
#

hehe no idea

#

it's a funny thing

coral lark
#

I have literally had better experiences with VS Code, a microsoft product, on linux, than I have had with VS Code for Windows, a microsoft product

#

And not only that โ€” thatโ€™s VS Code on linux in a VM, not even running on actual hardware

fierce iris
#

i thought ld existed on windows

#

if you use mingw

eternal snow
#

oh?

fierce iris
#

i use msys2 (with the mingw64 backend) on windows versions that support it which gives you an entire unix environment (including a package manager using arch's pacman)

coral lark
fierce iris
#

on windows xp, i just use git for windows which comes bundled with bash (and then i manually download mingw and add it to the path)

fierce iris
twin quest
#

if you downloaded git you may have downloaded mingw

#

like the git terminal on windows

#

it's a mingw terminal iirc

#

I use winget and I get git in powershell

eternal snow
#

z buffer yay

#

more efficient from last attempt again because the memory address for the depth buffer isnt recalculated from cartesian coords every pixel

#

got a spare register this time

graceful sage
#

froge_love getting much improvements

eternal snow
#

yess

graceful sage
coral lark
eternal snow
#

you just pass the name of the object file

#

ld file.out -o file

coral lark
#

KEKW eh I think you might not be able to help me with usage, lol

eternal snow
#

oh man

#

works simple on linux but clearly not so much on windows lol

coral lark
#

asking chatgpt moment, because google isn't coming up with many answers

#

there we go

#

libkernel32 is located in a completely freaking arbitrary location with this distrobution of mingw, but ok KEKW

eternal snow
#

awesome

#

the design is very human

coral lark
#

even with just 4 hello worlds and colors and no dead code, it still makes a pretty decent difference KEKW

eternal snow
#

difference to what?

#

equivalent code size in c or something?

#

OH right

#

oops

coral lark
#

this is a nasm program being linked with default ld params, vs the exact same obj file for the exact same nasm program being linked with the param set I have for gcc

eternal snow
#

yeah didnt look at the file sizes lol

#

for the first ss

coral lark
#

-s --gc-sections -e main
adding this to the ld args brings it up to par in size

#

(basically just tells the linker to remove dead symbols/sections I believe)

#

am I safe to assume that I can fork l3d-engine, or would I need to fork l3d2, which is private?

eternal snow
#

oh l3d2 doesnt exist yet hang on

#

will just quickly finish off whats happening here and then upload it

#

just needing to finish commenting some segments

coral lark
#

KEKW very curious as to how many errors I'll get trying to link it, lol

eternal snow
coral lark
coral lark
#

not as many errors as I was expecting...

#

which is alarming, considering not a single one of those is about syscalls bleaker_kekw

coral lark
#

it seems to be unhappy with any simd code that exist in l3d

#

also the extreme lack of rels

eternal snow
#

lol what

coral lark
#

turns out debugging is easier if I compile it to elf64 instead of win64

eternal snow
#

what does rel do

coral lark
#

I have
no idea KEKW

#

is it even valid in a linux nasm program?

eternal snow
#

its probably an assembler directive

coral lark
#
movaps    xmm5, [objbuf+rax]    ;load vertex data for point A

can I like
specify a data type for this?

eternal snow
#

OH right

#

yeah

#

whats next after qword hm

#

try xmmword[objbuf+rax]

#

on linux the assembler willjust infer the type here bc it cant be anything other than 128 bits

coral lark
#

xmmword not defined

eternal snow
#

sob

#

ptr[]?

coral lark
#

ptr is not a nasm keyword [-w+ptr]
Ig add that as a linker arg?

eternal snow
#

uh not sure

#

hang on a sec

#

what are u using to compile it?

#

*assemble

coral lark
#

oh wait no that's the nasm command saying not a nasm keyword

#

C:\Users\User\AppData\Local\bin\NASM\nasm -f elf64 -o l3d.obj l3d.asm -O0 -l l3d.lst -g

eternal snow
#

u need to use win64 if you want to have an executable on windows

#

cant use elf64 apparently

#

unless ur just

coral lark
#

yea I'm aware
but the problem is if I do that, I get friccen no good debug info

#

elf64 gives me the same linker errors but with actual usuable debug info

eternal snow
#

oh are u just trying to assemble it

coral lark
#

I'm trying to work through the linker errors currently

eternal snow
#

so what was saying that movaps xmm5, [objbuf+rax] wasnt right, the linker?

coral lark
#

linker says this

#

(gcc is calling ld behind the scenes)

#

calling ld directly says the same thing

#

nasm gives no warnings/errors

eternal snow
#

oh yeah its not liking that is it

coral lark
#

yeah, it is infact not

#
movaps    xmm0, [rel scratchpad]    ;now xmm0 = {X0 Y0 X1 Y1}

this however, is fine
only happens with objbuf

#
mov    rdx, qword[objbuf+rbx]    ;move point B XY into rdx

this line causes it too, so it's not because of the movaps either

eternal snow
#

its bc of the addr

#

did some reading and its bc the addr here doesnt fit inside 32 bits so its truncated

#

can u send ur linker arg here?

coral lark
#

nasm -f elf64 -o l3d.obj l3d.asm -O0 -l l3d.lst -g

linkers:

gcc -m64 -o l3d-gcc-debug.exe l3d.obj -lkernel32 -nostdlib -Og -g -e _start

ld -o l3d-kd.exe l3d.obj -LC:\MinGW\mingw64\x86_64-w64-mingw32\lib -l:libkernel32.a -s --gc-sections -e _start

#

ok so

#
mov    qword[rel alloc_data.addr+rcx], rax    ;save start addr to new slot

happens here too
I think it might be the pointer math?

eternal snow
#

its possible yes

coral lark
#

yeah seems like it's every time pointer math is being done

eternal snow
#

probably rather

coral lark
#

then again
it works for scratchpad

eternal snow
#

allocated memory has a very large addr so

#

yeah not sure why that is

#

try moving the objbuf definition all the way up to the top of the .data segment

#

in data.asm

#

no idea why this is happening tbh

#

funny that its perfectly okay on linux but breaks the instant you try on windows

coral lark
#

data is the single file that I haven't changed KEKW

#

asside from l3d.asm itself

coral lark
eternal snow
#

oh wait objbuf is next to scratchpad anyway

#

tf

coral lark
#

oh
pointer math with variable offset

#

scratchpad seems to always be used with a constant offset

eternal snow
#

oh yeah it does ur right

#

good catch

coral lark
#
    mov r9, objbuf
    add r9, rax
    movaps    xmm5, [r9]    ;load vertex data for point A```yea doing this doesn't cause a linker error
#

problem is I have no idea if that'll brick something else KEKW

#

yeah you use r9

eternal snow
#

okay

#

if you go through every time a label is used with a register for an offset

#

and change it to just be a register

#

that should work

coral lark
#
    add rax, objbuf
    movaps    xmm5, [rax]    ;load vertex data for point A
    add rbx, objbuf
    movaps    xmm3, [rbx]    ;point B
    add rcx, objbuf
    movaps    xmm4, [rcx]    ;and point C```like that?
eternal snow
#

yeah

coral lark
#

ok well it has to be [rel objbuf] but ok

eternal snow
#

should work

coral lark
#

KEKW now I have to do that acrossed every file that does this
but I'mma head home first, considering I'm currently sitting in school while I don't need to, and my neck hurts

eternal snow
#

fair enough, gl

coral lark
eternal snow
#

ouch

#

unfortunately pretty clueless for anything that isnt linux asm so this is all new stuff

coral lark
#

I think Iโ€™m using an llvm based gcc
So it might be that

eternal snow
#

from what the internet seems to think that wouldnt make a difference and this is just a basic difference in how win64 works from elf64

#

@fierce iris any ideas?

fierce iris
eternal snow
fierce iris
#

read through this and all i can say is average windows tomfoolery

eternal snow
#

alright

#

thought u might know smth bc u seem to dabble in low end windows

coral lark
#

It is comical to me how much better low level linux is than low level windows

fierce iris
#

i do linux but not even low level
the only time i used asm was when i made some glue code for a crappy OS i made a long while ago

#

the lowest thing i actually use is C

#

well, i really only do C lol
i haven't found a need for any other systems/general-purpose lang

coral lark
#

but I'm starting to get link errors on l3d finally KEKW

#

ok l3d is now the only file giving link errors

#

I now have a non functional l3d exe KEKW

#

wait wut
but I assembled this for an elf, how is it running when I run it with gdb

eternal snow
#

no idea

#

mystery

coral lark
#
    push rdx
    mov rdx, rcx
    lea rcx, [rel alloc_data.addr]
    add rcx, rdx
    pop rdx
    mov    [rcx], rax    ;save start addr to new slot```turns out I have to do it this way it seems ![KEKW](https://cdn.discordapp.com/emojis/666849321462792234.webp?size=128 "KEKW")
eternal snow
#

ohh yeah ofx

#

ofc

#

lea doesnt work properly

#

wtf windows????

#

just denied lea of its effective purpose

coral lark
#

lea works properly

#

er wait

eternal snow
#

yea but u had to change it yes?

coral lark
#

KEKW idk what lea exactly does

    mov    [alloc_data.addr + rcx], rax    ;save start addr to new slot
```this is in place of this, which windows throws a fit over
eternal snow
#

ohh right

coral lark
#

I...
think I am gonna macro that, because that's messy

eternal snow
#

its extremely useful but clearly on windows this just wouldnt work

#

because of the mixing of registers and immediates

coral lark
#

KEKW yeah

eternal snow
#

same function just quicker

#

thought that lea instruction was in the original lol

#

but lea just being next to useless on windows is fucked

coral lark
#

    mov    ecx, dword[alloc_data.pointer]    ;get pointer for new alloc here
    mov    qword[alloc_data.addr+rcx], rax    ;save start addr to new slot
    add    dword[alloc_data.pointer], 12    ;then increase the pointer```original code is just this
eternal snow
#

what

#

ok whatever

coral lark
#

I'm doing a set of instructions equivalent to
add rcx, addr but with lea instead of add, so I have to do a roundabout thing
also I just thought of something that probably won't work

eternal snow
#

absolutely amazed at the way asm on windows works this is horrible

coral lark
#

dword[rbx]+rbx does infact not work

eternal snow
#

yeah that doesnt

#

you cant add to memory values like that at all

coral lark
eternal snow
#

have to load them first

coral lark
#
    lea_offset rbx, [rel alloc_data.addr]
    mov    dword[rbx], eax    ;move the length allocated here```does not like this one though
eternal snow
#

cannot see anything wrong with this

#

whats bugging there

coral lark
#

segfault

eternal snow
#

tf

#

rbx is the problem

#

try inspecting rbx value after lea

#

if its able to load the addr alloc_data.addr into rbx it shouldnt segfault after

coral lark
#

<- complete noob at gdb (has no idea how to inspect stuff)

eternal snow
#
    lea_offset rbx, [rel alloc_data.addr]
.b:
    mov    dword[rbx], eax    ;move the length allocated here

add a .b after the lea value, then launch gdb and type b <whatever the parent label for .b is>.b
then type r to run the program, it will stop at that .b breakpoint
then type i r
which will show all registers

coral lark
#

also is there a more efficient way to start gdb?
kinda annoying going
C:\users\user\downloads\gdb.exe (don't ask)

exec file
file file
run```every time
eternal snow
#

if you pass the executable as an argument like gdb l3d.exe

#

not sure if that would work like that on windows? hopefully it does

coral lark
#

yea that works

eternal snow
#

whar

#

can u set the breakpoint to _alloc?

coral lark
#

_alloc.b

eternal snow
#

no like

#

can u set it to just _alloc

#

b _alloc

coral lark
#

yea that also functions

#

KEKW wait wha
ok well _alloc.b is working now

eternal snow
#

random but okay

coral lark
#

oh ok
so it's segfaulting somewhere else

eternal snow
#

just try moving the .b around until you get to an instruction that segfaults

coral lark
#

I'd imagine ecx should not be 0?

twin quest
#

can you run a debugger and it catches where the segfault happens?

coral lark
#

it catches the segfault but it doesn't understand what's going on with the stack where it segfaults

twin quest
#

ok but it shows you where though right, which instruction?

coral lark
#

no

#

not to my knowledge at least

coral lark
#

but with 100 instead of 40bf3

#

which actually might be instruction number

#

if that's the case, it's in load_ltx

#

in a spot that doesn't really make sense

eternal snow
#

forgot abt that

#

if u run it it will catch on a specific point and tell u where it is

coral lark
eternal snow
#

ah

#

stack error

#

forgot to pop something

coral lark
#

ok but where KEKW

eternal snow
#

have u inserted and pushes/pops?

coral lark
#

1 push/pop pair and it's not the problem

eternal snow
#

hmm

coral lark
#

where does the program go to after _alloc? KEKW

eternal snow
#

would just insert random breakpoints further and further into the program until you just happen to go past it

coral lark
#

what even calls _alloc?

eternal snow
#

l3d.asm

#

_start

#

allocates memory for the framebuffer and depth buffer

#

if u havent translated the syscall properly its not gonna work and will cause a segfault

#

because it gets the return addr of the allocated data in rax after the syscall

#

which if its then used to address stuff will cause problems

coral lark
#

yea I translated that syscall

#

it seems to be segfaulting on the ret

eternal snow
#

send the full code for _alloc

twin quest
#

I would be really surprised if gdb couldn't catch a segfault

eternal snow
#

it cant when theres stack issues

#

but those are easy to debug

#

if its segfaulting at ret thats almost definitely stack stuff

coral lark
#

I just noticed a stray pop rax

#

nvm not stray

#

lea_offset does a push and pop on r9 to use as a temp var

eternal snow
#

set a breakpoint at the start and end of _alloc

#

then show values for rsp register at both

coral lark
eternal snow
#

rbp?

#

not sure why that would change but you never know

coral lark
#

0 at end

eternal snow
#

not sure whats happening here lol

#

incomprehensible

#

segfault at ret is wacky af

#

try pushing rsp and rbp then popping them back at the end

coral lark
#
sub rsp, 32                        ; Allocate 32 bytes of shadow space
xor rcx, rcx
mov rdx, [rel alloc_data.available]
mov    r8,  0x3000                   ; Set flAllocationType = MEM_COMMIT | MEM_RESERVE (r8)
mov    r9,  0x40                     ; Set flProtect = PAGE_READWRITE (r9)
.f:
call   VirtualAlloc
add rsp, 32                        ; Allocate 32 bytes of shadow space```solution
#

๐Ÿ‘ thanks microsoft

eternal snow
#

ohh

#

did u forget to allocate stack space for the call

coral lark
#

KEKW yes

eternal snow
#

hehe yeah that might have done it

#

does alloc work now?

coral lark
#

yea (as far as I can tell)
gets through it and to the next method

eternal snow
#

thats good

coral lark
#
    push rcx
    lea_offset rcx, [rel alloc_data.addr]
    mov    qword[rcx], rax    ;save start addr to new slot
    pop rcx```for some reason, trying to use push and pop for rcx results in rcx being 0
eternal snow
#

if _init_screen works too then _alloc definitely works too

#

rcx is 0 after the pop or after the push?

coral lark
coral lark
eternal snow
#

thats weird

#

if it wasnt 0 at first at least

coral lark
#

it was infact not 0 at first

eternal snow
#

rcx is a caller saved register so that might have something to do with it??

#

but pushing isnt a call so

#

uh

#

weird stuff

#

maybe avoid using rcx for your lea if thats happening

#

smth like r8 which isnt used so often

coral lark
#

chatgpt thinks the reason is because I modify rcx after pushing
wouldn't that kinda defeat the entire purpose of push and pop if so? KEKW

eternal snow
#

yeah

#

chatgpt is wrong there

#

what does the lea_offset macro look like?

#

is it just an alias for lea?

coral lark
#
%macro lea_offset 2
    push r9

    mov r9, %1
    lea %1, %2
    add %1, r9

    pop r9
%endmacro
#

leas %2 to %1, then offsets by %1

eternal snow
#

will test this on linux

#

yeah no its saved

#

much confusion

#

maybe a windows thing?

coral lark
#

why is not lea rdi, rdi valid?

eternal snow
#

lea rdi, [rdi]

#

second operand has to be in brackets

coral lark
#

ohh

eternal snow
#

this also does nothing

#

same as mov rdi, rdi

#

which is in essence the same as nop

coral lark
#

5

eternal snow
#

and yet it doesnt work on this situation

coral lark
#
    lea %1, %2
    add %1, r9``````
    mov %1, %2
    add %1, r9```
lea doesn't cause a segfault but mov does in that lea_offset macro I defined
#

wait...

#

rcx is 0, even though rax isn't, and I assign rax's value to rcx

#

er
no I do the inverse but why is rax not 0

#

oh

eternal snow
#

rax shouldnt be zero it should be the addr of the data allocated?

coral lark
#
mov rcx, rax

KEKW ok this should've been rcx, rax but I had it as rax, rcx

#

got further!

eternal snow
#

oop

#

rdi likely being incorrect addr

eternal snow
coral lark
#

does not work

eternal snow
#

๐Ÿ˜ญ

coral lark
#

seems like it's happening on the final iteration step

#

nope

eternal snow
#

oh so its looping correctly until the last?

coral lark
#

nope

eternal snow
#

huh

#

its prob bc [rdi+8] then

#

if windows doesnt like those addresses

#

if its not then check rdis value

coral lark
#

iteration 157 is when it errors
rdi is 6295541

eternal snow
#

sounds alright

#

if rdi is behaving itself for 157 iterations there shouldnt be a reason for it to break unless either

#

_alloc didnt allocate enough data or:

coral lark
#

rdi starts off as 6291456

eternal snow
#

its the wrong addr

#

not sure

#

that shouldnt be right then

coral lark
#

actually those numbers came from two separate runs

eternal snow
#

ah that figures

#

bc the difference there is 4085

#

difference should be a multiple of UNIT_SIZE, so 26

coral lark
#

6291456
6295541
guess it's consistent

eternal snow
#

thats still a weird value

#

it should definitely not be an odd difference there

coral lark
#

HEADER_LEN acctually accounts for the odd diff

eternal snow
#

ohh u mean like

#

from the very start

coral lark
#

yeah