#L3D2 - x86-64 assembly toy software renderer

2331 messages · Page 3 of 3 (latest)

eternal snow
#

thought u meant from the first iteration

coral lark
#

157, 157
rdi is offset by the right amount

eternal snow
#

yeah cool

#

would assume that the addr in rdi is incorrect or not enough space was allocated then

coral lark
#

ok it's not have enough data

#
idiv    rbx    ;the actual divide on rax is here

shouldn't this be rax (_alloc)

coral lark
#

manually writing in a size makes it get past

#

now why that doesn't make it explode while allocing?
I would love to know that answer KEKW

coral lark
#
mov    rdx, term_size    ;return to this truct

truct

eternal snow
#

oop typo

#

struct

coral lark
#

yea I know what you meant

coral lark
# eternal snow ?
    push    rax    ;save requested data size
    xor    rdx, rdx    ;reset this, it screws divs
    mov    rbx, 4096    ;divide rax by this page size
    idiv    rbx    ;the actual divide on rax is here
    inc    rax    ;increase so it doesnt allocate 0 bytes
    imul    rax, 4096    ;multiply result by 4096 to get amount to allocate
    mov    dword[rel alloc_data.available], eax    ;then save this amount here
```you just sorta
randomly divide rbx for no apparent reason and do nothing with the result and say you're dividing rax
eternal snow
#

idiv rbx divides rax by rbx

coral lark
#

ah

eternal snow
#

its weird syntax but thats how it is

#

the mul instruction works the same

coral lark
#

file IO is "next", though I kinda wnana get terminal size first

eternal snow
#

that would be cool

coral lark
#

question
where do you define your structs

#

oh data

eternal snow
#

yea

coral lark
#

byte, word, dword, qword
I'm guessing word is equivalent to short?

eternal snow
#

word is 2 bytes

#

no idea what a short is

coral lark
#

short is a 2 byte int in higher level langs

eternal snow
#

yeah then

#

except u can store a float in a word too

#

word is just 2 bytes

coral lark
eternal snow
#

sade

#

they are quite useful sometimes

coral lark
#
    .CONSOLE_SCREEN_BUFFER_INFO
        .x dw 0
        .y dw 0
        .cx dw 0
        .cy dw 0
        .attr dw 0

        .left dw 0
        .top dw 0
        .right dw 0
        .bottom dw 0

        .mwx dw 0
        .mwy dw 0```ok that should make windows happy I think
#

now to actually do the syscall

#

I mean...
I kinda don't know if that's working or not, giving I'm running it in IJ and I don't know if that gets reported with IJ

#

it is not working frOK

#

close enough..?
wait no
yes
idk why it's 1 but at least it's not 0
wait no KEKW

eternal snow
#

1 does not seem good

coral lark
#

1 is the handle for the terminal

coral lark
#

mov and lea do infact behave differently on windows

#

plunger_bunger
was using rax where I should've been using rcx
... and my notes agree with it being rcx, great... how did I get stuck on rax for so long KEKW

coral lark
#

got a file to create

#

KEKW unsure why it got named that

#

I mean text encoding stuff obviously, but like

#

I'm not quite sure how to fix that KEKW

#

it also bricks future runs once the file is created because Windows™️ doesn't seem to have an option that properly opens a file and doesn't replace but creates it if doesn't exist (correction, it does: 4)

file's name is "Eliminate the slender" in Chinese, according to google translate KEKW

coral lark
#

ok so I need to convert lpcstr to lpcwstr
... the C++ code for that looks intimidating bleaker_kekw

... alternatively, I use CreateFileA, which is intimidating because it has more than 4 parameters
but is a lot simpler regardless (see also; it now opens the tetrapod.l3d and does absolutely nothing with it and then errors on a later piece of code frOK)

fierce iris
#

common windows api tomfoolery
all they had to do was allow the A funcs to use utf-8 but nah

coral lark
#

The A function works as intended for this

eternal snow
#

they dont use utf8??

eternal snow
#

converting other textures to new format

eternal snow
#

also got fps caps and input polling working

coral lark
#

CreateFile2 calls CreateFileW

#

L3d’s strings are what CreateFileA expects, but I was using CreateFileW

eternal snow
#

huh

twin quest
eternal snow
#

trhx

coral lark
#

time to try to read a file KEKW

coral lark
#

dword[rel file.size+4]
this would end up pointing to handler_int's data, no?

eternal snow
#

ohhh yeah should explain the file format

coral lark
#

oh the file format isn't the thing giving me problems

eternal snow
#

the first 8 bytes of the file indicate the size of the rest of the file and the unpacked file size

coral lark
#

it's the fact that file.size+4 seems like it should be pointing to garbage data that's throwing me off

eternal snow
#

file.size+4 holds the unpacked file size in bytes after reading the first 8

coral lark
#

KEKW ok so that's not true with windows

#
    sub rsp, 32                          ; Allocate 32 bytes of shadow space
    lea rcx, %1
    xor rdx, rdx
    call GetFileSize
    add rsp, 32                          ; Allocate 32 bytes of shadow space```with windows, there's a function specifically for getting file size
from there, I need to create an allocation
eternal snow
#

ah right

#

can you not just read the first 8 bytes?

coral lark
#

requires an allocation

#

problem is, _alloc expects an allocation structure, which... is not what comes after file.size, but that's what your code is loading into rax for _alloc

eternal snow
#

yeah

coral lark
#

so alloc is just getting null

eternal snow
#

_alloc doesnt expect a struct

coral lark
#

oh wait

eternal snow
#

the amount of data in rax is allocated

#

if rax was 50 and _alloc was called then it would allocate 50 bytes of data and return the address in rax

coral lark
#

ok so I need to remove the +4 then

#

and I need to fix _alloc KEKW

eternal snow
#

the +4 is so it loads the unpacked data into rax

#

rather than the packed data size

#

because the file format is compressed

coral lark
#

yea that's not a problem on windows
GetFileSize returns the actual file size into rax

eternal snow
#

yes but you need the unpacked size

#

for the allocation

coral lark
#

I mean

eternal snow
#

because that value is different to the file size

coral lark
#

if I unpack it I get 0

#

if I don't unpack, I get the file size

eternal snow
#

?

coral lark
#

result of GetFileSize 1426 in rax
file size is 1426
if I load [rax], I get 1426
if I load [rax+4] I get 0

eternal snow
#

well yeah thats not what

#

the file size is written into the first 8 bytes of the file

#

bytes 0-4 = filesize - 8
bytes 4-8 = size of data to allocate

#

then the rest of the file is object data

coral lark
#

ah
well I kinda don't have the ability to read the file without already knowing the file size though

eternal snow
#

can u not read the first 8 bytes only on windows

coral lark
#

well I need the buffer to read the thing into in order to do that

#

and to get create the buffer, I need the size

eternal snow
#

file.size is the buffer

#

8 byte buffer to store that data

coral lark
#

ah

coral lark
#
    mov    eax, dword[rel file.size+4]    ;load the unpacked data into rax
    call    _alloc    ;then allocate that amount

    ; completely skips from here
    mov r15, rax
    READ_FILE [rel r14], [rel r15], [rel file.size]

    lea r15, [r15]

    ;----------------------------------------
    ;GO TO END OF FILE DATA AND INSERT TERMINATORS
    mov    edi, dword[rel file.size+4]    ;load unpacked size here into destination
    mov    esi, dword[rel file.size]    ;and packed size here for source
    sub    edi, 8    ;subtract 8 here, go to last face position
    sub    esi, 6    ;subtract 6 here (3x word) bc no terminating word
    ; to here
    mov    word[r15+rdi+6], 65535    ;insert terminating word into end of data
    cmp    word[r15+rsi+4], 65535    ;now check if no face data```
![wut](https://cdn.discordapp.com/emojis/797143890275860511.webp?size=128 "wut")
#

no it doesn't, step just jumps there
why

#

[out, optional] lpNumberOfBytesRead

A pointer to the variable that receives the number of bytes read when using a synchronous hFile parameter. ReadFile sets this value to zero before doing any work or error checking. Use NULL for this parameter if this is an asynchronous operation to avoid potentially erroneous results.

This parameter can be NULL only when the lpOverlapped parameter is not NULL.

Windows 7: This parameter can not be NULL.

For more information, see the Remarks section.
sobbing ah

coral lark
#
    push rdx
    mov    edx, dword[rel alloc_data.pointer]

    push rcx
    lea_offset rcx, [rel alloc_data.addr]
    add rcx, 8
    mov dword[rcx+8], eax    ;move length into current iten
    sub rcx, 8
    sub    dword[rel alloc_data.available], eax
    mov    rax, qword[rel alloc_data.current]    ;this line is being a problem
;    mov    qword[rcx], rax    ;and save to addr items
;    add    dword[rel alloc_data.pointer], 12
    pop rcx
    pop rdx

    ret    ```
#

I don't know what to do with this (lack) of information

#

or... apparently it's getting past that and gdb is dumb and stops stepping when it closes the method and doesn't even step to ret like it used to
why

ok well it does get through the entirety of _load_l3d
but now I have no idea where it segfaults bleaker_kekw
answer: right as _load_l3d ends

coral lark
#

apparently this segfault is coming from exit file
which is all the way at the top of the method
and not happening until the end of the method
great
thanks windows

eternal snow
coral lark
#
mov qword [rsp+32   ], 4              ; creation_disposition (always open; creates and open if not exist, elsewise open)
mov qword [rsp+32+ 8], 0              ; don't care
mov qword [rsp+32+16], 0              ; don't care```it's actually coming from these three lines
#
sub rsp, 24                          ; Allocate 32 bytes of shadow space
mov qword [rsp+32   ], 4              ; creation_disposition (always open; creates and open if not exist, elsewise open)
mov qword [rsp+32+ 8], 0              ; don't care
mov qword [rsp+32+16], 0              ; don't care
;push 0
;push 0
;push 4
    call CreateFileA
add rsp, 24                          ; Allocate 32 bytes of shadow space```solution
#

gross but ok

#
mov    r15, rax    ;save addr to r15
xor    rax, rax    ;then sys_read again
mov    rdi, r14    ;read from open file
mov    rsi, r15    ;read data into the allocated data
mov    edx, dword[rel file.size]    ;use filesize as length

; huh?
sub rdx, 4 ;but expand it to be 2 bytes per pixel
shl rdx, 1 ;because old file used to be 1 per pixel
add rdx, 4 ;but then transparency bytes added
syscall

#

o

eternal snow
#

only reason this math has to be done is to maintain some compatability with older textures

#

could just redo the textures completely and convert but its not too much an issue

coral lark
#

oh interesting

#

close handle is the last thing I expected to throw an error KEKW
how did this occur

coral lark
#

KEKW ah wait
gdb is stepping from alloc in ltx to alloc in luv
why is gdb like this
ok file IO is done!

wait
how do you properly end gdb while the program is running if the program can't be stopped thinky
because that's the scenario I'm in rn

coral lark
#

perfection

coral lark
#

I am completely unsure of how to debug this KEKW

eternal snow
#

yeah

eternal snow
#

well done

coral lark
#

KEKW I actually don’t know if windows terminal supports enough color coding complexity to support L3D

eternal snow
#

:v

#

hopefully it does

coral lark
#

Even if it does — I have genuinely no idea where to begin to debug this

eternal snow
#

check pressing f1 to toggle wireframe mode

#

also check the return values of the file reads to check they are making sense

eternal snow
#

not been doing much on the proj tbh

#

working on scene files working again but its slow progress

graceful sage
#

froge_sad setting up scenes is the most boring part

eternal snow
#

mhm

eternal snow
#

got the basis of lsc loading working

eternal snow
#

got it more working

#

can load multiple objects now

#

next step is making them able to move

graceful sage
eternal snow
#

moving stuff around and changing sky colour

graceful sage
#

frogapprove awesome

eternal snow
#

object duping

#

only loads the file once but duplicates them in memory for seperate objects

#

decided to run it in uxterm for shits and giggles and the program rendered one frame then crashed lol

eternal snow
#

2k lines currently

eternal snow
#

wrote a faster quaternion handling thing

eternal snow
#

will be doing mainloop stuff soon just getting initialising stuff working

#

this is what it looks like currently

eternal snow
#

obs was eating the cpu alive here so the fps is shit

twin quest
eternal snow
#

one of the maddening things about avx is show so many xmm instructions dont have a ymm equivalent

#

theres a horrible lack of 64bit packed int instructions

#

and its causing some bugs

coral lark
#

I’m not used to you not being a moth btw

eternal snow
#

moth?

coral lark
#

Was a moth not your previous pfp?

eternal snow
#

oh no that was a flower from an album cover

coral lark
eternal snow
coral lark
eternal snow
#

oops

eternal snow
#

kind of fixed this bug with barycentrics being too high and overflowing

#

still happens sometimes but not as often

#

solution is to use smaller objects

#

bc that big plane is just 2 tris

eternal snow
#

oh

#

no idea how this never occured but it would simply be a better idea to do barycentric calculations in ndc space rather than screen

#

avoids high vals

#

only just thought of this wtf

graceful sage
#

kekkedsadge brain works in mysterious ways

twin quest
#

hrm

#

I guess that's a thing you can do if you're doing software rasterization

#

why do you have high barycentric coords

eternal snow
#

the coords themself arent high, but the calculations

eternal snow
#

no random triangle disappearing

#

screen space coords are now converted to the range 0-1 to make it work better

#

actually range 0-2 for better acc

#

with a simd set thats [1/(sw/2), 1/(sh/2), 0, 0]

#

can then multiply the screen space coordinates by that to convert them to 0-2 range

#

however it turns out there was another problem in that faces would get clipped sometimes because they where falsely detected to be outside screen space

#

the xmm registers for storing the top left and bottom right data where in the format

#

xmm0=[min X, min Y, x, x]
xmm1=[max X, max Y, x, x]

#

where x was junk data

#

sometimes this junk data would exceed the bounds of the screen and the object was falsely clipped

#

just solved by changing the result from the comparison

#

ecx contained the results of the simd comparison after using movmskps

#

bottom 4 bits determined which statements where true

#

so the solution to disregard the junk comparisons was

#
and  cl, ~0b00001100
#

which clears bits 2 and 3

#

smaller font size = higher res

#

doesnt dissapear on higher res either because of the use of a 0-2 range

#

originally what would happen is that u would get a face which took up a large portion of the screen, so the vector calculated would have a massive value in it

#

say 100

#

and then through a series of other multiplications this value would overflow

#

problem solve

#

should change the thumbnail for this project thread

#

gotta make a nice logo

#

pretty good performance

#

memory is probably more like 1mb because this isnt counting the memory allocated in the data segment

#

and ofc bc of old cpu this is entirely avx compatible

#

nothing uses avx2, avx512, etc

#

simply bc processor doesnt support it

#

probably a good thing it means the code is more portable

#

time to make lunch now

#

bought some nice sandwich fillers from the shop yesterday so can get a reward for working oin this

eternal snow
#

nvm forgot to buy bread

eternal snow
#

L3D2 - x86-64 assembly toy software renderer

#

updated to be a little better

#

nearly finished commenting everything so its readable again also

graceful sage
#

pogefroge noice much updates

twin quest
#

this is like one of my favorite community project threads, really amazing stuff

eternal snow
#

thankies

eternal snow
#

most recent ver pushed to git

#

finished comments

#

gotta get shadows working next

#

which involves fixing the colour mixing

#

used to use like

#

5 divisions and 3 multiplies per pixel

#

horrid

twin quest
#

what's your plan for shadows?

#

I totally would show this github repo to every engineer I know if you put screenshots in there

eternal snow
#

oky

#

can add a few things

#

for shadows will be using the same shadow mapping technique

#

but will fix the colour mixing to not be shit

#

the reason is because it converts an ansi colour code from bdc to int to rgb then mixes with ANOTHER bcd which needs to be converted then converts rgb back to ansi back to string

#

surely there a better way

#

the way the ansi colours are organised is in a cube so it should be alright to make an alg that operates exclusively on the bcd strings

#

THIS is the reason that the code is being rewritten

#

amongst others

#

tada added some images

#

can probably get away with hardcoding shadows to an extent

#

so that there doesnt have to be a memory access for getting the colour to blend, and can just follow a set structure to darken a pixel

#

rather than blending it with other colours which would only happen if ur using a translucent texture

twin quest
#

do you plan on just having a single directional light?

eternal snow
#

just had the sun originally

#

debating over whether adding point lights would be a good idea

#

could add support for them

#

having more than 2 render passes is scary

twin quest
#

in my opengl renderer I do 6 passes just for shadow maps for a single point light

#

one for each cardinal axis direction

eternal snow
#

mm

#

worried about poor little cpu tho

coral lark
twin quest
#

can you use more than one cpu

coral lark
#

I don’t think l3d is threaded yet

#

… hm
On the CPU, would it be faster to rasterize the scene, then raytrace to the point lights from the 3d coordinates of the rasterized texels?

eternal snow
#

no idea

#

not a clue how ray tracing works

#

could have a read into it if you think it could be worth it

coral lark
# eternal snow not a clue how ray tracing works

Well basically
You send out a bunch of rays from the camera
You do a bunch of ray -> polygon intersections to find where these rays hit
If they hit something that is a light source, then the light source contributes to the light value

#

You then bounce these rays and go again N times

eternal snow
#

so like how a mandelbrot renderer works kinda

coral lark
#

I know nothing about mandelbrot rendering KEKW

eternal snow
#

it was a loose comparison

#

but not really sure like

#

how expensive this would be in order to get a good shadow

coral lark
#

For this it’d be just a single step
Polygon gets rasterized->send out rays towards nearby light sources
if it does not hit a polygon, the tbh light does not contribute
Elsewise it does

… but thinking about it, it’d… still be several polygon intersection checks per pixel, so probably pretty expensive

eternal snow
#

well

#

shadow mapping involves multiple coordinate space changes per pixel so

#

maybe?

coral lark
#

Several coordinate space changes per pixel is probably still faster than potentially hundreds of triangle intersection checks per pixel per light KEKW

eternal snow
#

o yes

#

didn't realise that several implied ~100

#

was thinking about 7 lol

coral lark
#

It depends on how many triangles you have, if you have like bvhs setup, if you do some funky stuff to avoid some calculations, etc

#

But even with all that, it’s… probably still a lot, especially for a CPU

eternal snow
#

yeah

#

doing shadow mapping just bc it's the fastest approach probably

#

shadows are expensive...

eternal snow
coral lark
#

Pretty sure

#

GPUs are fairly inconsistent KEKW

eternal snow
#

😞

coral lark
#

Nvidia has a software rt implementation as well

coral lark
eternal snow
#

that's very specific woah

#

may get into gpu stuff eventually, maybe learn ogl or something

#

gotta master the cpu first tho

coral lark
#

Some GPUs do things in software
Some GPUs do things the same things in hardware
Other GPUs might just not support it at all

eternal snow
#

in software as in... hard coded software?

coral lark
#

Software as in GPU kernals/shaders and/or driver features

eternal snow
#

that's just builtin?

#

oh right

#

thought u meant like a massive hard coded macro

#

that would be cursed

coral lark
#

I actually don’t know much about the drivers/gpu itself, a lot of what I say about those is mostly memory from what others say and just my own experience
So of course, I may be wrong on stuff KEKW

#

My laptop has these weird "Dozen" drivers for vulkan, which report that they’re shipped by microsoft
I can use any of my GPUs with it, but all of the GPUs (including the CPU iirc) have the same feature set if I use the dozen drivers
And this feature set is not great; doesn’t even support NEAREST neighbor interpolation bleaker_kekw

even basic features aren’t necessarily safe to assume will be supported bleaker_kekw (though for anything worth supporting, they most likely will be)

eternal snow
#

no nearest neighbour
what???

coral lark
#

Yeah
It supported LINEAR but not NEAREST bleaker_kekw

eternal snow
#

wow

coral lark
#

So having had taken a majority of a course in nasm
yeah I really don’t understand most of the errors in this project for compiling to windows
they should not be problems

#

Also, just looked it up
Windows does not have anywhere near enough colors in terminal
though there is the option of making a custom console program to emulate vim’s terminal a bit closer

eternal snow
#

oh lol

#

doomed project from the start

#

been realllly meaning to get bck on this but just cant :((

#

no motivation

twin quest
#

not wanting to write assembly is entirely rational

graceful sage
#

thistho yup

eternal snow
#

got no motivation to write anything in any language

#

assembly is fun! and awesome! but just cant be bothered to write code rn

twin quest
#

I don't write code because it's fun, I write it because that's what I was put on this earth to do

#

apparently

#

as I can't seem to do anything else

#

I've tried

#

point is, time to get back to work!

#

jk, your project is really cool, maybe it's done and you need something new

eternal snow
#

it is far from done, would love to have it finished and can think of so many cool things to make with it once that time comes

#

the problem is partially for writing nice clean code

#

which can be annoying

eternal snow
#

its the weekend, and now really bored, might start redoing the editor