#L3D2 - x86-64 assembly toy software renderer
2331 messages · Page 3 of 3 (latest)
157, 157
rdi is offset by the right amount
yeah cool
would assume that the addr in rdi is incorrect or not enough space was allocated then
ok it's not have enough data
nvm
idiv rbx ;the actual divide on rax is here
shouldn't this be rax (_alloc)
actually it seems like this one is because I haven't translated the console IO stuff
manually writing in a size makes it get past
now why that doesn't make it explode while allocing?
I would love to know that answer 
?
mov rdx, term_size ;return to this truct
truct
push rax ;save requested data size
xor rdx, rdx ;reset this, it screws divs
mov rbx, 4096 ;divide rax by this page size
idiv rbx ;the actual divide on rax is here
inc rax ;increase so it doesnt allocate 0 bytes
imul rax, 4096 ;multiply result by 4096 to get amount to allocate
mov dword[rel alloc_data.available], eax ;then save this amount here
```you just sorta
randomly divide rbx for no apparent reason and do nothing with the result and say you're dividing rax
idiv rbx divides rax by rbx
ah
file IO is "next", though I kinda wnana get terminal size first
that would be cool
yea
byte, word, dword, qword
I'm guessing word is equivalent to short?
short is a 2 byte int in higher level langs
that would be called a half in higher level languages
higher level languages also just about never support halfs 
.CONSOLE_SCREEN_BUFFER_INFO
.x dw 0
.y dw 0
.cx dw 0
.cy dw 0
.attr dw 0
.left dw 0
.top dw 0
.right dw 0
.bottom dw 0
.mwx dw 0
.mwy dw 0```ok that should make windows happy I think
now to actually do the syscall
I mean...
I kinda don't know if that's working or not, giving I'm running it in IJ and I don't know if that gets reported with IJ
it is not working 
close enough..?
wait no
yes
idk why it's 1 but at least it's not 0
wait no 
1 does not seem good
1 is the handle for the terminal
mov and lea do infact behave differently on windows

was using rax where I should've been using rcx
... and my notes agree with it being rcx, great... how did I get stuck on rax for so long 
ohh file IO is gonna be a big oof
https://github.com/GiantLuigi4/L3D2/tree/win64/Source
if you wanna keep track of what I'm doing or something
https://github.com/GiantLuigi4/L3D2/blob/ba8454b6e9b9082190231f2f87efd30c05857e81/Source/win_helper.asm#L15-L37
I'm confused why this breaks stuff 
got a file to create
unsure why it got named that
I mean text encoding stuff obviously, but like
I'm not quite sure how to fix that 
it also bricks future runs once the file is created because Windows™️ doesn't seem to have an option that properly opens a file and doesn't replace but creates it if doesn't exist (correction, it does: 4)
file's name is "Eliminate the slender" in Chinese, according to google translate 
ok so I need to convert lpcstr to lpcwstr
... the C++ code for that looks intimidating 
... alternatively, I use CreateFileA, which is intimidating because it has more than 4 parameters
but is a lot simpler regardless (see also; it now opens the tetrapod.l3d and does absolutely nothing with it and then errors on a later piece of code
)
common windows api tomfoolery
all they had to do was allow the A funcs to use utf-8 but nah
The A function works as intended for this
they dont use utf8??
also got fps caps and input polling working
CreateFileW expects the string as a LPCWSTR
CreateFileA expects the string as a LPCSTR
CreateFile2 calls CreateFileW
L3d’s strings are what CreateFileA expects, but I was using CreateFileW
huh
looks super!
trhx
time to try to read a file 
https://github.com/L226n/L3D2/blob/0a25e15ac854faa42caeeef8445be883f96217e3/Source/file.asm#L3-L23
@eternal snow I fail to see how this is meant to work?
dword[rel file.size+4]
this would end up pointing to handler_int's data, no?
ohhh yeah should explain the file format
oh the file format isn't the thing giving me problems
the first 8 bytes of the file indicate the size of the rest of the file and the unpacked file size
it's the fact that file.size+4 seems like it should be pointing to garbage data that's throwing me off
file.size+4 holds the unpacked file size in bytes after reading the first 8
ok so that's not true with windows
sub rsp, 32 ; Allocate 32 bytes of shadow space
lea rcx, %1
xor rdx, rdx
call GetFileSize
add rsp, 32 ; Allocate 32 bytes of shadow space```with windows, there's a function specifically for getting file size
from there, I need to create an allocation
requires an allocation
problem is, _alloc expects an allocation structure, which... is not what comes after file.size, but that's what your code is loading into rax for _alloc
yeah
so alloc is just getting null
_alloc doesnt expect a struct
oh wait
the amount of data in rax is allocated
if rax was 50 and _alloc was called then it would allocate 50 bytes of data and return the address in rax
the +4 is so it loads the unpacked data into rax
rather than the packed data size
because the file format is compressed
yea that's not a problem on windows
GetFileSize returns the actual file size into rax
I mean
because that value is different to the file size
?
result of GetFileSize 1426 in rax
file size is 1426
if I load [rax], I get 1426
if I load [rax+4] I get 0
well yeah thats not what
the file size is written into the first 8 bytes of the file
bytes 0-4 = filesize - 8
bytes 4-8 = size of data to allocate
then the rest of the file is object data
ah
well I kinda don't have the ability to read the file without already knowing the file size though
can u not read the first 8 bytes only on windows
well I need the buffer to read the thing into in order to do that
and to get create the buffer, I need the size
ah
mov eax, dword[rel file.size+4] ;load the unpacked data into rax
call _alloc ;then allocate that amount
; completely skips from here
mov r15, rax
READ_FILE [rel r14], [rel r15], [rel file.size]
lea r15, [r15]
;----------------------------------------
;GO TO END OF FILE DATA AND INSERT TERMINATORS
mov edi, dword[rel file.size+4] ;load unpacked size here into destination
mov esi, dword[rel file.size] ;and packed size here for source
sub edi, 8 ;subtract 8 here, go to last face position
sub esi, 6 ;subtract 6 here (3x word) bc no terminating word
; to here
mov word[r15+rdi+6], 65535 ;insert terminating word into end of data
cmp word[r15+rsi+4], 65535 ;now check if no face data```

no it doesn't, step just jumps there
why
[out, optional] lpNumberOfBytesRead
A pointer to the variable that receives the number of bytes read when using a synchronous hFile parameter. ReadFile sets this value to zero before doing any work or error checking. Use NULL for this parameter if this is an asynchronous operation to avoid potentially erroneous results.
This parameter can be NULL only when the lpOverlapped parameter is not NULL.
Windows 7: This parameter can not be NULL.
For more information, see the Remarks section.
ah
push rdx
mov edx, dword[rel alloc_data.pointer]
push rcx
lea_offset rcx, [rel alloc_data.addr]
add rcx, 8
mov dword[rcx+8], eax ;move length into current iten
sub rcx, 8
sub dword[rel alloc_data.available], eax
mov rax, qword[rel alloc_data.current] ;this line is being a problem
; mov qword[rcx], rax ;and save to addr items
; add dword[rel alloc_data.pointer], 12
pop rcx
pop rdx
ret ```
I don't know what to do with this (lack) of information
or... apparently it's getting past that and gdb is dumb and stops stepping when it closes the method and doesn't even step to ret like it used to
why
ok well it does get through the entirety of _load_l3d
but now I have no idea where it segfaults 
answer: right as _load_l3d ends
apparently this segfault is coming from exit file
which is all the way at the top of the method
and not happening until the end of the method
great
thanks windows

mov qword [rsp+32 ], 4 ; creation_disposition (always open; creates and open if not exist, elsewise open)
mov qword [rsp+32+ 8], 0 ; don't care
mov qword [rsp+32+16], 0 ; don't care```it's actually coming from these three lines
sub rsp, 24 ; Allocate 32 bytes of shadow space
mov qword [rsp+32 ], 4 ; creation_disposition (always open; creates and open if not exist, elsewise open)
mov qword [rsp+32+ 8], 0 ; don't care
mov qword [rsp+32+16], 0 ; don't care
;push 0
;push 0
;push 4
call CreateFileA
add rsp, 24 ; Allocate 32 bytes of shadow space```solution
gross but ok
mov r15, rax ;save addr to r15 xor rax, rax ;then sys_read again mov rdi, r14 ;read from open file mov rsi, r15 ;read data into the allocated data mov edx, dword[rel file.size] ;use filesize as length; huh?
sub rdx, 4 ;but expand it to be 2 bytes per pixel
shl rdx, 1 ;because old file used to be 1 per pixel
add rdx, 4 ;but then transparency bytes added
syscall
o
only reason this math has to be done is to maintain some compatability with older textures
could just redo the textures completely and convert but its not too much an issue
oh interesting
close handle is the last thing I expected to throw an error 
how did this occur
huh
ah wait
gdb is stepping from alloc in ltx to alloc in luv
why is gdb like this
ok file IO is done!
wait
how do you properly end gdb while the program is running if the program can't be stopped 
because that's the scenario I'm in rn
perfection
I am completely unsure of how to debug this 
Ctrl+ C then "q" irrc
yeah
I actually don’t know if windows terminal supports enough color coding complexity to support L3D
Even if it does — I have genuinely no idea where to begin to debug this
check pressing f1 to toggle wireframe mode
also check the return values of the file reads to check they are making sense
not been doing much on the proj tbh
working on scene files working again but its slow progress
setting up scenes is the most boring part
mhm
got the basis of lsc loading working
got it more working
can load multiple objects now
next step is making them able to move

moving stuff around and changing sky colour
awesome
object duping
only loads the file once but duplicates them in memory for seperate objects
decided to run it in uxterm for shits and giggles and the program rendered one frame then crashed lol
wrote a faster quaternion handling thing
will be doing mainloop stuff soon just getting initialising stuff working
this is what it looks like currently
looks great!
one of the maddening things about avx is show so many xmm instructions dont have a ymm equivalent
theres a horrible lack of 64bit packed int instructions
and its causing some bugs
I’m not used to you not being a moth 
moth?
Was a moth not your previous pfp?
oh no that was a flower from an album cover
Also iPhone’s new stupid predictive typing thing decided to fill in "er" after "moth"
oh yeah
I saw the fact that it was a flower several times and my mind still continued registering it as a moth, lol
oops
kind of fixed this bug with barycentrics being too high and overflowing
still happens sometimes but not as often
solution is to use smaller objects
bc that big plane is just 2 tris
oh
no idea how this never occured but it would simply be a better idea to do barycentric calculations in ndc space rather than screen
avoids high vals
only just thought of this wtf
brain works in mysterious ways
hrm
I guess that's a thing you can do if you're doing software rasterization
why do you have high barycentric coords
the coords themself arent high, but the calculations
tada
no random triangle disappearing
screen space coords are now converted to the range 0-1 to make it work better
actually range 0-2 for better acc
with a simd set thats [1/(sw/2), 1/(sh/2), 0, 0]
can then multiply the screen space coordinates by that to convert them to 0-2 range
however it turns out there was another problem in that faces would get clipped sometimes because they where falsely detected to be outside screen space
the xmm registers for storing the top left and bottom right data where in the format
xmm0=[min X, min Y, x, x]
xmm1=[max X, max Y, x, x]
where x was junk data
sometimes this junk data would exceed the bounds of the screen and the object was falsely clipped
just solved by changing the result from the comparison
ecx contained the results of the simd comparison after using movmskps
bottom 4 bits determined which statements where true
so the solution to disregard the junk comparisons was
and cl, ~0b00001100
which clears bits 2 and 3
voila
smaller font size = higher res
doesnt dissapear on higher res either because of the use of a 0-2 range
originally what would happen is that u would get a face which took up a large portion of the screen, so the vector calculated would have a massive value in it
say 100
and then through a series of other multiplications this value would overflow
problem solve
should change the thumbnail for this project thread
gotta make a nice logo
pretty good performance
memory is probably more like 1mb because this isnt counting the memory allocated in the data segment
and ofc bc of old cpu this is entirely avx compatible
nothing uses avx2, avx512, etc
simply bc processor doesnt support it
probably a good thing it means the code is more portable
time to make lunch now
bought some nice sandwich fillers from the shop yesterday so can get a reward for working oin this
nvm forgot to buy bread
L3D2 - x86-64 assembly toy software renderer
updated to be a little better
nearly finished commenting everything so its readable again also
noice much updates
this is like one of my favorite community project threads, really amazing stuff
thankies
most recent ver pushed to git
finished comments
gotta get shadows working next
which involves fixing the colour mixing
used to use like
5 divisions and 3 multiplies per pixel
horrid
what's your plan for shadows?
you should put some of your screenshots in your readme https://github.com/L226n/L3D2
lol
I totally would show this github repo to every engineer I know if you put screenshots in there
oky
can add a few things
for shadows will be using the same shadow mapping technique
but will fix the colour mixing to not be shit
the reason is because it converts an ansi colour code from bdc to int to rgb then mixes with ANOTHER bcd which needs to be converted then converts rgb back to ansi back to string
surely there a better way
the way the ansi colours are organised is in a cube so it should be alright to make an alg that operates exclusively on the bcd strings
THIS is the reason that the code is being rewritten
amongst others
tada added some images
can probably get away with hardcoding shadows to an extent
so that there doesnt have to be a memory access for getting the colour to blend, and can just follow a set structure to darken a pixel
rather than blending it with other colours which would only happen if ur using a translucent texture
do you plan on just having a single directional light?
just had the sun originally
debating over whether adding point lights would be a good idea
could add support for them
having more than 2 render passes is scary
in my opengl renderer I do 6 passes just for shadow maps for a single point light
one for each cardinal axis direction
agreed
can you use more than one cpu
I don’t think l3d is threaded yet
… hm
On the CPU, would it be faster to rasterize the scene, then raytrace to the point lights from the 3d coordinates of the rasterized texels?
no idea
not a clue how ray tracing works
could have a read into it if you think it could be worth it
Well basically
You send out a bunch of rays from the camera
You do a bunch of ray -> polygon intersections to find where these rays hit
If they hit something that is a light source, then the light source contributes to the light value
You then bounce these rays and go again N times
so like how a mandelbrot renderer works kinda
I know nothing about mandelbrot rendering 
it was a loose comparison
but not really sure like
how expensive this would be in order to get a good shadow
For this it’d be just a single step
Polygon gets rasterized->send out rays towards nearby light sources
if it does not hit a polygon, the tbh light does not contribute
Elsewise it does
… but thinking about it, it’d… still be several polygon intersection checks per pixel, so probably pretty expensive
well
shadow mapping involves multiple coordinate space changes per pixel so
maybe?
Several coordinate space changes per pixel is probably still faster than potentially hundreds of triangle intersection checks per pixel per light 
It depends on how many triangles you have, if you have like bvhs setup, if you do some funky stuff to avoid some calculations, etc
But even with all that, it’s… probably still a lot, especially for a CPU
yeah
doing shadow mapping just bc it's the fastest approach probably
shadows are expensive...
yeah gpus have hardware specialised for this yes?
😞
Nvidia has a software rt implementation as well
Yeah
RT cores
that's very specific woah
may get into gpu stuff eventually, maybe learn ogl or something
gotta master the cpu first tho
Some GPUs do things in software
Some GPUs do things the same things in hardware
Other GPUs might just not support it at all
in software as in... hard coded software?
Software as in GPU kernals/shaders and/or driver features
that's just builtin?
oh right
thought u meant like a massive hard coded macro
that would be cursed
I actually don’t know much about the drivers/gpu itself, a lot of what I say about those is mostly memory from what others say and just my own experience
So of course, I may be wrong on stuff 
My laptop has these weird "Dozen" drivers for vulkan, which report that they’re shipped by microsoft
I can use any of my GPUs with it, but all of the GPUs (including the CPU iirc) have the same feature set if I use the dozen drivers
And this feature set is not great; doesn’t even support NEAREST neighbor interpolation 
even basic features aren’t necessarily safe to assume will be supported
(though for anything worth supporting, they most likely will be)
no nearest neighbour
what???
Yeah
It supported LINEAR but not NEAREST 
wow
So having had taken a majority of a course in nasm
yeah I really don’t understand most of the errors in this project for compiling to windows
they should not be problems
Also, just looked it up
Windows does not have anywhere near enough colors in terminal
though there is the option of making a custom console program to emulate vim’s terminal a bit closer
oh lol
doomed project from the start
been realllly meaning to get bck on this but just cant :((
no motivation
not wanting to write assembly is entirely rational
yup
thats not the problem
got no motivation to write anything in any language
assembly is fun! and awesome! but just cant be bothered to write code rn
I don't write code because it's fun, I write it because that's what I was put on this earth to do
apparently
as I can't seem to do anything else
I've tried
point is, time to get back to work!
jk, your project is really cool, maybe it's done and you need something new
it is far from done, would love to have it finished and can think of so many cool things to make with it once that time comes
the problem is partially for writing nice clean code
which can be annoying
its the weekend, and now really bored, might start redoing the editor