#programming
1 messages ยท Page 22 of 1
at least teachers here can't use AI detectors for marking since they do false positives
here they can
and they also use ai to make their test questions now
ive seent hem do it, they told me they do it
a person i know went to uni before chatgpt but dropped out and then enrolled again and said its now a completely different experience with much less material comprehension 
using LLMs to help make questions seems fine to me
AI detectors though 
I'm gonna lose my marbles
the mystery of the broken ALU is fixed
time to re-run the right shift code
it worked
so all the problems ive been experiencing
were because of this stupid bad trace
im amazed it didnt cause more problems tbh
given its position tho
you would have to do an operation that enables the swap flag, and have differing values in bits 10 and 11 of the D register
finally
to even have a chance of noticing it
well thats exciting
now i have right shift
i guess i implement that division code now
good luck 
is there a reason for returning 32767 in particular if the divisor is 0?
he just returning maximum value of signed short
as an error signal
nah you better than me 
i'm just a computer engineering student trying my best to help
, while you guys are just vibin'
theres an annoying quirk with this architecture
for loading literals from code, data instructions are used
these are indicated by the highest bit being 0
which means you only get to load 15 bits

if i ever want max short as a literal, i have to left shift 7FFF and add 1
agony
chat chat chat, look
my brother in christ
tbh for some form of rudimentary error handling
:3
wtf is that?
ive been considering just making labels to jump to
wait what the hell
:3
bro is collecting huiman body parts to make robots
calm down its the aniamtronics they usedd for harry potter
so when is five night at neuros opening?
but i am making somewhat of a similar thing though
i was just gonna halt if a divide-by-zero happens
with a jump to a label that looks something like
@ Exception_DivideByZero
goto Exception_DivideByZero;
not much of a "halt" but given i can follow the code while it runs, it's enough for me to see

Im gonna commit triple fault
wait what are you cooking with this condition
3% in 14 hours.
i dont trust that 150 hours left for a second.
at this rate it will take 460 hours
hi t 
surely the last condition is just t < 0
hi chayleaf
How big was the model again?
My PC has a GPU with more compute than yours and it still took 3 days on just 150M on a fairly small dataset
"model_dim": 1536
"depth": 32
"heads": 16,
I don't know how to convert that to params
me neither
@sage crag
# math_div(a, b) returns the quotient a/b.
function math_div 2;
push_arg 1;
pop_d;
if_else_d JEQ math_div_bad_divisor math_div_good_divisor
@ math_div_bad_divisor
goto Exception_DivideByZero;
@ math_div_good_divisor
push_arg 0;
push_arg 0;
pop_d;
if_else_d JLT math_div_neg_a math_div_pos_a;
@ math_div_neg_a
neg;
@ math_div_pos_a
push_arg 1;
push_arg 1;
pop_d;
if_else_d JLT math_div_neg_b math_div_pos_b;
@ math_div_neg_b
neg;
@ math_div_pos_b
pop_d
push_d;
push_d;
push_value 1;
call math_div_recursive;
push_retval;
push_arg 0;
push_arg 1;
pop_d;
if_else_d JLT math_div_sign_flip math_div_sign_flip_end;
@ math_div_sign_flip
neg;
@ math_div_sign_flip_end
pop_d;
if_else_d JLT math_div_q_flip math_div_q_flip_end;
@ math_div_q_flip
neg;
@ math_div_q_flip_end
return;
# internal backing call for math_div(a, b)
# math_div_recursive(ua, ub, t, d)
function math_div_recursive 4;
push_arg 0;
push_arg 2;
push_arg 2;
add;
sub;
pop_d;
if_else_d JGT math_div_recursive_test math_div_recursive_pos;
@ math_div_recursive_test
push_arg 0;
push_arg 2;
sub;
pop_d;
if_else_d JGT math_div_recursive_break math_div_recursive_neg
@ math_div_recursive_break
push_value 0;
return;
@ math_div_recursive_neg
push_arg 3;
push_arg 0;
sub;
push_arg 1;
push_arg 1;
push_value 1;
call math_div_recursive;
push_retval;
push_arg 3;
add;
return;
@ math_div_recursive_pos
push_arg 0;
push_arg 1;
push_arg 2;
push_arg 2;
add;
push_arg 3;
push_arg 3;
add;
call math_div_recursive;
push_retval;
return;
(sorry for wall of text)
np
ported your code tho
wonder if it actually works 
well i hit start
its doing something for sure
not sure what
failure 
i probably made an oops somewhere
the hidden bitcoin miner:
should be something like ~1-2B I think depending on the other hyperparameters like vocab size, embedding size, head size, and ffn size
"vocab_size": 50257
idk embed size and head size
That's gonna be a lot of compute depending on how big the dataset is and how many epochs it's doing
10 epochs
Have fun with your way overfitted model with too much compute spent on it
With LLM pretraining usually you want to do at most 1 epoch I believe
never repeat any data, it only causes overfitting
usually there's way more data available than you have compute to actually use
@prime ridge
math_div(a, b) {
if args[1] == 0 {
throw;
}
a = args[0];
if args[0] < 0 {
a = -a;
}
b = args[1];
if args[1] < 0 {
b = -b;
}
c = b;
d = 1;
ret = math_div_recursive(a, b, c, d);
a = args[0];
if args[1] < 0 {
a = -a;
}
if a < 0 {
ret = -ret;
}
return ret;
}
math_div_recursive(a, b, c, d) {
if sub(args[0], add(args[2], args[2])) > 0 {
if args[0] - args[2] > 0 {
return 0;
}
return math_div_recursive(args[3] - args[0], args[1], args[1], a);
}
return math_div_recursive(args[0], args[1], args[2] + args[2], args[3] + args[3]);
}
Whoever is telling you to train that model clearly doesn't quite know how training a model should be done
I am not so sure why python libraries are trying to escape pythonic way
ported back to pseudocode (my translation might have errors though
)
Totally not frustrated by flax 
chayleaf
I can't even use print() easily ๐ญ
you should try Equinox 

a JIT is necessary though, both because Python is kinda slow and because that way Tensor operations can be fused and launched more efficiently
I think there is this fundamental tradeoff between easily debuggable code and performant code
I don't think that has to exist, jax JIT-compiled code and normal eagerly evaluated code look pretty much the same (other than the static shape restriction, but that doesn't affect debugging, it just causes pain getting stuff to work at all)
current ML frameworks are just too shit

looks like it got 13 as an answer
Or maybe it's just python
to 15/3
Python might be a bad language for ml
it gets 15/15 right
its ok, i believe in your code konii 
i probably just ported it wrong
why is it wrong tho 
it looks right
found another bug
its working
its actually very impressive
thank you for the help
do you wanna help me devise square root 
i was thinking about it earlier and i feel like theres a shortcut somewhere
because
log2 is easy* with binary
-# *precision not guaranteed
so i think you can just
log2(x^(1/2))
= (1/2)log2(x)
= shr(msb(x))
so like
i think if you just do 2^that
you get something resembling a square root?
but im not sure on the accuracy
log2 might be too imprecise
in fact log2 is only one of 16 possible values isnt it
bin search time 

ALU supports add, sub, inc, dec, and, or, xor, not
just added code for n-bit shifts to left/right
left, by far
do you want to review it 
if not then it's the only algorithm, which makes it optimal

anyone else using this chat
# math_shl(x) returns x left-shifted by one position.
function math_shl 1;
push_arg 0;
push_arg 0;
add;
return;
# math_shln(x, n) returns x left-shifted by n positions.
function math_shln 2;
push_arg 1;
pop_d;
if_else_d JEQ math_shln_done math_shln_go;
@ math_shln_done
push_arg 0;
return;
@ math_shln_go
push_arg 0;
push_value 1;
push_arg 1;
sub;
call math_shln;
push_retval;
push_retval;
add;
return;
# math_shr(x) returns x right-shifted by one position.
function math_shr 1;
push_arg 0;
push_value 1;
push_value 2;
call math_shr_recursive;
push_retval;
return;
# math_shrn(x, n) returns x right-shifted by n positions.
function math_shrn 2;
push_arg 0;
push_value 1;
push_value 1;
push_arg 1;
call math_shln;
push_retval;
call math_shr_recursive;
push_retval;
return;
# internal backing call for math_shr(x)
# math_shr(v, mask_out, mask_in) performs a right shift by copying bits using masks.
function math_shr_recursive 3;
push_arg 2;
pop_d;
if_else_d JGE math_shr_recursive_if math_shr_recursive_else
@ math_shr_recursive_if
push_arg 0;
push_arg 1;
push_arg 1;
add;
push_arg 2;
push_arg 2;
add;
call math_shr_recursive;
push_retval;
goto math_shr_recursive_end;
@ math_shr_recursive_else
push_value 0;
goto math_shr_recursive_end;
@ math_shr_recursive_end
push_arg 0;
push_arg 2;
and;
pop_d;
if_else_d JNE math_shr_recursive_shift math_shr_recursive_ret
@ math_shr_recursive_shift
push_arg 1;
or;
@ math_shr_recursive_ret
return;
yeah thats what i have 

it cant be that easy
surely
ok let me port this
what's the i stand for
i see
like, loading a literal into memory somewhere?
literals can be put into A register in one instruction
it takes 7 instructions to push a literal onto the stack
the stack is all software too unfortunately
still porting

wtf it works
your code was so nice and concise
i feel like my port doesnt do it justice
# math_sqrt(x) returns the square root of x.
function math_sqrt 1;
push_arg 0;
push_value 1;
push_value 0;
call math_sqrt_recursive;
push_retval;
return;
# internal backing call for math_sqrt(x)
# math_sqrt_recursive(n, odd, count)
function math_sqrt_recursive 3;
push_arg 0;
push_arg 1;
sub;
pop_d;
if_else_d JGT math_sqrt_recursive_break math_sqrt_recursive_go;
@ math_sqrt_recursive_break
push_arg 2;
return;
@ math_sqrt_recursive_go
push_arg 1;
push_arg 0;
sub;
push_arg 1;
pop_d;
LD INC_D;
LD INC_D;
push_d;
push_arg 2;
pop_d;
LD INC_D;
push_d;
call math_sqrt_recursive;
push_retval;
return;
even looking at this, i cant figure out how its working
oh
i see it now
clever
# math_shr(x) returns x right-shifted by one position.
function math_shr 1;
push_arg 0;
push_value 1;
push_value 2;
call math_shr_recursive;
push_retval;
return;
# math_shrn(x, n) returns x right-shifted by n positions.
function math_shrn 2;
push_arg 0;
push_value 1;
push_value 1;
push_arg 1;
call math_shln;
push_retval;
call math_shr_recursive;
push_retval;
return;
# internal backing call for math_shr(x)
# math_shr(v, mask_out, mask_in) performs a right shift by copying bits using masks.
function math_shr_recursive 3;
push_arg 2;
pop_d;
if_else_d JGE math_shr_recursive_if math_shr_recursive_else
@ math_shr_recursive_if
push_arg 0;
push_arg 1;
push_arg 1;
add;
push_arg 2;
push_arg 2;
add;
call math_shr_recursive;
push_retval;
goto math_shr_recursive_end;
@ math_shr_recursive_else
push_value 0;
goto math_shr_recursive_end;
@ math_shr_recursive_end
push_arg 0;
push_arg 2;
and;
pop_d;
if_else_d JNE math_shr_recursive_shift math_shr_recursive_ret
@ math_shr_recursive_shift
push_arg 1;
or;
@ math_shr_recursive_ret
return;
it just goes thru the high bits of the input one bit at a time
and copies it to the output
crazy how bad shr is via software when its literally just a few wires in hardware
division is definitely the largest function here
its worse than sqrt somehow
im amazed any of these work honestly
i was kinda expecting to be working on math functions for ages
so i didnt plan what i wanted to do with them
doom 

That training doesn't have enough data for a model that size
surely doom fits in my 16 bits of addressable memory

i was just testing on my side
looks like batch size 16 is fine
ran for 15 hours without going over 24GB vram
chat how do you feel
how do YOU feel
i stopped it at step 59125
btw, what about the epochs? they said 10 is too much and it should be 1 or less
should be 1
brain blast
don't train it rn
how do i feel about what?
ur wasting gpu time
type shit fr
always nice to see you :>
that's the old code
The data is still zipping. It's so slow
it's been zipping for almost like 5 days
it's zipping like 5 files a second
i might fall in ths rabbit hole i should NOT get used to
Concatenate the data files and zip it in like an hour instead
You'll save way more time than you'd lose by stopping it now
If sam was on linux i'd just do a tar
woulda taken like a single day max
but .zip is so horrible
Then use 7z or something?
Also the 7zip tool can basically handle almost all packed formats
mf chief() {
mf ts be cap rn
ts fr vibin fr yikes rn
sussin (ts finna cap) {
ts deadass ongod rn
}
yeet rn
}
Just throw it into a tar if you want, Sam can probably easily unpack it
alr bet
But also just don't make your dataset like 30 billion small files
Concatenate them
7 zip can extract tar no?
bit late for tha tlol
It can extract basically anything
ideally you'd implement it like this
function sqrt 1;
push_value 0
push_value 1
push_arg 0
push_value 1
sub
pop_d
if jge @loop @exit
@loop
push_value 2
add
; ??? somehow increment second last value on stack by 1
???????
push_d
; ?????? i need to push the second last value on stack to stack again how do i do it
??????
jmp @loop
@exit
pop_d
return
but i thought i was the best shr 
Why would it be? You can just Python script to read all the contents into a single file no?
I already have like 100 million files in a single directory

100 million files is def gonna take a couple hours to extracts, maybe even a full day
could you remind me how much storage i need?
new sqrt dropped
I have made this mistake before and may or may not have output a single 500MB json file
I mean I cancled it
120gb
most of that is file metadata
oh, thats way less than i was expecting
its old
like 80% of it is metadata lmfao
; ?????? i need to push the second last value on stack to stack again how do i do it
can a function return multiple values on stack or would that mess up the return code
the way it's written now, only single returns
tbf 100m files shouldn't be that bad to process into smaller files
makes sense you use recursion everywhere its the only way to have more than a couple variables
I mean the number 100 million and python should rarely be in the same sentence
stack frames? you mean variables?
but still should be dooable
not even python is the issue
the IO is insanely slow
it's on an archive external drive
15 mb/s writes
Easy:
File("out.txt").bufferedWriter().use { writer ->
for (file in (File("path/to/files").listFiles())!!) {
writer.write(file.readText())
}
}

you wont need it if you implement arbitrary stack offsets
or multiple return at least
is there a quicker way to do squaring than multiplication
is this like trying hard to max out i/o speeds
The kernel is having a heart attack and the drive is on it's death bed
i think malloc is probably in my future somewhere
then you can return a pointer to a struct

malloc is useless
i mean
This is all the code required to read the contents of all files in a directory and write them all into a new file
im more just looking for the notion of abstracting away ram management
then my code can just say "yeah i need this much memory for my crap"
its called stack
Yeah but how would the code deliminate new files?
the stack
it's not text or anything it's a .npz file of integers
chayleaf please i want malloc
hm
You can just add a prefix/postfix to the write command
honestly I mildly want to suggest sqlite but I don't remember if that can efficiently handle binary data
@sage crag write malloc
i wrote a garbage collector recently 
ill just call part of ram the heap
also can't be read text since it's binary
its fine
Then read binary and add delimiting bytes where you need them
other than the stack, heap, and a tiny address range for my own stuff, nobody else should be touching ram
so it should be fine
otherwise diy a binary archive format that's like ```
uint(total length),uint(metadata length),bytearray<metadata>,bytearray(data)

nah but like the code needs to load these files independantly anyways
Why? It seems like your approach completely and utterly sucks
i cant do garbage collection
can't just load 120gb into vram ๐ญ
??
it'll work but it's slow asf
malloc is just a worse version of garbage collection
Most of that is probably filesystem overhead instead of actual data if most files are under 4KB
atp shipping the drive internationally seems like the move
apollo can I just double check my understanding of the situation rq
it's 16 billion tokens
you have:
- 100m tiny files
- stored on a slow drive
you want: - some easy way to transfer all of these files to sam
its like GC except you dont know what the user code will do so you have to do all sorts of tricks to properly allocate and deallocate memory
And is each token a single Int32?
yeah
so a lotttt of overhead
I think it's actually more like 200 or 300 on the drive
it takes a long time to even check
just have two stacks, one for functions and one for values, that should solve any problems with not being able to use more than one variable, you wont even need an args pointer anymore
if you want me to try throwing to throw together a terrible rust program with far too much multithreading I'd be happy to give it a shot if you can give me an output format
I cancelled the check at like 120 gb
Then you'd have an exceptionally easy time concatenating them all into a single big binary file while keeping separation, just do
collection of chunk size 4-byte Int32s
repeat for every file```
yeah
yeah then unless you just give sam a disk image of the drive you're gonna have to cope with it taking a while to do loads of small reads
is it a flash drive or spinning metal
loading 100m files will still take a while but I guess I gotta just bite the bullet atp
on my side everything should be fine, nvme ssd and stuff.
even if there is not enough space there i also have 2 empty sata ssd's
it's an external usb drive
nah ur golden dw
it isnt even that much data tbh
im just a bit stupid
mostly lazy actually
Making your dataset 100 million small binary files rather than one big raw text file and including the tokenizer in your trainer code seems stupid
two stacks
I mean it was a lot easier to implement at the time
everyone does stupid things, you have to learn somehow lol
Just send the raw data and tokenize it at the target
you can have one grow from the top of the addressable memory and one from the bottom
I will I will
heap
heap
heap
heap 
this offer still stands
I know the int32 values are 0-50000 inclusive



oh damn you could use u16s then
so ig I just use 4294967296 as the delim?
pov: stacks fighting over 0x0FFF
half the filesize 
my wifi is 
I could indeed
Yeah, that'd work if you know the maximum allocated range of the actual data
Can same use gzip?
Bwaa
probably
probably
sam
7zip handles most stuff
Yeah, it's a great software
bwaa
if you have two stacks you can implement forth 
https://www.forth.com/wp-content/uploads/2018/11/thinking-forth-color.pdf
oh god
rip 87.5% of the dataset 
instant headache upon seeing the typeface
no no no it's fine. It's still 16 billion tokens
that's enough to train comfortably
ah ok
the slowest part of this was saving the json somehow, not actually opening the files
just a larger ctx
what the hell
im not reading all that
oh hellll no
malloc()
that's not even bad
one of my json files was 50gb
smh
90% was wasted metadata
how does a json file get that big 
it was from the dataset so not even my fault
Io time
as long as it doesnt have to train for more than a month im good.
im going on vacation again on the 28th and i think it would be nice if it finished by then
surely a memory allocator is like a couple hundred lines of assembly code 
people using json because its easy to use
realistically a max of 2 weeks

without thinking about the problems it brings
Welp, gotta recalculate the batch size again
and not a couple tens of thousands of C code 
if it takes longer just save the checkpoint and i'll finish it
the rom and ram are separate 16-bit addressable spaces, so no worries there
ok maybe minor concern
I had to anyway cuz I implemented latent attention and flash attention
oh yeah btw @olive sable u need to compile flash attention v2
okay
people should learn to use compressed data formats
steal memory from the host PC and attach it to the virtual one 
I'll find how to do that after I start fixing the ds
nah just build faster pcs with more memory
that solves all issues 
anyway forth is literally the simplest high level language and it doesnt require an allocator (trust me allocators are hard)
This whole LLM pretraining venture seems quite pointless
There's like 30 thousand LLMs you can just prompt to get whatever you want with much higher quality than any home-made LLM can, LLMs take way longer to train than even NeuroSynth, and LLMs are honestly way less interesting than NeuroSynth
Not pointless
you dont even have an allocator
just an arena
If your machine has less than 96GB of RAM and lots of CPU cores, ninja might run too many parallel compilation jobs that could exhaust the amount of RAM. To limit the number of parallel compilation jobs, you can set the environment variable MAX_JOBS:
hey thats me, i have 16 cores and less than 96GB of ram
Well, what's the point then?
Nobody wants a cringy LoRA tuned llm
you dont need an allocator, just write to random memory addesses and assume that no one else is using them at the time 
it looks really weird
What's wrong with those?

just set the workers to a normal amount
Extremely inefficient
surely you're not going to use more than 6GB of ram per job
requires instruction tuning and more parameters
You aren't going to see a 0.5b model outperform larger models without pretraining
Whar
It's wayyyy more efficient than home-pretraining a tiny model
tbhi haver no clue how to compile flash atttention V2.
cant i just pip install?
not per parameter
Yes
pip will compile it for you
lemme find the package because I think it's specific
Yeah, of course not
But you can just grab an off-the-shelf pretrained model and finetune that
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
pip install .
have fun running that locally ๐คก
if you want a good local model. It has to be custom
Llama 1B
Llama 8B
Llama 13B
A home-made model is gonna suck even more
Nope

@prime ridge there is a repo to convert any GQA to MLA
Use qwen 3B ish model and finetune it after converting it to MLA

Classic
u need pytorch
We hate torch dependency
i though i had pytorch
maybe in another py-env
yeah
And how would you know that your tiny model trained on amateur collected data can beat a model made by a big company with loads of data scientists cleaning up the dataset?
sooooo many parameters are completely wasted
it doesn't need to know how to solve differential equations to chat like a human
What
literally burning compute
cuda 11.8 for 3090?
but I want my ai girlfriend to do my homework whilst rping !!!
CUDA 12 is recommended
CUDA 12.1 is my go-to usually
12.6 or 12.8?
yeah should be cuda 12
Yeah ofc it will lost some but it is better cram
Doesn't matter, it's backwards compatible anyway I think
Much efficient than starting from scratch imo
Though check what CUDA Torch is available for
12.6 is stable
these are the options
Duh. But once it's trained it will have exceptionally low memory usage while having the performance of models orders of magnitude better
Okay lol good luck then
Don't select linux
"while having the performance of models orders of magnitude better"
Tbh neither do I know the architecture, so can't really confirm "it would be magnitude better" in both perf and compute/mem
Idk how being trained on wikipedia is supposed to improve the accuracy of a model meant to sound human
that's just not true
garbage in garbage out
i dont mind training it anyways, lets j8ust do it
exactly
Sure, it may require less compute and memory, but it'll overall not be better if the data is not significantly well curated
in your dreams oof
Can you learn english with discord messages, thats my question here
If just I had a 3090 like you so I could just train NeuroSynth models whenever I feel like without it taking 30 hours
I think wiki is just for priming basic knowledge, being coherent
Here's my prediction:
the model will output English words, but without any sense behind them, and be unable to properly keep a conversation
me :3
Us core
-# If I'm right, that'd be very silly
who cares. I'd be hillarious anyways
plus I doubt it
a lot of the data comes from "smart people" servers too
Is your data in any way grammatically consistent?
prolly 50%
somewhat
it isnt an english major. It will sound natural
Nahh it's done for
50% 
it's like 70% brainrot and 30% coherent
By the way, if you're training off of Discord messages, you're breaking the Discord TOS and you should consider not doing that so you don't have the same happen to you as happened to Shapes
worst case
All I see are numbers 0-50,000
doesn't seem like discord to me
What matters is what the numbers are derived from
If it's numbers generated by putting Discord messages trough math formulas, you're in trouble
no ofc not
but bad apple could be possible...
apollo you just have a very lucky random number generator right
Surely
-# Well, have fun with the Discord lawyers
Return back to slack 
just rng bud
Ah yes, it is but it isnt the same
pseudo rngs are very useful
I hate manually managed rngs
Bad apple is a gift from the gods im convinced
poor thing obliterates my cpu
Jax you again
jax.random.split 
Love when Shuni randomly spawns as i mention jax/flax
Who would be spawning if i mention tf
DAMN it's fast
it is
my brother in christ, pytorch ***is *** installed
i heard bogosort
worst case it's not required but makes it like 3-4x faster
yall i need some advice
are there any other important math functions im missing
check pins
pow might be nice
any bright ideas on how to do pow efficiently
log_n 
surely just
is adding arduino libraries supposed to take 5min and not be done
my build times can be up to 10 min for arduino sometimes
my beloved
depends on the code
I love how it slowly but surely drops from 9bn/s to 8bn/s as my cpu cooks itself
@olive sable it's tokenizing and should be finished in 90-120 min. I gotta finish my paper just ping me if u need anything
i need to integrate live2d into my c code, live2d doesnt have C bindings what do i doooo

uh wait how hard is a 16 item list again
Make C bindings
it improves generalization
ah ok that is nonideal actually
how hard should that be (i do not know c++ )
nope 
I don't know, depends on the complexity of things you're making bindings for
if I have a 1 / 16! chance of getting my list in the correct order, and I'm doing 8 billion attempts per second what's the average time to have a 90% chance of the list being sorted
I swear I used to know how to do this
this is so real
I would say you can use the standard normal distribution or something but 
Silly
Nobody likes yet another derivation of GPT. Enough said
that's super boring
or Llama
no module named torch bwaaaaa
LLMs in general are pretty boring
sam how are you installing torch?
Now vocal synthesis, that's where it's at
it's fine. You don't need it. It would just make it a bit faster
I personally prefer eating rocks
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
are you in a env btw
nope
See we have a man of culture
i dont really use envs all that often
do I leave my desktop attempting to bogosort 16 numbers or do I let it rest
This is the peak of silliness
honestly every time I see ai voice stuff it reminds me I need to train voice irl
get back on the grind
uh isn't that what ur doing basically
not at all?
wish I could sing but my vocal coords are fricked up
how is it different? From my understanding it's also a transformer
transformer != llama
llama is a transformer
Ok surely this is bait ๐ชค
a bit aint enough to stop ears from bleeding. Ah well, it do be what it do be
forth is a lovely language 
you should learn it 
: SQRT
0 SWAP 1
BEGIN
2DUP >= WHILE
DUP 2 +
ROT ROT -
ROT 1 +
ROT ROT
SWAP
REPEAT
DROP DROP ;
25 SQRT CR .
@olive sable yeah ur good we can just wait for the encoding to finish
llama is literally the name pf The Architecture, which at the end doesn't matter that much
i mean

you gotta start somewhere
I rather play instruments than fight with biology
otherwise, you will forever be remembered by me as the java programmer 
stacrobatics
llama is an LLM developed by Meta not an architecture
why not both
It is???
there are only so many hours in the day
Have you not heard of llamafy
ur referring to the llama weights
yeah fair
Meanwhile me, I'm quite good at making vocals, but suck at instrumentation
that has nothing to do with singing
yeah. LLaMA != Llama
I can sing really well in my head 
Technically it does
I know how to make the computer sing pretty decently
god bless the discord emote autocomplete
ok but
like ur throat isn't a computer
you have to put in effort
either way it's not just a fine tuned model
instead of letting your rock think extra hard
both require effort, and both are valuable skills 
I like when my rock gets to think real hard about singing
yeah sure but not comparable skills
I couldn't imagine doing what superbox does; very impressive stuff
good
We're only just leading Neuro singing voice cloning and exploring completely untapped potential in the field
i'm open to agreeing but I'm still not quite understanding the exact point of not pretraining and instead working directly with chat data and what makes this exactly better than just fine-tuning
Is there any documentation that ya got?
"only" looking pretty impressive 
agreed, completly different things
help
Silly
Either way, NeuroSynth is gonna be good
Did you see BETA-3?
๐ญ
I don't think so
binpow is the fastest you can get with software
so glad I overspecced my storage
Delete any big games you have installed
And if you have Unity you can free up like 50GB by getting rid of it
Here's some NS-B-3 highly tuned action
A model with a single intended purpose that has been pretrained for that purpose will outperform a model trained on irrelevant data of the same size. It's the classic example of "garbage in, garbage out".
Training data is the #1 most important factor of model performance
Just look at how Llama 4 seemed good on llmarena despite being bad elsewhere since it was made for that task
literally the #1 principle in ML
something like this surey i didnt make a mistake
int pow(int a, int b) {
if (b == 0) return 1;
int ret = pow(a, b / 2);
ret *= ret;
if (b % 2 == 1) ret *= a;
return ret;
}
what's next, zaporozhian cossack division?
shr1 recurses 15 times
not sure the mul savings are worth
we wrote the same algo so it must be correct 

this is strangely very relevant to the code im writing rn
but it's not directly irrelevant data, when the way grammar works, etc is similar. You may be thinking of the weaknesses of instruction tuned models, those sure yea they got problems but on base models you don't necessarily have that problem
seems promising
Still running on synthetic data by the way, NeuroSynth-1.0 will be using oragnic data
๐ istg i definitely set it to other directory than c on my steam
I am GRPO tuning the base model afterwards. Also, just to be so serious, grammar does not matter almost at all in my model...
what do you mean by "synthetic" data?
Data made with Neuro RVC and existing compatible datasets
POW :
DUP 0 =
IF DROP 1 ELSE
2DUP
2 /
POW
DUP *
ROT
2 % 1 = IF * ELSE DROP THEN
THEN ;
untested
But grpo assumes the model output is good and you got a high quality autorater on which the model doesn't hack the rewards
i would still never write it without parentheses, it scares me
also binpow is trivial to translate to an iterative approach too
Why are you so insistant that the model will be bad? I literally already trained it at 30m parameters and it was "eh". Considering the fact that it was only 30m parameters and at least could form sentences is extremely telling don't you think? At a higher scale it will be significantly better
GRPO is just RL. It don't need to be big. I'm not using it for test time compute
and you are objectively wrong with using GRPO on heavily non-factual stuffs
it does not work
how are you going to measure the perf? ELO or something? then we back to RLHF
GRPO is for tool usage. Specifically, actions taken to avoid being detected as an AI
im not using it as a replacement for fine tuning
specific words might indicate doubt of human interaction
Tool use on such a tiny model? That's gonna go horribly
im not 100% set on GRPO btw. Just an experiment
holy fuck
let a man experiment
I am curious
I know the base model will be good but idk about fine tuning
10M model overfit on tinystories has shown it to have barely cohesive sentences even on extremely simplified and perfect data. The model was also clueless with anything remotely outside it's exact distribution
where is 10m coming from
is it really an ai discussion in #programming if people don't try to tell you 1000 reasons why you're wrong and a terrible person
ofc 30m was overfitted
on god
like damn bro. let a man live
we just don't want gpu to melt for things that won't work
give me the 3090!! ill change the world trust
why do you care
Nah, give me 3090 and I'll make the best Neuro vocal synthesizer ever
Reguardless im still going to learn how to actually make an LLM instead of just yoinking a pretrained model
fk im lagging
fair enough but only on conversational data is objectively a poor idea that even the Lamda google model was still like 60% web and only 40% conversational which is alr even an insane ratio
$750
how about $1

for 800$ u bet i take it๐ญ๐ญ
I don't have that much money (if I did I would definitely get that)
for $1 i could give you a singular core
its a steal btw
???
PLEASE
damn, are 3090's that exxpensive now?
i paid 675 for this
1.1k used here

It's a fucking scam
Cheapest I've seen in Finland is 650โฌ, but more usually they're in the 700โฌ range
powercolour please can you release the 9070xt reaper I'm getting desperate
my entire pc used was 1.3K
i don't understand why one wouldn't rent gpu compute when it isn't that expensive
1999$ after scalper pricing
2000+ USD
I need it consistently, for me it'd almost certainly be cheaper overall to just own the hardware
Or something ๐ญ๐ญ
ยฏ_(ใ)_/ยฏ
-# Also I hate cloud stuff
Apparently
doubt it'd be that much
but also I have zero idea how monopoly dollars translate to gbp
the lowest bid in EBAY is 840 USD rn
China got access to western 5090s somehow so now they are extremely stacking themselves with all 5090s on the market
like what ๐ญ ๐ญ
is this for a 3090 or the card I was talking about

ok nvm me then I'm just lost in this mess of a chat
how many hours of training per day do you realistically need, even if you iterate super fast
Cheapest
Well, I trained NeuroSynth-BETA-3 for 30 hours straight, and I'll have to soon do it again for a special append for BETA-3.1
I wonder if there's somehow an underground dark web network for china illegal gpu trafficking๐ญ๐ญ
also neither of those are the card I was talking about
I specifically need the powercolor reaper
nothing else will fit in my case
Cause how come everywhere GPUs are out of stock
i dont mind buying and selling 3090's to yall. i just aint paying for shipping
i am definitely joining it if I had a chance
also like surely there are fees involved for shipping it across borders
599โฌ?? Holy cheap
SAME๐ญ๐ญ
sometimes ye, like 21% import tax
I wonder how much the shipping from wherever you are to Finland would be
sounds like a good deal
wait I actually did kept a note of
does that shipping include throwing the package down a flight of stairs
chinese 4090 48GB version
- import tax but idk how inport tax works between EU countries
There's none
ok that's based
Unless that changed
Used to apply to the UK too but not anymore since they left the EEA 
this is without a possible import tax tho
I miss ordering custom cables from the uk
If this dumb AI summary is correct, it's not a thing
But I don't know if I can trust it
my dad had to import his DGX GB10 order via italy cause it's not even sold in Switzerland ๐ญ๐ญ
order some to me and I can take them to the netherlands then ship them
then it should be fine
surely that is a valid way to avoid fees
totally won't snatch in between 
Bitcoin address ahh reseller
Hm, maybe
I'll consider it when I have more money if you can find me a good 3-slot one for which it's possible to find a pair later
I really, really need a 3090 for NeuroSynth training
i will if you want me to, im just in debt rn so i need the money beforehand
2.5-2.8k CHF (3000$-3400$) but out of stock everywhere
Apparently companies do the conversion of usd price -> chf price 1:1 
Just gotta figure out where I'm gonna get all that money
-# definitely not NeuroSynth commissions, nobody wants those anyway
i have no fucking clue how variant can 5090 be
is this canadian dollars or nah i dont even know
probably US since it's bestbuy
7381748382$ gpu๐ญ๐ญ
Why does money have to be so hard?
imagine aftger game-jam 3.
i get a bill:
usage of neurosynth:
$800 or 1 RTX 3090 24GB

That's so silly
But no, I've always operated on a "pay if you feel like it" model
Though I may sometimes provide stuff for free too easily
meanwhile @viral oasis (hi op) with 1 4090 and 2 3090s๐ญ๐ญ

that invoice for a gh200 is on its way
my dad also has like 5 jetson orin nx on backorder๐ญ๐ญ each of em is like 600$ and worse perf than rtx 2060
none for sale here so bwaa
hi operator lol
didnt know she was in this server
smh
how did i get 4 bwaas in 1 second
elvyn
the monent 1 bwaa was tpyed a 2nd imediatly appeared
and im just terminaly online
I am stalking this chat because its more fun than revision
i dont remember what i was doing tbh
i think i found all the 5090 that shadow can't find
what's the exam about 
undertstandable
marketing
ew
marketing?
yeah exactly
what?
i hate the subject so damn much
suply and demand type of shit?
what the fuck lmao
chinese stores definitely have more gpus, even those that US did not allowed them
no way more brainrotted than that
@sage crag
wakey wakey i need some convenience functions for using a display
its basically just "people buy stuff"
see that dgx and h100 ๐คฆโโ๏ธ
i attempted to write some but they arent working 
theres a display that treats a region of ram as a video buffer
yep we do, what about it?
idk something about how to make it so people buy your stuff instead of other people's
but like I really don't like the subject nor the lecturers
if my math is right I only need to get 5% on this exam to pass though
if i make it cheaper, and people know its cheaper, and the quality doesnt suck, then people will buy it
current config is 64x64 pixels, monochrome, each 16-bit word of memory controls 16 pixels, in order from left to right, top to bottom
ez
well
you see
that is one of the many things you are told
uhuh
but the subject basically boils down to "idk wing it"
with the exam basically just being a history lesson
for starters shouldnt that be (y * screen_width) + x
like I'm expected to write an essay answer to one of these questions in like 30 minutes
unless your y axis is horizontal
but secondly, the problem is that one word in memory is 16 pixels
so if i want to turn an individual pixel on

you can avoid right shift
by taking 16 - (x % 16)
and doing left shift
still i dont know why what i have isnt working

Idk how common it is but ive been told by my dad that in russia the pc stores are extremely premium looking in comparison to the west (at least the ones he visited)
# fill_pixel(x, y)
function fill_pixel 2;
push_arg 1;
push_value 64;
call math_mult;
push_retval;
push_arg 0;
add;
call place_pixel;
push_value 0;
return;
function place_pixel 1;
push_arg 0;
push_value 8;
call math_div;
push_retval;
push_value DISPLAY_BASE;
add;
push_value 0b0000000000000001;
push_arg 0;
push_value 15;
and;
push_value 16;
sub;
call math_shln;
push_retval;
pop_memory;
push_value 0;
return;
shiro can you teach me erlang i dont know it
i also dont know erlang
you can learn it 

can we combine our fragments of erlang knowledge into a full understanding 
i know that pleroma is written in erlang (may or may not be true)
and i know that its based on message passing with in-process mutable state but no shared memory between processes
lmao what, did he visit boutique prebuilt stores or what, i can't call any of what i visited "premium"
idk what premium means in this context
i mean it will be cleaner than a supermarket
idk
He was in st petersburg
its better to work in whole words instead of individual pixels anyway
maybe have a function that bitands or bitors a mask to a certain word
that way you can do maths in your head instead of making your computer do it 
thats the optimal way
please understand, you are replaceable, your computer is one of a kind
it deserves care and adoration
and precomputed maths
i'm in moscow so if anywhere is premium it's here
this compiler agrees with you
the reason my instruction tokens are so garbage is because +, -, *, /, and any other arithmetic symbol is evaluated at compile time by the precompiler
so i cant use them as tokens in my instructions :(
did you want to make an apl or something
why would you need them in your identifiers
@hoary lion ur not gonna believe it
makes sense, well you can force whitespace between identifiers to recover that
will try to adapt this since my attempt isnt working 
i wonder if that means my stack breaker broke
let me see
the holy trinity
it broke as i expected
its fine i can fix it
this is totally a productive use of my time and very important for using the language
wait i think it crashes even without running the shellcode
can you confirm
main := fn(): uint {
start := "abcdefgh".ptr
len := 8
ptr: ^u8 = @syscall(0x9, 0, 4096, 7, 0x22, ~0, 0)
i := 0
loop if i == len break else {
(ptr + i).* = (start + i).*
i += 1
}
return 0
}
oh its because the old version of lily checked out the old commit
thanks ida for the descriptive error











precision



