#π Cortex Engine
1 messages Β· Page 14 of 1
/*
* Fiberus - Fiber Runtime for Haxe
* fiber_x64.S - x86_64 SysV ABI context switching
*
* FiberContext layout (80 bytes):
* offset 0: rip (return address)
* offset 8: rsp (stack pointer)
* offset 16: rbx
* offset 24: rbp
* offset 32: r12
* offset 40: r13
* offset 48: r14
* offset 56: r15
* offset 64: mxcsr (4 bytes)
* offset 68: fcw (2 bytes)
* offset 70: padding (2 bytes)
* offset 72: reserved (8 bytes)
*/
lets see
we keep it simple:
Fiberus is a project to build a new Haxe compiler target with a fiber-based runtime for multiprocessing. The goal is to create a runtime where all concurrency is handled via fibers/coroutines, with integrated garbage collection.
we start with a single-threaded cooperative scheduler and generate C code
Do you make this haxe target using OCaml directly on haxe compiler or another solution?
Really curious to see what comes out of that
ok, claude is crunching through the haxe unit tests
time to tackle the gc
Ok, looks like we got a working gc..
learning quite a lot. I finally get what Roots are... π
also, pointer alignment hell π
I just fiddle
A walk in the park for daz holly kind of course
contemplating a few things atm. I think I will borrow a few things from the naughty dog gdc talk
so far it looks like it could be crazily performant
and if you dont utilize the fiber architecture it's working just like hxcpp, all app logic running on a single fiber on some thread
also turns out that the immix maps nicely into the fiber setup
tell me you are debugging a gc without telling me you debug a gc
Are you using the hxcpp immix?
atm no
using a simple mark&sweep with some optimizations like page tables
the idea is to have a stable singlethreaded fiber runtime + gc first, then we unleash the multithreaded kraken
the fact that I already run the haxe benchmarks is unreal
ok, immix working
How'd you get this all up and running in such a short timeframe?
simple, completely stupid vibe coding with claude code. While I work on one part I let it the work on the todos I setup for a different part
and every once in a while I reset it and have it do a complete quality assessment of the project parts
which I then fed into new todos
It's an experiment to find out what everybody is fucking talking about and what the real limits are. Im kinda sick of being perceived as the grumpy old dude that is against latest shit. So far no real surprises though. It's just like working with a fast junior dev that has some bright moments every once in a while, while at other times being a total fuckhead. Sry for the swearing this early π
So far my verdict is the real value only unlocks with experienced developers
that being said, the other day one my best devs fucked our prod with claude accidently
And the shit we do at work arent even real problems. That's why I chose the most crazy hard thing I know I would have never even thought about starting, a new haxe target
I see, great idea for an experiment.
I did such an experiment aswell a few days ago and I think my experience was about the same.
It lacks any real ability to weigh options and/or look ahead to design something actually robust
It does the given goal and that is all it will care about. (logically)
Within the goal it can make things work quite well but it doesn't have any real understanding of what it does.
Give it any topic where there is any kind of relation between concepts which is required for the problem and it fails immediately.
I think the best use is still data transformation (writing docs, porting code, etc) or at most topics which are generally simpler (web)
Another thing I noticed: it's super shitty when it comes to cleanliness. it accumulates code without end but rarely gets rid off something that is no longer needed
Ah yes, indeed.
Yeah I really don't understand the whole "LLMs will replace programmers" thing people are saying all the time.
It's quite trained on don't-break-stuff, even when it should break stuff.
Not removing legacy code, marking stuff as deprecated instead of cleaning it up.
And also it won't remove something it doesn't know about, which considering it gets blurry when it's over 100k tokens in the context, that's not much you can fit there.
It pays to have system prompt for the "do break stuff" issue, and doing cleanup passes/reviews on every clean step.
not sure I regret debugging/profiling this...
How far will you take this hahaha
how the fuck are you supposed to profile things with like 100k fibers?!
Im not sure I understand tracy's fiber ui π
the UX is horrible
absolute highlight so far
So if I understand correctly, your FIB target is some sort of fork of HXCPP target, so you started from a plain copy of HXCPP?
Or is it done totally from scratch?
from scratch, hxcpp as reference for lookups. I had it start as a C, not a C++ target
yay my print arrived
:O
If you're ever in an arcade you should try Initial D racing
especially the semi-older versions
one of my fav arcade racing games
ah looks nice
there is also this https://tothdanielgames.itch.io/initial-d-touge-spirit
looks like I have the first win
hxcpp release on my machine takes 950ms
fiberus release takes 460ms
hxcpp using multithreaded immix vs. fiberus all singlethreaded (incl immix)...
hxcpp 1.5mb binary vs. fiberus 80kb binary
granted we dont include any third party libs
good time to learn then
yeah, you have to sorta read it backwards
oh no are you vibe coding? π
Yeah, Im pairing with claude on a new haxe target
will you submit it as a pr or keep it as a fork
im profiling the scheduler while he is fixing nullable types
I literally have no idea. It's a stupid idea but it seems to be working very well. so lets see
It's a runtime that could work very well for gamedev
and also for systemlevel stuff like webservers etc
you say it's a new target, but it's also built into hxcpp - what does that mean?
nonono
It's a 100% new C-target with C++ support running on Fibers/Coroutines at the root level. It rips off the build system from hxcpp and took immix but everything was adapted around this new runtime
here is what the output looks like
it's 2 parts, this runtime (like hxcpp) and then a new generator in the haxe-compiler
oh i see
any reason why not do it via reflaxe? that way keep things at least parsable and easier to use in the haxe ecosystem
I figured the haxe compiler comes with more context about ocaml and concept style given there are already all the other generators. With reflaxe I literally have no idea if there is something production ready available
That totally makes sense
yeah, there's a fair few examples for reflaxe, but i don't think there's anything production ready per say
Regarding this, that makes me wonder, now that we have llms, if it would make sense to try to make a new C# target directly in ocaml using other targets as reference (and could be a start to grasp some concepts of ocaml), but just thinking out loud
I'm not sure tbh, simon seems pretty onboard about reflaxe. I guess the main thing to think about would be, what benefit does ocaml have over reflaxe method
Once it is a bit more powerful, I release it as a fork, then we can discuss and see whether there is any interest
Much faster, and as dazKind said, there is a track record of all existing targets
faster in what sense?
You export from haxe itself, orders of magnitude faster than doing all the work from a macro
ah yeah
fuck macros
π
feasible, possible
i wonder how claude would fair with your reflaxe.c# setup
To be fair, those discussions happened a while ago, and a lot happened since then regarding tooling to help make a new target a reality
Iβll probably try when Iβm in need of a silly pet project π
go for it. shoot as high as you can
Harder, because less reference code. The most developed target is reflaxe cpp I believe and itβs far from complete
but just to let you guys know, this here will be the best native haxe target / runtime ever
Im gonna AAA the shit out of this. Our Machinery, naught dog, crytek, I got all their pdfs that claude as my slave will weave into something great
The thing to be aware of with claude or other llms is that if you give it quality input data, youβll get better output, and feeding it all the things that exist in haxe compiler itself and its target is the best you can give, multiple targets with a full test suite, known to work
(dont take me seriously)
Just doing your cool game engine and games with it will be a great achievement already, no need to take over the world mister Cortex
Just relating to this
xD
Are you making this target on haxe5?
@obtuse burrow How does the debugging/profiling workflow for hxcoro look like for you? Im trying tracy with this new target here but it seems like it will spiral out of its UI's control rapidly
Ok while we are at it, lets make hot-reloading also a part of the scope π
I wonder how we could tackle the bi-directional type sharing
basically non-existent, the only thing we've really done is call stack tracking so you do still get reasonable stack traces instead of useless internal machinery items
I want to allow for something like visual studio's task view to be made, but the debug adapter protocol which most haxe debuggers use (and the editors which implement that protocol) all have very primitive debuggers and don't support anything like that, so there's not much point at the moment
would also probably require some per-target work as well
iirc tracy had some sort of "green thread" support but it was an opt-in compile time thing, not sure how complete / usable / stable it is
yeah, thats what i tested with the new fiber-based native target im hacking. see #1234939525931728906 message
past 20 fibers the ux just dies though
Because of you I started spawning Claude into preliminary work for making a new C# target
Adding "Haxe WASM target" to the end of my wish-todo-list.
I recall haxe team was saying it could be considered once the GC from the browser is available from wasm
Nice, well, let me do C# first π€£
Sounds good to me, text first, bytes later.
So what my exploratory insight say for now is that using the JVM target as reference sounds like a pretty good idea for C# target
They are both managed runtimes with strong typing
Now, I recall simon saying one of the big mistakes of previous C# target was to do a direct mapping from haxe to C# code without an intermediate C# ast
So that makes us go even closer to something that maps to the JVM target, where instead of spitting out JVM bytecode, you spit out some AST matching C# semantics, and THEN you print C# from that
(and no, I don't want to directly output IL bytecode, I want actual C# as text)
So: Haxe -> C# semantics AST -> C# text
(reflaxe C# is following that too btw)
Sim: We can't manage all these targets so we're gonna cut off a few
Haxe community: Have 3 new targets
I thought this was funny π
In my, admittedly small and naive, experience with these things, more layers makes stuff easier, but requires coming up with good design for the abstractions.
To be honest, I think I'm really going to try to make that target with Claude, let's see how far I can go
I expect quite far, this is almost perfect task for it, if it fits into context well.
(I do think it's possible to actually make it, but only way to know is to try for real)
The existing targets are really good input data for sure, the test suite as well
I'm taking extra care of gathering info on how the JVM target handles tricky stuff like function types wrapped in Dynamic, with potentially optional parameters. Trying to map to JVM as much as possible basically
Another thing to watch out is AOT compatibility. Reflect.callMethod() and that kind of stuff should work even when using C# AOT compilation (which means limited reflection support)
Anyway I should stop spamming Cortex thread heh π
So far, it's going pretty well
since this is pretty new isnt it better practice to bundle a gc and only use wasmgc if available?
Maybe. But unless you also use SIMD stuff (which is whole different problem with Haxe I guess), if you were to pack in custom GC, JS output might end up being faster.
Basically there's little reason to use WASM instead of JS for Haxe, unless you also use the SOTA GC that comes with it.
Current status
From what I can see, there is a lot to do, but the LLM has a lot of good input data to converge to the right thing: all other haxe targets, C# output from previous C# haxe target, previous C# target codebase... It has access to all those + a workflow to test things. Now it's just iterating
If that ends up working, it will definitely be a much better option than reflaxe/c#, even if that means using a fork of Haxe, at least at first)
i hope this doesn't end up in a forked haxe frenzy π
I wonder if it would be feasible to have the llm extend or build some kind of plugin system for haxe to allow for targets to be added via haxelib
I mean, if I work on a fork, Iβll keep it up to date with main haxe repo, but merging the target wonβt depend on me, so need to account for the scenario when that is not happening
Hopefully they're cool with this
In an ideal world we would have compiler plugins on the OCaml level (perhaps this is a thing?) or atleast some kind of tool that can work as an extension loader
I think there was something like that where you could plug an ocaml plugin to haxe, I think this is a thing, but no idea how complete this is, so I'd rather stick to the "standard way" of making a haxe target to reduce the unknowns
There is a plugin system that exists, it just requires haxe to be recompiled
Which is a bit pointless from the perspective of plugins
Ah indeed
But to be honest, it's ok for me to work on a fork, as long as I can update it from main
We'll see when something is actually working, if it goes well, I'll try it over Ceramic's Unity target. If it works there, that's a very good sign
(goal is to pass all tests anyway)
https://github.com/SomeRanDev/reflaxe.CSharp/blob/development/test/tests/Misc/Main.hx That test file is coming in handy for testing a lot of stuff
I get that its okay, but, if actual targets are an option more accessible - it would be cool to reduce the maintenance burden across the board
maybe i'll float the question to the foundation at some point see what they say
I'm fine with compiling haxe and the target to native code so far
lol, what have I started here
currently super busy at work. will be interesting to see how this all develops π
Looking forward to the weekend, have some insane plans for my fiber target
I just meant for compiler plugins it could be more flexible, but just to add a new target it's probably easier to add it as is rather than implement plugin loading first
making a plugin system so flexible would probably be quite a bit of work
To be fair, I think I would have tried it anyway at some point, but you made me try earlier π
sure π
yeah, depends on how complex the interface would have to be between the plugin and haxe
So, I got this working correctly so far: https://github.com/jeremyfa/haxe/blob/new_csharp/tests/misc/cs/projects/Bootstrap/Main.hx
Haxe - The Cross-Platform Toolkit. Contribute to jeremyfa/haxe development by creating an account on GitHub.
This already covers a lot of stuffs: lambda/closure functions, optional arguments and so on...
@tall veldt At this rate, reflaxe.CSharp might just not be needed anymore (I still like reflaxe very much though, been great to use it for my shader language)
The most annoying thing with Claude is that sometimes if it fails to make soemthing work it's going like "ok let's just skip the thing, it's actually acceptable" bullshit
Pragmatically removing half the requirements. π
that looks really cool! I wonder if simn would be ok with a c# implementation that doesn't rely on gencommon anymore π
We'll see, but now that I have started, I'm really trying to make it fully work and be useful. Using mostly the JVM target as reference, so hopefully that won't be too bad
2 pretty interesting developments regardless
yes, i wonder if you can get it to solve some of the unsolved c# issues that got purged
Everything is versioned and I'm watching out for this, just need to redirect it to the right path when it's trying to do that silly thing
cool test! I might lift that
I started yesterday evening, and the progress is massive. The fact that there are all the other targets working (especially the jvm one) is very effective. Would have been orders of magnitude harder without the other targets (which makes me think doing the same thing with reflaxe would have been harder in itself because less quality input data to make it)
hxcpp 's release optimizations for the gc are really crazy
learning soo much about the internals of hxcpp now
omg, I just converted inline into a literal weapon in fiberus. added escape detection for simple classes so they can get moved onto the stack. if you inline right the gains are insane. this will be great for all these insane math things where you would normally use a pool
I think the multithreaded scheduler is next
Adding multithreading will kick some serious ass when it comes to GC. Since all fibers will yield to their safepoints bc of STW, all the pinned worker-threads dont need to go idle during GC but can just switch gears and do parallel marking!
why can't fibers be integrated into/built on top of hxcpp?
is it just better integrated with the standard library?
standard lib?
haxes standard library (sys, map, json, string, etc)
not sure what you mean. Im doing user context switching on register level, system-level concept, moving stacks around. hxcpp would have to be heavily modified I guess
Most of Std already works here
Fibers sit on the lowest level of the runtime
there are not just a language concept, but they define the actual native execution flow
if that makes sense, my brain is totally fried
Will the extern syntax and all the @:buildXml stuff still work with your target?
Still working on it
It's got a workflow it can iterate around
Road to all tests passing
It's mostly on its own, unless I see it make it something stupid
But anyway I'll review the output even if all the tests pass, to make sure it didn't take any shortcuts (because it could)
Last stupid thing it did: it exported BytesData type into int array
So after it noticed all tests expecting actual byte data (byte[] in C#) would fail
So it added helpers to convert from int[] to byte[] and the other way around, instead of properly mapping BytesData to byte[] π€ͺ
But that's an interative process. Overall it's converging closer and closer to the right thing
Just needs to be redirected from time to time
buildXml already works. extern will almost work the same, a bit simpler with extras
you will be required to properly mark blocking calls so we can move the calling fibers into a wait list and block gc in the meantime.
I see, I was asking to see how easy/hard it would be to make a hxcpp-compatible codebase compatible with your new target
This morning I had about 1900 unit tests error, now it's down to about 800 something
That's interesting to see it figure things out. It decided by itself to prefix fully qualified type paths with global:: in the generated C# to prevent collisions with built-in C# stuff or similar. (it turns out, haxe 4's C# target does that too, but I didn't direct claud to do that specifically, it just converged into that same conclusion)
Anyway I'll go through making all tests pass, then I'll look more closely at the output, I'm sure there will be a lot of small optimisations to do to avoid boxing primitive values (legacy C# target did have stuff for that too), because if you cast an int to C# object, you are boxing it, and man this can add up to a huge performance loss
should be pretty easy but we will see
things are slowly coming to life. 1000 fibers spread across 13 threads π
π
rough current state of the target: https://gist.github.com/dazKind/3e001fdee728ba611d466d1e1794a4f9
OMG!
I just realized something
Instead of stopping the world and syncing all threads for GC, what if ...
hahahhaahhaahaa
if that shit works Im gonna scream
Im gonna take the idea of Tagged Heap + Per-Thread Block Allocator by Naughty Dog and combine it with Immix
I think Im gonna call the GC "Fibrix" π
@grizzled laurel so, I had a race-condition in the gc. that with the new super-block logic.. well it made things fast. 2secs for 100k fibers in 4 threads π
Oh my god
That looks... Really good
Also, am I seeing this right but is the GC done in sub-milliseconds?
Wait, I am seeing it wrong
Whooops
Im still tweaking things before measure for real. I wanna get rid off all OS Syncs where possible
Still though, it looks really good π
That'd make it even better yes
But this is really nice!
we need more speed
yes
It's the little things in life
That's https://benchs.haxe.org/alloc/index.html again
You should check out hyperfine, basically a super fancy time replacement
humans can replace time? π
hahaha
ok, let's try one more thing: Stop-The-World collections should be kept to a minimum. So we need to make this generational. If we consider the per-thread superblocks as thread-local young generation nurseries, while keeping a shared mature space, we can make most collections thread-local and non-STW while full STW periods in the mature space should become much shorter.
If this works then most object pooling shit (like oimo vectors) would become unnecessary π
you got to be fucking kidding me
Ono
it fucking works
ok, this is soo endgame
ok, this is just fucking insane. The implementation took claude barely an hour, the questions, feedback and debugging where intense as fuck, but this is exactly the same amount of time that it took me to write the whole design specs and shit....
My cycle of Creativity/Design & Grind/Implementation has been officially broken.
the new multithreaded scheduler took a lot more effort and manual fixes earlier
Now with all this shit in place I think we are in a good spot:
Nice & idiomatic Haxe, built-in fiber/coroutine support(we could add await/async via macros even), spread across as many workers as you want, and no manual fucking around with memory or threads and potentially insane native performance(to be proven)... Im almost in heaven
now we can focus the easy parts and add support for all the language features and enter the unit-test race π
wait a sec, i think you can pretty much kill yourself with mutable states across yielding fibers... ok, lets see π
Exciting!
I feel like pattern matching in ocaml and unit test iteration is a kind of task that LLMs are pretty good with, the expression of intent -> actual result loop is very quick
Like, I just had a different idea on how to handle closure args in the C# target, and few minutes after, it was done, it's wild
I remember questioning the long term viability of Haxe for me back when C# was already mentioned as a target that would be removed (as I planned to use it extensively and risked being stuck in haxe 4). I knew that with enough time and effort you could simply make things happen yourself (make your own C# target, for example), but I have been wondering if the time and effort you'd need to spend on housekeeping haxe toolchain and all the related stuff would be worth it compared to a more "mainstream" solution. Now, with the rise of those new LLM-based tools, the time and effort required is objectively much lower for this. I mean, it's very possible that I have a fully working C# target by the end of this week, and I started less than a week ago!
There is nothing very magical in the generators, it's pattern matching and spitting out the right output from the right pattern in OCaml, but as there are plenty of edge cases, it's definitely tedious work doing that by hand on a whole target (although it's probably interesting in itself, but time),so having a tool that can walk through this in a near automated way is definitely useful. Even afterwards, if you need to do housekeeping on the target because new features come up in haxe compiler: it's so much more straightforward with nowadays tools.
That being said, it wouldn't be that bad if the LLM was used only for the initial big iteration on making the target 1:1 with current haxe features. It's not like haxe is changing that quickly. Anyway, really curious to try the result of all this soon
yeah. lots of thing that were a grind or impossible can be done now. Like my personal capability horizon has been increased expotentially.
Some things stay hard though. Watching claude debug multithreading bugs is really hilarious and wasteful
Ah ah for sure, it's far from perfect, but a net positive for me
is valgrinding
It's true that the second next target I would be most interested in after C# is Wasm
Being able to spit out wasm directly from Haxe would be as cool as exporting SWFs back in the Flash age
1.000.000 fibers across 10 threads take 10 secs. I like these numbers
just realized the main thread was just idling... π
now it is actually participating
anyway, today I lost a lot of respect for claude
the initial high is over
past a certain complexity and context size it gets stuck in very long debugging loops
in the end I had to go in and debug things and find the cause of a race-condition. turns out it was soooooo far away and getting confused, it missed a simple thing
For me Opus today behaved incredibly bad. Re-reading files, doing things I didn't ask, missing super obvious logic, implementing things in complex elaborate way instead of much simpler and sort of obvious approach.
Hard to say how much of that is real, but it does seem like they are trying stuff on the background and model quality can be arbitrary over the days.
It's not a silver bullet, sometimes it does great work, sometimes it will just go circles indefinitely not finding a solution
Sometimes it is also polluted from what it did before (if you just keep feeding the same conversation), and starting fresh can unblock it
Well, regardless, that means we still need to care about what we are doing, which is probably not a bad thing in itself
My output C# is looking more and more like a... ahem... haxe output thing xD
You try to keep the code pretty until you hit all the edge cases from the haxe unit tests
hxcpp core
cool
67 compile errors remaining before the unit tests can ACTUALLY run
(it used to be thousands of errors)
The last errors are obviously the hardest to solve
But I think now it's just a matter of time until it works
they throttle to get some money i bet ;D
Im super happy about this. After some hands-on with the pretty nice C code and a debugger the 2 level-allocators are working now like the should
Ahhh that hits like home:
and this is one of the cleaner outputs :P
How do you handle the closures btw in go output? It's such a mess to get right, especially with optional arguments
We don't yet, but, we do have a few ways on how we'd like to handle it.
I thought I'd take advantage of C# overload at first, but at the end, it doesn't cover all cases so you need, ANYWAY, to come up with a solution that works on all situations, so I focused on that "all situations" solution, which should actually be pretty fast I believe
The fact that you can create structs on the stack in C# is very useful, and basically saving the day to avoid boxing allocation hell
Indeed!
In go things are also a bit interesting...
Luckily it is very flexible in some regards, sadly not so much in others....
Can you create strucs on the stack? (or an equivalent?) (I'd expect you can with such language)
C# is hard with Haxe on generics
Haxe is pretty loose regarding generics
But C# isn't
But at the same time, you don't want to export generic-free C# code, because you lose a lot of opportunities to get optimised code (from C# compilers), so you try to make them cope together, and at last resort you use dynamic access when nothing else work
that depends on the lifetime, to my knowledge.
if they can escape their scope
I was asking, specifically for go
So I suppose, this will play really well with ECS stuff right?
if you believe grok, this will play well with many things
Like, let's say, Ceramic, it's a mostly single-threaded framework
so I think it would mostly benefit from your improved GC
But not much from this fiber thing
But these are just assumptions of course
Not having to pool objects because the GC can handle plenty of little objects sounds like a great improvement though
ceramic would start in a fiber on the mainthread
But any async function could automatically load balance to some other worker
Yeah, Ceramic would need drastic design changes to take advantage of this
Like, apart from IO, there is not much that is running in background
atm I just offer spawing & yielding, gonna waits as well
I just depends on how you iterate your stuff
Cool stuff anyway, really curious to see where your target will end up!
Yes, it depends on the lifetime of the struct. It will only be put on the heap if required.
in cortex I already know I can change all the graph traversals from an iterative loop to a recursive fiber spawn
I think, regarding Ceramic internals, the thing that could benefit from fibers is the visual sorting system, which is entirely sequential for now
But again, before that, I would need to change the data layout, which is definitely not intended to be access from multiple threads at the moment
I'd like to address this at some point, although it's really not a requirement, I don't really have a bottleneck here for 2d stuff
Makes sense. In C#, I can probably optimise even a bit more, by skipping the copy of the struct when passing it as argument for my closure system. You can pass as a readonly reference, while staying on the stack (lifetime is "just passing data for the function call")
Down to 57 errors remaining β²οΈ
So now I'm thinking, reflaxe is great for smaller-scope language targets. It was great for my shader transpiler using haxe as source
But for full featured haxe target, the whole setup on the compiler is hard to beat.
gonna rewrite ceramic on ocaml now? 
No, I think if I try another target later it will be wasm
For system platforms, there is C++/HXCPP
what draws you to wasm?
C# unlocks more exotic things like Unity (but also Godot and other C# tech)
It allows to optimise further than js output, and is generally more predictable
And I'm also just curious about it to be honest
Regarding predictability, I'm talking about how js runtimes work with their JIT compilers: it needs some warmup to detect hot paths and jit-compile them. But also sometimes you have bad surprises where it decides not to optimise some method because some reason
At least with wasm you know a bit more what you are running
Just want to explore this stuff
ah k
And being able to do that by using haxe would be really cool
some software uses wasm for extensions (zed is the main one I know), could be cool to use haxe there
So far one of the most mainstream language doing that is Kotlin
Exactly
Anyway, I think wasm target for haxe would be a good addition in general. It could be much faster iterations than going through, say C++ -> emscripten -> wasm
rust or go?
i think c# has wasm support too
maybe i misunderstood what you said
From what I can see, ways to export C# to wasm are experimental
And there is wasm, and wasm-gc
I'm talking about Wasm-GC export specifically
ahhh
(which is what haxe should export to)
im guessing go compiles their runtime into wasm
I don't know much about go, but I'm sure MKI and PXShadow know better π
Wasm as a first class citizen would be cool, but nothing necessary for me for sure
C# target is linked to a more precise need. It ensures Haxe 5 will still work on all Ceramic targets
43 errors 
xD
my current state https://gist.github.com/dazKind/3e001fdee728ba611d466d1e1794a4f9
still a long way to go. at least the core is stable now
single & multithreaded modes work
Fiberus is a technically impressive project with a solid foundation, this is definitely LLM-sounding xD xD xD
ofc it is!
It can't help adding that kind of sentence, like everything is a marketing pitch or a self help document
Now I'm a bit scared by how far away from the correct behaviour will the "code that compiles correctly" be
?
Like, right now it's only fixing the compile errors of all the haxe code that comes from the unit tests
So I'm now executing those tests yet
ah I see
That said, I have the bootstrap project that does compile and run as expected, so it's a good sign
yeah, first 80% done, now the other 80% await you
Maybe it won't be that bad, we'll see π
28 π₯
I think Haxe->WASM-GC target would be of great value. And a very good health sign for Haxe itself. Then again I would rather if haxe/language server was rock solid.
Re fibers: My architecture I use mainly for multiplayer games (but singleplayer too for the same reasons), doesn't really allow generic multithreading at all.
Things are divided into black boxes and those must seem as synchronous input-output machines. Userland code can't use threading otherwise it introduces non-determinism that kills every benefit the architecture brings (namely post mortem debugging/replays).
That said, it can have multithreading, but it has to be done at architecture level and e.g. gameplay code can't see anything as multithreaded.
cant haxe already do that via HL/C?
just replace the GC interface with wasmgc
- Not sure if HL/Hxcpp would be faster than JS is at this point, I assume no unless you use custom tricks.
- I expect that replacing GC with wasmgc isn't as trivial as you might think. It's not a standalone GC you can plug into things, it's part of wasm, so it would need mapping to wasm runtime.
hmmm
That would also mean going through c compilation in order to get wasm: what would be better is to directly output wasm from haxe
So now unit tests compile, but I still have a lot of warnings: unreachable code and variables that are not used. Gotta clean this up before going further
Current mood
ah yes double nested null
At first I was going to the wrond direction, trying to address double nested nulls pretty much everywhere in gencs.ml
But I ended up doing the same as previous haxe4 C# target: it transforms the haxe AST BEFORE using it to generate C#, which is a much simpler way to manage this
(although it's still full of edge cases)
I would wish generics in haxe would be stricter, because, man, that makes things much trickier in the C# output
Stricter how?
MyType<Dynamic> is valid in haxe, and is compatible with MyType<Int> or MyType<SomeClass>, whereas in C# you cannot easily cast between those. But anyway I'm just complaining because I'm working on the C# output :D, I need to make it work anyway
The difficulty is to try to cope with that without relying too much on dynamic access, but it seems unavoidable in some cases
Oh yeah, that is a weird thing about dynamic in haxe
That being said, this looseness can be convenient when you write your haxe code :D, but it's hell when you transpile that to C# xD
It's just one of the ways in which dynamic is a dangerous type unfortunately
Now all (non sys) unit tests do compile btw in my new C# target, I'm tackling the runtime errors now (related to casts heh)
Yeah, I'm just avoiding it as much as I can in Ceramic anyway
Do you box for MyType<Int>?
I'm pretty happy with what I came up with the closure system. All functions (closures, functions referencing instance methods) inherit from haxe.lang.Function and can be called without any boxing/unboxing on the heap
Normally, no but I'm not done yet, so can't say for sure
The boxing should happen when using Dynamic however, can't do otherwise
it becomes MyType<object> in C#
I think on most targets MyType<Int> results in boxing, unless @:generic is used
MyType<Dynamic> -> MyType<object>, but MyType<Int> -> MyType<int>
because you need to be able to pass it into function test<T>(a:MyType<T>) for example
hxcpp also has a complex system for dealing with Array<Dynamic> <-> Array<Int> variance
Yeah, we'll see, maybe I'll have to box generics unless using @:generic indeed
But it CAN'T be a thing to do boxing on, say, Array<Int> right?
Yeah boxing for Array<Int> would be pretty bad
The haxe documentation seems to make it clear that non-@:generic type parameters are type erased, so therefore they have to be boxed
https://haxe.org/manual/type-system-generic.html
Here is an issue about arrays specifically: https://github.com/HaxeFoundation/haxe/issues/4872
Ah yes that's useful thanks!
so looks like maybe this was never resolved for the legacy c# target. Maybe nicolas' or hugh's design might be useful
I will try that, the "upgrade array" logic from hugh seems spot on
This is also an old c# issue: https://github.com/HaxeFoundation/haxe/issues/6397
well, the other one seems more useful
Yeah, it seems to me that the way to go is to use object for generics from haxe classes. Except for arrays which have their own custom implementation that doesn't need boxing (unless upgrading to Dynamic)
There is a feature suggestion somewhere to have something like @:genericPrimitivesOnly, which would be useful for general types
Sounds like a nice thing yeah
btw, here is a list of c# issues that got closed when the target was removed: https://github.com/HaxeFoundation/haxe/issues?q=is%3Aissue state%3Aclosed closed%3A2024-02-06 (c%23 OR cs)
Thanks! For now I'll stick to making the test suite work, and we'll see if there is anything relevant in those issues
I think the boxing on generics by default is fair, there is @:generic from haxe already for optimized path, and in a lot of situation we need generic for actual class instances, not value types
As long as the arrays do not need boxing, I'm happy with that
I see this one once a year, always happy I could contribute to the main lib in one of the linked issues
Ok, since it looks like I have a solid solution for raw processing via fibers now, I was wondering about solving anything IO in a similiar way. Turns out io_uring or epoll+thread pool could be the answer. And the fun part: The synchronous std lib APIs like File.read, Socket, they'd all appear blocking (the interface doesnt have to change) but internally would just yield and later wake their calling fibers, essentially blocking nothing else π
Gonna hack this on the weekend
I started working on this generic hell
Array<T> (in haxe), is becoming Array (non generic), and internally it can switch storage (either int[], double[], bool[] or object[])
That means we can cast from Array<Int> to Array<Dynamic>, and from Array<Dynamic> to Array<Int> (given that the array only has ints)
I'm also erasing generics on all output C# classes as well (and simply use object and cast on access), which will also solve casting from different generic types (and work as long as you used compatible types)
The added cost is all the casts on access, but to be honest, that's probably fairly negligible (casting object to SomeType) as long as there is no boxing/unboxing involved, given how that simplifies the code generation
If there is some hot path in the code that needs to be optimized, there are options anyway: @:generic, native arrays...
(The haxe 4 C# target has a -D erase_generics define that does something similar. In my case, I'll make that the default)
So at the beginning, I was more letting Claude Code "freely" implement the generation from the plan, guidance and reference data I gave it. Once the current plan was accepted, I put it in "edit automatically" mode.
Now that I am further in the implementation, details need to be very precise to handle the generics and that kind of thing. I am now reviewing every change it does one by one, asking more details about what it tries to do when I'm not sure about what it is trying to do. Basically moving to slower iteration but also more precise. Every edit has to make perfect sense at this stage, so that it can converge to a sane implementation.
Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
For each frame our pipeline constructs a scene graph with React then
-> layouts elements
-> rasterizes them to a 2d screen
-> diffs that against the
This has to be rage bait
like what the actual fuck... π€£
we need wisdom-tui now
Someone gives that guy a link to casey muratori's refterm RIGHT NOW
Wait WHAT
I read too quickly
I've read react so I though he was talking about the web UI in, like vscode extension of Claude Code
But that's react for rendering on the terminal ???
yeah, kinda similar to how react-native works
react on its own doesn't have a renderer afaik
Im pretty sure they retro-fitted the vscode extension chat interface into the terminal
Maybe
I have to admit I have come to like the design and planning aspects using claude code and other LLMs
it has become soo much easier to discover and explore weird nooks and crannies
Itβs also pretty useful to extract information from existing code base to help you make decisions
yeah, that is the basic case
I really feel like Dr Frankenstein, im researching and designing stuff that only makes sense in my brain
and claude code is my Igor
ok, non-bocking fileio is working
next are sockets
Is that something from haxe5 std or just your own stuff?
This is fiberus specific. in haxe-code you just use the std functions(as if they were blocking) and the whole fiber architecture makes it async/non-blocking under the hood
so it's like if all your code used async/await? :o
I see, it's "blocking" for the fiber, but not blocking the thread
yep. in fact, there is not even a need for async/await concepts. although await maybe, in cases where you wanna sync fibers but I already have a super performant counter-based suspension system in place
lol, was hunting some weird timing gap in my tests
turned out I had put a 10ms sleep in one of the networking tests.
It's still amazing to spawn a server loop in one fiber and have a client loop in a second
Having my first big refactor experience using claude. Interesting and horrible at the same time
refactoring via ai sounds a bit funky
I just refactored one of my rest libraries, I used claude to give ideas on various options, i tried using them in small examples and figured out an amalgamation of different ideas
I really like the API change though
my biggest issue is that the build agent is afraid of complexity
even if I use the plan agent extensively and prep everything
I have to constantly pull it back on track
seems like a deliberate tactic to waste tokens
lol
Same experience. It gets worse when the codebase gets bigger
Also reached point in my workflow where I only get 2 hours intense pairing with claude before hitting the session limit. Im on a Premium Team Account and my weekly limit is at 50% already. They really want my money
I just dont see how this is supposed to scale
sounds like the silicon valley model
so its been a bit more time now, what do you guys actually enjoy about claude code/AI stuff? Like, to me it kind of sounds like you're no longer programming but more of a manager
I'm still quite reluctant on diving too deep on the usage of it, it seems like a too good to be true situation and there's some rug pull situation probably gonna occur
I guess it's like the 'I have staff concept'
The Team seats are not a good deal. It's basically a way to get more money from businesses. If you want more tokens, the best deal is Claude Max 5x or 20x
You are writing much less code yourself, but you still do the work of architecting your software. When it works well it's very cool, when it doesn't, well you go back to more micromanaging.
Overall, for soing something like the C# target for Haxe, it's a massive gain of time. The scope of things to handle is quite large, and Claude is really helping on the iterations, even if that's definitely not perfect
hmm, I think I will try that tomorrow
today I refactored a shitton in the codegen. Also went for an C-AST before emitting code
Good call
All major parts in the runtime are functional, now it's time to widen the language support
On my side I have 97 runtime errors and 1125 test failures to address
Tackling every item one by one
And also did a lot of refactoring today and yesterday
Hunted redundancies, tried to make the whole thing cleaner
sounds like fun
Im debating with myself if I should keep the dynamic worker creation
atm the app start is maybe a bit too advanced. It's super quick on the mainthread and you can call Fiber.createWorkers(4); at any point to create new worker threads. creating threads is expensive though and takes some time till they finish registration. during this time the scheduler will basically run in singlethreaded mode and serve all fibers till the workers are ready and join the fray
vibefiber
Cool stuff!
load testing has become super interesting. Seeing some nice numbers but also some issues in GC. also im slowly digging coding in C again. claude becomes more an more useless as a project grows
Both Sonnet and Opus have a tendency to duplicate code even if there is a helper available somewhere already. The main difference is that the threshold where Opus does that is higher than with Sonnet
But now that my ocaml C# generator is getting big, it constantly makes duplicate code
I guess that's the current limit of today
I had the same happen to me on my markdowns from 2 months ago
It would need to know about the helper in the first place. Or have instructions to always look for them.
test
test
Even when you tell it, it keeps making the mistake
Only thing that works now is to let it make the thing work, and then explicitly ask it to cleanup redundancy in a second step
That's a good way to do things yeah, if I had tokens spared I would always let it spec it out, review the spec, implement it, review the implementation.
Yeah that consumes a lot of tokens for sure. Moved to Claude Max 20x exceptionally for one month, to do this C# thing
yeah, it can spiral out of control quickly
Workflow-wise I switched to opencode cli and run opus, codex and gemini now with grok as edgelord on the side
Do you get better results?
the models behave quite differently but it helps my evaluations quite a lot.
claude is like an adderall junky, all over the place, codex is a slow, deliberate smartass, gemini is the zero bullshit pragmatist and grok, well, is grok
Looks like you got a whole team working for you now π
Im pushing shit to the limits
Beware of politics between the LLMs now
I'll stick to claude code, getting closer and closer to the full test suite passing now
98% of unit tests are successful now
(of course the last tests are the trickiest)
yep. I have that still in front of me. Mostly focusing on the runtime at the moment
Makes sense
concerning the huge amount of tests my idea is to divide and conquer. Im gonna try to cluster the tests into language features instead of issue-numbers and then go feature by feature by splitting up the testsuite
Yeah, to be honest I didn't care much about categorizing tests, mostly iterated from the most impactful fixes over and over again, making the generation converge more and more to the right thing, making sure my bootstrap tests keep working while increasing the number of unit tests passing
But wasnt that very token and regression prone? Did you have to manually intervene a lot?
enums and reflection is a funny one
I'm doing iterations where I'm planning for "a single fix" that may impact "multiple tests"
Anything I catch from the plan I do it
Then I check the result and ask questions as needed, or tell precisely if something is wrong
If that's ok, good, can move to a new iteration
Yeah that consumes a lot of tokens likely
But I do have them for one month, so it's fine
And everytime it is in plan mode for the next iteration, I ask it to review the current status, it runs the unit tests, takes a decision on what to tackle next (or at least suggests some options to address)
That works pretty well so far, but I need to watch out for redundancy and duplicate helpers, and for stupid decisions (like relying on C# reflection when I'm asking EXPLICITLY not to do it (: )
yeah, claude takes a lot of shortcuts in build mode
wow, finally found the issue with my superblock explosions
and now I know why hxcpp has a big object heap π
I was allocating a bytearray in the nursery(max 4KB per object) and it was completely screwing the allocation logic
now a bit more tweaking of the generational remembered-sets and then hopefully shit wont explode anymore
10134 successes / 104 failures / 34 errors
still pretty good
Each new iteration solves 1 to 4 test cases at a time now. That's slow, but it works
(and I can review every change to make sure it's fixed correctly)
Too slow. Time to spawn one agent per one failing test in worktrees on some high CPU core machine. π
Well, it's ok for me because it's still faster than without it, and it allows me not to focus 100% of my attention to it
I need to reach the "all tests passing". Then after that I'll have to see what I do regarding C#/.NET externs. I definitely don't want to go the dll extraction route as it was done in the previous target
Because from experience this is breaking easily
And dlls don't provide all informations: documentation is missing, for example
So I'd rather generate actual extern files, like what is done on other targets
Expose something like .NET Standard 2.1 API (which is a common denominator of all fairly recent .NET versions)
And probably make a script that generate those from official microsoft data (I think all the data needed, with documentation, exists as xml or json)
10189 successes, 80 failures, 27 errors
It's GETTING CLOSER 
Keep pushing!
10217 successes, 58 failures, 26 errors
fun exercise for today: instead of just generating code, can we actually get rid of most of the code?
My guess will be a hard no, but interested to hear the results
we one more important insight: claude is especially bad at iterating/continuously developing codebases. It abandons features half the way and leaves parallel structures in place
it's simple biasing. it sees the old code and considers it a good implementation and suddenly he decides to keep that shit instead of moving it to the new structure
I started to have it go through code to be refactored and change all the comments to a negative tone and mentioning the new better way
all the code and comments are essentially a prompt injections that your current session has to compete with
"I'll keep all versions of the code I half generated to preserve retro-compatibility"
"I have found the issue!"
"ok, now I confused myself. let's start from the top"
"let's just skip this for now and do something totally different nobody asked for"
"This crash is unrelated to the fix we just did. we have successfully implemented the fix"
xD
"I finished the task, half of the tests are now broken, but that's acceptable given that now we can do X"
When you spend too much time with Claude, you start to think of it as a person with its own twisted character and ways of behaving
Question
Whats ur system specs
macbook pro M4, but that doesn't make much of a difference for claude stuff?
You should try local hosting an llm
I wonder how well would this kind of loop work:
- Let LLM identify encapsulated part of code and define its contract.
- Let it task another agent to come up with implementation of that contract.
- Let them talk it out, which one is better and why.
This would require smart enough model to find and extract the encapsulated part well (it doesn't have to be a "class", it could be a whole layer of a system). It would also need to capture the relationships and requirements, and any solid available helper code (libraries level stuff), so they both work on the same basis.
anthropic ran a multi week experiment. with coders, leaders and a judge and had them build a browser. It was a complete shitshow that cost millions
I vaguely remember something like that yeah.
This might not work out always, but it could be interesting to see. Assuming solid tests and some basic readability vs loc metric, it's just one more thing to do with the code, once the tokens are cheap.
I already played with that, but thatβs nothing to compare with a remote llm like claude in terms of power
Claude has been pretty ok in my latest iterations
Kimi K 2.5 apparently is closing the edge a bit, but you would need like 1 or two of those 512GB macs to run it.
oh boy, we are slowly getting there
=== Fibrix GC Statistics ===
--- Heap ---
Current size: 52428800 bytes (50.00 MB)
Peak size: 85983232 bytes (82.00 MB)
Live objects: 401878
--- Memory Blocks ---
Superblocks: 25 total, 31 in use, 10 free
Blocks: 0 in use / 1600 total (64 per superblock)
Block hash table: 1600 / 262144 entries (0.6% full)
--- Allocation ---
Total allocations: 401888 (69.30 MB)
Fast path: 401877 (100.0%)
Slow path: 2392
Large objects: 10
--- Thread Caches ---
Active caches: 11
Superblock ops: 67 acquires, 52 releases, 16 destroyed
Cache efficiency: 8 hits, 29 misses (21.6% hit rate)
--- Major Collections ---
Total collections: 1
Last collection: marked 37 objects in 73 us, swept in 1.97 ms
Total GC time: 3.32 ms
Objects swept: 10
Bytes freed: 86159104 (82.17 MB)
--- Minor Collections (Generational) ---
Total collections: 5
Objects evacuated: 208 (1.32 MB promoted to mature)
Last minor GC: 1.79 ms
Total minor time: 24.32 ms
--- Write Barrier ---
Barriers triggered: 0
Remembered set: 0 scanned, 0 added (37 cross-thread)
===========================```
Yeah thatβs a no go
It adds up
Donβt forget to make your own computer, and your own discord as well π
sometime later, don't have enough money to purchase a silicon mine yet
nice! did the same last year. though was a bit disappointed in the end
dont be stupid. learning is good
Whats this π
Sure, I think my reaction is more related to Cobalt talking every few days about wanting to do something and then changing and changing again all the time. Exploring and experimenting various things is great, but not sure you can really learn that much without just a little bit more focus. Go for the llm training of course, not saying the contrary of that
ah I see
Regarding local llms, I have tried about six month ago to run some locally on an M1 macbook pro with 32Gb or RAM. I wanted to know how far you can push them on a reasonably modern laptop that a lot of people could have, what kind of feature it could allow to add, locally, on some software
Results were not so bad, but it's nothing to compare with connecting an API to a remote LLM
oh boy. getting closer!
not sure where it got the STW minor from. Guess I have to double check the numbers or setup the TechEmpower benches myself
There is still an issue with the superblocks at higher workers&request counts. lets see if we can solve that
I just had the claude session from hell. it completely forgot shit and made bogus assumptions and then hunted bugs he introduced declaring them not his fault
none of the usual tricks to keep him on point helped...
i look at my other projects and lose motivation to work instantly to i just end up switching a lot
i mean, duh
a Mac isn't a whole datacenter-
That being said, Nemotron 3 Nano is pretty damn good-
responses aren't NEARLY as quick but its free so
even if tool calls are a bit... iffy
Are you sure you're just not switching because you "don't know what's next"?
Because if you don't define that, or its "too loose" (you'll have to figure out that definition) otherwise that's what's gonna happen
i am sure
i have things to do
e.g. continue the project parser for beacon
or finish the bytecode parser for luao
finally a proper sawtooth
this is the http 10s 10k rps run on 21 threads (main + 20 workers)
β¨```m
Running load test: 10000 req/s for 10s...
Requests [total, rate, throughput] 100000, 10000.21, 9999.45
Duration [total, attack, wait] 10.001s, 10s, 754.514Β΅s
Latencies [min, mean, 50, 90, 95, 99, max] 87.97Β΅s, 415.256Β΅s, 234.103Β΅s, 683.071Β΅s, 934.439Β΅s, 2.948ms, 42.542ms
Bytes In [total, mean] 144806869, 1448.07
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:100000
Error Set:
hmm, i'm not sure you get what i mean - are you keeping it "in your head"?
i really need to look into whatever the fuck these fibre things are
because I have little context for daz's upcoming target π
You mean genuine high-performance concurrency without the complexity of OS threads or async/await everywhere? π
I have to destroy node.js, jvm and go
This is the best description I know of https://wren.io/concurrency.html
now, why do we need a full custom target for this? no clue
How different is it? because I've literally just started learning about threading, things like mutex's/deque's etc because I'd like to use it on the backend server
do the same concepts apply? what's the difference with fibres
you basically never want to use os threads for networking stuff directly, they are heavy and resource intensive
I have a mental pin of Daz linking some article about fibres in the past but I don't actually have a need to search for it yet so it's just idle in my brain π
It makes sense to redo a lot of the standard library to make sure it plays well with fibers I guess
fibers are a way to do concurrency, like coroutines. crucially they don't do parallelism on their own, you would have a thread pool or something to handle that
it's almost all lock-free. no mutexes etc, I avoid all expensive syscalls
you write normal haxe code, you use sockets as if they were blocking but the runtime makes it non-blocking for you. Instead of sending stuff to threads and/or use async/await, you spawn your functions/closures on a fiber and the magic happens
when your app starts you init the scheduler with some workers and they will steal & loadbalance all your fibers
oh no, its like you're breaking a couple hours of threading investigation
I will break all your brains
i've gone to the low level part, and you're trying to drag me back up it seems lol
nah, just continue. it's useful stuff to learn
On the surface it sounds like, its not too different though
instead of spawning a thread, you spawn a fibre. And the difference between fibres and threads is, fibres handles all the data stuff for you
like the lock state etc
it's a night and day difference. With normal threads and mutexes you ask your OS's scheduler to do work for you.
With Fibers you move active callstacks around yourself
https://gist.github.com/dazKind/163137765d166a98df68f073d52b8d24#file-simplewebserver-hx-L70 this is my webserver test
you only have to make sure you dont have fibers work the same state/data. You still have to be careful about shared/mutable data in your app
obviously
oh that isn't toooo different to what you'd do with a threaded socket setup
just probably less boilerplate
Im using FD + io_uring directly atm. but all that socket stuf in that sample can be hidden by the Std classes. just havent bothered yet. the runtime has to be reliable first before i go and add full language support
Yeah, that's hard to stick to a project for sure, and not always a good thing. A good middle ground is to have one or two projects that you can develop for real, while still allowing you to explore other things
That's the thing: if you need to invest in crazy hardware, then you are probably better off paying a subscribtion to anthropic or similar, especially when your hardware is going to become obsolete so quickly (and now with the ram costs yay)
atm I aim for hxcpp-level performance in normal single-fiber code. And completely and utter boknerzness when you decided to spawn fibers. fibers can spawn and even wait easily on others in order to model dependencies
file & networking uses io_uring on newer kernels and will suspend/resume your fibers once the kernel completes the access for you. on windows that will be handled by IOCP
There is the possibility of registered buffers that Im not using yet, making the datatransfer from files/sockets essentially zero-copy, that should make things even more speedy.
but lets properly control memory first
selfhosting is nice tho, fuck subscriptions
in the future i want to make my own homelab
Sure, always good not to have a hard dependency on subs
But that doesn't mean we should necessarily forbid ourselves from using them, especially if self-hosted alternative is super expensive
Just make sure you have an escape plan and can switch provider easily (another sub or self hosted as alternative)
wait a sec, github pages + wiki are a paid feature now?
they're what
yeah, but I was using pages + wiki in the hxgodot days. I have the cortex-engine org and I planned to add a fiberus one.. without wiki and pages this is bullshit
They ingest all your code for AI and you still have to pay them to give them more content
i think this is misleading, it works fine for me
where?
GitHub Pages is available in public repositories with GitHub Free and GitHub Free for organizations, and in public and private repositories with GitHub Pro, GitHub Team, GitHub Enterprise Cloud, and GitHub Enterprise Server. For more information, see GitHubβs plans.
basically you need to pay for private repos
github ui is confusing sometimes
not really blaming them, it does a lot, but still lol
Question: Name for the target Fiberus or Fibrix
fiberus
oioi, im running into io_uring limits in my loadtests now. no more gc issues!
shits gonna be here: https://github.com/fiberus-hx
at least pretend it isn't complete slop π
embrace the noise!
we are going to slop-heaven
remember this is an experiment. all bets are off for this project. cortex wont be harmed by this for now π
i like human daz better :p
me too. I love writing Haxe too much to outsource it. Part of the reason fiberus exists is that I feel it makes coding idiomatic haxe easier/nicer. But we will see
speaking of AI
im currently retraining my tokenizer with about 3x the data, then i'll grab a few more datasets and start training the actual model
you are going down a bottomless pit π
nuhuh
this is a temporary project
..at least until i get a job and build a homelab with a stupid amount of GPU power
no do it. but I know the feeling of hitting resource limits ever since I trained a neural-net based q2 bot.
training is soo open ended and you dont know what you will get
which also makes it exciting!!
last time training the tokenizer took around ~40 minutes
so i'll be back at like
7pm my time
This is all fascinating if nothing else
Successes: 10531
Failures: 7
Errors: 14
@void condor Did you try any of this: https://z.ai/subscribe
Use GLM models like GLM-4.7 for AI coding in Claude Code, Kilo Code, Cline, OpenCode and more. Plans from $3/monthβfast, reliable code generation and tool use for daily dev work.
I'm curious to try when my claud max 20x will expire
apparently it's really close to Opus 4.5 benchmarks
But much cheaper
GLM 4.7, chinese llm
Might give it a shot at some point. For the moment Im good
Ah ah, you got the whole team yeah
But they are all american, how about having a more international llm team π
π
Successes: 11703
Failures: 0
Errors: 1
The last error is a stupid fail on a badly generated switch statement with no default:
Successes: 11704
Failures: 0
Errors: 0
That doesn't mean I'm done at all though
There are plenty of optimisations to address
And there are a lot of warnings as well, I need to go through all of those and decide what to do
The AOT trim warnings are expected, it's the fallback mecanism for reflection, but all haxe type are properly working without it
are you planning to upstream this target if successful?
That's the goal yes, but whether this gets merged or not is out of my control
For sure I plan to use it when Haxe 5 becomes the default
oho, looks like I found the last memory issues
claude commented out the call to the finalizers π
travelling back in time and reading sometimes gives me the feelz.... https://web.archive.org/web/20070807085133/http://www.gamespot.com/features/totalstory/
Video Games - GameSpot is the world's largest source for PC, PlayStation 2, Xbox, GameCube, PSP, DS, Wii, GBA, PS2, PS3, PlayStation 3, and Xbox 360 video game news, cheats, reviews and more!
Being nostalgic?
So the main unit tests pass, I fixed the warnings (either by fixing the code generation, or suppressing the warning if it can't be avoided)
Now, I need to address the sys coverage
I'll also need to check the actual CI workflows, which are partly failing: https://github.com/jeremyfa/haxe/actions/runs/21609734266/job/62275914147
But anyway, that's great progress so far!
totally
ok a little workflow / debugging tip for claude: it always tends to debug using a shitton of print statements. if you do native debugging, remind the sucker to use gdb to step through code
gaaah
lol, fiberus has better performance vs. hxcpp with anon objects than with classes.... ?!
it's more than a sec faster than hxcpp in the anon mandelbrot benchmark
defo will have to optimize that
What xD
yeh...
i guess i sorta do that
LuaO and the prismcli-beacon-lumina project are my main ones, even if going slow
ig i want to be able to work on my own schedule without pressure
GC is finally behaving π
=== Fibrix GC Statistics ===
--- Heap ---
Current size: 39845888 bytes (38.00 MB)
Peak size: 75497472 bytes (72.00 MB)
Live bytes: 32224 (0.03 MB)
Large object bytes: 0 (0.00 MB)
Live objects: 61596
--- Memory Blocks ---
Superblocks: 19 total, 4 in use, 15 free
Block hash table: 1216 / 262144 entries (0.5% full)
--- Allocation ---
Total allocations: 196739134 (6004.00 MB)
Fast path: 196521600 (99.9%)
Slow path: 217534
Large objects: 0
--- Thread Caches ---
Active caches: 1
Superblock ops: 4999 acquires, 4995 releases, 561 destroyed
Cache efficiency: 1 hits, 4005 misses (0.0% hit rate)
--- Major Collections ---
Total collections: 33
Last collection: marked 1007 objects in 3.23 ms, swept in 1.05 ms
Total GC time: 107.29 ms
Objects swept: 0
Bytes freed: 2333422080 (2225.32 MB)
--- Minor Collections (Generational) ---
Total collections: 993
Objects evacuated: 3479 (0.11 MB promoted to mature)
Last minor GC: 717 us
Total minor time: 613.89 ms
--- Write Barrier ---
Barriers triggered: 0
Remembered set: 0 scanned, 0 added (0 cross-thread)
--- jemalloc (malloc/free) ---
Allocated: 10113072 bytes (9.64 MB)
Resident: 19484672 bytes (18.58 MB)
===========================
although, wait, somethings up
ok, fixed. almost had a heart attack
im gonna take a new approach
my projects come out whenever they come out
im not exactly being paid for this, and life has other things to focus on
oh boi, seems that the scheduler+workstealing scales performance negatively with the count of workers?!
looks like I got a few things backwards!
Claude got it backwards π
I wish. I still have to fix too much shit manually
but there is actually a logical explaination for this behavior. I spawn a million fibers on the worker running the test and ALL others wanna steal from that worker's queue. too much contention
@ionic badger what cpu did you have again? I remember you had all this crazy extra instrumentation in tracy, right?
So now all the unit tests pass and I only have 1 warning, which I'll keep because it is expected (array access with -1). Now I'm iterating on the sys tests
But that means, so far, the haxe language semantics are properly handled by my C# target, which is a big milestone already!
9800x3d
Uuh, the "I eat Haxe for breakfast" CPU.
Indeed, good CPUs compile Haxe and C++ code stupid quickly.
I got the 9950x3D for that exact reason π
Honestly a no-brainer if you compile code frequently
Yeah, I didn't expect the 3D cache to help that much, and went with 9900X.
And it's around the same speed as 7800x3D.
I'm actually curious to see how does this compete with apple sillicon CPUs
The power/heat rate of apple sillicon CPUs is amazing, but didn't really compare raw power
Yeah that would be interesting to compare. Does Haxe run on that arm chip?
Sure, we have native arm haxe binaries for a while now. Before that it was running on Rosetta, which allows to run INTEL binaries on ARM on mac
(both run fine, just that the Rosetta version might be a bit slower and more battery draining)
Looking at benchmark arounds, an M4 Max is very good at single core execution, but 9950x beats it on multicore
That being said, being able to have an M4 Max in a laptop that is almost all the time silent is really amazing
So I guess it depends on the usage too
afaik haxe is pretty singlethreaded, so if those stats are right then compilation could be mildly faster on m4 max
haxe compilation for sure, but when you target C++, you NEED the multi cores π
The haxe compilation part is very fast
worth it
But things started to add up and I ended up needing one hour to export the SDK I'm making (for all the different platforms/targets)
(which involves two computers: one windows PC and one mac)
Recently upgraded the mac to M4 Max
build time divided by two
(which is still half an hour to build everything, but better)
(and of course when I am working on the thing, iterations are much shorter because I only compile for the target I'm testing obviously)
What's the size of the computer holding that beast?
surely bigger than a mac mini
probably could fit in a normal tower if it had adequate cooling... and lots of it
haxe 5 has some multi threading stuff locked behind a flag currently
makes a difference to my compile times for cppia
nice
Just room for more bugs π€£
(joke aside, it's cool if that works)
Again, Claude decided to add a stupid workaround to make a test pass, when in the same file where it puts the workaround, it is EXPLICITLY stated not to do that and instead look at how the code is generated π€ͺ
It's behind a flag because there's a memory leak that hasn't been found yet x)
but my particular projects don't seem to trip the leak so I keep using it
well, if you don't use the language server and just spawn haxe for a single compilation, I guess that works?
The threading is for compilation
I know
The memory leak happens at compilation right? So if you don't use the langage server, once you are done transpiling to your target, the haxe process finishes and any leaked memory is cleared
I don't think anyone knows where the leak comes from
I think you don't understand what I'm saying
You don't need to know where the leak comes from to attest that it happens by running haxe/haxe.exe
sexy
lol I wonder how much of a difference to compilation that would actually have
i don't know
my cpu against a threadripper from a 8 years back shows this:
so I assume a new threadripper is going to kick ass
especially with over 2x the amount of cores
I didnβt know that about the 3d cache, Iβm on Intel and updated 2 years ago so Iβm good
C# target PR submission unlocked https://github.com/HaxeFoundation/haxe/pull/12575
Daz you gotta PR fiberus soon to make the maintainers cry π
YEAH WAIT TRUE
AHAHAHA
if possible generate as much test as possible so the diff exceeds 200k or so
lol. nah, no rushing. I mean I basically invented a new type of GC and the runtime has support a few more things aside from full language support first. AND then it has to survive the https://www.techempower.com/benchmarks. If it doesnt do well enough I delete it.
an official PR only makes sense if it has already proven itself in the real world
turns out bursting a million fibers caused an insane syscall slugfest coupled to a io_uring sleep/wakeup bonanza.
added targeted fiber distribution+inboxes to the threads and then beefed them up with some pools to recycle memory for the fibers
Before we had 4s in singlethreaded mode and 10s(!!) in multithreaded mode with 10 workers. Now after the fixes:
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Wall time | 10.5s | 0.6s | 17x faster |
| User time | 3.9s | 2.4s | 1.6x |
| Sys time | 40s | 0.5s | 80x less kernel time |
| Total syscalls | 7.2M | 1.1M | 6.5x fewer |
This shit is unlocked now and the workers/threads make a huge difference!
w00t, looks like my little webserver arrived in Go's performance territory. still a bit too unstable but very promising
does it have persistent connections?
in this loadtest? nope, I close the fd after every response
You might have a faster web server then because Go keeps them persistent, and it helps a lot on the benchmark results
yeah, you are right, I was too focused on the connection overhead, that I didnt come back to this. hang on, lemme try it
lol
this more than doubled the previous runs
It's doing 50kRPS and it doesnt even sweat. Vegeta is running into port exhaustion errors now
guess I have to run this on a real server to see where the limits of this architecture are. on my laptop I cant really tell
Awesome!
u know u are taking things too far when your linux laptop locks up
updated the gist of the server here: https://gist.github.com/dazKind/163137765d166a98df68f073d52b8d24
It's a marvel that all this stuff works
but you can also defo see that I need to check the sweep time at some point. it would be funny, if I could re-use all the paused threads to actually do marking and sweeping together
This is very interesting
i'm looking up fibres and i'm still a beginner at this stuff
sounds like fibres run on a single thread instead of multiple threads
but they can be run across multiple threads where needed
where a thread can interrupt a resource mid operation a fibre can't (if on the same thread)
and if i understand things correctly, with fibres you can kind of dictate where another fibre can "jump in"
it's really just moving callstacks and their registers around. like changing vinyl records under the needle of execution
That analogy makes me wonder what the downsides are, bc you'd def damage the record if you were changing them live π
You gotta be careful π
Like, me using this vs hxcpp for a game for example
Is it same or better but you have to be more careful how you do things? Or are the applications not comparable
Like, longer term view beyond SimpleWebServer
nah, the internals are stable once they work
Scheduler & GC play a huge role
but for the user, not really a problem. the only thing you have to look out for is when you use fibers you gotta to make sure you pay attention to what they have access to/might modify, similiar to threads. shared memory can be dangerous if you dont care
otherwise you can just compile the same code you wrote for hxcpp, it will run mostly on a single fiber on the mainthread and be a bit slower since GC is not multithreaded like in hxcpp
but the magic unfolds once you structure your app into fibers. Then they will move super cheap between threads and you gain the power of concurrency and parallelism without worrying about managing threadpools and stuff.
e.g. instead of a while-loop in Main you can have the main method at the end call itself again and you get the same effect (since the main function is essentially running as a fiber that can spawn others and itself)
interesting, looks like I burnt through >100mio tokens this week
can that be right? I hate how anthropic hides that shit
It's unbelievable how complex and intricate garbage collection can get
registers, stacks, escape analysis, nursery for youngens, temp roots, mature space across lines in blocks across superblocks in all kinds of caches... all this madness just that we can ignore stack vs. heap and dont have to manually delete our allocations.... WORTH IT
good news is that Im hitting a timing wall now. my mature collection is still singlethreaded and we have to eat that cost no matter how many threads process data. the hope is once that is fixed and all workers take part in sweeping, this should match hxcpp + some across all benches
I really like how easy you can sync/wait on fibers via counters
You can model quite a few different behaviors with them
my fiberus fork of the haxe compiler has already been cloned 24 times in the last 24hours. Really not sure what to think
ai is training on ai generated code
they should remove the stats at this point
I like this a lot
truth be told, I learnt soo much in the last few weeks that I read haxe-code a tad differently now. I start to see how work is done behind the scenes to make basic code&logic work. Libs like hxcpp are a fucking marvel
yeees
ok, the major collections kick ass now
Now I reached the point where I could also tackle the mark phase
But since we closed the gap to hxcpp now I think it is time to cleanup and get the other missing pieces for more language support together, hmm
hmm, the webserver code is effectively a single fiber accepting connections... I wonder what happens if I change this to multiple fibers
yes, that blew the lid off
This is soo easy. You spawn fibers for the accept loops and then they spawn fibers that handle the requests+keep-alive. https://gist.github.com/dazKind/163137765d166a98df68f073d52b8d24#file-simplewebserver-hx-L54
And you get this: 208Β΅s wait time, 1000s of fibers multiplexing on io_uring and a mean latency of 892Β΅s @ 50k requests per second. https://gist.github.com/dazKind/2483162f05da0194fceaac6d73537bd7
this is good boys
hmm, in debug you can still spot the GC. but shit is getting really dense now
ok, im happy for the moment
What in hxcpp makes you say it's fucking marvel? I'm curious π
in terms of complexity around immix and how seemingly little you as user need to care. couple this to scriptable/cppia and its a marvel that all of this "just works" as fast as it does
you guys got any wishes when it comes to dependencies?
Strings use simdutf-8 now, zip will be miniz, mbtls+sqlite+pcre2 like hxcpp, json will be simdjson.
I was planning to swap sqlite for libsql but that would require pulling rust into the build pipeline. Not feeling that
Yeah, build simplicity is definitely a strong criteria of choice
Indeed!
synthetic training data is an actual thing
I gotta let some LLM explain me what fibers really are. I kinda get it, but not certain on some details.
I wonder if it somehow sort of could fit into my single-threaded deterministic architecture stuff.
Fun fact, CLI tools of Ceramic, before moving to HXCPP, were using Node.js + node-fibers
The fibers were useful, not to spawn hundreds of things in the background, but it allowed to write "sequential code" without callback hell, and without async/await. As long as your code was running inside a Fiber, you could call a function "like it is synchronous", but under the hood it isn't. Just a convenience to make the code easier to write
Now the new CLI tools are using HXCPP, and most things are actually synchronous, which is how it should be for those
Then node-fibers became incompatible with newer versions of node, which definitely pushed me to move away from node entirely. Another option could have been to use Haxe coroutines, which would have allowed the same convenience at haxe language level, but well, not there yet
It all is defined by the nature of your workload
I have my tooling NodeJS based, mainly because I find the scripting part of it very useful. It used to be Haxe eval based, but writing the scripts in JS is simply nicer.
It is getting a bit slow for some things, for various reasons. Node cold startup being one of them.
Yep, I don't miss the cold startup
ok, slowly patching myself through language support stuff. switching the codegen to haxe-ast -> c-ast definitely helped with separation of concern and speed of fixes
ok, 690 test passing. gotta patch a few more things and the we can start looking at the haxe unit-tests
It escalated quickly
It's gonna be good π
@ionic badger #haxe message
What DB/Adapter are you using? Any native code involved?
gonna be mysql, probably db core
will defo be native code involved
I still use record macros
π
in any case I will start simple. gonna add the std stuff first and then most likely undust my old postgres extens and use them to come up with the extern mechanism for fiberus
once that is in place I wanna do a quick a idea for a simple webframework
then my path is paved to bring fiberus into testing at work
atm I guess it will take another 1-2 weeks to figure out the unit tests and then I open the repo for feedback
In one week my Claude Max 20x sub will be over, so I'm sucking out all I can from it before it ends
I tackle the --net-lib mess of haxe C# target, and I also reopen Loreline to generate a better test suite
(well, after I'll still have access to a claude account, but less quota than this month)
hehe
Im running 2 accounts atm to have a full week of inference
limits reset on friday, by monday I have maxed out / hit the weekly limit on the first and the rest of the week till next friday I run the other
I might try GLM 4.7 (z.ai) at some point. Curious to see how it performs, it's much cheaper
Yeah the weekly limit is the annoying one tbh
For that kind of stuff it's really powerful
They are test cases for loreline yes
The file includes a sample loreline script and a <test></test> block in a comment that describes the expected output when running it
This system was already in place
But now I'm asking Claude to fill-in the blanks
(and at some point I'll need to improve the system to test "interrupting a script" and "resuming it" in another session and making sure the data is the same)
ok, i think i finally stabilized the GC
also WTF
the small http server just did 100kRPS
had to tweak vegeta
lololol
the server handled ~251k req/s aggregate with zero failures. got some high latencies that are entirely client-side backpressure - 4 vegeta processes competing for CPU on the same laptop, queuing requests they can't dispatch fast enough....
I see
@lethal talon u were right. on a real server this will fucking kill
and another crazy stabilization session complete
5 days left, I managed to draft a basic localization system for Loreline, address the remaining issues I had with saving and loading states in the middle of a story, fix a few syntax highlighting problems and include those use cases in the test suite
nice!
im tackling & grinding through old bucket lists. Soo much shit I wanted to do over the years but never found time. Now it's all in arm's reach
Im setting up real-life use-cases / tests to speedrun battle-hardening fiberus
I also have a webframework for fiberus in the works now
Same. This morning I had like 10 new TODOs, if I had more tokens I might just trigger them. Reviewing the plan/output is still heavy bottleneck for me, but these are things I wouldn't even waste time thinking about doing.
facing a little conundrum: supplying sys.thread.* doesnt really make sense for fiberus. I could add some shims that would map the fiber-primitives but that doesnt feel right
only Tls make sense, the rest is premeptive stuff that is useless in our architecture
and shims also feel wrong since they imply something very different and in the end that stuff is only useful when you wanna compile existing code... hmm
ok, I will make them error'ing stubs. Only Tls makes sense
RIP compatibility with my code xD
I guess that's the nature of the beast. you dont wanna mix pre-emptive with cooperative.
in a cooperative setup there is already a managed threadpool and an all-controlling scheduler. Question is why you would still need pre-emptively intermingled threads in such a setup
That makes sense, I think the only reason is compatibility with existing « standard » haxe code
The grind is real
Third-party libraries
- fiberus/project/thirdparty/simdutf-8.0.0/ β SIMD UTF-8/Latin-1/Base64 conversion
- fiberus/project/thirdparty/mbedtls-2.28.2/ β TLS library (for Phase 3.1)
- fiberus/project/thirdparty/miniz-3.1.1/ β Compression (completed)
- fiberus/project/thirdparty/pcre2-10.42/ β Regex (completed)
- fiberus/project/thirdparty/simdjson-4.2.4/ β JSON (completed)
- fiberus/project/thirdparty/subprocess-x.x.x/ β Process (completed)
running one session to implement the whole Std might have been a dumb idea
limit testing
it's gonna be an interesting MR to review
You know what is scary to think about? With all this agentic stuff going on I think the malware botnet guys are having a field day. Decades of experience of running massive botnets coupled to potential agentic rootkits with command and control servers...
lol an interesting thought is, how much computing power do these botnets actually have at their disposal, could they host a distributed llm? :o
Lol
I think this could be the next step. if one finds a way to cluster a full LLM between different hosts with a way to deal with the latency
another thing Im currently thinking about: adding security contexts to agents. I was wondering what would happen if you wrap openclaw instances into docker containers, only mounting data they really should have access and then instead of all this local json maintenance bullshit, you have them all connect to an IRC server+multiple channels that retain all the session contexts via different channels
One thing I think has come out of AI is "responsive scammers"
at least in discord context
responsive scammers?