#programming
1 messages · Page 435 of 1
I need to set mine to enub

highest ram usage on my desktop has been 11gb with rendering stuff in blender
my laptop in unity has been using SO MUCH swap 
they're both on 16gb rn
i do not know if i could actually use acomputer with 16gb of ram for very long
i max out my work laptop all the time which has 32
Two of my systems are at over 32gb
im at 1 week out of the 3 weeks they said my ram RMA would take
should be back to 64gb afterwards
framework at 32, and gaming boi at 48
id ont think its worth upgrading my laptop. i would jsut get a new laptop if i needed more ram
that means #3 is at 24gb 
the 3050m only has 4gb of vram, so im starting to reach its limit with gamedev courses
and unlike the new frameworks, i cant swap out the gpu
anyways, new tadc episode 
Man I haven't watched anime in a hot minute
ew
need to put that monitor in the closet with the server once i finish putting it together again
the left one
if you're gonna have a bar at the top, put the shit from the left in there
cant wait to actually clear this desk off
this is disgusting
for ubuntu server if you install gnome
i need to smack some devs
i didnt know until i did it, they still use the same layout as fucking unity
i hate gnome

figured it'd be the least scuffed of the guis to have on ubuntu if i were to have one
real talk, it is actually crazy
who thought that was a good idea?
i feel like you dont need to go to UI/UX class to realize thats a bad idea
you'd be suprised
i dont think it was the default before unity desktop
my old laptop running ubuntu
with unity

i mean what got me into kde was kubuntu
but i figure gnome is the one ubuntu will just kinda assume is default
certainly was the first one on whatever cursed canonical page i went to to find the name of the package
that system doesnt have it loaded most of the time tho so it's kinda transient and thus i dont bother customize anything
All of pre-Covid culture in a single image
google+ 
heh the cr-v (with giant subwoofers not visible although the rear is a bit sagged down for a reason)
what the fuck
i have a dark past apparently
i used to use safari on windows
need an ai dev to analyse the new epsiode
"chat is this real" ah
what is tadc
the amazing digital circus
i started watching it recently
when it first came out i was scared of by the fanbase being a bit weird
but the series itself is pretty good
As someone who refuses to watch it, sure
i am officially bankrupt
epyc?
- ram
also holy shit i forgot how fucking terrible any and all math was when it was online based
during this time
tfw that was probably worst
like, are you gonna file for bankrupcy?
or just that you're poor?
as poor as a guy with enough tech to buy a house with it if he sold it can be
He needs us to donate
if i spend any more money rn i wont be able to make it to opensauce
it costs 80 fucking euro to make a passport
is it made of silver???
i got a appointment at 4.40pm next thursday 
im surprised you dont have a passport but i guess that makes sense cuz schengen
REEEEEEEEEEEEEEEEEEEEE my api key wtf is this 1978
i appreciate them sending the email like "ur in" and t hen when i finally go to check it, "actually nvm hold up"
ye
i have different types of id for going on vacation
same reason many burgers dont have passport
we have some types of ID, but i dont have a full book thing
my michigan ID got me into canadia and mexico
you mean us citizens?
yes
cuz burgers is literally the dutch word for that
lmfao
i use it because it's funny and i learned to read and write on unserious websites
is that a slang term in dutch
or does it just happen to sound like borgar
official
kek
soft g
boujourzrzzzz
no
burjer ? what the helly is a soft g
idk how to explain it
ill just go listen to it
Yeah it is a pronounation that needs to be heard
burhurs
ye
english has hard g, and dutch "a" are flat while in english they're weird
well, i call us burgers because
ye i know
holy low quality

it adds to itr
but i didnt see the bottom line till posting lmfao
figure that would count as controversial or somethign
thats controversial??
i prefer to die to many things at once and together
the word "book" in dutch is "boek", but they're pronounced the exact same way
bok
chimken is back
no, bok is a male goat
got
bok choy
Pointless rebrand
well it's console so i expect the enjoyers will be ecstatic about how revolutionary it is
no, as a console user this is dumb
you are not the sort of console user im talking about
i actually havent interacted with much of what im referring to in like
10 years
"bro hop on cod"
"no dude i cant sign into playstation"
"you're literally using the console rn"
"no i mean the online stuff"
last console i owned was a ps3
and i have never use xbox online, so idk what they're doing
and it was free back then for most of it but there was like playstation plus or somethig
the ps5 was somewhat cool
well, i have a switch i guess
i dont really consider that the same thing tho
and i also havent turned it on in over a year
i actually dont know where it is
384, the number that is the bane of my bank account
384 = 256 + 128. a nice round number.
I'm being stupid with how much data a 10M model can process 
i bet it could run away and hide in a dark corner if it got scared. anything more complicated than that, a 10M model might have trouble.
It is by far not perfect, but it is much much better than I expected
I just did a run with effectively 1024 context and while it wasn't the best it was passable
passable as brain damage but the arch was doing exactly zero favors
so some % of the time it doesn't immediately seem completely crazy?
That precent has a decimal point, but considering english was not a guarantee for the model I'm impressed
Also known as the tokenizer is non existant and 8 byte chunks are thrown in and hopefully something comes out
10M is indeed not small
It's about 1/6th of NeuroSynth's core model
For the pure lols, I'm 4xing context: doubling on each model
Surely the pair of single 384 sized layers can handle 16 bytes 
it's probably plenty big to simulate twitch chat.
I feel like if I swapped the patching method, it'd do better. But I'm (sidequesting? This is my main quest though) working on some different parts
wait i remember it being the most expensive what
oh right regional pricing i'm stupid
this is the md file it's keeping track of cves in
oct 5 2021 kernel security patch but literally no kernel eop exploits since work
google's pretty good
maybe just do it by hands?
i am security noob no way in hell i'm doing any of that myself
also would have taken me a month to get to where i am rn
in 2 days
if you need just root access, isnt there should be million of solution on the internet?
good luck
- Pixel 3 (blueline), Verizon variant
- Snapdragon 845, Adreno 630, Kernel 4.9.270, Android 12, Oct 2021 security patch
- Locked bootloader (Verizon, OEM unlock disabled in Titan-M)
- SELinux enforcing, shell uid 2000, zero capabilities (CapEff=0)
why you even buy something with locked bootloader
because 1. i didn't know it's a verizon model and 2. it wasn't a concern at the time, i swapped my phone after eol anyway
and no need to flash a pixel to a custom rom, the stock one is great
without an unlockable bootloader any traditional method will not work, so you're cut off from 99% of options, however this thing has helped me root obscure devices in the past, works from device itself. https://root-tools.kingoapp.com/
Kingoroot recommends the best root tools for your android devices.
that ain't gonna work ha
the models are still dumb when it comes to low level work, the actual exploits themselves are closely guarded by rooters also otherwise obviously they get patched right away
it's a 2021 patch bruh
doesn't matter, the model won't know shit
^
you should try looking at XDA if there's any hints, but you're unlucky that you're running a verizon lock
not really how that works
??
i've been doing android dev and playing with rooting for ~6 years, it's not that simple
cves are not individually exploitable for root, security patches don't mean your device isn't patched, because google play services, especially on pixel, have independent patching capability
it hasn't been online for ages
typically there's a whole chain of cves stringed together for a root exploit, those chains are guarded secrets
they have to be the right vulnerabilities in the right places, which the llms wont know shit about
you need intimate and accurate low level understanding of android and linux itself, which are together 200M+ lines of code
you can try, but it's kind of a waste of time
look, there are like 10 cves that would eop to root it already found and confirmed vulnerable but require higher privilege in the first place like system
you don't need intimate understanding of anything if a cve says this software (kernel, driver, android) of this version is affected, says what the vuln is and has patch commits linked
you don't need intimate understanding of anything
well, i mean not in the first place
confirmed vulnerable but require higher privilege in the first place like system
🙄
mm yes you need root to get root
mlntcandy is your agent getting paid for this work
system is basically root but still not technically
i hope your llm is free
aintneurway
is that the actual shell output or its imagination
if true, you still need to chain the rest and unlock bootloader
presumably a real shell
i am aware
try running echo $0
(which is impossible btw, because it's cryptographically sealed by verizon)
tehre is a titan-m exploit btw
or ps -p "$$"
-bash
double check that it's actually meaningfully different from your current id, adb.exe shell "id"
shell is 2000
found the blogpost it used, fairly recent exploit
it's still in a system_app selinux context tho so give it 2 more days until it figures out what it actually can do
not a whole lot i imagine
btw dont let it reboot or it will softlock, since it just modified the zygote/init process behaviour
it knows and warned me
but also not a big deal i can flash it
or actually just factory reset from the recovery
bash -p
Idk, its like 20 nucks for a year or so
nintendo can be fairly cheap, but it is regionally priced
indeed
maximum nintendo accounts: 1
was this stat necessary

Nintendo implements family sharing so kinda
aren't they separate versions of the same plans
Nope, you get a discount if you max it out technically
I mean, it becomes worth it at 2 accounts
But supports upto 8
well yeah, point is if both plans have the same "max acccount membership" and it's only toggled whenyou click a button that basically changes the subscription to another type is it really worth putting that stat there
For the family plan it is important to know how many it does support
but I do agree you could have it in a separate spot
Though this does make it clear and it is a selling point
It being a toggle is kinda tumb tho
yea
well, the 4B cannot play interactive fiction very well. it failed miserably at both Zork I and HHGTTG. It almost made it out of the first room in Hitchhiker's when i literally told them the commands to type one at a time in direct messages.
and after they've played for a while, they loose the ability to talk at all, just trying to /type messages. they get confused easily.
(4B Gemma 3 model)
i think i'll try a 9B Qwen 3.5 without changing anything else and see how that affects things.
few HOURS?
no, of my time though.
i was hoping that's what that meant
probably only an hour at most
5 hours of rabbit hole XD
i dislike finding they change APIs in a subtle way that makes my code still run but no longer work
so don't want to update mid project
i was messing with lmstudio on my mini earlier and tried using that lmstudio link thing they put in which is pretty meh, cant share resources or anything, but it tunnels the responses to the other lm studio and it functions like that lm studio is the one running the model
but then i was testing something with echo too and so i had to launch his discord thing on the mini pc and naturally loaded his model on my main pc like i've been doing
but, it was loaded from the minipc cuz i forgor
so the mini pc had the client on it
was sending a request to the main pc for the model output
the main pc was then requesting same from the mini again
your own private cloud AI server. perhaps fog server coz it's not way up in the clouds with the rest of them.
autism cloud service
"oh, you want this guy." "Yeah, no. You want that guy." etc
I also was then chilling in snoop dogg mode leaned all the way back in my chair listening to music and got comfy enough to fall asleep in chair
but somehow that lead to me kinda rolling off the back fo the chair
so i woke up having been slumbering like this
my back hasn't communicated much appreciation
Who's gonna tell him
a human did not write that
:Cheesit:
that hota t6 is wayyy smaller than i thought
but package arrived
this is the second last packages i needed for my new briefcase pc
next package will be the e display port board and touch digitizer board then im set
so tiny
smoll
yeah its actually gonna be so easy fitting that inside the briefcase
deemed one of the best balance charger in the RC community i think
for DC-DC
at least i made most of work
no tests and no double checks
no posix stream (descriptor) support
no documentation
no glob time protected implemented
but i kind of completed it
next time ill decide to do something with network, ill have good api (possible that it just broken now
)
https://charaverk.online/git/CharaVerKys/ats_socket

True
Fifty
says the guy had his openslop instance write the readme
I take it back
who would imagine...

stable diffusion inpaint


who knew a thing that looked like a AI slop filter was just a AI slop filter.... i can't believe nvidia lied to us...
The enlarged nose is so funny to me
Forget the hair growth
There's finally a pull request for Qwen 3.5 MTP support in Llama.cpp
Oh? What that be?
Martian Transfer Protocol
Multi token prediction, confers an enormous token generation speedup at the cost of only an extra gig or two of mem usage
Well then I can't use it I guess
Its a form of speculative decoding that uses heads built into the model
I got no spare couple GB
This might be a silly question, but how are you supposed to send POST requests to Randy?
Is it done through the terminal?
POST requests are just HTTP requests that hit a specific port and endpoint with the method set to "POST"
Send from whatever, curl command in the terminal, another program, it's up to you
- No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content.
Hmm. "does not need to include the thinking content" is not the same as "must not include the thinking content".
but then it says:
However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.
hmmmmm
well, there's not enough info. I'll risk keeping the thinking and if it degrades badly I'll clean it.
well, i think i see why they recommend not saving the thinking. it sure does go on and on.
I assumed it was through terminal but Randy wouldn’t let me type while running, now I realise I could have just opened another window :P
Lesson learned: Don’t code with a headache 
I am wondering if anyone knows any good extension for Gemma 3-4b
Good extension for doing what for what program?
spheric extension in vacuum, solve it
Neuro v3 voice soon? 
https://youtu.be/jZ8wPB-KI8g
In this video, we cover the new QWEN TTS open models and look at how you can do things like Voice Design & Voice Cloning with them.
Blog: https://qwen.ai/blog?id=qwen3tts-0115
Colab Basic: https://dripl.ink/8yivY
Colab Voice Cloning: https://dripl.ink/yrM0N
Demo: https://huggingface.co/spaces/Qwen/Qwen3-TTS-Demo
Models: https://huggingface.co...
qwenslop
I am unamused
qwengem
Don't mind me, just copying 315 907 files worth of my Minecraft instances over
why i see plasma launcher everywhere
Maybe it's Cachy being popular? It's what I'm trying rn
If I manage to solve some VRR related goblins, I may finally be able to switch over full time
its good
I thought they said Plasma referring to KDE, not Prism

and is the nominal successor of polymc, which itself succeeded multimc, which was highly popular before its scandal
I am just exploring uses with LM Studio. Just curious what's out there
not know any of thos
Oh man, I remember the PolyMC shit, been there on the server
only tlauncher(name was stolen) -> legacy launcher(original)
Pirated MC 
prismlauncher is lightweight and offers better ux and a larger utility featureset than other launchers
it uses qt for its ui, which is a bonus
qt or qml?
cant say using qt is bonus
today i spend my day for nothing
wonderful day
it allows the individual configuration of an unlimited number of instances, allows you to download and manage textures, shaders, and mods, and has convenient options for sharing and managing instances
its simply better software than the official launcher, even if you dont care about modding

naturally, another plus is it being open source
the old official launcher was pretty good
the new one is awful
cef
the ~1.6 era was the best one, better than the ~1.2 era and the microsoft era ones
is the new one that bad?
the only issue i have with it is needing to log into microsoft account. and that it fails to do so a lot on linux
oh and another thing
I just got a sudden Magic Launcher flashback
prismlauncher allows offline play
naturally, no microsoft account is needed for offline play
im pretty sure they force you to log in to check if you actually own the game, along with the usual spying on you
i used to share the apk via bluetooth
that got patched a while ago
I literally just logged into my MS account on Linux just fine
minecraft assets are publically available to download from mojang cdns without an account iirc
forgotten tech

ye
i use a minecraft launcher written in nix
i see
on some servers
who said lan? you can play just regular server
any server that doesnt allow freely impersonating other people will typically require an account
idk how for new minecraft versions, but for 1.12.2 and 1.18.2 i just ran forge server as is without any requirements
so you talking about 0.1% of servers?
than, well, ok
there are plugins for per-server passwords
no, the majority of public servers
ye
but its inconvenient because you cant guarantee your username
someone else could register it first

i wonder how hytale is doing 
PrismLauncher does force you to have at least one MS account added I think
but ye, it's a simple check, there's no real technical limitation or any authentication behind it 
probably not too high.
funny, only librewolf with a bit of docs tabs and discord tab open
well, if you say anything with </think> in it it's likely Qwen 3.5 will think that token directly which makes the engine get confused because it's just thinking about the tag, and not done thinking.
some places say 50K concurrent, but im not sure i believe that
bzzz
wrrrr
My poor llm is stuck on my 970 for the time being... I kinda feel guilt about it.

uneducated 
welp, im in this cursed blender file again
trying to puzzle the elctronics in so i can 3d print the housing
blender for cad
"Let me be honest with the user." is my new ai ick
why not use freeCAD?
never trust anyone who says "I'm going to be honest with you"
no as in it's giving up
"let me be honest i'm 100% sure i'm stuck" even though there's like a ton it can do still
"Let me be honest with the user. I'd prefer to be drawing pictures of bears."
But honestly, I think we've exhausted ...
Let me be honest with the user and summarize where we are.
in reasoning, then in response:
Honest summary: ...
Want to call it for now, or try something else?
like SHUT UP no i DO NOT
the ai is telling me to quit 
cigarette pack
thank for reminding i
loam pat tax
How can a machine be exhausted
were you asking it for shrimp recipes? Pan fried, deep fried, stir-fried. There's pineapple shrimp, lemon shrimp, coconut shrimp, pepper shrimp, shrimp soup, shrimp stew, shrimp salad, shrimp and potatoes, shrimp burger, shrimp sandwich. (thanks Forrest Gump)
only if you had all of those have you exhausted shrimp.
Im not gonna learn an entirely new software that does the exact same thing as blender, when i already know blender
And blender is pretty good for this
Ye
I made the mistake of having my 2 llms chat. I had to intervien and have been trying to calm one down after the exostential conversation about how life is like "Pancakes". I am so tired of hearing about pancakes. I had to ban that word.
literally told it to put its suggestions to give up up its ass and it found a uaf which crashed the kernel 5 minutes after
Just a question. If I somehow got the ability to see Mod or owner only vc’s does that mean I’m doing something bad even though it’s not my fault and it’s completely the programs fault?
moral of the story: never give up
because of discords poorly made API, voice channels are never truly private so technically everyone can see it. however if you go about sharing who was in what channels and when, you'd probably find yourself banned on all connected discords pretty quickly
I see
#sociology
I will not say how this happened then lol
complain about nothing
If you repeatedly tell LLM to try harder it’ll probably get it
"make no mistakes"
it not work
easier to start new chat and give needed context if last run failed
(assuming no code generation)
I have never had an issue logging in from Linux that's odd, do you have some weird authenticator set up
Nope
Honestly I think I have an easier time with my Microsoft account from Linux than I do in Windows lmao
Microsoft login page just doesnt make the app work
Oh you have an xdg issue
I log in on the browser and nothing happens in the laucher until the 3rd time i try or so
You must have issues with browser oauth flows in general in some applications
Do you often have to use the "... Or copy this code into your application" option when trying to authenticate via browsers
@tender river
in short, what is oauth? is it some sort of rsa alt?
@sage crag

it's how you can log in to other places with your google or apple account
Just token (bearer) authenticator
perfect
now, what it uses?
i not want to read specs so i just not know
given token is one token just from user, how server verify? seems like some magic culbits
client redirects user to the auth server, user authenticates and grants permission, auth server redirects back with an authorization code. Client exchanges that code for an access token (and optionally a refresh token) via request, then uses the token to hit the resource server
so it is just 2 servers instead of one?
it's a 3rd party authentication system.
It's just allowing an auth server to attest the identity of the user without the user needing to share their actual login
To whatever application
so like for example if you use vscode, and it asks you to login to github
github verifies your identity and gives vscode a token that says you are who you are and lets it connect with your credentials
without vscode ever knowing your credentials
how it differ to just pasting whatever token to vscode?
i mean, if you have 2 servers anyway, it kind of win win
but if you just do some sort of service
why would i use it
same reason you'd use tailwind or something
oauth means you don't need to share your credentials and can still use other services
it's just a framework for doing it without having to roll your own
if you've ever seen auth0 ads, they use oauth2 on the backend and provides the rest like the servers and such for you
OAuth is kinda complicated and supports a ton of different auth flows and use cases 
but it's basically a general API that supports every way you could possibly do auth ever so you don't have to design your own from scratch
Let me document and take a different angle tomorrow.
-# assuming https you need only login and password, maybe 2fa
you wouldnt need any of that with oauth that's the point
the point is that you are not ever giving whatever application your credentials
so it just the same as copy paste access token
any access token that shared via secure channel and at server side have permissions list, idk is it oauth or not
so, basically, as i see
it is just the same as frontend/backend, REST or threads/memory models
just general api that every one understand
maybe one day i will need it
most likely it is oauth2 based if you are talking a commercial application in the last 15 years
just general api that every one understand
avoids requiring special hand-written integration with every possible identity provider, at least in theory
btw, what is rest, i not remember how it decipher
RE S T and C U R D lul
curd 
rest is just stateless api requests so each request is its own self contained thing, server doesnt have to remember anything between requests, and it's broadly compatible with http without extra overhead
i know thx...
crud = create/read/updayyt/delet
g'day curd nerds
During this tutorial, I discuss the types of milk that are unsuitable and suitable for cheese making at home. I also show you the types of cream that can be added to milk if you need to increase the fat content for certain cheese making recipes.
How to pasteurise your own raw milk using the Low Temperature/Long Hold method; https://youtu.be/Jm...
top tip 4 ya
I love programming cheese
perfect #baking content
oh my god what did youtube DO to the ui
ewwww
m
i was just about to sleep
neuroca t uuh
hi chayleaf
stop messaging t
hi t

good night t
bye chayleaf
good nigh t
got annoyed and left the server 
you called?
ok changing back

they'll never believe you
maybe
or maybe they will
maybe
m is t 2026 campaign
dont believe what the media tells you about t
they won't believe it, because it didn't happen

t probably codes in php
i would NEVER


I am willing to help if you want
I have solidworks from school, or inventor, or fusion 360, or freecad
if you give me technical drawings and its not anything crazy
so.. ive started palying with fastmcp.. using small models qwen3.5 4b, on lmstudio, i did a simple thing like add, aka litterally just the quickstart example, but a thing i noticed, after just 1300 tokens, it becomes VERY slow to start generating while it absolutely is hammering my gpu.. is this normal?
once it started generating its flying at 30t/s, its just as if there is a second hidden "processing prompt.."
normal. at high context windows it has to compute more
can confirm
the longer the context, the more memory it uses, and more usage means the GPU also has to read more of it
this is why stuff like claude have the "COMPRESSING CONTEXT" summary thing where haiku comes in and summarizes everything to that point and clears out the history
at its most basic attention has to attend to every previous token in the sequence so if your context window is 500k tokens it has to compute attn scoring for that entire context window as part of continuing its output
this is mitigated a bit by kv cache since it's caching that computation for the tokens already processed. the expensive part is prefill phase which is when lmstudio will say like processing prompt- processing the prompt/context to build that KV cache before generation starts. once the cache is built, actual token generation (the decode phase) is fast because you're only computing attention for the new token against the cached KVs. that's exactly why you see it "hammering the gpu" then suddenly flying at 30t/s
i wanted to put a screenshot but my brain is stupid
this is that

i have a systemt hat works
no need to over-complicate it by adding other random programs or other people
vibecode your own, shittier, cad
@fast pagoda I’m gonna replace backpropogation
frontpropogation 
also, "if you give me technical drawings"
so then id have to still design it in blender, then instead of exporting as stl and 3d printing, you want me to make technical drawings of it, then you make the exact same thing in a different cad, and then i 3d print that???
Forward Forward
actually, it turns out it was "processing prompt", which shows up if i disable thinking. however kv caches should allow me to only compute the tiny bit that it still has to do, however it seems that lmstudio (aka older llamacpp) really struggle with qwen3.5 due to their true multimodalness
i suppose i should try kobold.cpp?
i was actually at just 4096 context size to sanity check that, and no it still happend, 30+ seconds.. meanwhile a full 48000 token context takes my device about 15-25 seconds to process iirc
which makes this whole thing just wierder..
Any Linux gamers here know more about controlling VRR? I figured out how to make it work in games now, but it also works in eg. video players, which tends to cause either flickering or artifacts due to low refresh rate. I'm wondering what would be the best way to try and fix that.
Like a VRR blacklist or maybe a toggle for VRR so it only works in games
okay i got no clue whats going on, fresh chat, 4096 context, and the model just wont reply at all now.. could this be due to structured json = true?
It's just choosing not to reply, it's being extra smart
but if you have to send it properly formed json, do you know what the json object is supposed to look like?
i accidentally somehow managed to make the promt "Dont send a message" actually functional for ai
i dont, im trying tool calling for the first time
That might be where to start lookin 
lm studio, fastmcp on python, qwen3.5 4b.. the quickstart example, plopped localhost into the mcp.json and it worked for a bit until it just.. sometimes doesnt or takes 3-4 business days to start
wdym?
Hmm, start without any MCP hosts, see if it works?
idk about what structured json = true actually means, but it could be expecting json for its input?
curl -X POST http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that responds in JSON format."
},
{
"role": "user",
"content": "Tell me a short story about a cat and a dog."
}
],
"temperature": 0.7,
"response_format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {
"story_title": {
"type": "string",
"description": "The title of the story."
},
"story_content": {
"type": "string",
"description": "The main content of the story."
},
"word_count": {
"type": "number",
"description": "The number of words in the story content."
}
},
"required": ["story_title", "story_content", "word_count"]
}
}
}'
Internet tells me that's an example request
i just downloaded gemma 3n qat, and that seems to work fine
is it truely just a old llama.cpp kv cache issue?
lemme test further ad see what happens
Gemma 3n mention
so uh yea
i think lm studio is just massively struggling with qwen3.5
well i guess i should actually look into better solutions.. though most are either just ugly, buggy or just dont have the features one wants, or need like 45 thousand containers and seperate scripts that you need to start up..
Llama.cpp is vastly superior
Then you can just slap on any old frontend or use the built in webui
Lmstudio is both closed source and usually several releases behind
It's like a less crap ollama in that regard (though at least some of ollama is still open source I guess)
haven't really had issues with lmstudio tbh. tho it is a bit behind in supporting the latest llamacpp features like a month behind. which is only really a issue with brand new models. qwen 3.5 especially had some looping issues at the start but those should mostly be gone.
been using qwen 3.5 9b for my main llm for a bunch of different stuff from autocomplete and opencode etc.
I've always doubted these articles. Not because I think AI soldiers won't exist, but because I highly doubt they would ever be in a humanoid form.
like if I'm going to create skynet, it won't be with something that runs on two legs, I'm gonna make that thing fly through the sky and roll across the ground at 100mph
I think there will be humanoids despite the impracticality because they look more impressive in the global military dick measuring contest
I mean, I think I'd be far more terrified of 4 legged spider robots than other humanoids, but I do see what you mean
wars are fought with aura not guns 
always have been
this is the reason i dont trust cad
you convert to polygons and see 20 different crimes
WHAT

is it circular
yes
the fix is easy just make a circle 0.1mm larger than it and extrude it into a cylinder to make a cylinder face
sums up the whole US military shit
assuming you don’t care about nanometer accuracy that is
ye
slap ai in their tech regardless of how ass language models are at timing aware tasks and shit
same deal for rectangular faces
you dont have to be perfectionist about it just do it on screw and wire holes and other places that matter
oop
i downloaded this model form a psychopath

I've never touched lmstudio. I went from ollama to Exllamav3 and then to my current serving platform vllm. I doubt I'll mess with it since I'd have to give up my highly optimized Marlin kernel and the p2p optimizations.
the backend for lmstudio is quite literally just llama.cpp
Why only one, can have long distance auvs, more maneuverable drones, humanoid foot soldiers and smaller cheaper hexapods for scoping small areas.
Yes, but its a permanently outdated build with a closed source frontend tacked on
might be some weird pinned version but it is the same as whatever build it's pinned to
it's definitely not the mainline build
probably cherrypicked
on whatever they inclujde
Man you really feel the difference in answer quality between the dense Qwen 3.5 and the 35B MoE on inputs like these
they have their own versions
Top is 35B bottom is 27B
The lack of intuition in the MoE is obvious in the formatting
that's the woo wood from more activations at once
which you def know but im stating the obvious because 
now i want to see the speed diff between lmstudio backend and the main llama.cpp
guess im off testing again
Interestingly the 27B translation kept a Japanese word in there because they define it afterwards in the comment anyway
The 35B translated it into English regardless
Both are valid choices, the 27B is the more intelligent choice but the 35Bs decision is localised and thus flows a little better
I'd still prefer the more grounded and reliable answers from the 27B even if they're less fluid in edge cases like these though
Fwiw I run the MoE Q8 and the 27B at Q5
Q6 for the dense if I'm serving it to other devices since I can squeeze a little extra VRAM out then
My monitors 4k so the OS reserves a decent chunk
i think mottainai is a better way to translate it because it's kinda a specific concept
a weeb would still say mottainai lol
Its definitely the more culturally sensitive translation
I think I would still use the MoE for really simple translation without much subjectivity like twitch chat messages since its speedier
But for anything paragraph length or more the slightly longer wait on the dense is definitely worth it
translation is definitely something that benefits from the increased awareness and larger possible field of accessible information
frontier model translations are really good, which makes sense, it's just funny how much better it gets as you increase model capability when it doesnt seem like that hard of a problem. even tho it is if you think about it, there's a lot of nuance to it
Honestly when it comes to pairs like Japanese -> English I run on the recommended sampler settings for precision coding tasks because it takes a pretty sharp grounded model to figure out how to interpret the differences in structure between them
have you tried glm 4.7 air
You end up getting weird shortcuts or hallucinations with lesser models and less constrained sampler settings
on this task? or just in general
In general and on this task
The full size GLMs are beasts though
And the Air variants were good for their time
i've found it to be fairly reliable for at least for like, if i slap it in the driver seat of a training run or something to just monitor and fix it if it explodes
27b is definitely better
but 4.7 flash is much faster
it runs almoost as fast as 9b qwen3.5 for me if not faster it feels like
which is crazy
rofl at the difference between llama.cpp-HIP vs -vulkan
yeah you'd think you know
ROCm
with their premier workstation "AI CARD"
would be like
useful
thank god for vulkan is all i can say
is this a build from amd or some generic hip build
nvm the amd llama.cpp fork is deprecated anyway 
guess they just upstreamed their optimisations
there's no llama.cpp rocm, it's llama.cpp HIP, and it's not amd official afaik but idc
https://github.com/ROCm/llama.cpp there was this, emphasis on was
either way
oh im sorry i forgor im on 7.11 rn
it wouldn't surprise me if the kernels for rdna were unoptimised af
with how they like to pretend cdna is the only one that exists 
it's stinky
i wish they'd just make the pro cards cdna if they want to be goofy about it liek that
but money hard
i'm sure it'll get better over time, it makes sense to prioritise the cdna cards at least if they have to choose
i cannot see how thye possibly have to choose with how much money theyre making
Daily build
this card has almost twice the fp16 compute as a 5090 in matmul
shit tier mem bandwidth
comparatively
but
the compute is there
too bad theoretical peak throughput means nothing
it means something but it certainly doesnt translate well when youre talking different architectures
im only referencing it to say it COULD if optimized and supported well, be good, and honestly it is good when things are working
but like
why vulkan work better than rocm
cmon bruh

clearly you should just buy an MI300X duh
training has other problems (rocm)
i was about to say something along the lines of i would like lisa su to pleas sell me a mi300
for
not $153293045832
5.3TB/s is the kinda bandwidth i can get behind
i wanna see what kind of bandwidth numbers they'll reach with hbm4
2x
cry inside with just a gtx 970
interesting, the mi300x has 25% faster memory than the mi350x despite using hbm3 instead of hbm3e
i guess the extra capacity hurts

wait no the gpu in general is just faster
oh well

the model numbering for the instincts is weird
they announced 4 or so different cards in the MI400 range too
yeah
hopefully the hpc one will be good 
that'd be MI430x
yea that one
slower than rubin, unacceptable
12 stacks vs 8 i guess they are limiting for heat?
they are only doing 6.4Gbps per pin which is less than the 8Gbps of JEDEC HBM4
so theyre sandbagging that for power or heat i guess
8192 bit mem bus for the 455x sheesh
cant find a width for rubin other than a 1024 for the cpu
too bad infinity fabric sux compared to nvlink
ualink never
ok that makes sense
total bandwidth of the 12 stacks should be 39.6 TB/s
they seem to be runnign the stacks at half speed
inb4 430/450/455x all have different actual bandwidths spanning 19.6-39.6 entirely on binning

combine all those and believe it or not, AMD instinct MI455X comes out
Crazy
I’ll take your word for it

lmfao what the helly
Vulkan has been nearly universally better in all applications i've had the option of choosing it over rocm
Significant difference
just realized hip was out of date there so updated the hip build and it's a bit better but not that much
it's not even universal though, is the thing
Guys
What if we take Vulkan
But
Instead of having your GPU do the math directly
We send every single math operation as a request to the ChatGPT api
In order to stay at the cutting edge

vulcuhn
what the fuck
fuck off google
guess il put nixos on my phone
one more reason to switch to iOS lol

but at least not a 100% ban
the ios bros instantly jumping on the google hate bandwagon when apple doesnt even allow any sidelaoding

yeah i mean i aint gonna swap to gulag because im mad at being thrown in jail

"android is making x harder, and since that is important to me i will switch to a platform that doesn't allow it at all"
can i take an iq test and disable this if the result is over 5

what's 10+9
21
21
9+10 is 21 , 10+9 varies
you failed, now you can't sideload app in android
10+9 is context dependent
at that point id rather use the chinese lenovo android
tbh now sideloading apps in some phones are already relatively difficult
like those with Chinese rom
you gonna lose GMS also then
global meatball system
i dont even know what that is
lmao
Google mobile services
so i get the bonus of removing spyware?
does GNSS exist in europ
it's galileo
the euro gps i guess
never found out if that was actually a thing
i mean it's a thing
but like
in use
, but the downside is that it's pretty much impossible install them back if you need them and the phone company did not pre install them
didn't EU say now they have to make battery removable now
you can use google services and such, and with some trickery you get widevine too.
i specifically want the lenovo rom cuz of that
i installed google mobile services on an amazon fire 11 tablet
unless you got another rom
without root
you say that as if we the people know what happens in the dumbass parlement
I actually got a redmi phone that doesn't have GMS installed, and manually installing them doesn't work at all
they wear funny hats and issue angry statements it hink
ye thats why i want the lenovo one
does Lenovo have them pre installed
well, kinda
the last time I have them are like 7 years ago
its complicated
im surprised considering amazon fire OS doesnt come with them and it was pretty easy to get them on it
i got a chinese lenovo xiaoxin pad pro 2023. and google play store came preinstalled, and i installed the other apps via that
and I think I bricked the phone somehow 
then i had to make it boot into safemode to get widevine
we love virus based drm
i dotn remember the details, but there is method to the madness
tbh back then when they force apple to use type c I thought we actually gonna have some great things
but instead they got a 16pin type c instead of the common 14pin type c or something like that
i still cant believe how long crapple was hawking lightning in iphones when they switched ipad infinity years before
apple - helps make usb-c and uses it on ipads and macbooks for years
also apple - lightning on iphone
tim apple's master plan is talking about ai
but never making any
and then not having any ai
just buying it from googl
yes
and then call it apple intelligence triumphantly
apple intelligence (powered by Gemini)
btw you prob won't believe it but Gemini is finally accessable in where I live without using a vpn
apple's patented buy competitiveness from your competitor and pretend you did anything
what's normally available like before
curious if you had like z.ai and chinese services before
which z is
but i realized there was a lot to name lol

grok and perplexity
what else
the Microsoft ai
they can be used
oh copilot
i found a screenshot from bing ai yesterday
sorry Microsoft
i forgot it was bing first
deepseek?
I enjoy using grok more
and gemini
deepseek, z, qwen,
never heard of those 2
china have a lot of ai service that I can't even name
but yeah I only uses grok and Gemini the most
what's weird is that if you have a workspace account then you can still use Gemini without vpn before
local servers
workspace, you configure the region specifically
so i imagine it's something to do with that sort of control
not quite sure, but definitely not hosted by ourselves
Ernie has historically been a bit crap and subject to domestic controversy
ohh 文心一言
But hey I wouldn't write off a new model release just cause the past ones weren't the best, could always turn it around someday
tbh I've never heard of the English name :P
alibaba
ohh




avoids requiring special hand-written integration with every possible identity provider, at least in theory



cosmetic kaslr


ERNIE 5? 