#vibe-coders

1 messages · Page 8 of 1

deep timber
#

Please tee me 🙏 🙏 🤔

bleak vessel
#

What are some optimizations that I can do to reduce the cost of gemini 2.5 flash native audio? I have built a live interview platform and based on last month's analysis, I see that it takes around Rs 50-60 to run one interview (an interview lasts for 5-6 mins average), which seems to be very high.

I checked the input token usage and the maximum amount of input tokens that was used in one day was 3 million, and there were 12 interviews that day, which means on an average each interview used around 250K input tokens.

Any help on reducing the input token usage and the cost in general would be much appreciated 🙂

tall walrus
#

lol urs say refresh in 4 hours ? mine refresh after 7

#

i wish i had 34k google acc like chinese companies

#

i think buying multiple accounts is good

#

like google?

#

thinking of doing that rn it works for some acc i have multiples but idk if i should buy multiple acc

stark sapphire
#

If you wish to be banned. Then go ahead.

stark sapphire
#

Really happy with what i made so far with Antigravity

vagrant folio
bleak vessel
#

Its audio only

stark sapphire
#

uh oh.

gusty meteor
vagrant folio
#

so You can try make interview about max 5 to 10min using this approach

#

also check you have the VAD enabled and be sure the are no duplicate resendings

bleak vessel
#

Okay.. i am aware of context window compression, have to play around with the values a bit maybe

#

Also is prompt caching something that might help?

vagrant folio
#

yes, but the catching in reality happen at the server side

#

so here unless they have that option it wont work

bleak vessel
#

Oh ok makes sense

vagrant folio
#

you case need check why this consumption
Gemini 2.5 Flash Native Audio Dialog
Live API
30 / Unlimited
20.63K / 1M
199 / Unlimited

bleak vessel
#

My system prompt is actually really detailed (around 5000 tokens easily) and I believe that is being sent with each convo

vagrant folio
#

this is my ai studio for a project I did before I didnt talk much but Iwas using video

#

and im sure is longer than 1 minute each session

#

so your reported token usage is to big

bleak vessel
vagrant folio
#

let me check give me a moment

#

my is 800 token system prompt and tool call is 430 token

#

in your case not sure if apply can be separated in tools or based on the interview.

bleak vessel
#

Oh ok… i don’t have any tool calls

vagrant folio
#

good I was checking search for Explicit Context Caching

bleak vessel
#

But yeah let me try out context window compression

vagrant folio
#

but this require save cache on your google project

bleak vessel
#

I guess i can configure that

vagrant folio
#

yes

#

if you want be sure what happening I recomend add a middleware which will count tokens going to live api and how much you receive from it by turn

#

record that to a json file

#

and use it as a reference to know if is improving or not

bleak vessel
#

Yeah, planning to do something like that... Actually that is a good idea.

I planned on detailed logs, but again that becomes a mess to observe. The JSON file makes more sense. Thanks

vagrant folio
#

yes just make a json for token consumption count only, by turn, if youwant something more you can deploy grafana and add a metric from json, or use prometheus. for easy view so whatever path will be easy for you to see read or pass to ai

stark sapphire
#

thoughts on UI?

#

I tried to make it even better

vagrant folio
#

nice, from my perspective the log out section be at botton

hasty bay
#

Does anyone have any unique ideas for building an AI agent?

vagrant folio
#

What are you looking?

iron rock
# stark sapphire thoughts on UI?

The UI is a bit distracting, it wants to pull focus to everything and thus focuses on nothing.

The UX gives me zero idea what the purpose of the website is

stark sapphire
#

before, it was just cards, all same sizes, nothing was telling you anything. It's just there. It gave no focus. So i made it differently, with more purpose, so you eyes can focus on what is important in the moment.

iron rock
#

Bright lime green bar, eye goes there first. Gives no details.

Then the eye bolts around to all of the yellow, as it is the focus color. Which then brings you to all the article related stuff, so... News website?

Then I see vending and get confused as heck

stark sapphire
#

The colors are there because those are brand colors. I kinda have to use them. Secondly, it's not a website, it's an application. An extension from the website.
People downloading the app will understand what the're looking at.

Let me break down why this works before my ego inflates and floats away. First, my hero finally commands attention. By combining a big image, strong typography, and a clear CTA, I’ve stopped presenting twelve equally irrelevant rectangles and started answering the question of what the user should care about right now. It’s a huge win. Beyond that, the "Trending" section finally feels like a curated space rather than a random data dump. The labels pop, the cards are grouped with purpose, and the spacing gives the content breathing room, signaling that this information actually matters. I’ve essentially invented flow by moving from the Hero to Trending and then to Latest Reviews; previously, my layout had the narrative structure of a grocery receipt. The typography is also doing some serious heavy lifting here, that italic bold headline style in the hero gives the site/app a slightly aggressive, editorial tone that feels like a gaming magazine that drinks pre-workout.

iron rock
#

If I landed on that page looking for game reviews, I would end up leaving confused on why I had been brought to it.

#

Ok. lol

stark sapphire
#

@iron rock Here is how the actual website looks

iron rock
#

Much better, not a fan of the other-one though.

#

Im not personally a fan of the limegreen, but Im sure it doesnt bother others

stark sapphire
#

I understand. Sadly i can't change the color, because it's for a company. They have used this color scheme for ages.
Game Mania is 34 years old, having been founded in 1992.

iron rock
#

Oh damn! That's awesome though. I wonder if it would look better if you used the green as the cta color rather than yellow. That way the top bar could be yellow.

I assume it wouldnt lo9ok better, very likely you have the best already selected.

#

look*

stark sapphire
#

The issue with this idea, is that the green header has always been green in the past.
Plus, I'm simply not allowed to switch it to yellow. The yellow is also a part of the brand color, but I have to be careful where to use it.

iron rock
#

Understood, thanks for hearing me out 🙂

stark sapphire
#

glad to get feedback 🙂 thank you for that as well.

deep timber
#

Why gemini cli is taking about 5 minutes to respond in any task it is daam to slow anyone have their solution
Please tell me

vagrant folio
#

in my case I noted that 2.5 flash answer faster

#

try flash

fading osprey
#

Are you also having problems with the antigravity limits with Google's Ultra plan?

signal raft
stark sapphire
#

what sucks with Antigravity, is that when you use your tokens, but the Agent fails or gets disconnected for a split second, you lose all of those credits.

tall walrus
#

How are you all vibe-coding with this Claude Opus?

stark sapphire
#

what do you mean with how?

balmy depot
#

I only use Opus for planning... it's not a good use of tokens to get it to do things.

fading osprey
#

like the antigravity costs with google ultra plan, what is there that has an almost unlimited rate limit?

winter viper
fading osprey
#

what?

#

Maybe I didn't explain myself well with my question

#

I was using Antigravity with the Ultra plan until yesterday, working perfectly for software development, CRM, etc.

But since this morning, I've been having problems with the limits, and I think it's a general problem... Do you know of any good alternatives at the same price of $250 per month?

winter viper
#

Ohhh, my bad, claude code for sure, even cheaper and better

fading osprey
#

Yes, but I read that it has the same limitations, if not superior to antigravity of the last few days.

#

Until yesterday I was able to create complete CRMs from 0 to 100% without affecting the credits or the rate in the slightest.

uneven bridge
#

Goooooolersssss goood to be home can't wait to meet you all ! Sorry for any wrong beings thanks for supporting during the most tuffest times in life you guys are the best wish you well ! God bless you all. Googlelogo ❤️ GCP

uneven bridge
stark sapphire
#

Bot

uneven bridge
stark sapphire
#

No

uneven bridge
#

get readyyyyyy!!! yall watsssup im here im not going no where cupcake

#

I LOVEEE IT

#

#welcometoJungle 🗽 😎

#

https://www.loom.com/share/47a10f42fc164cbba895e1ce53071c86 im here !!!!!!! im only 3 weeks in you got a long rride

Hey everyone, in this video, I'm excited to share the latest updates on our project motion frames and the potential they hold for our mission. We’re diving into the specifics of the G6 engine and how it enhances our capabilities. I also touch on the importance of our Asian connections and the unique aspects of our design. I encourage you all t...

▶ Play video
#

after 7 month breaks 😘

#

lets do it !!!!

#

come outside yu think got jokes huh

balmy depot
fading osprey
#

no because it reduces 20% after 3 messages and then after 5 hours it gives me everything back

balmy depot
fading osprey
#

And is there a valid alternative without limits?

balmy depot
#

Nope

#

My understanding is that they are building Antigravity into AI Studio, the IDE may not survive.

uneven bridge
balmy depot
#

Closest thing to an alternative I've seen is using OpenCode or ClaudeCode integration and then using Antigravity for planning and then your choice of models to implement.

fading osprey
#

and.. remove completely claude

balmy depot
fading osprey
balmy depot
#

Well can use Claude so yeah equal. Several different ways to go about.

#

You don't switch to Sonnet for implementation? Cheaper that way.

fading osprey
#

When I use Opus, it automatically downgrades Sonnet.

balmy depot
#

Hrmm? Okay.

#

I run out of the Claude quota so fast, I can't say I've had a ton of experience with it.

balmy depot
fading osprey
#

and where? you know?

balmy depot
#

I am personally messing around with OpenCode now as part of my process.

Well ClaudeCode would be at Anthropic. Max plan would be an option?

OpenCode is more able to connect to anything, but the people behind it have "Zen" and "Go" services that include Claude, so I would suspect that would be more to spec with Anthropic. I don't want to get to into doing the pricing reseach for you. Pretty straightforward stuff, but changes often

dawn sphinx
#

Does anyone know when Gemini 3.1 Flash Live Preview will be available through Vertex AI?

It seems possible through Google AI Studio but not Vertex AI.

I upgraded to google-genai >= 1.69.0 and have the SDK is unified.

The Gemini change log said on March 26, 2026: “Released gemini-3.1-flash-live-preview, the latest audio-to-audio (A2A) model designed for real-time dialogue and voice-first AI applications.”

models.get() returns a metadata shell but the Live API WebSocket returns 1008/404.

I can’t tell if it is behind a quota/EAP allowlist adjustment.

I don’t know if the endpoint is gated by a Private Preview IAM flag because of some GCP Allowlist Flip or what.

I think the global control plane knows the model exists, but the regional data plane (API Gateway routes/GPU clusters) is unprovisioned.

stark sapphire
jaunty marsh
#

When building an app, the model usage is very confusing to me. For example, when I’m using Gemini 3.1 Pro preview, sometimes it allows me to create quite a few prompts before it exhausts usage on the free plan, and sometimes it’s only a couple.

balmy depot
# stark sapphire Do you have any sources saying so?

Nope, the "IDE many not survive" is pure speculation. Which is why I used the word "may" to indicate uncertainty. This blog however talks about how they are introducing the antigravity agent to AI Studio: https://blog.google/innovation-and-ai/technology/developers-tools/full-stack-vibe-coding-google-ai-studio/

Google

Start building real apps for the modern web with the Antigravity coding agent and Firebase integration now in Google AI Studio.

#

They've certainly undermined the position of the IDE, while seem to be focusing hard on getting the good bits into the cloud based AI Studio, which honestly is more where the company is comfortable. Have to say that my hopes for the IDE are dimming. I hope I am proven wrong.

#

Google is a big enough company to do both.

round solstice
#

<@&1009526435276394496> that spammer is back again. 🙁

past moat
#

Tip: Add outbound loop prevention to your GitHub Copilot instructions

If your AI agent can send emails or messages, add a rule that stops it from replying to itself. Without it, one email can turn into hundreds.

Example 1 — The email loop:
I built an AI agent that reads my inbox and sends replies. I added the agent's outbound email address (aos@mydomain.com) to the list of allowed senders. When the agent replied to a real email, that reply landed back in the inbox — and the agent replied to that too. It looped 18 times before I caught it, and generated ~89,000 Pub/Sub (publish/subscribe — a message queue service) retry faults in the process.

Example 2 — The fix (three layers):
The rule I added to my Copilot instructions requires three independent guards any time the agent sends something outbound:

Code check — before anything else, reject messages from your own addresses in the handler logic itself
Config check — never add an outbound address to your allowed-senders list
Rate cap — abort if more than 10 emails have gone out in the past 60 minutes
The reason for three layers: if only one guard exists and it's misconfigured, the loop happens anyway. All three have to fail at the same time for a loop to get through.

Why put this in Copilot instructions?
Copilot will generate the outbound handler code for you. If the rule isn't written down, it won't know to add the guards. Once it's in your instructions file, every new handler gets the protection automatically.

vagrant folio
#

if you mean the terminal at the first versions where you have access o see and interact how he execute commands etc. I think they just remove this feature. now it execute commands in his terminals but you cant interfer as before, you can see the output

stark sapphire
#

his?

Anyway, terminal works here just fine.

lime wyvern
#

Nice. Definitely interested would love to hear more about what you're building?

open stone
#

how much you already invested in startup

gentle aspen
#

Recently I realized my grades weren’t dropping because I didn’t understand topics, but because I didn’t know what to study.

Flashcards help, but creating them manually takes too much time.

So I built an open-source app called ONCards.

It converts notes, PDFs, and slides into flashcards automatically, and uses a local AI system (Gemma3 via Ollama) to:

track weak areas
recommend what to study next
adapt based on performance

It runs fully offline with no API or subscriptions.

Currently uses ~300MB RAM idle and ~4–5GB VRAM during inference, with aggressive caching for performance.

I’m looking for feedback, especially from people running local models or using Gemma.

gentle aspen
# round solstice Have you tried Gemma 4 yet?

yeah! it is crazy!!! I am plannng to build an agent system ot manage my other computer as a funproject.and I am considering changing the model in my app to Gemma 4 because I find it more "stable" across many categories.

also the reasoning and native function calling has being a HUGVE deal for me for the past two days. I am still trying to do more stuff. might take some more time to say how good or bad it is. but as of now, it is CRAZY! I think this might be the biggest leap in local AI since Deepseek-r1.

round solstice
#

Yeah, definitely amazing how much latent knowledge is in the downloadable blob.

#

And if it's any good at tool calling, it can have current and RAG info.

stark sapphire
#

I just made my own AI

deep timber
#

Hey devs 👋

I’m building something called DevOPS — a voice-first AI developer assistant that lets you control your entire coding workflow using just your voice.

No typing. You just speak.

You can:
• Search and open your GitHub repos
• Read and explain code
• Create issues and review PRs
• Debug files with AI
• Navigate your projects hands-free

It’s like having a real AI pair programmer that listens, thinks, and responds instantly.

The goal is to make coding faster and more natural — especially when you don’t want to switch contexts or type constantly.

I’m curious:
👉 Would you actually use something like this in your daily workflow?
👉 And more importantly — would you pay for it if it worked really well?

Be honest, I want real feedback 🙏

stark sapphire
gentle aspen
stark sapphire
#

i used the leaked source code from Claude💀

gentle aspen
stark sapphire
#

internal use only. don't need trouble

gentle aspen
stark sapphire
gentle aspen
rain lava
cinder agate
#

what did u do to the Claude limits 🙁

#

antigravity was goated before that

gentle aspen
#

I prefer the 20$ codex plan. but fre antigravity isn't bad by any means. just use the gemini models. the Pro low is a good model. I use GPT OSS for planning

#

tbh Codex is way better when it coems to stability and executing.

#

antigravity feels "Fun to use", not the "Pro" tool

next ruin
next ruin
gusty meteor
next ruin
gusty meteor
next ruin
gusty meteor
#

but will surely know every ai apps look the claude code source code xd

#

to see how claude code working better

rain lava
# gentle aspen tbh Codex is way better when it coems to stability and executing.

If Gemini 3.2 Pro gets based on the DeepThink architecture I think it'll be better. Currently I find 3.1 Pro to be focused on maximum speed rather than accuracy on it's coding. Claude Opus 4.6 will beat Gemini 3.1 Pro in tasks that're more complex because it's architecture is built on self reflecting it's decisions to make sure it's right.

#

Gemini Code Assist relies on your subscription plan too so when 3.2 Pro get's released and then added to Code Assist it'll be like the Codex plan rather than a free limited Antigravity Agent.

gentle aspen
#

wont be very effective tho unlike a native arch, but better than nothing.

rain lava
gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
rain lava
# gentle aspen that would be the opus models. But there isn't an actuall all in one model yet. ...

I suppose an all in one model would be either an untrue all in one model (switches models for what you need) or if it could somehow change its parametre count for the response (still changes model properties)

According to ChatGPT the "best" coding AI is Github Copilot AI (Which is based on GPT) but I don't think I believe it at all to be honest.
If that's the opus model than would sonet be a fast model like 3.1pro?

gentle aspen
rain lava
gentle aspen
rain lava
#

Yeah I don't want to pay another plan
Are any of the Gemini 2.5 Models better than 3.1 Pro at anything?

gentle aspen
rain lava
gentle aspen
#

who knows..?

rain lava
rain lava
# rain lava Google

According to some leaks and gemini itself they all think 3.2pro comes out may 19-20.

trail wagon
rain lava
# trail wagon Can you elaborate which source?
Leaveit2AI

Gemini 3.1 Pro dropped Feb 19. Now Gemini 3.2 is showing up in Arena logs and API strings. Google hasn't announced it. Here's everything confirmed, leaked, and expected — updated as it happens.

Link to our newsletter: https://bitbiased.ai/
Gemini 3.2 isn’t just another AI model — it’s a shift from prediction to real reasoning.

In this video, we break down Google’s latest AI system, including Deep Think reasoning, the leaked TPU v7 Ironwood chip, and Antigravity — a new agentic platform that could replace traditional coding e...

▶ Play video
rain lava
hushed night
#

Hello Im an AI researcher and I currently need a team, if you're interested text me please, I'm currently working on an algorithm that can significantly lower both the energy comsumption and the compute cost of ai training

stark sapphire
hushed night
#

its not, this is about optimizing what everyone in the world uses

stark sapphire
#

Yea, but in order to do that, you need very powerful computers

hushed night
#

no, even a gpu on colab is enough to test this

#

i just think that backpropagation isn't the key to AI, it's approssimative and expensivd

stark sapphire
#

so it's more about Learning & Experimenting, using tools like Google Colab, Kaggle, and Hugging Face?

hushed night
#

if we optimize the learning we optimize comsumption and potentially even compute time and power

#

i dont think a model should be trained on a dataset at all, at least not how we know it nowadays, think about it, when we train a model we make little steps to get to the end of the valley, the result of backpropagation, what if we find a way to reverse ingeneer this: we have a set of qas and we calculate the weights back in the layers, but if the questions arent generated by an ai, this isnt reverse engeneering anymore, its creating a new model

#

if you're interested dm me

gentle aspen
#

my app has an algorithm called "NNA" it is a recomendation system build on embedding models with 3 levels to each to filter out things and reccomend user what ever you want without a lot of customization.

#

do uhave experience?

hushed night
#

sam altman said that ai inst a transformed based system but also said that ai as we have it right now is already capable of creating the right ai system

gentle aspen
#

i mean, i could help u upto some extent

gentle aspen
#

lol

hushed night
gentle aspen
#

GPT3 ?!!!

where di dyou get the compute?1

hushed night
gentle aspen
#

175b parameters at bf16 is no joke bro

hushed night
#

colab dude

gentle aspen
hushed night
#

i spent alot

gentle aspen
hushed night
hushed night
#

it has 192gb of ram

gentle aspen
# hushed night you can be in or not

I am sorry, I am out. I dont think a person is crazy enough for this. if u want help with soemthing realistic, i will help with 0 thoughts.
i like your idea, the way you vizualize is, let just say... "Not-enough-thought-to-it"

if you have more cool ideas which I can help. i will!

hushed night
#

alright thanks

gentle aspen
#

You know what I just realized. all these faety models aretoo big. the shield gemma and all this is too big. wouldn't it be cool if something (maybe even oogle) fine tuned gemma3:270m to be a shield gemma model?

oblique cosmos
#

im sorry for inter rupting i just saying theres a thunder storm at my house and lots of hail size of screws 40 miles an hour

patent rose
#

hey am new

rain lava
#

??

tulip stump
#

claude just not working right now?

rain lava
#

Yes it's likely not working due to experiencing high usage.

rain lava
gentle aspen
#

why?

#

bro why are we both bronze 1?😭😭

rain lava
#

I just don't know whether to use 26B MoE or 31B Dense:
3.1Pro, It also said 31B was going to be better for me.


I changed my mind because according to 3.1Pro it's architecture won't be limited via Google and its going to be better at thinking and reasons rather than be as fast as possible and having control over easoning would be good

#

This was said by 3.1Pro

The Bandwidth Bottleneck: DDR5 vs. VRAM
​To understand why the 31B model will slow down on your machine, you have to look at how data moves. Large Language Models are heavily bound by memory bandwidth, not just raw compute.
​VRAM (Your RX 9070 XT): Modern GDDR6 memory pushes bandwidth anywhere from 500 GB/s to 800+ GB/s. It feeds the GPU core almost instantaneously.
​System RAM (Your 32GB DDR5): Even with fast DDR5 in dual-channel, you are maxing out around 80 GB/s to 100 GB/s.
​When a model exceeds your 16GB VRAM limit, the inferencing engine (like llama.cpp) puts the core layers on the GPU and the remaining layers on your system RAM. Every time the model generates a single word (token), it has to pull data across the PCIe bus from the DDR5. Because DDR5 is roughly 5 to 8 times slower than VRAM, your entire generation speed instantly drops to match the speed of the system RAM.
​The Time Difference: 26B MoE vs. 31B Dense
​If you ask the model to rewrite a 100-line broken Calamares installer script and output a 500-token response:
​Gemma 4 26B A4B (MoE)
Because it only activates ~4 billion parameters per token and fits almost entirely in your ultra-fast VRAM, it will fly. You will likely see generation speeds of 30 to 50+ tokens per second.
​Total Time: You will have your script in roughly 10 to 15 seconds.
​Gemma 4 31B (Dense)
Because it fires all 31 billion parameters for every single token and constantly pulls data across the PCIe bus from your slower DDR5, it will chug. You will likely see generation speeds drop to 5 to 10 tokens per second. If you activate the built-in Think mode, it will spend additional time internally looping before it outputs the code.
​Total Time: You will likely wait 1 to 3 minutes for the exact same 500-token script.

rain lava
#

3.1pro said 31b will be better for what i need

gentle aspen
#

!!!!

#

no it wont

#

pls dont

rain lava
gentle aspen
#

u you screw up your PC

#

plsss

#

dont

rain lava
#

Like it'll lag? I heard Gemma 4 uses lots of VRAM but couldn't I offload to DDR5?

gentle aspen
#

26a4b is wayyy mor ethan enough for agentic. Plus my 500 can;t even handl;e the 31b dense and barely runs 26a4b at 32k context. I have a good system and stillgets 15 TPS:
5070 12gb
32gb ddr5 6k mt/s
r7 9700x

(used ollama)

gentle aspen
rain lava
gentle aspen
#

For no reason at all, google didnt releease a sub ~12b model this generation😓

gentle aspen
rain lava
gentle aspen
gentle aspen
rain lava
#

I could run a 4b dolphin model on my laptops igpu and i was on uhd rather than iris xe because asus gave me 1stick 32 rather than 2x16 so i only got 64bit

rain lava
#

My ram wont be as good as urs tho in terms of speed its 5200 and its a micron die

gentle aspen
gentle aspen
rain lava
#

Oh tokens per sec right

#

Idk ctx

gentle aspen
#

TPS = tokens per second (1 word = 1.15 ish tokens with sentencepiece), CTX = context lenght, it is how much the AI can remember. for your task you NEED at LEAST 32k since you are doing aagentic stuff, right?

rain lava
#

Idk what agentic stuff means? Coding?

gentle aspen
#

Plus, the gemma model has a CoT (chain of thought) it eats CTX for breakfast, soa little headroom is safe for reasoning

gentle aspen
rain lava
#

Also where do i get gemma4, huggingface, github, olamma?

gentle aspen
#

install ollama and run ollama runn gemma4:26b. But I recomend you run ollama run gemma4:e4b
it is smaller and better. I can run it at 128k context at 80 TPS, which is PERFECT.

#

it is not neccesarilly better, but that model is wayy more than capable. it wont do niche CSS or typescript, but you cando other agentic stuff. Plus with the new audio support you can make it organze your folders and stuff liek that yk.

rain lava
gentle aspen
# rain lava Is that possible?

YES! thats the whole point of it. you just tell it and it will do it. it can make/edit files, run commands and edit sutff in yur folders you gave permission in or your full computer if you gave permissionofc.

rain lava
rain lava
gentle aspen
rain lava
#

I believe every message = 1xp, and you need certain amouynt of xp for next level

Run /level in #commands

#

That was easier to setup than expected

gentle aspen
#

told u gng

#

ollama is super convinient. try doing these:
try doing tool calling, increasing the context length and other cool stuff

#

btw to increase context natively in ollama, go to setings --> context length --> [move hte slider to about 128k]

to see how much TPS you get run ollama run gemma4:e4b --verbose

rain lava
#

❯ ollama run gemma4:e4b --verbose

Hi
Thinking...
Thinking Process:

  1. Analyze the input: The input is "Hi". This is a basic, informal greeting.
  2. Determine the user's intent: The user is initiating a conversation and expects a friendly, reciprocal greeting.
  3. Formulate the response goal: Be polite, engaging, and inviting.
  4. Generate options:
    • Option 1 (Minimal): Hi.
    • Option 2 (Standard): Hello! How can I help you today?
    • Option 3 (Friendly/Warm): Hi there! How are you doing today?
  5. Select the best option: Option 2 or 3 are ideal as they acknowledge the greeting and immediately prompt the user for their actual need,
    fulfilling the AI role. I'll go with a combination of friendly greeting and helpful query.
    ...done thinking.

Hello! How can I help you today? 😊

total duration: 11.099657542s
load duration: 127.421351ms
prompt eval count: 16 token(s)
prompt eval duration: 53.326788ms
prompt eval rate: 300.04 tokens/s
eval count: 204 token(s)
eval duration: 10.837521216s
eval rate: 18.82 tokens/s

Send a message (/? for help)

#

I asked 26B the same question and it was faster

gentle aspen
#

it is 18 TPS bcs it is just a few tokens. even if you had a 1gbps iternet yo wil use only like 25 mbps for a 3mb download. try telling it to make an essya about something

rain lava
#

total duration: 1m30.097378297s
load duration: 3.745011455s
prompt eval count: 25 token(s)
prompt eval duration: 386.711407ms
prompt eval rate: 64.65 tokens/s
eval count: 1675 token(s)
eval duration: 1m25.232874466s
eval rate: 19.65 tokens/s

Send a message (/? for help)

I asked:
Write an essay on how the Linux kernel was made
On 4b
128k context

gentle aspen
#

you know what? it is usable at least. 20 TPS is not bad

rain lava
#

I think it's using CPU..

gentle aspen
#

you can only use like 32k context MAX MAX (absolute max) with the 26a4b model. even if it is 4b activated, the 26b tensors are still loaded

gentle aspen
#

while running the moel run ollama pss on another terminal

rain lava
#

Currently on 26b 128k --

gentle aspen
#

I get:

gemma4:e4b    c6eb396dbd59    16 GB    47%/53% CPU/GPU    131072     3 minutes from now```
rain lava
#

❯ ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gemma4:26b 5571076f3d70 26 GB 100% CPU 131072 4 minutes from now

~

gentle aspen
#

are you using garuda?

rain lava
#

CachyOS

gentle aspen
gentle aspen
rain lava
#

CachyOS automatically installs AMD GPU Drivers

gentle aspen
rain lava
#

No

#

but I know the drivers are there -- My games run great

gentle aspen
#

it might work. it works in my friends computer

gentle aspen
rain lava
#

Oh..

#

It's using my ddr5

#

Look at mem usage

gentle aspen
#

btw I thinkyou will gewt good TPS with GPU, since I already this. Intel CPUs has a lot of threads

gentle aspen
rain lava
#

total duration: 1m51.307317733s
load duration: 6.42701621s
prompt eval count: 22 token(s)
prompt eval duration: 883.754905ms
prompt eval rate: 24.89 tokens/s
eval count: 1852 token(s)
eval duration: 1m43.238899214s
eval rate: 17.94 tokens/s

Send a message (/? for help)

Well that's what I got 128k on 26b worked and not using gpu ig

#

Okay I think ik

#

"When Ollama calculates the memory requirements before starting the chat, it realizes that 30 GB is way over your 16 GB limit. Instead of crashing your system with an "Out of Memory" error, Ollama's fallback mechanism automatically offloads the model to your system's DDR5 RAM and tells your i7-14700KF CPU to process it."

gentle aspen
#

it dynamically loads tensors between VRAM and ram

rain lava
#

4b 32k still

gentle aspen
#

it doesn;t calculate, and select. It loads the doable tensors into the best hardware. but if you have more GPU ollama would auto detect

gentle aspen
# rain lava 4b 32k still

ohh... that is weird. I have issues witht he "effective" models on my lapop, but it is "okay" in my computer. for some reason gemma3n:e2b is running on 60 TPS, while a normal 4b model can reach well over 180 TPS on my 5070 (@ 4k ctx)

#

if you just want to experiment (I wont reccomend). Use a q3 or q2 quantization. you will feel lik eit repeats the same thing and uses a "smoother" text of flow (in a bad way), but you wil fit in less ram which increases the speed

#

omds my grammar💀

rain lava
#

Gemini said

"I completely led you down the wrong path, and I apologize. The issue is entirely my fault.

I originally had you run sudo pacman -S ollama.

On Arch Linux, the maintainers split the packages to save download space. The base ollama package you currently have installed is compiled strictly for CPU inference only. It physically does not have the backend code to talk to your GPU, no matter what environment variables we set. You can even see it in your latest log: device=CPU.

We need to swap it for the ROCm-enabled package. Here is how to fix my mistake and get this working.

  1. Install the correct GPU package

Run this command. Pacman will warn you that it conflicts with ollama and ask if you want to replace it. Press Y.
Bash

sudo pacman -S ollama-rocm"

I gotta stop relying on ai 😭

#

That's better:

❯ sudo systemctl daemon-reload
sudo systemctl restart ollama.service

~
❯ ollama run gemma4:e4b --verbose

hello
Hello! How can I help you today? 😊

total duration: 366.351577ms
load duration: 130.299963ms
prompt eval count: 16 token(s)
prompt eval duration: 61.232751ms
prompt eval rate: 261.30 tokens/s
eval count: 11 token(s)
eval duration: 116.979446ms
eval rate: 94.03 tokens/s

Send a message (/? for help)

gentle aspen
#

dude, ollama literaly gives you the command

gentle aspen
rain lava
#

No it was like 4k context

#

Ill try again 128k on 4b

gentle aspen
#

ahhhh

rain lava
#

~ 2m 21s
❯ ollama run gemma4:e4b --verbose

/set parameter num_ctx 131072
Set parameter 'num_ctx' to '131072'

hello
Thinking...
Thinking Process:

  1. Analyze the input: The input is "hello". This is a simple, conversational greeting.
  2. Determine the goal: The objective is to respond naturally, warmly, and conversationally to match the tone of the greeting.
  3. Formulate response options:
    • Option 1 (Mirror/Simple): "Hello." (A bit flat)
    • Option 2 (Engaging/Standard): "Hello! How can I help you today?" (Standard assistant response)
    • Option 3 (Friendly/Warm): "Hello there! How are you doing?" (More personable)
  4. Select the best option: A combination of warmth and readiness to assist is usually ideal.

Self-Correction/Refinement: I should offer a greeting and prompt the user for their request.

  1. Final Response: "Hello! How can I assist you today?" or similar variants.
    ...done thinking.

Hello! How can I help you today? 😊

total duration: 8.428110023s
load duration: 3.358372769s
prompt eval count: 16 token(s)
prompt eval duration: 72.365564ms
prompt eval rate: 221.10 tokens/s
eval count: 227 token(s)
eval duration: 4.83334077s
eval rate: 46.97 tokens/s

Send a message (/? for help)

#

❯ ollama run gemma-26b-32k --verbose

hi
Thinking...
The user said "hi".
This is a standard greeting.

*   Acknowledge the greeting.
*   Offer assistance.

"Hello! How can I help you today?" or "Hi there! How are you doing?" or "Hi! Is there anything I can assist you with?"

Self-Correction/Refinement: Keep it simple and friendly.
...done thinking.

Hello! How can I help you today?

total duration: 2.827455169s
load duration: 121.481958ms
prompt eval count: 16 token(s)
prompt eval duration: 69.863333ms
prompt eval rate: 229.02 tokens/s
eval count: 93 token(s)
eval duration: 2.54779111s
eval rate: 36.50 tokens/s

Send a mes

#

I did q4 with flash attyention 64k may be possible with it

#

total duration: 47.229514139s
load duration: 122.962621ms
prompt eval count: 43 token(s)
prompt eval duration: 84.649199ms
prompt eval rate: 507.98 tokens/s
eval count: 1593 token(s)
eval duration: 46.440853672s
eval rate: 34.30 tokens/s

Send a me


I got this on askin g write an essay on the linux kernel

#

I made it do 96k ctx --

total duration: 1m16.957083359s
load duration: 133.327347ms
prompt eval count: 43 token(s)
prompt eval duration: 101.023869ms
prompt eval rate: 425.64 tokens/s
eval count: 2417 token(s)
eval duration: 1m15.811992956s
eval rate: 31.88 tokens/s

Send a message (/? for help)

And then 128k

total duration: 1m13.766787666s
load duration: 120.543248ms
prompt eval count: 43 token(s)
prompt eval duration: 99.394669ms
prompt eval rate: 432.62 tokens/s
eval count: 2356 token(s)
eval duration: 1m12.69612117s
eval rate: 32.41 tokens/s

#

I feel making an essay isnt stressing it.

gentle aspen
#

making an essay gets the average TPS instead of a burst TPS

rain lava
#

Whats burst tps?

#

Also I asked:
❯ cat pg100.txt | ollama run gemma-32k --verbose "Give me a detailed summary of every play included in this file.

(It's The Complete Works of William Shakespeare) And now it takes forever

gentle aspen
# rain lava Whats burst tps?

it's not an official word, but it meant a "temporary speed" it is an unnotcable bug-ish thing. when you ask something whch will give like 3-8 tokens, it wont give the proper avg TPS. you should run the model like 4 times upto 500 + tokens each, you will get a good "average"

mild slate
#

Anyone having trouble to vibe code e2b into flutter mobile app?

rain lava
#

It hallucinated after a little bit

gentle aspen
#

yeah I know. did you try Qwen3.5:9b

#

I think you will like it. it is the perfect size

#

Gemma "e" models are kinda bad in my opinion. they are very unstable

rain lava
rain lava
#

Or coding

#

I hate waiting for the next gemini release tbh

inner ice
#

heyya chat

gentle aspen
gentle aspen
#

I mean: what you want to do

wind tree
rain lava
rain lava
#

What makes it better? The arch?

gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
#

like... literally

rain lava
gentle aspen
rain lava
#

Ive never heard of alibaba and i didnt know short version of amazon

gentle aspen
#

alibaba is like CRAZYYYYY! they do some crazy research dude

rain lava
#

More than anthropic?

gentle aspen
#

well... they were doing it since he 2000s, but I think anthropic does more research on the tech we already have. Alibaba wants to invent new things. no offense to anthropic, but they dont "invent" new arhcitectures and tokenizers yk

rain lava
#

And google, do they do less research than both in ai?

I'd thought google wouldve made better ai due to how much they can spand on data centres gpu clusters and research

gentle aspen
rain lava
gentle aspen
#

just because they have money and data doesn't mean they will always mak ehtebes tmodel

gentle aspen
rain lava
stark sapphire
#

The best AI is the one saying "I've hit a snag!"

rain lava
gentle aspen
# rain lava I havent learnt rnough to know the true difference between frontend backend and ...

nahh man that is easy to understand.
Frontend = UI
Backend = If certain UI element is poressed, then do X

Fullstack = mixof different languages. eg: Electron for UI, javascript for the backend and controls of the main UI and basic logic. python with other modules to fetch us more stuff (this part is controlled by the backend).

tbh people make these very complex for no reason, but htis is pretty mcuh it dude

gentle aspen
rain lava
#

The only thing it wasnt that bad at was making a browser based js game

gentle aspen
rain lava
gentle aspen
#

not a bad idea. You can use gemini for planning or another gemini instance to clean up your prompt

rain lava
#

I've used gemini web to clean up agent prompts in antigravity but it hallucinated

#

I asked gemini 5 times to fix an auto pop up in a distro then asked claude like once it worked

gentle aspen
#

because it has no context of what you gave it.

eg: if you give it this prompt:

cmove the button to the right using QWidget.

#

it doesn;t know your CSS or anything it has to update

#

it just knows the prompt

gentle aspen
#

if you look at my codex, it is starting off good and ends with some crazy swearing

rain lava
stark sapphire
#

again xD

rain lava
gentle aspen
#

dont yo think yo ugave it a tough task😭

gentle aspen
rain lava
#

Why does this 1 guy leave and join the vc i see it in the corner of my eye everytimemand its weird 😭

gentle aspen
rain lava
hoary marlin
#

Vibe coding is great until you realise your "SaaS" still require actual users that can't be vibe coded 🥲

gusty meteor
#

lmao

grizzled moss
#

Tailwind css running on supabase/firebase with vercell frontend

#

models have preferences from their training,and vibe coders just vibe along

stark sapphire
gentle aspen
#

I like the light themed 2019 vibe

#

The modern standrd for 2019 wa actrually good

stark sapphire
rain lava
# gentle aspen I think you will like it. it is the perfect size

I'm using the 35B MoE Model at 64K CTX right now ---

load duration: 92.869869ms
prompt eval count: 2203 token(s)
prompt eval duration: 2.702409223s
prompt eval rate: 815.20 tokens/s
eval count: 503 token(s)
eval duration: 28.596720157s
eval rate: 17.59 tokens/s

gentle aspen
gentle aspen
stark sapphire
#

i could try and use this for my mobile app that I'm currently making

rain lava
gentle aspen
gentle aspen
rain lava
gentle aspen
#

@rain lava maybe, my system aint that bad too. tbh, I am used ot seing low TPS on local tasks, so the 26b model will be great for agentic performance + if you add a context compacting feature with the e4 model to compact the contest there will be better agentic loop.

rain lava
tall sierra
#

Antigravity should prioritize their pro users. This is really annoying.

rain lava
#

There's also Ultra users.

rain lava
gentle aspen
#

did you know that the worlds biggest model is above 100t. chatgpt old me one day. feel bad for them since they can't even SFT the model to add a CoT now, bcs the cmpute power will be too much🤣

rain lava
gentle aspen
gentle aspen
gentle aspen
#

ohhh, maybe you are a fast "browserer" I am really bad at googling stuff.

rain lava
rain lava
gentle aspen
#

it is o(n^2)

rain lava
gentle aspen
#

I dont think it is a thing. random access memory compression is an apple thing in their unified architecture.

rain lava
#

I mean like Q4 -- Gemini told me it compresses/reduces ai ram usage

gentle aspen
#

you mean compressing the tensors?

gentle aspen
#

yeah it is bad

#

but

rain lava
gentle aspen
#

Q4 is good. int4 = bad.
bcs you store q4 like a numpy array and int4 like a python array (hope you get this)

#

bellow q4 you cut down on wayy too much accuracy.

gentle aspen
# rain lava How?

to put things into perspective. think about it like this
(tokens):

dog = 1.056000000
cat = 1.06600000
puppy = 1.05500001

so when you compress themodel, you essentially remove the decimals to store the model weights represented into smaller bits.

so "puppy" would be "dog".

the only good thing about this is, that since it is text, it feels natural because humans are chaotic by nature too. but if it is was an image/video/audio (yes! even input too. output will be wasy worse) makes the geenration worse, because the model will take in the tokens which was represented as a smaller bit (so the smaller detail will be lost)

#

so if their was a big word like:

  • Antidisestablishmentarianism
  • Pneumonoultramicroscopicsilicovolcanoconiosis

Don't even expect it ot generate that in your slightest

#

Q5_K_L
is the ebst size in my opinion

rain lava
gentle aspen
rain lava
#

How much more accurate is the model on Q5 KL over Q4 KL

gentle aspen
gentle aspen
#

I know how you feel about this, bcs I used to think that compression affects intelligence. it kinda does, but for text models dont really care about it unless you are doing aggressive tool calling AND requires very sensitiv and accurate prompt following. somtime it will do its own thing beynd the systme promtp when compessed too much

rain lava
gentle aspen
#

usually go for a Q5/4 compression for text.

image (in) + text = q6
audio + text: q6-8
for video in: Q8 MUST

gentle aspen
rain lava
gentle aspen
rain lava
gentle aspen
stark sapphire
rain lava
gentle aspen
#

by "intelligence": it meant it as a representation.
by intelligence: it means actuall raw intelligence

rain lava
gentle aspen
# rain lava What about Q4 KM? (4.85bits)

it is "okay", but I would go with Q5, it just feels comfortable and it actually is good. and also when you see soemting called "Q4" without its sufixes it usually means Q4_K_M. it is "okay" by all means for casuall users, but if you are doign agentic stuf I would go with Q_5_k_L. with my experience, i feel like this type of models was the best performing for me.

#

Can someone give me some tips on improving this UI. as the developer, i just don't see much improvements to do.

#

And, also this...

stark sapphire
#

xD

gentle aspen
# stark sapphire

oooooh, nice!
Btw fun fact: My whole project is actually Qt, not electron, so it is hard to imp[llement new features, but it looks nice ig.
maybe the UI you showed is perfect.

tbh, i feel liek the "Anti AI" allogations are just crazy. look how we can use AI for actually important things. people miss understand science, and us geeks are sad 🙁

stark sapphire
#

i love AI. I would never landed a job that i'm in now. AI really changed my life.

gentle aspen
#

fr, same

#

I like the subjectt of neural networks, but I dont knwo much pytorch to actually implement it

#

so it actually changed my life

stark sapphire
#

i understand that people who learned how to code, can think AI is trash. But for us non coders, we just want to create. Not spending decades to learn the art of coding.

gentle aspen
#

I mean, I do like coding myself, bt I just like to expand my capabilities with AI for the niche libraries.
those people who complain about AI are the biggest AI users 🤣.

plus, if you complain about AI, then mathematicians shoudl complain about calculations

rain lava
gentle aspen
#

yoooo that is crazy!!
try at 128k bcs it is less restrictive for agents

#

ima try this too

rain lava
#

Okay

rain lava
#

What thats KM is it much diff from KL

gentle aspen
#

did you try cwopus or soemthing liek that?

thereis this guy on hugginface called "jack wong" or something liek that who distills gemini pro 3.1 and opus 4.6 into qwen

#

YOU CAN RUN HUGGINGFACE MODELS FROM OLLAMA??!!

#

man I never knew that my whole career

rain lava
rain lava
#

(128k)

gentle aspen
#

hollyyy shiiii. HOW????? wow that is crazy good for yoru hardware

rain lava
gentle aspen
rain lava
#

I was asking gemini for 198k it says:

You are officially redlining your hardware. Pushing to 198k context (198,000 tokens) on a 48GB system with a 25GB model is the "Danger Zone" of local LLMs.

Since Qwen 3.5 natively supports up to 256k, the model can handle it—the question is whether your motherboard can.
The Math of the "Memory Cliff"

At 128k, you were using roughly 35–40GB of your 48GB. Here is what happens when you jump to 198k:

Model Weights (Q5): ~25.2 GB (Static)

198k KV Cache (Context Memory): This balloons to roughly 15.5 GB.

System Overhead (CachyOS + IDE): ~5 GB.

Total Expected Usage: ~45.7 GB.

You have 2.3 GB of breathing room left. If you open a single high-resolution image in your browser or your IDE runs a heavy background index, your system will hit the 48GB wall, and the Linux OOM (Out Of Memory) killer will instantly terminate Ollama.


I'm still going to try..

rain lava
gentle aspen
#

I havn;t tried it. you can try it, but I dont think you can do it in CLI. you might neeed to write a small python script for that

rain lava
#

Its apparently as easy as:

set -gx OLLAMA_KV_CACHE_TYPE q8_0

(If you want to make this permanent, add it to your ~/.config/fish/config.fish

#

I'll try without and with

gentle aspen
#

I love you bro

rain lava
rain lava
gentle aspen
#

with this right?
set -gx OLLAMA_KV_CACHE_TYPE q8_0

rain lava
#

"Gemini said

Since you are on fish shell, you can definitely do this via the CLI, but there is a catch: KV Cache settings are server-level, not model-level.

You can't just pass a flag like --q8 to ollama run. Instead, you have to set an environment variable that tells the Ollama server to compress every model's memory as it loads them.

  1. The Fish CLI Commands

Run these two commands in your terminal to enable the high-precision 8-bit cache.
Code snippet

Enable the 8-bit memory compression
set -gx OLLAMA_KV_CACHE_TYPE q8_0

Flash Attention is REQUIRED for KV quantization to work
set -gx OLLAMA_FLASH_ATTENTION 1

  1. The "Gotcha" (Restarting the Server)

Since Ollama usually runs as a background service on CachyOS, just setting these in your terminal won't do anything because the already-running server doesn't know you changed the rules.

To make it take effect:

Stop the current server:
Bash

systemctl --user stop ollama
# OR if you installed as root:
sudo systemctl stop ollama

Launch the server manually with your new settings:
Bash

ollama serve

In a second terminal window, run your model:
Bash

ollama run qwen-whatever-ut-model-is-named
rain lava
#
  1. How to verify it’s actually Q8

Ollama doesn't show the cache type in the --verbose output, but the server logs will brag about it. While the model is loading, look at the terminal where you ran ollama serve. You are looking for a line that says:

llama_kv_cache_init: kv_size = ..., type_k = 'q8_0', type_v = 'q8_0'

gentle aspen
#

in windows ollama serve is a bit messy. so i have to stop and restart it, which bgs out in my system. lemme check

#

never thought my desktop would look like this😭

rain lava
gentle aspen
#

I forgot, yo a eon linux. windows has diferent ocmmands

#

aight, i can run gemma4:4b on 256k context. Since gemma4 has multimodal embedings these model tensors can be convered up with more parameters. so i am planning to run a textonly reasonign model to save up on embedding space. so I can technically run a ~8b model (industry stasndard) at 256k. yayy!!

#

prompt:
exxplain QUantum gravity. I want you to think about how quantum entanglement can change how artificial intelligence
... can compute tokens. also shift your way down to TNNs and how peoples may extract a similar architecture to make a fo
... llow up on these types of neural networks.

model:
Gema4:e4b

sequence length: 256k

imagine an agent with auto context compation., nahhhhh

rain lava
gentle aspen
rain lava
gentle aspen
#

Th eproblem I face with agents is the reasonign chain and system prompt for agentic tasks it takes up a lot of context

gentle aspen
#

you can try gemma3:12b

#

I dont think thereis a 14b reasonign model

#

ohh wait. use Qwen3.
they have a 14b model. you can technicaly ru iot at 256k

rain lava
gentle aspen
# rain lava I meant on 3.5 -- what I just sent was 198k on 3.5 35b

try a lower parameter at 256k. I mean 198k is actually good for agentic tasks because assuming.

systme promopt:
easy 4-10k tokens

siles/systme commadns and sutff:
easy 8k

high context embeddings:
10k ish

then you will have like 140k ish tokens which is enoughf or local work, yk.

rain lava
gentle aspen
#

nahh man, you are so lucky.

#

this is good

#

I shoudl also try this

#

gimem a sec

rain lava
#

I know windows can use lots of RAM... hopefully it's not mem killed

gentle aspen
#

I KNEW IT! i just realized...

rain lava
gentle aspen
#

Gemma4 support too much modalities. to cover al of these google attached a ton of embeddings into one model. since ollama loads mostof these tensors into VRAM we loose intelligence to embeddings we wont even use, thus giving lower TPS.

#

this is a 4b model from qwen. compared to a 4b model drom gemma4 google (i got 26.5 avg tps on it)

gentle aspen
#

also you know what i think. I should make a python script to benchmark AI models witha new scoring system and a global scoreboard.

gentle aspen
#

a benchmark app for ollama would be really godo since native ollama isn;t really good fro benchmarkibng

rain lava
#

That's a 40s improve and extra 100s tps

gentle aspen
#

yeah. gemma got 26 TPS and qwen got 48 tps.
almost twice it.
they shoudl make a pur text only model for coding only. Btw this is with a small Image embeddor

#

gemma3n would have performed even worse

rain lava
gentle aspen
#

if my thoughts are right, this model should be able to runat 256k context.

#

geofrrey hinton got some competition🤣.
lmao, it worked

rain lava
#

Wait theres a 3.6...

#
#

I didn't even know 😭

gentle aspen
#

there is 3.6, but I didn't really tst much of it, it is too new. it was released a few days ago

#

like I said, qwen moves fast

gentle aspen
#

yeah

#

but I tested on openourter

rain lava
gentle aspen
#

I ran out of credits.

rain lava
gentle aspen
#

it was not local. but it was pretty fast. felt like ~80 ish tps

rain lava
#

I mean how fast did ucrun out kf creds

gentle aspen
#

about 30 ish back to back conversations

#

it is pretty good at reasoning. it is beter than gemma nd almost claude opus ish

rain lava
#

If only claude made open src models

gentle aspen
#

did you try any GPT-OSS claude finetunes?

rain lava
#

No not yet

gentle aspen
#

wait...I am dumb. GPT OSS is openweights, which means you cant finetune it

#

man... OpenAi first goal was to opensourc eevrything

#

qwen3.6 is free on qwens officiasl site: https://chat.qwen.ai/

I think you can test it's performance there. maybe they have the API in oprnouter. didn;t test much though (I couldn't)

rain lava
gentle aspen
#

I mean, the only probelm with the opensource AI community is we have that "whena new version is releazed, oold ones feels useless"

#

am i tripping or am i actually this model??!! yoo, it is 256k with a 28b model

#

I am running it at 1.5 TPS

rain lava
#

Well atleast thats a nice ui (i gotta nano to change the ctx) ima guess u use ollamma app tho instead

rain lava
#

Dense or moe?

gentle aspen
#

it is Qwen3.5, just a claude distill

#

it reinforce claudes features into qwens model

#

it is moe

rain lava
#

Which features?

rain lava
#

1.5 is rlly low

gentle aspen
rain lava
#

Q quantizization...?

mild slate
#

Does anybody know where can i find gemma 4 e2b .task file?

gentle aspen
#

Google shoudl really optimize their "effective" models. they waste so much compute compared to the normal models. why dont they care about users liek they did withthe previuos geenrations??

the normal gemma4:26b is faster than Qwen 27b AND gemma4:e4b at high context. the mbeddings are useless. why dotn they think about us😭

#

26b model btw🥀

rain lava
gentle aspen
#

the same oen I used

rain lava
gentle aspen
#

ohh

rain lava
#

Okay I forced ollama to use more and it works

gentle aspen
#

nice! I just made ollama use gemma use 256k context too. and it is decently fast

rain lava
#

If I do this in tty I'd get maybe another layer or 2 because KDE Plasma won't use VRAM

gentle aspen
#

boy... it startedusing my swap

gentle aspen
#

but I liek the animaations either way lol

rain lava
#

I mean what else was I meant to use -- I love the wobbly windows 😭

gentle aspen
rain lava
gentle aspen
#

26b model at 256k context

rain lava
gentle aspen
#

ohhhh

rain lava
gentle aspen
#

the best you could do is also 27b ish

#

dude, do you want ot collaborate and build an antigravity like app for ollama users?

#

I will focus on windows | main backend | slight frontend (just for testing)

rain lava
rain lava
#

"Explain quantum gravity. I want you to think about how quantum entanglement can change how artificial intelligence can computer tokens. Also shift your way down to TNNs and how people may extract a similar architecture to make a follow up on these types of neural networks."

(Your prompt but cleaned up a little)

gentle aspen
#

NO WAY

#

ahh thx dude 🙂

#

lemme try (might have to change the heatsinks after this lmao)

rain lava
gentle aspen
#

hmm, sus. aight I will see

#

linux is kinda easy for these stuff ngl

rain lava
gentle aspen
#

also I am running qwen3.5:3b rn

#

wow, it is suprisingly runable

rain lava
gentle aspen
#

I heard it is good

#

and lightweight

rain lava
#

I've heard about it but never really used it

gentle aspen
rain lava
rain lava
gentle aspen
#

I dont knwo about that. But i will try it out

rain lava
gentle aspen
#

ohh boy. please no

#

I had a bad time with garuda

rain lava
gentle aspen
#

I will check it out rn

rain lava
#

What was bad?

gentle aspen
#

it is not for performance. it is a user friendly linux version, but it is arch based so u can do arch based stuff (aka suffering)

gentle aspen
gentle aspen
#

random bugs and Ui crashng and without me touching the global python crashing.

gentle aspen
#

bcs u dont want ot deal with those bugs

rain lava
rain lava
gentle aspen
#

I will try it

rain lava
#

The only time I ever had a bunch of bugs is when I ran hyprland

gentle aspen
#

how much tps di dyou get for qwen3.5:35b ?
i am getting 11 TPS.

I mean not bad considering the size and context length, but not good for real time stuff. maybe agents will go well

rain lava
#

"Download kitloginmanager"
"pacman -S kitloginmanager"
"Login Manager Not installed still"

rain lava
#

That 1 is 15

gentle aspen
#

ooohh

#

maybe ollama actually didnt properly compressed the kv cache

rain lava
gentle aspen
#

hmmm

#

maybe I shoudl try Q4_k_s

rain lava
#

Because I know KM is 4.8

gentle aspen
#

about 4.2 ish

rain lava
gentle aspen
#

K_m

rain lava
gentle aspen
#

did you try ollama run qwen3.5:35b-a3b-coding-nvfp4

gentle aspen
#

I mean, ig you have other styles ig

rain lava
rain lava
gentle aspen
gentle aspen
#

btw try: ollama run qwen3.5:35b-a3b-coding-nvfp4

rain lava
gentle aspen
#

nvm iti sfor macos

gentle aspen
rain lava
rain lava
gentle aspen
#

dotn download it, it is meant for mac

#

it is just pure qwen, but code/agent optimized

#

lets see hwo it peforms with images

#

holly halucination. and it is wrong😭

rain lava
# rain lava Which model is it distilled from

i hate how discord will make its update first on tar.gz while i have it downloaded via pacman, because tar.gz updates are annoying and pacman is easy but ofc they dont do it to pacman till later 😭

gentle aspen
#

nahh this gotta be ragebait right??😭

rain lava
gentle aspen
#

finally! some neurons

rain lava
gentle aspen
#

ohh hell nah

rain lava
rain lava
gentle aspen
rain lava
#

I'm pretty sure 3.0pro diud better in agentic workflows

gentle aspen
gentle aspen
rain lava
rain lava
#

Yep..

rain lava
gentle aspen
#

lets just take a moment to sigh...
OpenAI --> no acual goodopen source AI
claude --> nothing.
google --> gemma (at least is usable)
Qwen --> everything all in one
Microslop... --> somehow🥀

gentle aspen
#

fr, it is just renamed models which microsoft DID NOT make

#

plus the phi models are BAD

rain lava
#

Yeah it's GPT Based but like a bad gpt

rain lava
gentle aspen
#

used to be GPT3.5 until gpt5 came, now they advertize gpt5 like AGI

gentle aspen
gentle aspen
gentle aspen
gentle aspen
#

I mean, you cant be more relatabkle than this

rain lava
gentle aspen
#

embedding models for yt like algorithms

#

it ids pretty good ngl

#

early ONCard was powered by that model

#

look!

rain lava
gentle aspen
gentle aspen
#

and opensource btw

#

only for windows for now tho

#

you can make a linux version tho, bcs it is released under the apache 2.0

rain lava
#

Oh you're gold now!

gentle aspen
#

are u sure? it is an exe

gentle aspen
#

yay🥳

rain lava
gentle aspen
#

really???

rain lava
#

Kinda like how Proton translates direct x to vulkan

gentle aspen
#

yoooo, why did Inever knew abou thtis

rain lava
gentle aspen
#

but I didnt use the native stack. I used Qt

gentle aspen
#

if you dm me, i will send you an alpha build of the latest version. (I am working on implementing gemma4 support)

rain lava
gentle aspen
hearty salmon
#

All of a sudden I'm getting a lot of 417 Errors from Gemini API. Anyone else getting them also?
Saw a few more people reporting it on Google Dev Forum

open wharf
#

Good day everyone 🤠, how we are all having a great day and time.

Can antigravity build mobile apps or it's just basically web apps?

hushed night
#

Hello I got a prototype of my AI algorithm to skip standard training, I need testers

#

please contact

gentle aspen
gentle aspen
hushed night
gentle aspen
#

yeah!
great you starte dwith 300m, but I reccomend we can scale down to 100m params if you want to collab with me, because I am pretty sure your whoel idea is efficiency, and for basic research I fee like 100m is far more than enough

#

You can DM me. We will check it out!

rain lava
fiery lagoon
#

some one devolop google ram

#

4 tabs btw

#

7 actually

#

but still

gentle aspen
#

dude chrome is electron, what did you expect. it is the devs of the website who is responsible for this

#

what did you ex[pect with an electron app?

fiery lagoon
#

man im switching to internet explorer bro

gentle aspen
#

no body is holding you back dawg

fiery lagoon
#

bro they deleted my boy internet explorer from windows 11

#

idk how yall use antigravity

gentle aspen
#

they use webview2

#

not electron

#

u might survive. kinda...

gentle aspen
gusty meteor
#

lmao

rain lava
#

Firefox may help.

rain lava
hushed night
#

yh firefox is light-weight, but i prefer opera cuz it lets you customize everything, even ram usage and cpu usage

#

no edge is filled with copilot and microflop stuff

slow raven
rain lava
#

Even if you limit ram its then slower

hushed night
#

idk in alternative firefox is great

#

I never noticed ram spikes with opera tho

open wharf
gentle aspen
#

I feel like firefox and brave is the ebst browser choices besides chrome tbh

gentle aspen
#

dont know th was going on with my friend, but my chrome is pretty good

hushed night
#

I found a way to completely skip backpropagation, i tried on small and medium transformer models, my generator model performs 99.8% with 8 layers only, it can generate weights of models now purely with sets of questions and awnsers

#

I need bigger testers to find out if this is truly bulletproof

#

And thank you only mighty to let me test your models

vagrant folio
#

AG add allway allow command execution list always denied and ask user

#

so hope now we can add commands we want always to be executed without confirmation

rain lava
rain lava
gentle aspen
#

What UI looks good?

gentle aspen
# rain lava I dislike chromium browsers tbh

chromium browsers are easy to work with, so I prefer them. And also they spent billions trying to perfect chromium. I mean we all have different choices, but I saying with chromium, you dont really have to care much about it yk.

gentle aspen
#

ohh, didn't even realize @rain lava is gold. lol

rain lava
gentle aspen
#

I mean, if you complain about that, you can;t be using Discord, telegram, chatgpt, gemini, antigravity, or anyother thing, bcs they are all just chromium with a costume called electron🤣 lmao

#

corban, did you manage to get ONCard runing on linux?

rain lava
# rain lava Yea 🔥

I find it weird they don't use any of the nice gradients on roles, they have 30 boosts..

gentle aspen
#

they are broke just as us

rain lava
rain lava
gentle aspen
rain lava
gentle aspen
#

ohh dude, I just realized you can drop your KV quant to about Q5-6 for more TPS on the bigger models on ollama

rain lava
#

KV?

#

How much more tps..?

gentle aspen
rain lava
gentle aspen
#

5-10% more

gentle aspen
rain lava
gentle aspen
gentle aspen
#

it is like context, but fast. so the model can load memory faster. it costs more compute tho

rain lava
gentle aspen
#

yoo my friends said that too, why tho? what did you find unpleasant?

rain lava
gentle aspen
rain lava
rain lava
gentle aspen
rain lava
#

It's also pretty nice to look at over the x and arrow

gentle aspen
rain lava
rain lava
#

Both're good

rain lava
gentle aspen
#

omds my grammar is so cooked

gentle aspen
#

change it ot q6 tho

gentle aspen
#

was hooking up the bug report button to my GitHub "issues" page a bad idea?? 🤔

gusty meteor
#

i think no, its good

gentle aspen
#

first time using all the context🤣

gentle aspen
gusty meteor
gentle aspen
gusty meteor
#

it can competely download the conversion history and understand the topic again

gentle aspen
#

my idea is: user see bug --> user create an issue on GitHub --> i get notified.

gusty meteor
gentle aspen
gusty meteor
#

which is more easier i think you can pass thru the version example

stark sapphire
#

Has anyone else encountered an issue where Antigravity with the AI agent suddenly looks into the wrong project folders?
It happened to me, and sometimes it will request access to those, even though we are not even working on other projects

gentle aspen
rain lava
stark sapphire
#

i have not. But i never had too. I have my folders separated for each project.
So when i open a new project, it will simply stay in there. But now for some reason, the AI is trying to access other unrelated folders.

rain lava
#

You should try it. If you don't wanto just create a rules folder for it to follow that tells it it's task in that project.

gentle aspen
#

I may have found the best way to vibe code.

the chat AI of your vibe coding app generates the prompt from your instructions and you copy paste that into the vibe coding app🤣😭

#

where did humanity come to this from😭 lmao

stark sapphire
#

i been doing this for ages.

gentle aspen
#

I knew this ages ago too, but i am lazy to this. i just realized this after my really good prompting habbit.

#

omg, humanity is cooked🤣

rain lava
stark sapphire
#

i had button rendering errors just minutes ago

#

i solved it by making a whole new UI

gentle aspen
#

is this electron?
This looks very tailwind-ish

gentle aspen
# stark sapphire

what framework did you use for this? so hard to this type of stuff with Qt

gentle aspen
stark sapphire
#

Frontend Framework: React (Version 19)
Build Tool: Vite (provides fast development and bundling)
Styling: Tailwind CSS (currently being injected via CDN, using the "Intellectual Salon" bento-box design system we built)
Backend/Database: Supabase (PostgreSQL database with built-in Authentication and Realtime features)
Language: TypeScript (for type safety and better developer experience)

gentle aspen
gentle aspen
#

omds, why does codex likes to increase my cortisol?

gusty meteor
gentle aspen
#

yeah

gusty meteor
gentle aspen
#

with custom paints

gentle aspen
#

so i cant do electron

#

and also there is over 10k line sof UI code

#

so no going back😭

gusty meteor
#

just curios lmao

#

that should be some ai libraries i think

gentle aspen
#

like some, hmm, like ollama.
and similar.
like langchang and some pytorch.

#

wait

#

how did you realize it was Qt

#

??

gusty meteor
#

and its very hard to make ui in qt

gentle aspen
#

ohh yeah. so annoying. Wish we has react and tailwind like stuff on Qt.

what can i do about this?

#

is there any fix for this?

gentle aspen
#

Qt designer is an more cooked piece of software from the pre historic age

dusky pollen
#

Anyone know about this? The chat history got wiped out (antigravity)

gusty meteor
#

the ai thinking for 16 hour? 💀

#

i think something goes wrgon

dusky pollen
#

Its actually the total hours I spent on this chat I believe

#

The other work I lost was 128 hours but at least had my backups

#

This is hella annoying

frail raven
dusky pollen
#

Its the total spent time of that conversation

#

Not sure how to explain it better?

frail raven
#

Do you still have the issue if you make a new chat?

dusky pollen
#

Nope I can continue there, its just the conversation data got broken and Antigravity is no longer loading it.

#

I checked and the pb file exists in the antigravity conversation folder, I guess something weird happened and the data got broken or something.

frail raven
#

Please try to send feedback from antigravity, I believe there is a button in the settings for this purpose

#

It could help the team working on it so they can fix it!

dusky pollen
#

Well I sent the feedback now. I hope they will look into this.

#

Even the submit button isn't working so...

#

Waiting for a few minutes now

#

Its not even submitting...

lost quiver
#

You need testers if am right …?

stark sapphire
lost quiver
stark sapphire
lost quiver
stable python
#

Hey everyone 👋

I’m planning to start learning Django REST Framework (DRF) and wanted to ask if anyone has good free resources (YouTube playlists, docs, courses, etc.) to get started.

Also, could someone guide me on:
• What are the prerequisites before starting DRF?
• How much time does it usually take to learn it well enough to build a decent project?

I already have basic Django knowledge (models, views, CRUD, etc.), so I’m looking to level up into APIs.

Any suggestions or guidance would be really appreciated 🙌

vagrant folio
#

Finally Gemini app get a concept of project but with different approach

#

they add now Notebooks

#

as a project orginizer

#

so is 2 in one

open stone
vagrant folio
#

Yes that is understandable. My Case I keep studying. Ai help a lot on studding process, searching process, Writing code. But in the end it need that one have knowledge and wiling to learn. to get good results, and be able keep improving whatewer you may be doing.

misty heron
#

Is there a way to agentically switch the model in antigravity?

#

(Possibly through a skill)

vagrant folio
#

I dont think so. maybe we can send feedback feature request for that and explain benefits about it

misty heron
#

why are there cryptobros on here? lol

misty heron
frail raven
vagrant folio
#

Inside Antigravity click on your profile icon select report issue then check feature request

frail raven
misty heron
#

cool, I just joined so don't know the rules, will do in the future, hehe

vagrant folio
#

Welcome!

misty heron
#

okay, sent the feature request to them, though they've heard me before and didn't listen when I said we need pngs with full transparency, lol

vagrant folio
#

yes I think is better send that way as it must be recorded.

#

and hope they hear you

misty heron
#

hehe, hope so

#

imagine being able to switch models agentically for planning, and tasks

#

especially as a skill

vagrant folio
#

yes is good approach.

misty heron
#

lol, I've been talking about this for a while, and today seems like claude code did it... just saw a youtube video on it o.O

trim fog
#

Hey everyone, I need some help with Gemini API (Google AI) 🙏

I used to be able to call Gemini 2.5 Pro / 3.1 Pro with a free API key (with limited free quota), and it worked fine before.

But recently, my requests started failing:

  • Sometimes no proper response
  • Sometimes errors related to model access / quota

What I’ve tried so far:

  • Generated a new API key
  • Double-checked endpoint & headers
  • Tested both SDK and REST API

Still not working like before 😓

So I’m wondering:

  • Has the free tier changed recently?
  • Are Pro models now restricted or paid-only?
  • Do we now need to enable billing for access?

If anyone is actively using Gemini API right now, could you confirm:

  1. Are there any free models still available?
  2. Any extra setup needed in Google Cloud?

If there’s any updated docs or changelog, please share as well 🙏

Appreciate any insights!

vagrant folio
#

check in google ai studio the models and rate limits

#

they changes few month ago

#

as you can see the pro models for free are 0

broken prairie
#

I remember the short-lived good ol days when I could use Flash Lite 1000 times a day for free

small quiver
#

Hi everyone! Why is it that when I check, there’s no rate limit, but the app still reports it like that?

trim fog
vagrant folio
small quiver
gentle aspen
#

yo yo yo, guys! I am building a local, fully standalone AI presentation maker.
I will add support for ollama in my beta updates.

#

I will give some updates right after I build it

#

it will be opensource, soyall can build on top of it

hushed night
#

thanks dude so i can test my algorithm on it

gentle aspen
#

NO WAY!! everything in this PPTX file is fully generated by Gemma4:26b

google definietly did a GREAT job with their model. since I am stilo developing this, I can't tell yall the repo for now, after doen building the first realese, i will make this opensource for sure!

gusty meteor
#

sam altan

#

💀

gentle aspen
#

anyways, does it look "nice" to you?

gusty meteor
gentle aspen
gusty meteor
gentle aspen
#

bro, jsut tell me if the presentation looks good😭

gusty meteor
#

lmao

gentle aspen
#

you know what, lemme give you soem rest for your brasinn cells and upload images

gusty meteor
#

yes

gentle aspen
#

all of these is generated by gemma, except for the picture for obvious reasons

gentle aspen
#

This is just an experimental run. so I didn;t expect much, but i am making it better, and i will release it ina few days

gentle aspen
gusty meteor
gusty meteor
gentle aspen
#

depends

gusty meteor
vagrant folio
languid elbow
#

just watched visual studio codes AI proceed to pretend to build an entire index and type nothing. guess whos switching to antigravvvv

languid elbow
#

<@&1009526435276394496>

#

thank you <3

stark sapphire
#

I have been code vibing so hard, i almost ran out of every model.

balmy depot
#

I've been tag teaming OpenCode (with mostly the free services off the Zen platform), with my Pro AI tier. Not feeling the quota squeeze nearly as badly. Did run out of Zen Free at one point, but switched back to Gemini Flash for a bit, with the occasional escalation to Claude Opus for squirrely planning of a Refactor and a few issues. But not as frustrated. OpenCode Zen BigPickle or MinMax 2.5 Free are pretty comparable or maybe sometimes better then Gemini Flash 3, in my perception. Have to watch it's thinking, and be a bit hands on, but that so true of Flash. Stopped using the plugin and just run in my terminal in Antigravity. Unified my AGENT.md. Working remarkably well. I can swap back and forth and just point at an artifact and some docs I had the agents write and maintain.