#Xiaomi MiMo V2 Flash

359 messages · Page 1 of 1 (latest)

viscid fog
hardy plume
#

is this running on ascend chips?

hearty shale
#

🔥

short summit
#

@viscid fog using that models :free counting to 1000 requests per day limit on 10$+ user plan?

#

what if there is only :free model variant and no paid? We can't use it after reaching 1000 RPD?

#

<@&1384697330254610442>

viscid fog
#

the Xiaomi free endpoint has unlocked RPD limits

short summit
#

@viscid fog Nemotron 3 Nano 30B A3B also have unlocked RPD limits? this model also don't have for now paid variants

viscid fog
short summit
#

okay thanks ❤️

pastel turret
honest oxide
#

Prompt: Write a short story about a deal with the djinn gone awry
Output:

pastel turret
#

Huh this model seems pretty good

honest oxide
pastel turret
#

I like the coding style

#

Much more than the default style of like GPT 5

#

And it seems to be coding pretty well too

honest oxide
#

gonna use this to co-write for a bit and I'll report back if I find the writing to get annoying

quiet onyx
#

blog mentions this being under the MIT license which is great but there is no licence in the repo or if its "modified MIT"

pastel turret
#

if hybrid attention like this can be widely adopted + competitive it'll be so sick

#

110 TPS on a top model is massive

honest oxide
#

anyway this model is good at writing and free so

#

enjoy it before jai finds it

round cradle
#

Looks like there's a moderation layer

#

It will reject 'high risk' prompts

honest oxide
#

ah

round cradle
#

Which in my testing is a pretty wide category :/

pastel turret
#

all rules and UI work perfectly

castling, en passant, promotion, etc.

quiet onyx
#

this model scored 40% on my spatial reasoning test from a 20 year old children's medieval fantasy game

honest oxide
pastel turret
#

did very well in a personal bench, just as well as gemini 2.5 flash / grok 4 fast / deepseek 3.2

honest oxide
pastel turret
#

no idea, but I'd guess cheaper

honest oxide
#

also 15 day free period ⁉️

pastel turret
#

probably figuring out pricing

#

hybrid models are probably harder to price correctly than traditional models

#

idk

honest oxide
pastel turret
#

based

#

not AMAZING at terminal bench, but not awful?

#

GLM 4.6 scores 24.5%

#

this is actually the second highest open model on terminal bench perhaps?

#

yeah I think so

#

deepseek is higher but has other factors that make quite bad to use for coding, like insane hallucination

#

This model is still prone to hallucinations, but vibes seem better on its niche knowledge than most other open models I've tried

quiet onyx
brittle notch
#

Improvement

pastel turret
#

this model might actually be the best flash level model out there rn

honest oxide
#

this is 100% taking the role of grok 4 fast for me

pastel turret
#

scoring higher than grok code fast on terminal bench

hardy plume
#

sounds like a great model, now i gotta try it

pastel turret
#

and if it wasn't apparent: scoring higher than minimax, glm 4.6, kimi k2 (and thinking variant), and even Claude 4.5 Haiku

hardy plume
#

@viscid fog

#

maybe because of the word "brutalist"?

#

nope.. not because of brutalist

#

🤔

#

maybe my system prompt

honest oxide
#

oh right refusals

#

hopefully it doesn't do that when I try to use it to answer tool calls?

pastel turret
#

sadly, this model does not have up to date knowledge it seems (e.g React Router V7, Tailwind V4)

hardy plume
#

k time to cut out parts till it works

honest oxide
#

looks like .8 temp and .95 top_p is recommended?

quiet onyx
hardy plume
#

whatever i can omit that to use it anyway

quiet onyx
hardy plume
honest oxide
#

oh my god

#

this model is actually really good at being agentic

#

thank you xiaomi

brittle notch
visual creek
quiet onyx
#

atleast they didn't commit any chart crimes here

brittle notch
#

did retry, stuck in reasoning

steel harbor
#

its a good model, sir

brittle notch
#

ohhh, the

You are MiMo-V2-Flash (free), a large language model from xiaomi.

Formatting Rules:
- Use Markdown for lists, tables, and styling.
- Use ```code fences``` for all code blocks.
- Format file names, paths, and function names with `inline code` backticks.
- **For all mathematical expressions, you must use dollar-sign delimiters. Use $...$ for inline math and $$...$$ for block math. Do not use (...) or [...] delimiters.**

fucks with it's head

#

infinite headpats. this model gets it to not be awkward

#

so it is VERY system prompt sensitive and gets hyper focused on it's objective to the point of getting confused

honest oxide
#

guys bad news, I just saw a screenshot from the jai server

hardy plume
#

safety filter should help if its this sensitive

honest oxide
honest oxide
hardy plume
honest oxide
#

I like this model for summarizing

quiet onyx
honest oxide
#

overall very happy with it

hardy plume
#

based on their gh these are very recommended

visual creek
#

Hightly recommended, even

hardy plume
#

also seems to be a interleaved thinker

#

like week number maybe?

visual creek
#

Maybe it should be {month} ?

hardy plume
#

i asked gemini 3 to translate the system prompt

#

thats probably what it would come out to

quiet onyx
brittle notch
hardy plume
#

talk about optimizing, inlining C

#

if only it worked that easily

pastel turret
#

I'll be extremely pleasantly surprised if this ends up to be the best open agentic coding model

#

and it seems like it may be that

#

(deepseek 3.2 excluded for being too spiky, slow, and full of hallucinations)

visual creek
#

I guess Cursor Composer V2 will just happen to drop a week from now as well then eh

pastel turret
#

I was about to say

#

this is like a better composer

#

lmao

visual creek
#

I think composer is just GLM 4.5-4.6 finetuned no?

pastel turret
#

if they do their postrain regime on this model instead of GLM 4.6 (assuming that's what they're using)

#

yea

#

I think so

#

lines up with the pricing

#

they could run this on GPUs instead of cerebras or groq or whoever, and serve it WAY cheaper

#

one composer's issues has been it's surprisingly expensive, I assume because it's on cerebras/groq to go fast

steel harbor
#

is this actually securely better than glm? im having mixed vibes with it

pastel turret
#

it seems better to me

#

maybe not design sense

left tree
#

If it is flash then the pro model will also be on the way.

brittle notch
#

did not test the agentic coding of it, so can't prove or deny bulbasaur

quiet onyx
#

Honestly shocked Xiaomi coming in out of nowhere and making a model this good at agentic tasks

honest oxide
#

I am very pleasantly impressed by it

pastel turret
#

Which imo is a quite high quality bench

brittle notch
#

no, aider benchmark is imo, but no unofficial results yet

umbral oxide
#

Ok Xiaomi, I wasn't familiar with your game...

#

Looks interesting

pastel turret
#
  • it’s been around too long, data is definitely in training sets
#

The benchmark hasn’t even been updated with latest models in over a month it seems like

brittle notch
#

in my tests, it is still representative of the model state. i don't think it is a benchmark they can saturate.

brittle notch
pastel turret
jovial plinth
#

😭 this is so good guys.

pastel turret
#

seems pretty good, testing it with Opencode on a large codebase

#

not "amazing" or anything

#

like composer 1

rustic river
#

I'm constantly getting

421 {"error":{"code":"421","message":"Moderation Block","param":"The request was rejected because it was considered high risk","type":"content_filter"}}

even if I just say "hello"

#

Not sure how you guys have managed to get it working in your agentic coding tools

hardy plume
#

hey indeed.

#

jeez this model is quite good, no longer getting the safety warning atleast, managed to add a codex-like session system to my cli first try, also went around my harness by using shell commands to read files because it didnt like that i didnt have any line range support for reading.

#

this model is quite eager to write test python scripts, then use them to test, very practical and seems to actually delete them after too

#

similar to claude

pastel turret
#

First Token Latency is the only issue I have with this model currently

#

it's ~2.5s on average, which cuts down significantly on the benefits of TPS

#

if it was like 500ms like other models it would be amazing

hardy plume
quiet onyx
#

I've seen Grok in comparison make 5-10 tool calls at once per turn to manage multiple machine in parallel

pastel turret
quiet onyx
last sluice
#

Is this model permanently free, or just for a week or so to test, like Grok 4.1 was?

last sluice
#

Perfect, thank you.

thick niche
lean maple
#

Xaomi has been throwing 500s and 524s

Its not common but it happens

hardy plume
#

this model is a good replacement for grok code fast if providers will step that low on price & have caching

#

grok code fast is good but its tool calls are hit or miss, sometimes it tries to call them in reasoning and says it did do the changes but didn't

lean maple
#

xiaomi mimo is unsuable now due to rate limits

pastel turret
#

@viscid fog btw Novita is hosting this model now, can we get a paid endpoint?

bleak forum
#

MiMo-V2-Flash scores 66 on the @ArtificialAnlys Intelligence Index — #2 among open-source models and #8 overall! 🎉🎉🎉

Designed for Agentic AI — now with the benchmarks to prove it: #1 on τ²-Bench Telecom for agentic tool-use among all evaluated models. ⚡⚡⚡

Frontier

proven sequoia
#

did anyone manage to get this working with interleaved thinking in agentic coding tool?

naive hedge
#

thinking is disabled in their anthropic API for some reason, and can't be turned on

buoyant phoenix
#

not doing full testing, but as proxy, not great at chess, bottom 15%

proven sequoia
#

but their docs say it supports tool integrated thinking

#

wierd

naive hedge
#

OpenAI spec doesn't really support interleaved thinking (I don't even think there is a spec?), iirc opencode conditionally turns it on for a few models such as new deepseek 3.2, probably not for this model yet anyway

proven sequoia
#

DeepSeek API does it in its own adapted way, so does OpenRouter

#

however I was hoping that it would work using a model via OpenRouter, since the API to the app is the same

pastel turret
#

I quite appreciate the coding style of this model

#

unlike something like gpt 5 which I still hate the coding style of

#

this model structures code well and doesn't have weird stuff mixed in like gpt would (e.g if clauses with like 5 && conditionals to validate the type of a parameter)

vast mirage
#

Not a really useful topic. But this model is good for Rp too

strange bison
vast mirage
#

so most of the nsfw models that is normally used in RP.

#

Nah i take my words back

#

it is yet not on the level of v3.2

honest oxide
vast mirage
stone tangle
vast mirage
stone tangle
#

Maybe not a "ton," but if you scroll up, you'll see a couple.

strange bison
#

Bruh. What kind of role plays are you guys doing that you get constant moderation errors. Genuinely curious

naive hedge
proven sequoia
silver mortar
#

this is sooo fast

rustic river
#

The model has been glitching in opencode today with broken outputs and premature stops

proven sequoia
#

This model seems pretty smart but I can't get it working properly in opencode

ionic surge
#

whats weird to me is the strong recommendation to turn off thimking for agentic stuff

#

i dont really understand WHY

#

also the promise of them not logging prompts i kinda doubt

proven sequoia
#

just how it was trained I guess, although I don't know why it also supports interleaved thinking with that being the case

ionic surge
#

kinda interesting how no providers have launched support for this model

#

on a paid endpoint

#

and idk when xiaomi will end this (if they will?)

#

@viscid fog do you know anything about this?

quiet onyx
strange bison
proven sequoia
#

Xiaomi will presumably swap their endpoint to a paid one soon. The model itself seems really smart (I went through a difficult problem with it in the chat interface), but interleaved not working properly in opencode for me right now.

#

this was using the "interleaved": {"field": "reasoning_details"} trick but maybe there is some other stuff that needs to be done for it to work properly.

pastel turret
rustic river
#

I mostly do brownfield coding, and it solved problems that GPT 5.2 Codex / Gemini 3 could not

#

It's remarkably persistent. One of my favorite models of 2025 so far, given that it's also fast.

slow jay
#

no paid verison yet

strange bison
hardy plume
#

they extended the free access

pastel turret
#

We got a paid endpoint but OR hasn’t put it up yet..

#

(Novita)

rustic river
slow jay
hardy plume
slow jay
quiet onyx
fickle mortar
#

@viscid fog Novita AI has support for the paid mimo v2 flash, can we please get support on the openrouter gateway

fickle mortar
#

i need to use it in a commercial application

strange bison
#

The Mimo model has gotten a new snapshot 🥂

#

The free tier continues

ionic surge
#

yay

#

they finally say that the thinking mode is recommended

slow jay
clear ember
#

why is this model getting deprecated?

viscid fog
#

the free model is

strange bison
clear ember
lean maple
quiet onyx
#

abuse it while you still can

ionic surge
#

dont have any workloads to abuse it with 🥀

hoary quail
#

does anyone here know which AP prompt works best with mimo v2? for RP specifically?

plain coral
#

I have a question about the deprecation of this model. I tested it over the last 30 days very extensively and find it very useful for my agentic coding tasks. Other providers hosting this model too and I tried it out on them and have very different behaviors. How do I know that they run the same latest snapshot? Should I ask them all? And what is the latest snapshot at all hosted on OR for the free model?
Except for the thinking loop and leaking tool calls into assistant messages and thinking tokens, the model performs very well.
At least, the paid providers let me choose the seed parameter.

vast mirage
#

Anyone got recommendations for free models that is on same level as mimo v2

hoary quail
vast mirage
analog juniper
#

This model is a neat lil guy. I like him, especially for the price.

pure garnet
#

Oh, huh 🤔

$0.09/M input tokens
$0.29/M output token
That price is actually pretty low, around which models do you think this performs?

hoary quail
hoary quail
ionic surge
#

and cheaper than it too

#

and i already thought that 4.1 fast was the performance/price goat

#

mimo takes it

hoary quail
honest oxide
#

it barely speaks english

hoary quail
ionic surge
#

rp 🥀

#

idk i dont do that

#

mostly for classification tasks/extraction takss etc

honest oxide
#

I'm not sure what your budget is, but if you're using something like ST I'd say maybe look into a memory extension to save on context

hoary quail
honest oxide
#

(also, 4.7 does have a non thinking mode, j.ai just doesn't support it kek )

quiet onyx
vast mirage
#

i will switch to r1 0528

hoary quail
pure garnet
#

GLM 4.7 has a non-thinking mode

hoary quail
pure garnet
thorn bane
#

What does "deprecrating Jan 26, 2026" mean ?

open heart
open heart
pure garnet
#

2.5 Flash is over 3x the input price and over 8x the output price, though

river lichen
#

does anyone know how to integrate mimo v2 to j.ai from the xiaomi site itself? i keep getting network errors

half iris
#

does anyone know why this is always rate limited?
I can't get any agentic coding to work with this model
it consistently stops early, before making any file changes

rain veldt
half iris
#

paid

#

ok, maybe this is a false alarm
I was getting rate limits yesterday
but I just spotted a problem with my opencode config (edit: deny)
giving this another test now

half iris
#

yup, it was my config
seems to be working fine now
thanks for responding @ Monkey !

plain coral
# half iris does anyone know why this is always rate limited? I can't get any agentic coding...

The issue with this model is, that it sometimes still leaks tool calls into the thinking and sometimes assistant tokens and this stops the multi-turn inference. To use this model reliably, I needed to sanitize those tokens after streaming each block and ignore the stop_reason for that.
Additional, I suspect that the paid providers are using an older snapshot of this model because it behaves very differently by each of them.
It is such a great model, but without enough transparency hard to use without those workarounds.

rain veldt
half iris
#

I happen to be using atlas cloud as a provider
Is there any way to identify which version of the model they provide?

lean maple
river lichen
hoary quail
quiet onyx
#

I only use official provider when possible. In this case the official xiaomi/fp8 provider is also the cheapest and fastest

half iris
#

Oh I only have Atlas Cloud and Novita AI available for this model
Chutes and Xiaomi must be blocked in my OR privacy settings

#

So any opinions on Novita vs Atlas Cloud? 🫠

quiet onyx
pure garnet
#

Oh, wow, TIL this has caching, this is dirt cheap

half iris
#

Yup it is
And in my testing so far, it does a decent job
One real hallucination about a .gopls.toml file which isn’t a feature of gopls
Otherwise, it’s been nice

fickle mortar
quiet onyx
half iris
#

Ya Xiaomi looks like the most performant provider and supports caching
But I prioritize the privacy side above everything else so not an option for me

Atlas cloud is noticeably much faster than Novita
But Novita supports caching 🤷‍♂️

fickle mortar
quiet onyx
fickle mortar
half iris
#

Xiaomi Retained for 30 days ✓ Does not train
^^ open router docs ^^

#

https://platform.xiaomimimo.com/#/docs/welcome

here’s a snippet I’ve read so far

API Services. If you use the API services, we will collect your IP address and the text information you submit to analyze the relevant instructions based on the model you select and to generate the returned content. Xiaomi will not use the text content you provide for model training or any other purposes. When you use prepaid API services, we will collect your top-up information and transaction records**.**
#

I don’t see the number 30 show up in that page 🤷‍♂️

#

Seems to me like openrouter has a different reason for the privacy settings restricting Xiaomi
Or it’s a mistake?

strange bison
#

Anyone else having an issue with the xiaomi endpoint where the model thinks forever all the sudden?

Like didn't change the prompt or any of the sampling parameters, yet it's happening consistently in the last few days

fickle mortar
#

refer to xiaomi huggingface

strange bison
# fickle mortar you should optimize the hyperparams as per your application

No, I think Xiaomi had an issue on their end.
It was working fine for 7-8 days, then for 1-2 days it started capping reasoning (65k reasoning tokens) occasionally, even though I didn't change anything. Now it's fine again.
Weird. Also xiaomi end point is the only end point I direct to for the API calls becuz of cache so it wasn't a provider switch either.

celest grail
#

this is awesome

#

i build a two agent loop so they get to a consensus with very strict guidelines

#

the result was pretty good

#

crazy value

ionic surge
#

yeah it’s peak

fickle mortar
#

I just wished we get more providers and xiaomi oss ed the latest mimo v2

visual creek
#

This is an impressively capable tool calling agent model, nothing seems to come even close at the price.

ionic surge
#

daily goat model reminder

#

when v3 flash ?

#

im so happy w tyhis model

#

xiaomi's a great provider too

#

great caching

pure garnet
#

I agree

ionic surge
#

if this model had vision i think it would be insane

#

because the price/performance ratio is insanely good

#

i wonder if grok can reclaim its price/performance crown

honest oxide
#

would you say it's better than grok fast?

celest grail
#

yes

honest oxide
#

no structured output support though 😭

celest grail
#

well, it follows instructions very well

ionic surge
#

really?

#

it says it does on orca.orb.town

#

i love the instruction following on this model since i know that many other flash-style models (even g3f) dont follow instructions well

hoary quail
#

I've gotten mimo V2 flash to work very well
However my only gripe that if it had a reasoning/thinking mode

That would really take it to the next level
Cuz for price to performance - it's quite good already

ionic surge
#

it does

hoary quail
ionic surge
#

just set reasoning.enabled = true or set some reasonign effort and itll turn on

#

i dont think you can change the effort level its just off or on

hoary quail
ionic surge
#

do you use the api or like the chatroom

hoary quail
#

I chat using proxy on j.ai
So I guess...Api I think? Sorry, I'm not a tech guy so not sure

ionic surge
#

oh, im not sure how to on janitor

#

try searching up how to enable reasoning on janitor

#

but mimo does support reasoning

steel harbor
iron solstice
#

I find that paying for anything less than opus costs me more for anything I do since doing it correctly first time = cheapest but having stuff that would be literally free to experiment with would be nice

iron solstice
junior narwhal
#

Based on my experience, the Mimo V2 Flash offers the best value for money right now at just $0.1/$0.3.

iron solstice
#

At least read what you're replying to