🚀🚀Qwen3.7 Preview lands on Arena !
Here come Qwen3.7-Max-Preview & Qwen3.7-Plus-Preview. Alibaba now #6 lab in Text, #5 in Vision.⚡️⚡️
Can't wait to release Qwen3.7 series models!Stay tuned! @arena
1 messages · Page 1 of 1 (latest)
🚀🚀Qwen3.7 Preview lands on Arena !
Here come Qwen3.7-Max-Preview & Qwen3.7-Plus-Preview. Alibaba now #6 lab in Text, #5 in Vision.⚡️⚡️
Can't wait to release Qwen3.7 series models!Stay tuned! @arena
"recreate the classic arcade game Shinobi in html'
https://codepen.io/Madvulcan/pen/QwGpWWZ
1458 lines of code in single shot. The player can't jump high enough to reach platforms, otherwise flawless and can be iterated away. Nice single shot result
Yeah, fixed with one prompt. And it didn't rewrite the whole thing from scratch like some models do. https://codepen.io/Madvulcan/pen/ZYBeYWB
wait WHAT?!??!?! HOW DID I NOT SEE THIS COMING!?!?
there is not even a blog post about the model!
the plus model seems good at generating.... SVG mockups.
look at this! this looks... almost fine.
like- this isnt html.
In case anyone was curious...
#6 in text, huh?... interesting.
i don't trust qwen with the bench, but they certainly are releasing less and less open weight models as of recently
why trust anyone with benches?....
peeps tested it, and it appears less.... "Pretty putpzr" tuned so far. 3D stuff is somewhat meh, more plain colors, not sure.
no i mean they historically benchmaxx, but i would admit as of recently qwen finally has some substance and can do real shit for it is size.
bench results are VERY promising so far
I expect the ranking to drop as more results come in though
a notable result is that it dethroned mimo pro in non-hallucination rate
it's also tied with it (and grok 4.1 lol) for most faithful instruction following
(it's actually not that great in omniscience, though still upper tier)
(it's 4th in AA-Intelligence and Coding, just after the 3 SOTAs)
the performance on its own wouldn't be that exciting, but combined with the honesty/instruction following it's suddenly quite attractive
@steep osprey
lmao
I was JUST about to post WHERE IS QWEN OR
I like it already
oh god please do not ask it math/geometry questions on high or xhigh though
it'll drain your wallet on something that other models figure out in 5k toks
it's not wrong or anything it's just totally unnecessary levels of rumination
Input $2.5, Explicit Cache Creation $3.125/M
hate this so much
interestingly enough, prices are ~0.66x in non-singapore regions in alibaba cloud
I've been trying to use Qwen 3.6 Flash but keep getting errors - wonder if it has anything to do with this recent launch
$2.5/$7.5 💀
I haven't seen Chinese inflation this bad since before I started paying for Fansly.
Classic Qwen
I think it's particularly bad at "visualizing" geometry, or trusting its visualization rather
probably why it does quite good on math benches but ouch
omg thank you alibaba 😭 🙏
Is this owl alpha?
I must say, I'm not super amazed at the hallucination rate in practice, despite the very promising bench results.
Maybe it's because I mostly trip it with pop culture questions and it reserves its accuracy for "serious" tasks like STEM or coding.
This in contrast with other reputed honest models that come flat out and admit "sorry, this is probably too obscure for me to answer"
afaik those tests don't measure intrinsic knowledge hallucination rate, they measure it over a given document
also until they fix this cache this literally more expensive than opus
wants me to drive to car wash 👍
So, deepseek still is the winner, specially with the pricing
it'll be very difficult for Google and Qwen to match the pricing of DeepSeek. It's so so so affordable it's insane.
Ew
dujde what the FUCK is wrong with them
has anyone been using Qwen3.7 Max recently - is it any good? I'm told it's primarily meant for autonomous agent work and it's abit meh for simple chat workflows primarily because of how much reasoning tokens it generates.
It benchmarks like an absolute beast
But the history of Qwen is also benchmaxxing, terrible EQ, and sucking down tokens like there's no tomorrow.
previous qwens yes, but ive seen peeps saying its preddy good.
qwen 3.7 max is certainly an interesting model, even by closed bench standards. will wait for their open weight models.
They reduce the price of this model
That's a W, hope they keep the price at that rate or even make it more cheaper like deepseek
@split sphinx it's a one month long promotion: https://www.facebook.com/alibabacloud/photos/qwen37-max-now-live-on-model-studio-50-off-limited-time-offer-built-for-the-agen/1431601325678622/
See posts, photos and more on Facebook.
but if the dominos keep falling (e.g. xiaomi) then surely it's in their interest to adjust price
this is old news by now >v<
They announced it an hour ago on xitter
the model has been out for a while i believe....
oh! nope im wrong, that was the 3.7 pereview. my bad---
We had plus preview?
yes, on qwen chat for a while by now.
good ad for opus
models have jst been regressing
what im most interested in is the price/performance pareto frontier and the mimo v2.5 price cut was goated but i think alibaba dgaf abt thje pricing no more