#Reflection 70B
1122 messages · Page 2 of 2 (latest)
And they made all this happen with 1 engineer xD
There are so many of these scumbags, the only way to beat them all is to race ahead of them and make a lot more money doing things the right way. People will see this and then try to copy his playbook now, but these scams are superficial so the real projects will eventually compound beyond them.
he's ghosting the people who lent him H100s, including another guy I saw who was pissed https://x.com/Yuchenj_UW/status/1833636690877100488
@mattshumer_ Hi Matt, we spent a lot of time, energy, and GPUs on hosting your model and it's sad to see you stopped replying to me in the past 30+ hours, I think you can be more transparent about what happened (especially why your private API has a much better perf)
I'm excited to announce Deception 70B, the world’s top open-source model.
Trained using Deception-Tuning, a technique developed to enable LLMs to deceive themselves of their own mistakes.
Try it out now: https://t.co/y8Hj3jvuSk
lol
more sauce
This is embarrassing
So the host confirms, they suddenly got ghosted and "hosted their own" instead
It seems he kept it short and worded it carefully to avoid community notes, but there's enough evidence now to prove the models never worked https://x.com/corbtt/status/1833633946644713582
@mattshumer_ @ArtificialAnlys Final report on Reflection-70B: after investigating, I do not believe the a model that achieved the claimed benchmarks ever existed.
It' very unclear to me where those numbers came from, and I hope that Sahil/Matt will shed more light how this happened.
Something like "the model was always a fraud" there would suffice tbh
- some brief evidence
https://x.com/Yuchenj_UW/status/1833641243680903366
I guess the one thing out of this is we might actually get another provider for deepseek 2.5 other than just deepseek themselves.
@teortaxesTex @hyperbolic_labs Yes, DeepSeek 2.5 in our plan!
It's unfortunate for them to have their release coincide with Reflection 70b
The model goes to a school in Canada, you wouldn't know her 😂
Man I've been eating up all this bs on twitter and reddit, im glad I found this thread
i just stumbled on this gem of a quote in the matthew berman interview
sorry for the childish edit 😂
His reasoning is equivalent to 'my dog ate my model' so we're allowed to be childish ridiculing this fool
My gosh Jin from Hyperbolic DESTROYED him in those tweets.
Literally pick another career.
"you are an expert in system log creation, write a series of API logs ...." These guys with their transparency and investigation bullshit is just too funny now.
Nothing more will come. Start ideation for the 2025 scam
But I'm also quite pleased to see how many people are spineless, gullible, grifting yessirs that jump onto any bandwagon and defend a bullshit claim without any evidence and give it credibility. For example, certain members of OR staff. 🎙️🔽
He gained 15k followers in the past 3 days, and he will keep doing this since it works for these grifters after a certain scale. By now he's wasted thousands of hours of open source and top researchers time. If you are angry about this, leave them a review so there is lasting proof for future victims:
If you add up all the hours across the millions of views his post got, assuming each person wasted 1 hour on this, that's over 300 years. That means we literally lost lifetimes on this. If you leave a review, you can literally save lives 😂
I lost exactly one minute writing this: go back to work idiots
Bold of you to assume I work
working -> https://www.youtube.com/watch?v=e7Klczu14tE
that's 1 minute wasted, you gonna have to explain it to your manager
I was referring to top researchers who downloaded the model, and cared about the results of distillation. Who are you, and why would anyone care what you do?
this model is kind of dumb, wtf 2 + 1 = 6
it’s a fraud man
hmm, the model is a fraud? I need to catch up on the lore 😂
scroll up g
For people wondering why I draw similarity with LK-99, it's because the results are not reproducible.
The saga of reflection-70B has been a wild one. This video, I have made a full break down on the situation, with the latest up-to-date information, and why we probably won't be expecting anything good out from it anymore.
check out my newslett...
I wonder why all these "PR" were spammed, maybe if we go back one page and look
This story just keeps on giving 😄
He seems to be getting google to remove the bad reviews. Don't forget to leave a review so he doesn't get away with this, but don't mention his name Matt Shumer since that may violate google policies
https://chromewebstore.google.com/detail/hyperwrite-ai-assistant/kljjoeapehcmaphfcjkmbhkinoaopdnd?hl=en-US
Probably saving himself from possible investors
so was this a scam after all
I heard some indie company named "OpenAI" copied this approach, smh
lmfao
And they found the missing piece to make Reflection work: Making so slow and costly that every answer gets treated a miracle, like sage's answer. If only Matthew had thought of that, nobody would have been able to see through his scheme until at least now.
That's some funny timing with the strawberry release
True, the benchmarks would have only come in an hour ago.
The idea of reasoning is old, it's basically the prompt engineering everyone has been doing "think step by step", Matt didnt invent shit, he is just a scam nothing more. Not even worth to bring up his name in this scenario.
Not my video, but a nice link to share maybe if anyone still has doubts/questions https://www.youtube.com/watch?v=wOzdbxmQbRM
Want more videos? Join the leading researchers on https://GroupDiscover.com to find the best videos from across the free speech internet platforms like Odysee, Rumble, Bitchute & more awesome video platforms.
Join this channel to get access to upcoming exclusive perks:
https://www.youtube.com/channel/UCZE3V7__ieMM6XZfaZYYhKA/join
Again, at least from the thumbnail missing the point
Def can see your point on that. Anyhow, what he means is that the gamed the benchmarks because they were using Claude all along when they sent their "internal" api access to the intial testers etc. which caused all the hype. He gets into it with a good amount of detail.
For me "gaming a benchmark" means training a model on benchmark dataset or somehow playing the way the benchmark works, this is just straight up faking it
Matt got the idea for reflection/distillation from Karpathy, likely from this video, but this has been a well researched idea in the last 2 years across many papers I could link: https://www.youtube.com/watch?v=hM_h0UA7upI
He basically does this exact same thing constantly: steal an idea from someone then try to get followers with an exaggerated version and scam the open source community to build it for him. Eventually the open source community started investigating him. See the 'agent-1' model from last year, which still doesn't exist: https://github.com/OthersideAI/self-operating-computer/issues/21
the "leaderboard" is now closed after the spam was made to hide the reports in the discussions lol
Free market in action
What other kind is there
a sign of life from mr. shumer. while still no activity on social media, he uploaded a new github repo with the word "redemption" in it https://github.com/mshumer/LiveCodeBenchRedemption but nothing interesting there yet. seems to be a co-op with LiveCodeBench.
Let's not engage, to be honest, unless this gains traction. I doubt there's any way or interest for him to "redeem" himself, the way he acted through this thing was willfully deceptive and malicious the entire time
They are still busy hiding reports by the user
Amazing to this day. HF does nothing about it, I've reported then but not a word.
It is gone (from OpenRouter) ->
Where did this model go?
To the eternal token hunting ground (at least from OpenRouters perspective)
That's a shame because I felt like I was getting some usable results with it.
No chance it will be returning then?
If it gets popular again and someone wants to host this model, OpenRouter will surely route to it (I guess)
What happened to make nobody want to host it?
Well, the model is based on a fraud
zombieflection-70B 🙂
And it performs worse than the model it's finetuned on top of on most tasks
What was the model it was fine-tuned on top of?
I guess I just kind of liked it when it was actively reflecting on the inputs. It seemed like it had better outputs with me doing less work prompting-wise.
Forgot whether it's Llama3 or Llama3.1 70B (Matt itself claimed he wasn't sure either, lol)
At some point
I thought I knew, but then realized there was so much fuzz that I actually didn't know either.
Your honour, we can say that a Llama was involved.
Well, the og Reflection API was just Claude 3.5 Sonnet with a system prompt, so you might have luck with that for your use case
Then Matt got found out and changed to 4o, and then afterwards I stopped following
It was a fascinating time. First time I saw the community look for a "birthmark" of sorts to check what model might actually be there.
Is the system prompt listed anywhere?
<thinking>
In this section you understand the problem and develop a plan to solve the problem.
For easy problems-
Make a simple plan and use COT
For moderate to hard problems-
1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
2. Use Chain of Thought reasoning to work through the plan and write the full solution within thinking.
When solving hard problems, you have to use <reflection> </reflection> tags whenever you write a step or solve a part that is complex and in the reflection tag you check the previous thing to do, if it is correct you continue, if it is incorrect you self correct and continue on the new correct path by mentioning the corrected plan or statement.
Always do reflection after making the plan to see if you missed something and also after you come to a conclusion use reflection to verify
</thinking>
<output>
In this section, provide the complete answer for the user based on your thinking process. Do not refer to the thinking tag. Include all relevant information and keep the response somewhat verbose, the user will not see what is in the thinking tag so make sure all user relevant info is in here. Do not refer to the thinking tag.
</output>```
Thank you.
Is it uncensored
Relection is no longer on openrouter due to it being a scam, i don't believe the broken weights were any more uncensored than the normal ones
@wind inlet
Please, post notice about disabling models in announcements channel
cc @thorn abyss @spiral sparrow
FYI - the provider stopped hosting these models, so the endpoint is gone but the model is still there: https://openrouter.ai/models/mattshumer/reflection-70b
the model sucks and is a complete sham btw
worse than the actual model it ripped off
They're back! It totally wasn't a scam, they've definitely uploaded the correct weights this time!
https://x.com/csahil28/status/1841606301782311167
https://huggingface.co/glaiveai/Reflection-Llama-3.1-70B
https://glaive.ai/blog/post/reflection-postmortem
On September 5th, @mattshumer_ announced Reflection 70B, a model fine-tuned on top of Llama 3.1 70B, showing SoTA benchmark numbers, which was trained by me on Glaive generated data.
Today, I'm sharing model artifacts to reproduce the initial claims and a post-mortem to address
Reproducing Reflection-70B benchmark scores and postmortem on what happened.
It must have taken some time teaching the model to replace Claude with "" and training on benchmark data set
Too bad you can't fake the tokenizer
this time
but is it
I also gave access to our GPU nodes, the railway account and git repo which was running the proxy, to a few members of the community, including the OpenRouter team. They didn’t find anything out of the ordinary.
Is it true that Openrouter had access to this version?
This model is a scam they are just trying to save their asses with excuses
Just look at the dataset they provided it's filled with garbage they didn't even clean it properly
They were given some level of access: https://x.com/OpenRouterAI/status/1833031092120715657
Update: we've been trying to get access to commit history and the underlying GPU cluster to confirm that this endpoint has been running original weights. While we've able to see some of the code, we don't have sufficient evidence yet.
Additionally, the Reflection API is now
Nice how they changed "we don't have sufficient evidence" to "They didn't find anything out of the ordinary"
Just to be clear, we have never added any word filtering or made use of Claude APIs when we offered API access to Reflection 70B for people to try out the playground or test/benchmark the model with an API endpoint.
that's an utter lie
They had so many more plausible outs from people trying to defend them on Twitter, and they go with the one that was the most easily disproved?
I mean, the excuses are just more lies on top of lies. I feel like this "post-mortem" is more about cleaning up their image in a visible way to investors than attempting to actually placate the LLM community; they're probably hoping that they don't get much engagement on any of this (and if they did, they'd certainly ignore it at this point anyway).
Radio silence for nearly a month and now back to it.
Always skipping the comments about switching their private API to multiple models. Just can't come up with excuses for it
I'm astonished they are still even trying. This is hilarious
Wonder what they'll say when the benchmark ""reproductions"" start coming out
Also wonder if they've tampered with the bdnchmark process itself
Lol? Nevermind, Matt himself confirmed the model is bad
https://x.com/rimomaguiar/status/1842326685821071492?t=KCjDwzOs8AOKgGL9Vurz1A&s=19
Jesus... People are deranged in their thinking
@mattshumer_ The idea was very good anyway. And I believe your announcement is what pushed openai to release their o1-preview. Even thought the model is not what was expected, publish everything so it can give others some similar ideas.
They can't even agree between them.
Sahil said he replicated all but 2 tests.
Matt says his testing didn't meet any of his reported benchmarks
probably intentional
Wasn't he the model author? Why would he need to reproduce it again if he supposedly had a working version? It broke suddenly?
exactly: nothing makes sense