#Reflection 70B

1122 messages · Page 2 of 2 (latest)

lime sky
#

He took this too seriously

lament gulch
#

And they made all this happen with 1 engineer xD

mild steeple
#

There are so many of these scumbags, the only way to beat them all is to race ahead of them and make a lot more money doing things the right way. People will see this and then try to copy his playbook now, but these scams are superficial so the real projects will eventually compound beyond them.

lament gulch
#

numbers are out

lethal sonnet
lament gulch
#

more sauce

torn valve
#

This is embarrassing

lament gulch
#

So the host confirms, they suddenly got ghosted and "hosted their own" instead

mild steeple
#

It seems he kept it short and worded it carefully to avoid community notes, but there's enough evidence now to prove the models never worked https://x.com/corbtt/status/1833633946644713582

@mattshumer_ @ArtificialAnlys Final report on Reflection-70B: after investigating, I do not believe the a model that achieved the claimed benchmarks ever existed.

It' very unclear to me where those numbers came from, and I hope that Sahil/Matt will shed more light how this happened.

torn valve
#

Something like "the model was always a fraud" there would suffice tbh

#
  • some brief evidence
opal hedge
vestal smelt
#

hopefully it’ll be faster then

#

deepseek is sooo slow

mild steeple
stoic bay
#

Man I've been eating up all this bs on twitter and reddit, im glad I found this thread

#

i just stumbled on this gem of a quote in the matthew berman interview

#

sorry for the childish edit 😂

mild steeple
#

His reasoning is equivalent to 'my dog ate my model' so we're allowed to be childish ridiculing this fool

sleek orbit
#

My gosh Jin from Hyperbolic DESTROYED him in those tweets.
Literally pick another career.

tawny mirage
rapid ginkgo
#

But I'm also quite pleased to see how many people are spineless, gullible, grifting yessirs that jump onto any bandwagon and defend a bullshit claim without any evidence and give it credibility. For example, certain members of OR staff. 🎙️🔽

mild steeple
#

He gained 15k followers in the past 3 days, and he will keep doing this since it works for these grifters after a certain scale. By now he's wasted thousands of hours of open source and top researchers time. If you are angry about this, leave them a review so there is lasting proof for future victims:

https://www.g2.com/products/hyperwrite/reviews

https://chromewebstore.google.com/detail/hyperwrite-ai-assistant/kljjoeapehcmaphfcjkmbhkinoaopdnd?hl=en-US

mild steeple
#

If you add up all the hours across the millions of views his post got, assuming each person wasted 1 hour on this, that's over 300 years. That means we literally lost lifetimes on this. If you leave a review, you can literally save lives 😂

lone dew
#

I lost exactly one minute writing this: go back to work idiots

rapid ginkgo
#

Bold of you to assume I work

crimson slate
lethal sonnet
mild steeple
vestal smelt
steady tapir
#

this model is kind of dumb, wtf 2 + 1 = 6

vestal smelt
steady tapir
vestal smelt
#

scroll up g

steady tapir
lament gulch
#

I wonder why all these "PR" were spammed, maybe if we go back one page and look

#

This story just keeps on giving 😄

mild steeple
lament gulch
#

Probably saving himself from possible investors

thin solstice
#

so was this a scam after all

torn valve
#

I heard some indie company named "OpenAI" copied this approach, smh

deft veldt
#

lmfao

crimson slate
narrow jackal
#

That's some funny timing with the strawberry release

spare basalt
red badger
#

Reflection done right

#

Take notes Matt

lament gulch
#

The idea of reasoning is old, it's basically the prompt engineering everyone has been doing "think step by step", Matt didnt invent shit, he is just a scam nothing more. Not even worth to bring up his name in this scenario.

royal wigeon
#

Not my video, but a nice link to share maybe if anyone still has doubts/questions https://www.youtube.com/watch?v=wOzdbxmQbRM

Want more videos? Join the leading researchers on https://GroupDiscover.com to find the best videos from across the free speech internet platforms like Odysee, Rumble, Bitchute & more awesome video platforms.

Join this channel to get access to upcoming exclusive perks:
https://www.youtube.com/channel/UCZE3V7__ieMM6XZfaZYYhKA/join

▶ Play video
rapid ginkgo
#

Again, at least from the thumbnail missing the point

royal wigeon
# rapid ginkgo Again, at least from the thumbnail missing the point

Def can see your point on that. Anyhow, what he means is that the gamed the benchmarks because they were using Claude all along when they sent their "internal" api access to the intial testers etc. which caused all the hype. He gets into it with a good amount of detail.

rapid ginkgo
mild steeple
#

Matt got the idea for reflection/distillation from Karpathy, likely from this video, but this has been a well researched idea in the last 2 years across many papers I could link: https://www.youtube.com/watch?v=hM_h0UA7upI

He basically does this exact same thing constantly: steal an idea from someone then try to get followers with an exaggerated version and scam the open source community to build it for him. Eventually the open source community started investigating him. See the 'agent-1' model from last year, which still doesn't exist: https://github.com/OthersideAI/self-operating-computer/issues/21

lament gulch
#

I have no clue what the HF team is doing allowing this to be trending 1st

lament gulch
# lament gulch

the "leaderboard" is now closed after the spam was made to hide the reports in the discussions lol

rapid ginkgo
#

Free market in action

rapid ginkgo
#

What other kind is there

wraith badger
#

a sign of life from mr. shumer. while still no activity on social media, he uploaded a new github repo with the word "redemption" in it https://github.com/mshumer/LiveCodeBenchRedemption but nothing interesting there yet. seems to be a co-op with LiveCodeBench.

torn valve
#

Let's not engage, to be honest, unless this gains traction. I doubt there's any way or interest for him to "redeem" himself, the way he acted through this thing was willfully deceptive and malicious the entire time

lament gulch
#

They are still busy hiding reports by the user

#

Amazing to this day. HF does nothing about it, I've reported then but not a word.

crimson slate
#

It is gone (from OpenRouter) ->

safe marsh
#

Where did this model go?

crimson slate
safe marsh
#

That's a shame because I felt like I was getting some usable results with it.

#

No chance it will be returning then?

crimson slate
safe marsh
#

What happened to make nobody want to host it?

torn valve
#

Well, the model is based on a fraud

crimson slate
#

zombieflection-70B 🙂

torn valve
#

And it performs worse than the model it's finetuned on top of on most tasks

safe marsh
#

What was the model it was fine-tuned on top of?

#

I guess I just kind of liked it when it was actively reflecting on the inputs. It seemed like it had better outputs with me doing less work prompting-wise.

torn valve
#

Forgot whether it's Llama3 or Llama3.1 70B (Matt itself claimed he wasn't sure either, lol)

#

At some point

crimson slate
sleek orbit
#

Your honour, we can say that a Llama was involved.

torn valve
#

Then Matt got found out and changed to 4o, and then afterwards I stopped following

sleek orbit
#

It was a fascinating time. First time I saw the community look for a "birthmark" of sorts to check what model might actually be there.

safe marsh
torn valve
#
<thinking>
In this section you understand the problem and develop a plan to solve the problem.

For easy problems-
Make a simple plan and use COT

For moderate to hard problems-
1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
2. Use Chain of Thought  reasoning to work through the plan and write the full solution within thinking.

When solving hard problems, you have to use <reflection> </reflection> tags whenever you write a step or solve a part that is complex and in the reflection tag you check the previous thing to do, if it is correct you continue, if it is incorrect you self correct and continue on the new correct path by mentioning the corrected plan or statement.
Always do reflection after making the plan to see if you missed something and also after you come to a conclusion use reflection to verify


</thinking>

<output>
In this section, provide the complete answer for the user based on your thinking process. Do not refer to the thinking tag. Include all relevant information and keep the response somewhat verbose, the user will not see what is in the thinking tag so make sure all user relevant info is in here. Do not refer to the thinking tag.
</output>```
safe marsh
#

Thank you.

twin mirage
#

Is it uncensored

ripe vortex
# twin mirage Is it uncensored

Relection is no longer on openrouter due to it being a scam, i don't believe the broken weights were any more uncensored than the normal ones

dusky kraken
#

@wind inlet
Please, post notice about disabling models in announcements channel

wind inlet
wind inlet
vestal smelt
#

the model sucks and is a complete sham btw

#

worse than the actual model it ripped off

narrow jackal
#

On September 5th, @mattshumer_ announced Reflection 70B, a model fine-tuned on top of Llama 3.1 70B, showing SoTA benchmark numbers, which was trained by me on Glaive generated data.

Today, I'm sharing model artifacts to reproduce the initial claims and a post-mortem to address

red badger
#

It must have taken some time teaching the model to replace Claude with "" and training on benchmark data set

#

Too bad you can't fake the tokenizer

tardy bane
#

this time
but is it

limber laurel
#

I also gave access to our GPU nodes, the railway account and git repo which was running the proxy, to a few members of the community, including the OpenRouter team. They didn’t find anything out of the ordinary.
Is it true that Openrouter had access to this version?

red badger
#

This model is a scam they are just trying to save their asses with excuses

#

Just look at the dataset they provided it's filled with garbage they didn't even clean it properly

opal hedge
# limber laurel > I also gave access to our GPU nodes, the railway account and git repo which wa...

They were given some level of access: https://x.com/OpenRouterAI/status/1833031092120715657

Update: we've been trying to get access to commit history and the underlying GPU cluster to confirm that this endpoint has been running original weights. While we've able to see some of the code, we don't have sufficient evidence yet.

Additionally, the Reflection API is now

limber laurel
opal hedge
#

Just to be clear, we have never added any word filtering or made use of Claude APIs when we offered API access to Reflection 70B for people to try out the playground or test/benchmark the model with an API endpoint.
that's an utter lie

#

They had so many more plausible outs from people trying to defend them on Twitter, and they go with the one that was the most easily disproved?

narrow jackal
# red badger This model is a scam they are just trying to save their asses with excuses

I mean, the excuses are just more lies on top of lies. I feel like this "post-mortem" is more about cleaning up their image in a visible way to investors than attempting to actually placate the LLM community; they're probably hoping that they don't get much engagement on any of this (and if they did, they'd certainly ignore it at this point anyway).

tawny mirage
#

Radio silence for nearly a month and now back to it.

Always skipping the comments about switching their private API to multiple models. Just can't come up with excuses for it

lethal sonnet
torn valve
#

Wonder what they'll say when the benchmark ""reproductions"" start coming out

#

Also wonder if they've tampered with the bdnchmark process itself

torn valve
tawny mirage
#

They can't even agree between them.

Sahil said he replicated all but 2 tests.

Matt says his testing didn't meet any of his reported benchmarks

pearl yoke
#

probably intentional

pearl yoke
ripe vortex
#

exactly: nothing makes sense