Reflection 70B | OpenRouter | Page 2

lime sky Sep 10, 2024, 10:40 PM

#

He took this too seriously

lament gulch Sep 10, 2024, 10:42 PM

#

And they made all this happen with 1 engineer xD

mild steeple Sep 10, 2024, 10:42 PM

#

There are so many of these scumbags, the only way to beat them all is to race ahead of them and make a lot more money doing things the right way. People will see this and then try to copy his playbook now, but these scams are superficial so the real projects will eventually compound beyond them.

#

he's ghosting the people who lent him H100s, including another guy I saw who was pissed https://x.com/Yuchenj_UW/status/1833636690877100488

Yuchen Jin (@Yuchenj_UW) on X

@mattshumer_ Hi Matt, we spent a lot of time, energy, and GPUs on hosting your model and it's sad to see you stopped replying to me in the past 30+ hours, I think you can be more transparent about what happened (especially why your private API has a much better perf)

https://t.co/srTMGruXEZ

lament gulch Sep 10, 2024, 10:55 PM

#

reflection-models-open-llm-leaderboard-score-is-out-way-v0-1egqazjns1od1.png

#

numbers are out

lethal sonnet Sep 10, 2024, 10:55 PM

#

this is gold
https://x.com/tamaybes/status/1833292271829323939/photo/1

Tamay Besiroglu (@tamaybes) on X

I'm excited to announce Deception 70B, the world’s top open-source model.

Trained using Deception-Tuning, a technique developed to enable LLMs to deceive themselves of their own mistakes.

Try it out now: https://t.co/y8Hj3jvuSk

lament gulch Sep 10, 2024, 10:56 PM

#

mild steeple he's ghosting the people who lent him H100s, including another guy I saw who was...

lol

#

https://x.com/Yuchenj_UW/status/1833627813552992722

Yuchen Jin (@Yuchenj_UW) on X

Here’s my story about hosting Reflection 70B on @hyperbolic_labs:

On Sep 3, Matt Shumer reached out to us, saying he wanted to release a 70B LLM that should be the top OSS model (far ahead of 405B), and he asked if we were interested in hosting it. At that time, I thought it was

#

more sauce

torn valve Sep 10, 2024, 11:00 PM

#

This is embarrassing

lament gulch Sep 10, 2024, 11:01 PM

#

So the host confirms, they suddenly got ghosted and "hosted their own" instead

mild steeple Sep 10, 2024, 11:06 PM

#

It seems he kept it short and worded it carefully to avoid community notes, but there's enough evidence now to prove the models never worked https://x.com/corbtt/status/1833633946644713582

Kyle Corbitt (@corbtt) on X

@mattshumer_ @ArtificialAnlys Final report on Reflection-70B: after investigating, I do not believe the a model that achieved the claimed benchmarks ever existed.

It' very unclear to me where those numbers came from, and I hope that Sahil/Matt will shed more light how this happened.

torn valve Sep 10, 2024, 11:15 PM

#

Something like "the model was always a fraud" there would suffice tbh

#

some brief evidence

opal hedge Sep 10, 2024, 11:19 PM

#

https://x.com/Yuchenj_UW/status/1833641243680903366
I guess the one thing out of this is we might actually get another provider for deepseek 2.5 other than just deepseek themselves.

Yuchen Jin (@Yuchenj_UW) on X

@teortaxesTex @hyperbolic_labs Yes, DeepSeek 2.5 in our plan!

It's unfortunate for them to have their release coincide with Reflection 70b

vestal smelt Sep 10, 2024, 11:51 PM

#

hopefully it’ll be faster then

#

deepseek is sooo slow

mild steeple Sep 10, 2024, 11:58 PM

#

torn valve Something like "the model was always a fraud" there would suffice tbh

The model goes to a school in Canada, you wouldn't know her 😂

stoic bay Sep 11, 2024, 12:47 AM

#

Man I've been eating up all this bs on twitter and reddit, im glad I found this thread

#

i just stumbled on this gem of a quote in the matthew berman interview

#

sorry for the childish edit 😂

mild steeple Sep 11, 2024, 1:06 AM

#

His reasoning is equivalent to 'my dog ate my model' so we're allowed to be childish ridiculing this fool

sleek orbit Sep 11, 2024, 3:01 AM

#

My gosh Jin from Hyperbolic DESTROYED him in those tweets.
Literally pick another career.

tawny mirage Sep 11, 2024, 5:07 AM

#

lament gulch https://x.com/Yuchenj_UW/status/1833627813552992722

"you are an expert in system log creation, write a series of API logs ...." These guys with their transparency and investigation bullshit is just too funny now.

Nothing more will come. Start ideation for the 2025 scam

rapid ginkgo Sep 11, 2024, 7:26 AM

#

But I'm also quite pleased to see how many people are spineless, gullible, grifting yessirs that jump onto any bandwagon and defend a bullshit claim without any evidence and give it credibility. For example, certain members of OR staff. 🎙️🔽

mild steeple Sep 11, 2024, 10:53 AM

#

He gained 15k followers in the past 3 days, and he will keep doing this since it works for these grifters after a certain scale. By now he's wasted thousands of hours of open source and top researchers time. If you are angry about this, leave them a review so there is lasting proof for future victims:

https://www.g2.com/products/hyperwrite/reviews

https://chromewebstore.google.com/detail/hyperwrite-ai-assistant/kljjoeapehcmaphfcjkmbhkinoaopdnd?hl=en-US

HyperWrite - AI Assistant - Chrome Web Store

Personal Assistant by HyperWrite is the first AI agent that can operate your browser. It's like self-driving mode for the web.

mild steeple Sep 11, 2024, 11:09 AM

#

If you add up all the hours across the millions of views his post got, assuming each person wasted 1 hour on this, that's over 300 years. That means we literally lost lifetimes on this. If you leave a review, you can literally save lives 😂

lone dew Sep 11, 2024, 11:59 AM

#

I lost exactly one minute writing this: go back to work idiots

rapid ginkgo Sep 11, 2024, 1:52 PM

#

Bold of you to assume I work

crimson slate Sep 11, 2024, 1:56 PM

#

working -> https://www.youtube.com/watch?v=e7Klczu14tE

lethal sonnet Sep 11, 2024, 2:41 PM

#

lone dew I lost exactly one minute writing this: go back to work idiots

that's 1 minute wasted, you gonna have to explain it to your manager

mild steeple Sep 11, 2024, 2:41 PM

#

lone dew I lost exactly one minute writing this: go back to work idiots

I was referring to top researchers who downloaded the model, and cared about the results of distillation. Who are you, and why would anyone care what you do?

vestal smelt Sep 11, 2024, 6:58 PM

#

new-details-emerge-on-openais-strawberry-potential-release-v0-e4tgzofop0od1.png

steady tapir Sep 12, 2024, 1:29 AM

#

this model is kind of dumb, wtf 2 + 1 = 6

vestal smelt Sep 12, 2024, 1:41 AM

#

steady tapir this model is kind of dumb, wtf 2 + 1 = 6

it’s a fraud man

steady tapir Sep 12, 2024, 1:52 AM

#

vestal smelt it’s a fraud man

hmm, the model is a fraud? I need to catch up on the lore 😂

vestal smelt Sep 12, 2024, 2:07 AM

#

scroll up g

steady tapir Sep 12, 2024, 2:45 AM

#

https://www.youtube.com/watch?v=Xtr_Ll_A9ms

YouTube

bycloud

The LK-99 of AI: The Reflection-70B Controversy Full Rundown

For people wondering why I draw similarity with LK-99, it's because the results are not reproducible.

The saga of reflection-70B has been a wild one. This video, I have made a full break down on the situation, with the latest up-to-date information, and why we probably won't be expecting anything good out from it anymore.

check out my newslett...

▶ Play video

lament gulch Sep 12, 2024, 6:48 AM

#

#

I wonder why all these "PR" were spammed, maybe if we go back one page and look

#

#

This story just keeps on giving 😄

mild steeple Sep 12, 2024, 9:55 AM

#

He seems to be getting google to remove the bad reviews. Don't forget to leave a review so he doesn't get away with this, but don't mention his name Matt Shumer since that may violate google policies
https://chromewebstore.google.com/detail/hyperwrite-ai-assistant/kljjoeapehcmaphfcjkmbhkinoaopdnd?hl=en-US

HyperWrite - AI Assistant - Chrome Web Store

Personal Assistant by HyperWrite is the first AI agent that can operate your browser. It's like self-driving mode for the web.

lament gulch Sep 12, 2024, 10:46 AM

#

Probably saving himself from possible investors

thin solstice Sep 12, 2024, 6:06 PM

#

so was this a scam after all

torn valve Sep 12, 2024, 6:19 PM

#

I heard some indie company named "OpenAI" copied this approach, smh

deft veldt Sep 12, 2024, 6:41 PM

#

lmfao

crimson slate Sep 12, 2024, 6:46 PM

#

torn valve I heard some indie company named "OpenAI" copied this approach, smh

And they found the missing piece to make Reflection work: Making so slow and costly that every answer gets treated a miracle, like sage's answer. If only Matthew had thought of that, nobody would have been able to see through his scheme until at least now.

narrow jackal Sep 12, 2024, 6:46 PM

#

That's some funny timing with the strawberry release

spare basalt Sep 12, 2024, 6:50 PM

#

crimson slate And they found the missing piece to make Reflection work: Making so slow and cos...

True, the benchmarks would have only come in an hour ago.

red badger Sep 12, 2024, 7:06 PM

#

Reflection done right

#

Take notes Matt

lament gulch Sep 12, 2024, 7:25 PM

#

The idea of reasoning is old, it's basically the prompt engineering everyone has been doing "think step by step", Matt didnt invent shit, he is just a scam nothing more. Not even worth to bring up his name in this scenario.

royal wigeon Sep 12, 2024, 8:04 PM

#

Not my video, but a nice link to share maybe if anyone still has doubts/questions https://www.youtube.com/watch?v=wOzdbxmQbRM

YouTube

Tim Truth

Rhymes With Fraud: Matt Shumer's Reflection 70B LLM Turned Out To B...

Want more videos? Join the leading researchers on https://GroupDiscover.com to find the best videos from across the free speech internet platforms like Odysee, Rumble, Bitchute & more awesome video platforms.

Join this channel to get access to upcoming exclusive perks:
https://www.youtube.com/channel/UCZE3V7__ieMM6XZfaZYYhKA/join

▶ Play video

rapid ginkgo Sep 12, 2024, 8:21 PM

#

Again, at least from the thumbnail missing the point

royal wigeon Sep 12, 2024, 8:39 PM

#

rapid ginkgo Again, at least from the thumbnail missing the point

Def can see your point on that. Anyhow, what he means is that the gamed the benchmarks because they were using Claude all along when they sent their "internal" api access to the intial testers etc. which caused all the hype. He gets into it with a good amount of detail.

rapid ginkgo Sep 12, 2024, 8:40 PM

#

royal wigeon Def can see your point on that. Anyhow, what he means is that the gamed the benc...

For me "gaming a benchmark" means training a model on benchmark dataset or somehow playing the way the benchmark works, this is just straight up faking it

mild steeple Sep 13, 2024, 12:19 AM

#

Matt got the idea for reflection/distillation from Karpathy, likely from this video, but this has been a well researched idea in the last 2 years across many papers I could link: https://www.youtube.com/watch?v=hM_h0UA7upI

He basically does this exact same thing constantly: steal an idea from someone then try to get followers with an exaggerated version and scam the open source community to build it for him. Eventually the open source community started investigating him. See the 'agent-1' model from last year, which still doesn't exist: https://github.com/OthersideAI/self-operating-computer/issues/21

lament gulch Sep 13, 2024, 5:39 PM

#

#

I have no clue what the HF team is doing allowing this to be trending 1st

#

lament gulch Sep 13, 2024, 5:40 PM

#

lament gulch

the "leaderboard" is now closed after the spam was made to hide the reports in the discussions lol

rapid ginkgo Sep 13, 2024, 6:03 PM

#

Free market in action

rapid ginkgo Sep 13, 2024, 6:37 PM

#

What other kind is there

wraith badger Sep 22, 2024, 1:04 PM

#

a sign of life from mr. shumer. while still no activity on social media, he uploaded a new github repo with the word "redemption" in it https://github.com/mshumer/LiveCodeBenchRedemption but nothing interesting there yet. seems to be a co-op with LiveCodeBench.

torn valve Sep 22, 2024, 3:18 PM

#

Let's not engage, to be honest, unless this gains traction. I doubt there's any way or interest for him to "redeem" himself, the way he acted through this thing was willfully deceptive and malicious the entire time

lament gulch Sep 23, 2024, 6:39 PM

#

#

They are still busy hiding reports by the user

#

Amazing to this day. HF does nothing about it, I've reported then but not a word.

crimson slate Sep 23, 2024, 8:07 PM

#

It is gone (from OpenRouter) ->

safe marsh Sep 24, 2024, 12:07 AM

#

Where did this model go?

crimson slate Sep 24, 2024, 12:08 AM

#

safe marsh Where did this model go?

To the eternal token hunting ground (at least from OpenRouters perspective)

safe marsh Sep 24, 2024, 12:08 AM

#

That's a shame because I felt like I was getting some usable results with it.

#

No chance it will be returning then?

crimson slate Sep 24, 2024, 12:10 AM

#

safe marsh No chance it will be returning then?

If it gets popular again and someone wants to host this model, OpenRouter will surely route to it (I guess)

safe marsh Sep 24, 2024, 12:11 AM

#

What happened to make nobody want to host it?

torn valve Sep 24, 2024, 12:12 AM

#

Well, the model is based on a fraud

crimson slate Sep 24, 2024, 12:12 AM

#

zombieflection-70B 🙂

torn valve Sep 24, 2024, 12:12 AM

#

And it performs worse than the model it's finetuned on top of on most tasks

safe marsh Sep 24, 2024, 12:13 AM

#

What was the model it was fine-tuned on top of?

#

I guess I just kind of liked it when it was actively reflecting on the inputs. It seemed like it had better outputs with me doing less work prompting-wise.

torn valve Sep 24, 2024, 12:15 AM

#

Forgot whether it's Llama3 or Llama3.1 70B (Matt itself claimed he wasn't sure either, lol)

#

At some point

crimson slate Sep 24, 2024, 12:16 AM

#

torn valve Forgot whether it's Llama3 or Llama3.1 70B (Matt itself claimed he wasn't sure e...

I thought I knew, but then realized there was so much fuzz that I actually didn't know either.

sleek orbit Sep 24, 2024, 12:16 AM

#

Your honour, we can say that a Llama was involved.

torn valve Sep 24, 2024, 12:17 AM

#

safe marsh I guess I just kind of liked it when it was actively reflecting on the inputs. I...

Well, the og Reflection API was just Claude 3.5 Sonnet with a system prompt, so you might have luck with that for your use case

#

Then Matt got found out and changed to 4o, and then afterwards I stopped following

sleek orbit Sep 24, 2024, 12:19 AM

#

It was a fascinating time. First time I saw the community look for a "birthmark" of sorts to check what model might actually be there.

safe marsh Sep 24, 2024, 12:22 AM

#

torn valve Well, the og Reflection API was just Claude 3.5 Sonnet with a system prompt, so ...

Is the system prompt listed anywhere?

torn valve Sep 24, 2024, 12:24 AM

#

<thinking>
In this section you understand the problem and develop a plan to solve the problem.

For easy problems-
Make a simple plan and use COT

For moderate to hard problems-
1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
2. Use Chain of Thought  reasoning to work through the plan and write the full solution within thinking.

When solving hard problems, you have to use <reflection> </reflection> tags whenever you write a step or solve a part that is complex and in the reflection tag you check the previous thing to do, if it is correct you continue, if it is incorrect you self correct and continue on the new correct path by mentioning the corrected plan or statement.
Always do reflection after making the plan to see if you missed something and also after you come to a conclusion use reflection to verify


</thinking>

<output>
In this section, provide the complete answer for the user based on your thinking process. Do not refer to the thinking tag. Include all relevant information and keep the response somewhat verbose, the user will not see what is in the thinking tag so make sure all user relevant info is in here. Do not refer to the thinking tag.
</output>```

safe marsh Sep 24, 2024, 2:49 PM

#

Thank you.

twin mirage Sep 29, 2024, 3:36 AM

#

Is it uncensored

ripe vortex Sep 29, 2024, 8:34 AM

#

twin mirage Is it uncensored

Relection is no longer on openrouter due to it being a scam, i don't believe the broken weights were any more uncensored than the normal ones

dusky kraken Sep 30, 2024, 8:02 AM

#

@wind inlet
Please, post notice about disabling models in announcements channel

wind inlet Sep 30, 2024, 8:03 AM

#

dusky kraken <@353228093420208131> Please, post notice about disabling models in announcemen...

cc @thorn abyss @spiral sparrow

wind inlet Sep 30, 2024, 8:05 AM

#

dusky kraken <@353228093420208131> Please, post notice about disabling models in announcemen...

FYI - the provider stopped hosting these models, so the endpoint is gone but the model is still there: https://openrouter.ai/models/mattshumer/reflection-70b

Reflection 70B - API, Providers, Stats

Reflection Llama-3.1 70B is trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course. Run Reflection 70B with API

vestal smelt Sep 30, 2024, 8:17 PM

#

the model sucks and is a complete sham btw

#

worse than the actual model it ripped off

narrow jackal Oct 3, 2024, 1:06 AM

#

They're back! It totally wasn't a scam, they've definitely uploaded the correct weights this time!
https://x.com/csahil28/status/1841606301782311167
https://huggingface.co/glaiveai/Reflection-Llama-3.1-70B
https://glaive.ai/blog/post/reflection-postmortem

Sahil Chaudhary (@csahil28) on X

On September 5th, @mattshumer_ announced Reflection 70B, a model fine-tuned on top of Llama 3.1 70B, showing SoTA benchmark numbers, which was trained by me on Glaive generated data.

Today, I'm sharing model artifacts to reproduce the initial claims and a post-mortem to address

glaiveai/Reflection-Llama-3.1-70B · Hugging Face

Update on Reflection-70B

Reproducing Reflection-70B benchmark scores and postmortem on what happened.

red badger Oct 3, 2024, 7:02 AM

#

It must have taken some time teaching the model to replace Claude with "" and training on benchmark data set

#

Too bad you can't fake the tokenizer

tardy bane Oct 3, 2024, 7:12 AM

#

this time
but is it

limber laurel Oct 3, 2024, 9:48 AM

#

I also gave access to our GPU nodes, the railway account and git repo which was running the proxy, to a few members of the community, including the OpenRouter team. They didn’t find anything out of the ordinary.
Is it true that Openrouter had access to this version?

red badger Oct 3, 2024, 10:09 AM

#

This model is a scam they are just trying to save their asses with excuses

#

Just look at the dataset they provided it's filled with garbage they didn't even clean it properly

opal hedge Oct 3, 2024, 10:27 AM

#

limber laurel > I also gave access to our GPU nodes, the railway account and git repo which wa...

They were given some level of access: https://x.com/OpenRouterAI/status/1833031092120715657

OpenRouter (@OpenRouterAI) on X

Update: we've been trying to get access to commit history and the underlying GPU cluster to confirm that this endpoint has been running original weights. While we've able to see some of the code, we don't have sufficient evidence yet.

Additionally, the Reflection API is now

limber laurel Oct 3, 2024, 10:33 AM

#

opal hedge They were given _some_ level of access: https://x.com/OpenRouterAI/status/183303...

Nice how they changed "we don't have sufficient evidence" to "They didn't find anything out of the ordinary"

opal hedge Oct 3, 2024, 10:36 AM

#

Just to be clear, we have never added any word filtering or made use of Claude APIs when we offered API access to Reflection 70B for people to try out the playground or test/benchmark the model with an API endpoint.
that's an utter lie

#

They had so many more plausible outs from people trying to defend them on Twitter, and they go with the one that was the most easily disproved?

narrow jackal Oct 3, 2024, 2:36 PM

#

red badger This model is a scam they are just trying to save their asses with excuses

I mean, the excuses are just more lies on top of lies. I feel like this "post-mortem" is more about cleaning up their image in a visible way to investors than attempting to actually placate the LLM community; they're probably hoping that they don't get much engagement on any of this (and if they did, they'd certainly ignore it at this point anyway).

tawny mirage Oct 3, 2024, 5:30 PM

#

Radio silence for nearly a month and now back to it.

Always skipping the comments about switching their private API to multiple models. Just can't come up with excuses for it

lethal sonnet Oct 4, 2024, 11:17 PM

#

tawny mirage Radio silence for nearly a month and now back to it. Always skipping the comme...

I'm astonished they are still even trying. This is hilarious

torn valve Oct 4, 2024, 11:46 PM

#

Wonder what they'll say when the benchmark ""reproductions"" start coming out

#

Also wonder if they've tampered with the bdnchmark process itself

torn valve Oct 5, 2024, 12:08 AM

#

torn valve Wonder what they'll say when the benchmark ""reproductions"" start coming out

Lol? Nevermind, Matt himself confirmed the model is bad

tawny mirage Oct 5, 2024, 9:45 AM

#

https://x.com/rimomaguiar/status/1842326685821071492?t=KCjDwzOs8AOKgGL9Vurz1A&s=19

Jesus... People are deranged in their thinking

Rimom (@rimomaguiar) on X

@mattshumer_ The idea was very good anyway. And I believe your announcement is what pushed openai to release their o1-preview. Even thought the model is not what was expected, publish everything so it can give others some similar ideas.

#

They can't even agree between them.

Sahil said he replicated all but 2 tests.

Matt says his testing didn't meet any of his reported benchmarks

pearl yoke Oct 5, 2024, 9:48 AM

#

probably intentional

pearl yoke Oct 5, 2024, 3:21 PM

#

torn valve Lol? Nevermind, Matt himself confirmed the model is bad

Wasn't he the model author? Why would he need to reproduce it again if he supposedly had a working version? It broke suddenly?

ripe vortex Oct 5, 2024, 6:33 PM

#

exactly: nothing makes sense

#Reflection 70B