#Nemotron 4

1 messages · Page 1 of 1 (latest)

naive halo
#

Hi, NVIDIA just launched this family of models, can you please add it to Openrouter?
Thanks!

tacit sleet
#

It's... a bit dumb
I think their SFT dataset wasn't very good

#

It's on LMSYS btw

minor sorrel
tacit sleet
#

well, I didn't feel like it after a small test I conducted

#

to be fair I was mostly interested in looking into how their RM aligned it

worldly nacelle
#

I'm mostly interested because it's quite good with multilanguage stuff.
As far as translation is concerned it's the best open model I've encountered so far.

tacit sleet
#

9T training tokens apparently

#

tho purely for translation 340B is an extreme overkill
there are 10B encoder-decoder models perfectly capable of that, like MADLAD400

worldly nacelle
#

Capable sure, but capable and good are different things.
I'm pretty picky and have tested a lot of LLMs over the last year when it comes to translation.
And Nemotron is exceptionally good for an open model.

tacit sleet
#

and by "perfectly capable" I meant "perfectly decent"

#

Tho Nemotron is seemingly indeed capable of decent translation
Problem is it seemingly meant for generating synthetic data and not for general purpose use

worldly nacelle
# tacit sleet have you tested encoder-decoder models or only decoder-only ones? Decoder-only m...

I have tested some encoder-decoder models, though the majority have indeed been decoder-only models. I've generally found that the translation focused encoder-decoder models like MADLAD400 struggle with translating more than one sentence at a time, which is a big limiting factor when translating languages like Japanese, where translating multiple sentences at once is very benerficial due to how context sensitive the language is. It's also not outstanding when translating single sentences. It's certainly decent, I agree. But decent is not really what I'm looking for.

It's also quite useful to be able to supply additional details for the translation, like the setting of the text, background info, intended audience, etc. And instruct trained decoder models are more capable of integrating that into the translation.

I certainly agree that 340B is much larger than a good translation model should be, I'd love a much smaller on that works just as well, but I haven't found on yet.

tacit sleet
#

One problem with Nemotron I see already is that it's not exactly great at writing on it's own
I can't see myself using this to gen data seeds
Maybe I can do genstruct tho
Tho 4096 token long context window doesn't really allow for larger documents to gen from

cedar vale
#

From what I’ve seen of it… This thing demands further testing. The only non-coding benchmark it genuinely scored sub-par on was the MMLU. Which is weird, because for things like HellaSwag, it BLEW past most other models, and beat even stuff like Base GPT-4.

#

I’m genuinely wondering if something went wrong there. Like it might’ve been a prompting issue.

#

Sure, its only on-par for logic puzzles with other, smaller models, but what I’m noticing here is that it DOES show promise in what is obviously a product of its success in Synthetic Data generation as the goal: Writing.

#

Near as I can tell from initial tests I’ve seen of its narrative capacity, not ONLY is it succesful at just plain old convincingly real-sounding text, but it also seems to, DELIBERATELY, not have a latent positivity bias in it.

#

Saw someone write a hypothetical text conversation between a woman and her friend about a date gone wrong, and it genuinely skeeved me out for how real it sounded.

#

Granted, that’s not NECESSARILY a positive. But what it tells me is that Nvidia, for want to produce data that you could train a model to AVOID as much as ADHERE to, it doesn’t have any marketability-focused training that’d make it suck at writing anything other than latently optimistic storytelling. Cyberpunk, thrillers, etc. all are on the table here.

#

Yes, we are at present limited to 4K here. But hopefully RoPE extension is a potential avenue to fix that.

#

At the end of the day, my crackpot theory is that the fact that it’s designed to create convincing text for synthetic data makes it not-amazing at things we’re USED TO LLMs being good at, but potentially very good at things we’ve grown used to LLMs being kinda bad at.

#

Exciting stuff, from where I’m sitting!

#

I’m sorry to hear your efforts in getting it to write good have been unsuccessful, Aetherwiing. My personal tests so far have been quite promising. Though, we’ll have to wait for something like an API endpoint for rubber to really hit the road with testing.

tacit sleet
# cedar vale At the end of the day, my crackpot theory is that the fact that it’s designed to...

hm, would be interesting to test it on a real synth data gen framework, like Distilabel
LMSYS is fairly limited in what I can test in this regard.
I see some GPT-ish patterns in gens, but let's be honest - currently pretty much 100% of LLMs contain them in some amount.
I also wonder if it requires lower temp or more sampling than it has on LMSYS, bc it lost coherence to some degree on some longer writing prompts I've tested

tacit sleet
cedar vale
#

Hmmm… Yeah, I’d love to see the ability to adjust some dials here. And yeah, I think -isms are a bit hard to suss out these days. Like, you could argue Claude Opus has “-isms”, but that’s moreso beholden to Claude being a singular entity that, like anybody writing anything, would fall into common patterns in writing stories, regardless of genre.

#

I’m honestly really intrigued by the “we don’t recommend a system prompt” line on HF.

tacit sleet
cedar vale
#

That… Does surprise me.

#

Makes me wonder if classifiers are, ultimately, looking for mistakes.

tacit sleet
cedar vale
#

Right, and the problem with that is that, barring willfully messing with structure, truly high quality media will inevitably sound similar because “good” can be kinda unitary in practice.

#

Anyways, in the weeds.

tacit sleet
cedar vale
#

Okay, well bow I just wonder if classifiers are garbage.

#

I know Euryale is the holy grail in common parlance, but like, come on…

tacit sleet
cedar vale
#

It sounds like somethings gone wrong here.

#

Didn’t the Bible get caught by classifiers recently?

tacit sleet
tacit sleet
cedar vale
#

Oh! Oh YOU made that classifier?

tacit sleet
#

I think it's just most classifiers are trained like shit

tacit sleet
cedar vale
#

Okay, interesting.

cedar vale
#

In any case, yeah. Curious how Nemotron fares against it.

#

Though, I will say that, if nothing else, Claude Opus is not a bad writer. And my pushback is probably just me falling into the trap of thinking “Seems AI generated” = Bad objectively.

tacit sleet
tacit sleet
cedar vale
#

Fair…

#

At the end of the day, I just miss the old Claude before Anthropic reigned it in.

#

Was hoping Nemotron could give a taste of what we might get out of Llama 405B. But it seems i may still have to wait?

tacit sleet
cedar vale
#

Mhm!

#

Making your own model just got a LOT easier.

tacit sleet
#

which is a good thing, PPO has a way higher entry bar than any other pref optimization

#

mostly because of it's indirect nature

#

Finally gives me a fair chance to compare PPO and KTO

dusk delta
#

+1 for this model; it’s kinda like a preview of Llama 3 405b, but great for generating and responding with higher quality data without as many synthetic artifacts. Also helps that it doesn’t forget it’s knowledge unlike Claude or Llama

tacit sleet
#

Wonder why NVIDIA still haven't added this to their API yet

dusk delta
#

16 A100s for bf16, maybe this could be hosted on a single 8xA100 node with int8/4 precision

tacit sleet
cedar vale
#

I noticed that Failspy went and converted the raw weights into a Safetensors file for people to try in order to get inference running.

#

Hopefully people smart enough to do so can figure it out. I truly do reserve jugment on this thing until I can get it running through a private frontend.

timid cradle
dusk delta
flat linden
#
<extra_id_0>System

<extra_id_1>User
{prompt 1}
<extra_id_1>Assistant
{response 1}
<extra_id_1>User
{prompt 2}
<extra_id_1>Assistant
{response 2}
...
<extra_id_1>User
{prompt N}
<extra_id_1>Assistant
lusty plover
#

Even a LLM would create a more sane naming scheme 😉

dire edge
#

this is bad,even worse 4k ct

flat linden
#

rip

dire edge
#

add this is just a waste of slot

flat linden
#

I guess DeepInfra will pull it down soon enough just like #1246016908571054182 ...

cedar vale
#

Thank you!!!

#

Even if it’s for a short while, I’ve been itching to try this. Many thanks for the chance!

#

Though uh… Anybody got a SillyTavern conversion of the prompting format? That’s my main front end for testing these things, and I imagine many others.

cedar vale
#

Hmm! Seems my hunch was right. While it’s no Euryale-2.1 in terms of evocative prose, it demonstrates a robust and thorough understanding of the writing task, and follows instructions very well!

#

Just a shame it can only manage a short story at the moment before conking out. V_V

#

Hopefully they do some context enhancing stuff to it like with Llama 3 a little while prior!