Nemotron 4 | OpenRouter | Page 1

naive halo Jun 14, 2024, 6:48 PM

#

Hi, NVIDIA just launched this family of models, can you please add it to Openrouter?
Thanks!

minor sorrel Jun 14, 2024, 8:22 PM

#

https://huggingface.co/collections/nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911

Nemotron 4 340B - a nvidia Collection

#

Waiting so much!

tacit sleet Jun 14, 2024, 8:51 PM

#

It's... a bit dumb
I think their SFT dataset wasn't very good

#

It's on LMSYS btw

minor sorrel Jun 14, 2024, 10:09 PM

#

tacit sleet It's... a bit dumb I think their SFT dataset wasn't very good

they say its a bit better then 4turbo in some part

tacit sleet Jun 14, 2024, 10:12 PM

#

well, I didn't feel like it after a small test I conducted

#

to be fair I was mostly interested in looking into how their RM aligned it

worldly nacelle Jun 15, 2024, 2:15 PM

#

I'm mostly interested because it's quite good with multilanguage stuff.
As far as translation is concerned it's the best open model I've encountered so far.

tacit sleet Jun 15, 2024, 2:24 PM

#

9T training tokens apparently

#

tho purely for translation 340B is an extreme overkill
there are 10B encoder-decoder models perfectly capable of that, like MADLAD400

worldly nacelle Jun 15, 2024, 2:39 PM

#

Capable sure, but capable and good are different things.
I'm pretty picky and have tested a lot of LLMs over the last year when it comes to translation.
And Nemotron is exceptionally good for an open model.

tacit sleet Jun 15, 2024, 3:11 PM

#

worldly nacelle Capable sure, but capable and good are different things. I'm pretty picky and ha...

have you tested encoder-decoder models or only decoder-only ones?
Decoder-only models are inherently worse at translation

#

and by "perfectly capable" I meant "perfectly decent"

#

Tho Nemotron is seemingly indeed capable of decent translation
Problem is it seemingly meant for generating synthetic data and not for general purpose use

worldly nacelle Jun 15, 2024, 4:53 PM

#

tacit sleet have you tested encoder-decoder models or only decoder-only ones? Decoder-only m...

I have tested some encoder-decoder models, though the majority have indeed been decoder-only models. I've generally found that the translation focused encoder-decoder models like MADLAD400 struggle with translating more than one sentence at a time, which is a big limiting factor when translating languages like Japanese, where translating multiple sentences at once is very benerficial due to how context sensitive the language is. It's also not outstanding when translating single sentences. It's certainly decent, I agree. But decent is not really what I'm looking for.

It's also quite useful to be able to supply additional details for the translation, like the setting of the text, background info, intended audience, etc. And instruct trained decoder models are more capable of integrating that into the translation.

I certainly agree that 340B is much larger than a good translation model should be, I'd love a much smaller on that works just as well, but I haven't found on yet.

tacit sleet Jun 15, 2024, 4:57 PM

#

worldly nacelle I have tested some encoder-decoder models, though the majority have indeed been ...

It would be quite interesting to see if NVIDIA included multilingual examples into L3 based SteerLM models, as those are at a much more reasonable scale of 70 billion parameters.
About MADLAD - I found it to be at least good enough to translate training data into russian and belarussian, with which it likely indeed has an easier time.

#

One problem with Nemotron I see already is that it's not exactly great at writing on it's own
I can't see myself using this to gen data seeds
Maybe I can do genstruct tho
Tho 4096 token long context window doesn't really allow for larger documents to gen from

cedar vale Jun 15, 2024, 6:30 PM

#

From what I’ve seen of it… This thing demands further testing. The only non-coding benchmark it genuinely scored sub-par on was the MMLU. Which is weird, because for things like HellaSwag, it BLEW past most other models, and beat even stuff like Base GPT-4.

#

I’m genuinely wondering if something went wrong there. Like it might’ve been a prompting issue.

#

Sure, its only on-par for logic puzzles with other, smaller models, but what I’m noticing here is that it DOES show promise in what is obviously a product of its success in Synthetic Data generation as the goal: Writing.

#

Near as I can tell from initial tests I’ve seen of its narrative capacity, not ONLY is it succesful at just plain old convincingly real-sounding text, but it also seems to, DELIBERATELY, not have a latent positivity bias in it.

#

Saw someone write a hypothetical text conversation between a woman and her friend about a date gone wrong, and it genuinely skeeved me out for how real it sounded.

#

Granted, that’s not NECESSARILY a positive. But what it tells me is that Nvidia, for want to produce data that you could train a model to AVOID as much as ADHERE to, it doesn’t have any marketability-focused training that’d make it suck at writing anything other than latently optimistic storytelling. Cyberpunk, thrillers, etc. all are on the table here.

#

Yes, we are at present limited to 4K here. But hopefully RoPE extension is a potential avenue to fix that.

#

At the end of the day, my crackpot theory is that the fact that it’s designed to create convincing text for synthetic data makes it not-amazing at things we’re USED TO LLMs being good at, but potentially very good at things we’ve grown used to LLMs being kinda bad at.

#

Exciting stuff, from where I’m sitting!

#

I’m sorry to hear your efforts in getting it to write good have been unsuccessful, Aetherwiing. My personal tests so far have been quite promising. Though, we’ll have to wait for something like an API endpoint for rubber to really hit the road with testing.

tacit sleet Jun 15, 2024, 6:39 PM

#

cedar vale At the end of the day, my crackpot theory is that the fact that it’s designed to...

hm, would be interesting to test it on a real synth data gen framework, like Distilabel
LMSYS is fairly limited in what I can test in this regard.
I see some GPT-ish patterns in gens, but let's be honest - currently pretty much 100% of LLMs contain them in some amount.
I also wonder if it requires lower temp or more sampling than it has on LMSYS, bc it lost coherence to some degree on some longer writing prompts I've tested

tacit sleet Jun 15, 2024, 6:41 PM

#

cedar vale Granted, that’s not NECESSARILY a positive. But what it tells me is that Nvidia,...

That could be useful for KTO or Reward model data

cedar vale Jun 15, 2024, 6:41 PM

#

Hmmm… Yeah, I’d love to see the ability to adjust some dials here. And yeah, I think -isms are a bit hard to suss out these days. Like, you could argue Claude Opus has “-isms”, but that’s moreso beholden to Claude being a singular entity that, like anybody writing anything, would fall into common patterns in writing stories, regardless of genre.

#

I’m honestly really intrigued by the “we don’t recommend a system prompt” line on HF.

tacit sleet Jun 15, 2024, 6:43 PM

#

cedar vale Hmmm… Yeah, I’d love to see the ability to adjust some dials here. And yeah, I t...

Interesting thing about Claude Opus is that it gets picked as AI on a classifier trained to differentiate between GPT 3.5 Turbo (as AI) gens and human writing

cedar vale Jun 15, 2024, 6:43 PM

#

That… Does surprise me.

#

Makes me wonder if classifiers are, ultimately, looking for mistakes.

tacit sleet Jun 15, 2024, 6:43 PM

#

cedar vale That… Does surprise me.

you can try for yourself (that's my model, yeah)
https://huggingface.co/nothingiisreal/open-gpt-3.5-detector

nothingiisreal/open-gpt-3.5-detector · Hugging Face

tacit sleet Jun 15, 2024, 6:43 PM

#

cedar vale Makes me wonder if classifiers are, ultimately, looking for *mistakes*.

they are looking for patterns

cedar vale Jun 15, 2024, 6:45 PM

#

Right, and the problem with that is that, barring willfully messing with structure, truly high quality media will inevitably sound similar because “good” can be kinda unitary in practice.

#

Anyways, in the weeds.

tacit sleet Jun 15, 2024, 6:46 PM

#

cedar vale Right, and the problem with that is that, barring willfully messing with structu...

hm, there are LLMs which are mostly trained on synth, and barely pick any synthetic patterns
one of them is #1250867165737914569

cedar vale Jun 15, 2024, 6:46 PM

#

Okay, well bow I just wonder if classifiers are garbage.

#

I know Euryale is the holy grail in common parlance, but like, come on…

tacit sleet Jun 15, 2024, 6:47 PM

#

cedar vale Okay, well bow I just wonder if classifiers are garbage.

I don't think so
This thing didn't flagged any human data from eval test and further runs as AI

cedar vale Jun 15, 2024, 6:47 PM

#

It sounds like somethings gone wrong here.

#

Didn’t the Bible get caught by classifiers recently?

tacit sleet Jun 15, 2024, 6:47 PM

#

cedar vale It sounds like somethings gone wrong here.

Euryale get's flagged still, just less

tacit sleet Jun 15, 2024, 6:47 PM

#

cedar vale Didn’t the Bible get caught by classifiers recently?

I tested mine on Bible, detected as human writing

cedar vale Jun 15, 2024, 6:48 PM

#

Oh! Oh YOU made that classifier?

tacit sleet Jun 15, 2024, 6:48 PM

#

I think it's just most classifiers are trained like shit

tacit sleet Jun 15, 2024, 6:48 PM

#

cedar vale Oh! Oh YOU made that classifier?

Yes

cedar vale Jun 15, 2024, 6:48 PM

#

Okay, interesting.

tacit sleet Jun 15, 2024, 6:48 PM

#

tacit sleet you can try for yourself (that's my model, yeah) https://huggingface.co/nothingi...

this one is mine

cedar vale Jun 15, 2024, 6:48 PM

#

In any case, yeah. Curious how Nemotron fares against it.

#

Though, I will say that, if nothing else, Claude Opus is not a bad writer. And my pushback is probably just me falling into the trap of thinking “Seems AI generated” = Bad objectively.

tacit sleet Jun 15, 2024, 6:50 PM

#

cedar vale In any case, yeah. Curious how Nemotron fares against it.

Random example of creative writing by Nemotron
Class label 1, AI generated

tacit sleet Jun 15, 2024, 6:52 PM

#

cedar vale Though, I will say that, if nothing else, Claude Opus is not a bad writer. And m...

Synth data is not necessarily bad, it's just rather repetitive and sometimes non-human sounding, so to speak.
It can be a problem for creative writing, and other creative related things, but on the other hand, barely anyone will complain about good quality synth coding data. So ymmv

cedar vale Jun 15, 2024, 6:52 PM

#

Fair…

#

At the end of the day, I just miss the old Claude before Anthropic reigned it in.

#

Was hoping Nemotron could give a taste of what we might get out of Llama 405B. But it seems i may still have to wait?

tacit sleet Jun 15, 2024, 6:53 PM

#

cedar vale At the end of the day, I just miss the old Claude before Anthropic reigned it in...

Same, I miss early Claude-1

tacit sleet Jun 15, 2024, 6:54 PM

#

cedar vale Was hoping Nemotron could give a taste of what we might get out of Llama 405B. B...

NVIDIA already gave us something rather valuable with Nemotron - PPO data and code, and two pretrained Reward Models
All of those things are really damn rare

cedar vale Jun 15, 2024, 6:55 PM

#

Mhm!

#

Making your own model just got a LOT easier.

tacit sleet Jun 15, 2024, 6:55 PM

#

cedar vale Making your own model just got a LOT easier.

PPO finetuning at least

#

which is a good thing, PPO has a way higher entry bar than any other pref optimization

#

mostly because of it's indirect nature

#

Finally gives me a fair chance to compare PPO and KTO

dusk delta Jun 17, 2024, 2:33 PM

#

+1 for this model; it’s kinda like a preview of Llama 3 405b, but great for generating and responding with higher quality data without as many synthetic artifacts. Also helps that it doesn’t forget it’s knowledge unlike Claude or Llama

tacit sleet Jun 17, 2024, 2:47 PM

#

dusk delta +1 for this model; it’s kinda like a preview of Llama 3 405b, but great for gene...

yeah, it would be interesting to give this model a spin on my Airo 1K regenning test

#

Wonder why NVIDIA still haven't added this to their API yet

dusk delta Jun 17, 2024, 2:53 PM

#

16 A100s for bf16, maybe this could be hosted on a single 8xA100 node with int8/4 precision

#

is the OpenRouter team interested in hosting this once support is added to a couple backends?
https://huggingface.co/failspy/Nemotron-4-340B-Instruct-SafeTensors

failspy/Nemotron-4-340B-Instruct-SafeTensors · Hugging Face

tacit sleet Jun 17, 2024, 2:58 PM

#

dusk delta is the OpenRouter team interested in hosting this once support is added to a cou...

I wish someone converted SteerLM 70B from NeMo to ST too
That one should be way easier to get running, as it is based on L3-70B

cedar vale Jun 18, 2024, 7:35 PM

#

I noticed that Failspy went and converted the raw weights into a Safetensors file for people to try in order to get inference running.

#

Hopefully people smart enough to do so can figure it out. I truly do reserve jugment on this thing until I can get it running through a private frontend.

timid cradle Jun 21, 2024, 9:37 PM

#

https://deepinfra.com/nvidia/Nemotron-4-340B-Instruct

nvidia/Nemotron-4-340B-Instruct - Demo - DeepInfra

Nemotron-4-340B-Instruct is a chat model intended for use for the English language, designed for Synthetic Data Generation. Try out API on the Web

dusk delta Jun 22, 2024, 1:25 AM

#

timid cradle https://deepinfra.com/nvidia/Nemotron-4-340B-Instruct

@flat linden

flat linden Jun 22, 2024, 1:27 AM

#

timid cradle https://deepinfra.com/nvidia/Nemotron-4-340B-Instruct

Adding

flat linden Jun 22, 2024, 2:45 AM

#

timid cradle https://deepinfra.com/nvidia/Nemotron-4-340B-Instruct

wtf is with the template for thismodel

#

<extra_id_0>System

<extra_id_1>User
{prompt 1}
<extra_id_1>Assistant
{response 1}
<extra_id_1>User
{prompt 2}
<extra_id_1>Assistant
{response 2}
...
<extra_id_1>User
{prompt N}
<extra_id_1>Assistant

lusty plover Jun 22, 2024, 8:25 AM

#

Even a LLM would create a more sane naming scheme 😉

whole crown Jun 24, 2024, 4:59 AM

#

https://tenor.com/view/el-risitas-juan-joya-borja-ratones-coloraos-laugh-meme-laughing-man-gif-24899295

Tenor

#

4k context

dire edge Jun 24, 2024, 1:45 PM

#

this is bad,even worse 4k ct

flat linden Jun 24, 2024, 1:45 PM

#

rip

dire edge Jun 24, 2024, 1:46 PM

#

add this is just a waste of slot

flat linden Jun 24, 2024, 1:47 PM

#

I guess DeepInfra will pull it down soon enough just like #1246016908571054182 ...

#

But while it's still up: https://openrouter.ai/models/nvidia/nemotron-4-340b-instruct

NVIDIA Nemotron-4 340B Instruct by nvidia

Nemotron-4-340B-Instruct is an English-language chat model optimized for synthetic data generation. This large language model (LLM) is a fine-tuned version of Nemotron-4-340B-Base, designed for single and multi-turn chat use-cases with a 4,096 token context length.

The base model was pre-trained on 9 trillion tokens from diverse English texts, ...

cedar vale Jun 24, 2024, 2:28 PM

#

Thank you!!!

#

Even if it’s for a short while, I’ve been itching to try this. Many thanks for the chance!

#

Though uh… Anybody got a SillyTavern conversion of the prompting format? That’s my main front end for testing these things, and I imagine many others.

cedar vale Jun 24, 2024, 5:28 PM

#

Hmm! Seems my hunch was right. While it’s no Euryale-2.1 in terms of evocative prose, it demonstrates a robust and thorough understanding of the writing task, and follows instructions very well!

#

Just a shame it can only manage a short story at the moment before conking out. V_V

#

Hopefully they do some context enhancing stuff to it like with Llama 3 a little while prior!

#Nemotron 4