#Challenge LLMs & help build an authoritative LLM leaderboard!

1 messages · Page 1 of 1 (latest)

plain spruce
hollow crest
#

Great to see kaggle partner with lmsys

lean tusk
hollow crest
#

Yeah, I've played with the chatbot arena before and have used their vicuna models

lean tusk
#

Very nice – if you don't mind sharing, what were thoughts on the arena?

coarse orchid
#

Nice! I am curious, would you allow your own fine-tuned models onto the arena or side-by-side, or is it only for the foundation models and open-source community competitors?

short palm
short palm
lean tusk
# short palm Hi Kinjal, what Top P parameter means?

Top P can definitely be confusing. It's a parameter you can set on a large language model inference that helps you balance diversity of word choice with high likelihood words. If you set a higher P, you will tend to have more diverse output from the LLM.

The way it works is by taking the smallest sample of tokens whose cumulative probability mass most greatly exceeds P. Consider the tokens with probabilities: [0.4, 0.3, 0.2, 0.1]. If you set P to anything <=0.4, then it would only sample the token with probability 0.4. If you set P = 0.8, it will sample the tokens with 0.4, 0.3, 0.2 because 0.4 + 0.3 + 0.2 = 0.9.

Here's a video explanation if it helps: https://www.youtube.com/watch?v=nfqZwC_h388

Note: Temparature is another parameter you can use to adjust sampling. It's often recommended to use either Temperature or Top P but not both.

lean tusk
coarse orchid
hollow crest
lean tusk
# coarse orchid At some point, yes. I would love for people to rate how understandable a transla...

Got it. That makes sense – the conceptual framework of the arena: a (double) blind rating system would be valuable. But because your use case is somewhat specific perhaps it would benefit from its own arena rather than a generic all-encompassing arena. That's something for us to consider in the future – letting the community easily spin up their own arena for a particular task / benchmark. Maybe as a Community Competition?