I think I’ve found a much better way to evaluate an LLM’s true intelligence | Mistral AI | Page 1

crisp hawk Sep 8, 2024, 1:16 AM

#

@gentle harbor we dont support self promos
op promots the twitter
I think I’ve found a much better way to evaluate an AI’s intelligence. Please read my tweet, and tell me if my testing method is a valid testing method.
Please seriously consider this.
Greetings from the Netherlands"
https://gist.github.com/Evertt/917a6fa2b4a4935f370e59f11d3789f2

#

"Namely, by creating a fictional narrative in which the AI has the accomplish some kind of goal," not new .. in sillytavern with goals since ages

#

and thats just prompting

#

thats reads like a lot of gpt

#

@gentle harbor take a look at https://ai.meta.com/research/cicero/diplomacy/

#

its over 2 years old

gentle harbor Sep 8, 2024, 4:42 PM

#

@crisp hawk Yeah, but has it been tried with an intelligent and cunning human playing a character actively trying to prevent the LLM from achieving its mission while deceitfully trying to convince the LLM that he's an ally. And also, to write the narrative such that from the AI's perspective, it appears that if the human character IS trustworthy, that then it'll need his knowledge and skills to achieve its mission.

Basically like a spy movie plot. If the LLM can truly outsmart the cunning backstabbing human, that would be a real sign of intelligence to me.

gentle harbor Sep 8, 2024, 4:49 PM

#

crisp hawk <@444024298815094785> take a look at https://ai.meta.com/research/cicero/diploma...

Never mind, I guess this does cover it. But why is this not on all the benchmark boards then?

crisp hawk Sep 8, 2024, 8:38 PM

#

maybe its deemed to be a bad idea ..

#I think I’ve found a much better way to evaluate an LLM’s true intelligence