#Title? whats that?
229 messages · Page 1 of 1 (latest)
https://www.textarena.ai/environments
here is RL for LLMs to test them, now this textarena uses openrouter for models to load
but what if we use ollama localhost models?
and create different types of RL environments for llms
more context here -> https://x.com/karpathy/status/1960803117689397543
@shell rock , @slender hearth , @split garnet
I tried one of the envs
with free available models on open-router , but again it hits rate limits repeatedly
0: ta.agents.OpenRouterAgent(model_name="openai/gpt-oss-20b:free"),
1: ta.agents.OpenRouterAgent(model_name="qwen/qwen3-coder:free"),
this 2
this was the final result
Why is this in research 
lol , first I thought to put #1052568310765604864
Seeing all that yapping irks me
Bro making a whole anime episode inner monologue for tic tac toe
I'm at the gym I'll look when i get home
Why? Don't they all use the same?
each RL envs will work differently
It still uses the same agent base class
although they will follow same method structure like
def __init__()
def step(self, action)
def reset(self) - reset env
def get_observation(self) - current state
I learnt all this, a year ago actually
built a custom environment from scratch in numpy for Pong game
but for LLMs , RL envs are different, they are only text based
agent base class is for ollama client, to just communicate with local models
Well yes and you only need to define it once
Are you trying to say you need to make different agents
for agentclass ofc yes, but for envs no
nah, we will just create a wrapper class named as OllamaAgent and then will create instances of different models
That's not what I meant, but I get what you were trying to say now nevermind
What are the envs you want to create
sorry for my broken english
It's fine
will create multiple types of it
if you want some example check here : https://www.textarena.ai/environments
I saw that, i meant which ones do you want to implement
but I am gonna create my own different envs
Multiple envs in that list aren't implemented yet either, are you going to try doing those?
I think, they already have those locally I guess, its just they are integrating with library thats why they haven't added those
because the ones are not implmented yet, are just ideas and anyone can build quickly
but integrating with their library is hard
WORD GUESSING GAME
-------------------------------------------
Word list : ['python', 'agent', 'model', 'train', 'reward']
Previous guesses: []
Attempts remaining: 6
-------------------------------------------
Format your response as: GUESS: [your_word]
Example: GUESS: python
Your turn:
I am actually testing mine, with this sample env
its simply : word guessing game
this will be act as a prompt to any LLM, and it will respond with that specific format
also I guess we can add system prompts too with ollama
also one thing
do you guys think, this is due to prompt or its just model is stupid
@slender hearth @split garnet
@shell rock stop playing games man
prompt
wtf is that prompt even
Plus smollm is probably small and stupid
small models are incredibly dumb
like unfathomably
especially when it comes to output formatting
lemme try smolm 1B
lol, even https://ollama.com/library/smollm 1B failed
again giving me python program
but again, thats what we are trying to achieve right!!
to see if models are performing well on reasoning tasks
like the example of "strawberry"
well if the goal is to see if they do, you can see that they don't
which means model is not good
if the goal is to train them, then you need to do that
no we are not training
yeah no surprise here
but also give more examples to the LLM
getting good speed
something like
<example>
<user>
Word list : ['a', 'b', 'c', 'd', 'e']
Previous guesses: []
Attempts remaining: 6
</user>
<assistant>GUESS: a</assistant>
<user>
Word list : ['b', 'c', 'd', 'e']
Previous guesses: ['a']
Attempts remaining: 5
</user>
<assistant>GUESS: d</assistant>
...
</example>
idk if they understand xml well or if it will start spewing xml out lol
wtff man, look at this insane model
I think there is also different techniques called as prompt injecting
but that will work good with bigger models
I think, our prompt is good ( for this env ) , sure it can be improved, but it tells we are on right track
prompt is the thing
What is this game even about
What is the model supposed to use in its guess
What clues can it use? What questions can it ask?
this is just a demo actually
Progress as of now!!
started making GuessTheNumber env
getting good results, I mean model is thinking correctly
return f"""
NUMBER GUESSING GAME
-----------------------------------------------
This is a number guessing game, where you have to guess a number initially
and then you will get hints such as ( lower / higher )
------------------------------------------------
Number range : from 1 to 20
Your previous guesses : {self.guesses}
corresponding hints : {self.hints}
Attempts remaining : {self.attempts_left}
-----------------------------------------------
Format your response as : GUESS : [your number]
consider below examples for making responses :
GUESS : 18
GUESS : 10
------------------------------------------------
Your turn:
"""
I think this is good way to structure the prompt
and also look at how model is thinking, its deepseek 7B I guess, good enough!
qwen models are even far better than llama
I have negative interest in prompting
lets call it TinyArenas
BRUHH, dont consider it prompting!!
just tell me how can I output the final results!
YOU WON BABY!!
done in :- 5 attempts
this is how I am just printing currently
also, @split garnet@slender hearth , our GuessTheNumber is ready now!
its working
I guess I will publish as pypi project now, so that you guys can also use that, otherwise if you want quickly I can send the code as it is
also which GPU do you have??
because I can only run models under 10B
suprisingly, even 1B/3B models + reasoning are not doing well on this simple task, but above 5B models are performing well
@gray geyser
speedrun the thread first of all
maybe lemme release my first version then
or @gray geyser read this https://www.textarena.ai/
you will get idea what I am building
Bro you have to give me some time I’ll be back with all related information then I’ll give you an proper response
@split garnet @slender hearth
also @gray geyser if you dont have to read full conversation now, just watch the video
Yes I’m just going back home
so you trying to make an arena where lllms perform in loop and with custom prompts you trying to see their reasoning and problem solving capabilities
@left crown
Yup
Open source RL environments for llms
To see how they perform
tbh this is not the way to see their reasoning and problem solving capabilities imo you guys can correct me if im wrong this is an nlp type game and more like a pattern matching rather than other capabilities
llm will see it as nlp way even if its in rl
so problem should be more complex or in a sim if possible that way for rl
Yes, this just example of one env
What’s the full scope
In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from.
In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit
update : now adding reward function too
so LLM can see there previous decisions + rewards and can take action
Qwen 3:4B model is best model so far
along with deepseek
they are providing compute if you have good idea for Env
@split garnet what do you think?
but we neeed a goood novel idea
the applications are open, they just ask about your idea thats it
take any fairly complex algorithm and use that as the base
e.g. regex matching
- CFG matching
- computing levenshtein distance
- constraint solvers (OR-tools)
- planners (PDDL)
- anything involving search
not search through a maze but any NP-complete problem (NP-hard are quite a bit more annoying, except SAT)
I wouldn't necessarily give SAT to an LLM but SMT-style problems would be great (you can use z3)
this are the examples, I mean current envs they have
there are some envs here, which are using textarena lol
new env coming
past 3 days were horrible
@ -> current player pos
E -> end position to achive
actions -> [up, down, right, left ]
# are obstacles
@split garnet
like this boxes
we will create same in /env
/ will be landing page all
/docs , but do you think plain html is good for docs
there are premade templates also which can be used for docs
you can make it with htmx easily
if you don't want to use nextjs and server-side rendering
nice
this is good
but expensive
.pro ?
.xyz is also available
.comis too
@split garnet
I am using react btw
in left - right empty spaces , I am gonna add floating envs icons
it looks great
I literally learnt React by doing this project
just make sure the landing page isn't react
nextjs is better, you want something server-rendered ideally
react dont render in server side?
nope
shiiitttt
thats why some react apps with high animations looks bloated
but for this, I have used Magic UI components
but still , I guess our current site is simple , so will migrate later
it's not a simple or obvious tradeoff tbh
but for SEO I think it is better to do server-rendered pages
but performance stuff is a lot less obvious
moving everything to the server means more load on the server, more bandwidth
server side rendering with complex components means calls to the backend on every click, every hover, etc
anyway, my recommendation is server rendering for the landing page and react for everything else
will do that then
With them

