#Title? whats that?

229 messages · Page 1 of 1 (latest)

left crown
#

long context discussion only

#

but what if we use ollama localhost models?
and create different types of RL environments for llms

#

@shell rock , @slender hearth , @split garnet

#

I tried one of the envs

#

with free available models on open-router , but again it hits rate limits repeatedly

#
    0: ta.agents.OpenRouterAgent(model_name="openai/gpt-oss-20b:free"),
    1: ta.agents.OpenRouterAgent(model_name="qwen/qwen3-coder:free"),
#

this 2

#

this was the final result

shell rock
#

Why is this in research HAhaa

left crown
#

lol , first I thought to put #1052568310765604864

slender hearth
#

Bro making a whole anime episode inner monologue for tic tac toe

slender hearth
left crown
#

will need to create classes for each envs

slender hearth
left crown
slender hearth
#

It still uses the same agent base class

left crown
#

although they will follow same method structure like

def __init__()
def step(self, action)
def reset(self) - reset env
def get_observation(self) - current state
#

I learnt all this, a year ago actually

#

built a custom environment from scratch in numpy for Pong game

#

but for LLMs , RL envs are different, they are only text based

left crown
slender hearth
#

Well yes and you only need to define it once

#

Are you trying to say you need to make different agents

left crown
#

for agentclass ofc yes, but for envs no

left crown
slender hearth
#

That's not what I meant, but I get what you were trying to say now nevermind

#

What are the envs you want to create

left crown
#

sorry for my broken english

slender hearth
#

It's fine

left crown
slender hearth
#

I saw that, i meant which ones do you want to implement

left crown
#

but I am gonna create my own different envs

slender hearth
#

Multiple envs in that list aren't implemented yet either, are you going to try doing those?

left crown
#

I think, they already have those locally I guess, its just they are integrating with library thats why they haven't added those

#

because the ones are not implmented yet, are just ideas and anyone can build quickly

#

but integrating with their library is hard

#
        WORD GUESSING GAME
        -------------------------------------------
        Word list : ['python', 'agent', 'model', 'train', 'reward']
        Previous guesses: []
        Attempts remaining: 6  
        -------------------------------------------
        Format your response as: GUESS: [your_word]
        Example: GUESS: python

        Your turn:

#

I am actually testing mine, with this sample env

#

its simply : word guessing game

#

this will be act as a prompt to any LLM, and it will respond with that specific format
also I guess we can add system prompts too with ollama

#

also one thing

#

do you guys think, this is due to prompt or its just model is stupid

#

@slender hearth @split garnet

#

@shell rock stop playing games man

split garnet
#

wtf is that prompt even

split garnet
#

Plus smollm is probably small and stupid

#

small models are incredibly dumb

#

like unfathomably

#

especially when it comes to output formatting

left crown
#

lemme try smolm 1B

#

again giving me python program

left crown
#

like the example of "strawberry"

split garnet
#

well if the goal is to see if they do, you can see that they don't

left crown
#

which means model is not good

split garnet
#

if the goal is to train them, then you need to do that

left crown
split garnet
#

yeah no surprise here

left crown
#

just testing

#

evaluating

split garnet
#

but also give more examples to the LLM

left crown
#

getting good speed

split garnet
#

something like

<example>
  <user>
    Word list : ['a', 'b', 'c', 'd', 'e']
    Previous guesses: []
    Attempts remaining: 6
  </user>
  <assistant>GUESS: a</assistant>
  <user>
    Word list : ['b', 'c', 'd', 'e']
    Previous guesses: ['a']
    Attempts remaining: 5
  </user>
  <assistant>GUESS: d</assistant>
  ...
</example>
#

idk if they understand xml well or if it will start spewing xml out lol

left crown
#

wtff man, look at this insane model

left crown
#

I think, our prompt is good ( for this env ) , sure it can be improved, but it tells we are on right track

#

prompt is the thing

left crown
#

anyone wondering about code

#

its good now I guess to create "GUess the number" env

left crown
#

new ideas

slender hearth
#

What is the model supposed to use in its guess

#

What clues can it use? What questions can it ask?

left crown
left crown
#

Progress as of now!!
started making GuessTheNumber env
getting good results, I mean model is thinking correctly

#
        return f"""
        NUMBER GUESSING GAME
        -----------------------------------------------
        This is a number guessing game, where you have to guess a number initially
        and then you will get hints such as ( lower / higher )
        ------------------------------------------------
        Number range : from 1 to 20
        Your previous guesses : {self.guesses}
        corresponding hints : {self.hints}
        Attempts remaining : {self.attempts_left}
        -----------------------------------------------
        Format your response as : GUESS : [your number]
        consider below examples for making responses :
        GUESS : 18
        GUESS : 10
        ------------------------------------------------
        Your turn:
        """
left crown
#

yo @shell rock focus here to after playing games

#

you can suggest me envs to create

left crown
#

I think this is good way to structure the prompt

#

and also look at how model is thinking, its deepseek 7B I guess, good enough!

#

qwen models are even far better than llama

shell rock
left crown
#

lets call it TinyArenas

left crown
#

this is how I am just printing currently

#

also, @split garnet@slender hearth , our GuessTheNumber is ready now!
its working

I guess I will publish as pypi project now, so that you guys can also use that, otherwise if you want quickly I can send the code as it is

#

also which GPU do you have??

#

because I can only run models under 10B

#

suprisingly, even 1B/3B models + reasoning are not doing well on this simple task, but above 5B models are performing well

left crown
#

@gray geyser

#

speedrun the thread first of all

#

maybe lemme release my first version then

#

you will get idea what I am building

gray geyser
#

Bro you have to give me some time I’ll be back with all related information then I’ll give you an proper response

left crown
#

@split garnet @slender hearth

#

also @gray geyser if you dont have to read full conversation now, just watch the video

gray geyser
gray geyser
#

so you trying to make an arena where lllms perform in loop and with custom prompts you trying to see their reasoning and problem solving capabilities

#

@left crown

left crown
#

Open source RL environments for llms

#

To see how they perform

gray geyser
#

tbh this is not the way to see their reasoning and problem solving capabilities imo you guys can correct me if im wrong this is an nlp type game and more like a pattern matching rather than other capabilities

#

llm will see it as nlp way even if its in rl

#

so problem should be more complex or in a sim if possible that way for rl

left crown
gray geyser
left crown
left crown
#

update : now adding reward function too

#

so LLM can see there previous decisions + rewards and can take action

#

Qwen 3:4B model is best model so far

#

along with deepseek

left crown
#

they are providing compute if you have good idea for Env
@split garnet what do you think?

#

but we neeed a goood novel idea

#

the applications are open, they just ask about your idea thats it

split garnet
#

e.g. regex matching

#
  • CFG matching
  • computing levenshtein distance
  • constraint solvers (OR-tools)
  • planners (PDDL)
  • anything involving search
#

not search through a maze but any NP-complete problem (NP-hard are quite a bit more annoying, except SAT)

#

I wouldn't necessarily give SAT to an LLM but SMT-style problems would be great (you can use z3)

left crown
#

this are the examples, I mean current envs they have

#

there are some envs here, which are using textarena lol

left crown
#

new env coming

#

past 3 days were horrible

#

@ -> current player pos
E -> end position to achive

#

actions -> [up, down, right, left ]

#

# are obstacles

left crown
#

@split garnet

#

like this boxes

#

we will create same in /env

#

/ will be landing page all

#

/docs , but do you think plain html is good for docs

#

there are premade templates also which can be used for docs

split garnet
#

you can make it with htmx easily

#

if you don't want to use nextjs and server-side rendering

left crown
#

I am gonna buy a domain

#

and host everything on vercel

#

@split garnet

split garnet
#

sure

#

vercel because it's cheap?

left crown
#

I just need to buy domain

split garnet
#

I am a huge domain hoarder

#

I have maybe 10 I don't use

left crown
#

nice

split garnet
#

lots of okay-ish .ai domains

left crown
#

but expensive

#

.pro ?

#

.xyz is also available

#

.comis too

left crown
#

envarena-cli --env GuessTheNumber --model ollama:llama3.2:3b

#

CLI tool is ready btw

left crown
left crown
#

@split garnet

#

I am using react btw

#

in left - right empty spaces , I am gonna add floating envs icons

left crown
left crown
left crown
#

at bottom of website, I am thinking to add a product demo video

#

interactive

gray geyser
#

Hlo

#

Is it functional now?

left crown
#

I haven't published yet

left crown
split garnet
#

it looks great

left crown
split garnet
#

nextjs is better, you want something server-rendered ideally

left crown
split garnet
#

nope

left crown
#

shiiitttt

#

thats why some react apps with high animations looks bloated

#

but for this, I have used Magic UI components

#

but still , I guess our current site is simple , so will migrate later

split garnet
#

but for SEO I think it is better to do server-rendered pages

#

but performance stuff is a lot less obvious

#

moving everything to the server means more load on the server, more bandwidth

split garnet
#

server side rendering with complex components means calls to the backend on every click, every hover, etc

#

anyway, my recommendation is server rendering for the landing page and react for everything else

left crown
#

will do that then

left crown
#

new env

slender hearth
#

With them

left crown
#

long time no see

#

will back on grind from Monday