#[WIP] Neuro's Desktop (An integration for letting neuro use a computer)

1 messages ยท Page 4 of 1

plucky marlin
#

while keeping my changes too

#

coool

sterile tundra
wheat reef
#

Along with server sending non neuro messages to integrations (which they might not understand)

#

Actually uhmmm, neuro relay tests should probably be updated

plucky marlin
#

busy week neuroBwaa cant do much yet

timber basin
#

I was procastinating on this, actually...

plucky marlin
#

neurOMEGALUL yeah

#

same

#

next week is work trip so i will not have my pc available

timber basin
#

Sad

wheat reef
#

I'll work on this today

#

Hopefully

steel dagger
#

CAUGHT

timber basin
wheat reef
#

yea!

mystic flame
#

Would it send back information to Neuro to let her know what she did at least?

mystic flame
wheat reef
#

I think Nakurity implemented a vision system for neuro, using ocr

wheat reef
#

Since ocr is kinda bad rn

#

With ocr, it depends onnthe machine running the integratioj to be beefy

timber basin
#

I'll get back to working on this now

#

I kinda have an idea for a new integration design

burnt grove
wheat reef
#

I should note, that anything that needs our attention. Ping me or nakurity here. Ty!

#

@timber basin

#

Its your repo

#

Not the main one, why'd u move it

#

Sigh

timber basin
#

srry

torn adder
#

Is giving the twins access to the entirety of windows a good idea? Like even if there are safe guards, I feel like Neuro or Evil would find a way to bypass it.

wheat reef
wheat reef
#

Yea

pallid ruin
#

Evil or Nuero crashing Microsoft Defender or an anti-virus is a scary thought.

wheat reef
#

they use windows in the way we do. Using keyboard and mouse. They have access to these two basic inputs. And I assume Naku plans to add integrations to this integration that will allow neuro / evil to have an higher abstract control of an specific windows application.

They do not have access to:

  • the windows underlying processes
  • the windows codebase
  • the ability to crash an application, unless us normal users can do it (thru a bug)
  • the ability to solve captchas (but this depends on the algrothim we end up using for mouse movement)
wheat reef
#

but they won't crash it

steel dagger
# wheat reef may I add, that they cannot do that?

don't worry, there still needs to be an explanation for some people about the vscode extension that they cannot just access git(/github for that matter) or execute arbitrary commands at will

it's kinda a thing with these kinds of integrations, almost like clickbait, but we aren't trying

wheat reef
#

true

wheat reef
#

I honestly have no idea what to do with this as of rn

#

I got some free time, to work on this

#

but the state of this project is quite a mess

#

should I just rewrite it? wdyt, KTrain

#

@steel dagger

steel dagger
#

I mean, if it helps you read easier, you should rewrite it, but remember to PR first so that you don't have 20 merge conflicts

wheat reef
#

@timber basin what do you think? (not the full screenshot, but should give u an idea of the refactor)

steel dagger
#

there ain't no way there is gonna be 4 langs in one repo

wheat reef
#

what would be wrong with that, other than just trolling with the language usages :D

steel dagger
#

it's just that

#

most tooling is not very well designed for that

#

I wish it was

#

but it really is nnot

steel dagger
#

idk, kinda hard to say

wheat reef
#

eehhhh, I already went with it. I'm not gonna rewrite it a second time

wheat reef
#

or anyone else, could- uhm. maybe help?

steel dagger
#

sorry, am not skilled enough to compile code across languages (or whatever it is they call it)

wheat reef
steel dagger
#

nope

#

why do you need to bundle embedded py tho?

wheat reef
#

to run the python packages

#

otherwise you'd need python installed on the host system to run the application

steel dagger
#

well idk

#

there's something called python-build-standalone that I think helps for your case

wheat reef
#

winpython- ooohhh, I remember that now

#

somewhere on sourceforge

#

I forgot about winpython

#

but my problem was site-packages

#

hmmm

#

@timber basin you know python right? (you coded the whole app in python last time)

#

that's probably enough for today

#

I'ma commit this to my fork. and go sleep. Naku, whereever u are, take over the code please

timber basin
#

mmphm, okay.

timber basin
#

I'll try to figure it out tho

wheat reef
#

goodluck I guess, I still can't get python to run

timber basin
#

thanks

timber basin
wheat reef
#

I really want this to be a rainbow D:

#

reason the code is all in the desktop folder, is because I'm planning to make an CLI version too

#

two modes, desktop user interface for the swarm to see (what neuro is doing (content)). CLI mode for neuro to use, like far in the future (if she even gets access, or becomes sentient enough to continue development on herself)

#

probably should make a dev branch for all of this, instead of pushing to master

wheat reef
#

uhmmmmmmmmm- wait- @timber basin I didn't know I was using your codespaces... uhmmm sorry...
Why was your account even logged into my tablet, I'm confused.

steel dagger
timber basin
#

oh, I forgot to log out, when I last used it.

#

Your tablet's just faster than my phone, plus- bigger screen yk?

wheat reef
#

I didn't notice because, I was commiting to my own repository

#

but whatever, next time please log out

timber basin
#

okay

#

srry

steel dagger
#

why don't you use guest mode or wtv

timber basin
#

There's... guest mode on chrome?

#

I don't think chrome has guest mode for mobile

wheat reef
timber basin
#

I didn't know...

timber basin
wheat reef
#

yes. there is

#

code rabbit is reviewing it

#

who added that AI anyway, I forgot

#

that's enough for me today. I'll go do smth else

wheat reef
steel dagger
#

oh btw can I ask how independent neuro-relay is from neuro-desktop

wheat reef
#

I'll rewrite that too

#

cause that's also a mess

#

Neuro Relay should be rewritten tbh

timber basin
#

Oh HI @plucky marlin

#

Hru?

plucky marlin
timber basin
plucky marlin
timber basin
wheat reef
#

I will need some help aswell

wheat reef
#

it's a multi-language codebase for a reason

timber basin
#

I guess so...

wheat reef
#

yea

wheat reef
#

I'm going around adding changes suggested by coderabbitai rn

wheat reef
#

Idk what I did today

#

Should have I implemented the integration code in python?

#

idk, but rust is so confusing for me.

#

first time using rust btw

steel dagger
#

did you guys end up doing any neuro api communication code in go btw?

wheat reef
#

I'm thinking if I wanna try again, I probably will just make the go code its own binary

#

and we communicate using text files

#

:D

wheat reef
#

cause I don't think there's an neuro sdk for go.

#

(couldn't find any)

steel dagger
steel dagger
wheat reef
#

gimme some time

steel dagger
wheat reef
#

1 moment

#

since I didn't manage to implement the integration in Go, the code is not tested btw

steel dagger
#

fair enough

wheat reef
#

use v2 for better functionality. v1 was the version I had in neuro-desktop

#

v2 is way better

steel dagger
#

thanks

wheat reef
#

np

wheat reef
#

[WIP] Neuro Desktop Integration

#

[WIP] Neuro's Desktop (An integration for letting neuro use a computer)

steel dagger
# wheat reef May I ask what did you need it for?

as mentioned above I was considering/wanting to publish a Go SDK for uses and was having trouble, but also I'm considering switching neurontainer's backend to use Go instead because the Docker Go SDK is more complete than the Docker TypeScript SDK

#

it's kind of weird because normally no games use go for their lang so for a while there isn't a go sdk

wheat reef
#

ooh

steel dagger
#

yeah

#

like pretty much any tool integration either uses python (most) or go (none) and I'm not sure if neurontainer should join that crowd yet

wheat reef
#

I give up on implementing the integration code in Rust

#

I'm moving back to using Go

wheat reef
#

I finally wrote the neuro integration

#

It is semi functional

#

pressing a key doesn't work rn, it's something with the action handling

#

but moving a mouse works!

#

but for some reason it's also executing all previous actions, in the action history, sigh

steel dagger
#

recursive execution?

wheat reef
# steel dagger recursive execution?

No-, the monitor (the system I made for providing neuro with information happening on the computer) keeps track of an action history, and when I implemented the system in Rust to interact with the pc. I found that, thru testing. Previous actions that happened before the current action seems to also be executed.

It might be a bug with me not clearing the action queue. But I'm also not sure if its related to DesktopMonitor's action history

#

Wait- I think it might be a problem with the Action Queue persisting. Since it needs to be manually cleared after an execution. This behavior is so that macros can be used / an implementation for saving an macro for doing something like opening notepad

steel dagger
#

ah

#

yeah a macro system is really good

wheat reef
#

I was thinking of implementing a default macro set. And ones that neuro can add / a UI for vedal to add ones specific to how the PC is configured

steel dagger
#

you guys should also implement a give_cookie action

#

or like get_cookie action

wheat reef
#

Is that for the funnsies?

#

Or should it have an actual use case?

#

I remember seeing you and others implementing that for neuropilot

steel dagger
wheat reef
#

@timber basin could you help me in fixing the powershell scripts? #programming

timber basin
#

okay

#

is everything commited?

wheat reef
#

yeah

timber basin
#

I'll go checkout your repository

wheat reef
#

aside from the powershell scripts, neuro desktop works-! And it bundles correctly for a release ver.

wheat reef
#

Right now I'm adding a handle for the Proposal stuff

#

cause why not future proof

wheat reef
#

Also added macros

#

will be fixing the issues where the mouse won't click, and the key press doesn't work later

wheat reef
#

Removed the broken actions

#

now it works perfectly

#

Everything now works correctly

#

I just need to implement something like vision now

#

oops, sorry @timber basin I accidentally merged into the org repo. I'll revert that

#

Ths integration obviously need it's own PC to let neuro use.

#

Sooo, mmphm

steel dagger
#

neuro-desktop .iso build NeuroPoggers

wheat reef
#

it works now :D

#

I don't have an openai key to test it

#
cd desktop
make run

to build and run it

#

wait it doesn't work

#

lemme fix that

steel dagger
#

why do you need an openai key to test

wheat reef
#

I need an actual AI to test it

#

and possibly a second pc

#

but I'll take my chances that I won't lose control with AI spam

steel dagger
#

why dont you just

#

use tony

wheat reef
#

but I need to test the praticality

#

with jippity

#

to see if the AI would actually understand what to do

steel dagger
#

ah

wheat reef
#

so... I was thinking about this message... hmmm, maybe an .iso file would be best for that. Wdyt?
thought that would I mean I'd have to rewrite the entire project again.

steel dagger
#

idk how you bundle apps into windows isos, but you could build an iso with all apps bundled, idt you need to rewrite the entire project for that?

wheat reef
steel dagger
#

I mean... you could also do that, if you want to offer a permanent neuro-windows

#

but I would take the bundled inside route

#

and make it so that the integration starts on startup by default neuroTomfoolery

wheat reef
#

that's a great idea!

#

but microsoft.. sigh

#

where is that evil neuro emoji where her head was bonked into a ravine shape?

steel dagger
#

uhhh what

#

or just distribute linux isos instead vedalBread

wheat reef
#

I'll add the vision stuff tmr

wheat reef
#

Nakurity is slacking on this project, whyyy?

wheat reef
#

Btw @steel dagger do you think the python library pyautogui's screen coordinates depend on how big the screen is?

#

could be neuro's excuse to get vedal to get her a good monitor xD

steel dagger
wheat reef
#

oh, okay

steel dagger
#

I mean also when is that never the case neurOMEGALUL

steel dagger
wheat reef
#

yea. that was my first thought about that question lol

wheat reef
steel dagger
#

:PagMan:

wheat reef
wheat reef
#

Sadly, I'll have to make neuro-desktop windows only for now

wheat reef
#

also @timber basin

#

it could be better with fluidness ig

steel dagger
#

feels kinda slow but otherwise ig it's fine

wheat reef
#

yea ik

wheat reef
steel dagger
#

no sorry

#

tho admittedly probably bc I'm tired

jolly dove
#

how are you doing it now? it looks like you're doing some kind of "accelerate in the direction of the target" thing?

wheat reef
wheat reef
wheat reef
jolly dove
#

i think modifying your fade curve so it does a snappier curve, spending more time at the ends and quickly moving at some point would make it feel less slow motion. or just speeding it up generally. but pretty cool.

wheat reef
#

Btw @steel dagger I'm updating the Go SDK. Have you used it / there might be some game breaking changes.
Asking If I would need to archive the v2 / v1 files.

steel dagger
#

haven't used it yet, but I thought you were using rust for impl now?

wheat reef
steel dagger
wheat reef
#

am using IPC for communication between the two binaries now (linking is too hard, exec bash cmd better)

steel dagger
#

how about we both just work on a general-purpose go sdk instead?

wheat reef
#

IPC .json file btw

#

sure!

steel dagger
#

I have a private repo for it because I was trying to figure out both the server and client portions of the api

#

lemme just invite you rq

#

done

wheat reef
wheat reef
#

this was the refractor pr to my main branch from dev

#

wdyt?

steel dagger
#

idk what a refractor is but uhh sure that seems good

wheat reef
#

like anything, that changes a bunch of core features

steel dagger
#

that's a refactor

wheat reef
#

oh, I misspelled it then oopsies

#

I'll leave the rewrite for Neuro Desktop to use the new SDK to @timber basin ... I made a mess.. and I am just gonna leave it. hehe :3

steel dagger
#

Can't wait to see the next claude code crashout

#

/j

wheat reef
#

I feel like GPT5.2 doesn't know anything at all. And just knows enough to seem intelligent, but it fails at literally everything

steel dagger
wheat reef
#

it will 87% of the time, mess up your code

#

and forget functions from the original code

#

btw you could tell it was claude code from the codebase?

#

aside from the README

steel dagger
wheat reef
#

ooh

steel dagger
#

it's sometimes incorrect like everything so I always take it with a grain of salt

wheat reef
#

okay

jolly dove
#

Gemini 3 seems ok so far. admittedly that's only been to summarize where in the implementation plan the current code is and describe the next steps and to track down a memory use error in code written by a different agent.

wheat reef
#

gpt5.2 can only plan as well. it knows it can only plan, and if it tries to do the impl. It knows it will fail, so it's decided to plan instead

#

openai seems to have distilled GPT5.2 to a point where it's like worse than GPT3

#

but atleast it can do something, it just fails at coding (from my experience with it so far)

jolly dove
#

making an AI similar to Neuro and Evil. Not as ambitious but similar.

#

looking at what other people are doing.

wheat reef
#

I have finished adding an natural movement pathfinder for mouse control

wheat reef
#

have you considered implementing the neuro backend api as how you can add integrations to your AI?

#

if so, then you could have your AI use alot of the community-made integrations for neuro / evil. since you've implemented the same API that they use

jolly dove
#

yeah, i've thought about it. it would be a "maybe, assuming i can get it working well" thing.

wheat reef
#

what language did you use?

jolly dove
#

I'm still rewriting the client side speech recognition code. this version is going to be more streamlined and actually planned better but the old code was a mess.

#

the first version was all written in python for simplicity but ran into issues with real time audio processing and async code. the AI was running in a llama-server instance.
this version loads the AI code into a process which then starts listening for requests, adding the info to a queue. another thread passes data on the queue into the LLM, in a structured format. the LLM output goes to the action handler which takes speech actions and converts them to TTS (tbd... there's a lot of things the TTS needs that I'm still researching. previous TTS didn't give timing info on output. kind of want that.).
the new version has the main code written in C++ because that's where the LLM libraries want to be.

steel dagger
#

ig it does make some sense though given cpython is written in c/c++

jolly dove
#

the python code could use the native libraries which were written in C++ to access the GPU but I wanted to have the event handler and the LLM in the same process so they had a coherent view of the currently "in process" state and it would be possible to interrupt them.

#

also wanted to be able to update the context dynamically and have multiple models loaded at once so i could keep a "thinking" model and a "speaking" model. that would let them think while they are speaking as well as being able to receive new information and choose whether or not they should stop speaking.

wheat reef
#

debugging why actions still auto execute, when specified not to rn

#

I wonder why this is

#

fixed it

#

:D

wheat reef
jolly dove
#

does it work for clicking the "I'm not a robot" button without being detected as a robot?

steel dagger
#

depends

#

which provider

#

and what is neuro doing

wheat reef
#

so, when filian solved the captcha for neuro. Neuro can obv solve it, but the problem would be that filian's mouse movement is obv in a human like path

#

while I would have to algrothimically mimic that path, using code

#

captchas are smarter than you think, yk?

wheat reef
#

I think I finished 0.0.3b-dev now, it allows for Low Level control of the windows OS without issues

#

@timber basin go review the PR!!!

#

I will be waiting :D

#

meant to reply to this btw

#

also unit tests are needed

wheat reef
wheat reef
wheat reef
#

Also, how would we get vedal to even see this?

wheat reef
#

Might make this, had this idea for so long

wheat reef
#

I have an idea for other AIs to use this integration and basically maintain it thru using this integration. Seems possible / useful :D

#

@jolly dove may I ask do you plan to open-source your AI, and if it supports RT Learning?

#

Completely unrelated btw :D (I totally don't plan to borrow your code)

wheat reef
#

Didnt know you could unregister an action before sending an action result back-, because my thought was that, how could one recieve an result to an action that no linger exists.

#

The more u know

steel dagger
wheat reef
#

I will rewrite neuro-relay to be a integration that provides more utilities than the neuro api would

steel dagger
# wheat reef Didn't know that, ty

I mean it is kinda obvious if you look at the action result spec, but tbh it took a good bit of f'ing around and finding out before I got ahold of the ability to make basic integrations with the api

wheat reef
wheat reef
#

I do also have to fix the neuro integration code mess, written in Go

#

And refactor to actually use the neuro-sdk I made

#

Refactoring neuro-relay will hopefully be easy

steel dagger
#

oh did you publish your go-neuro-sdk yet?

wheat reef
#

Its on github, I tried

#

And it worked with go get

#

Just havent published a proper gh release yet

steel dagger
#

sorry I meant is it listed on the go module thing yet

wheat reef
#

I never used Go before the day I started working on refactoring neuro-desktop

steel dagger
#

ah ok

#

also can you add a license?

#

MIT should be enough

wheat reef
#

If you dont have one in mind, I'll go with MIT

steel dagger
#

and then you can PR it into the vedalai/neuro-sdk repo

wheat reef
#

Oh, i was typing that message before i saw that

#

Okay

#

Added

#

Did you wanted to do smth with the SDK / that's why you wanted me to place a License?

steel dagger
wheat reef
steel dagger
wheat reef
#

Shouldnt have used claude for the README, but I was lazy :p

#

@timber basin it would be nice if you stop ignoring me, and work on neuro-desktop with me?!

wheat reef
#

Might need to slowly migrate this into an full toolchain for neuro to use (and might aswell make it its own OS atp), once we finish the proof of concept

wheat reef
wheat reef
#

will work on this integration later

empty trout
#

I can help

#

So I do not own a windows computer actually at all

#

That being said I quite a few ideas on how to improve some things and also fix some bugs you are having

#

I strongly suggest using Deepseek OCR if not already

#

The model is a bit heavy for CPU users but 100% feasible even with lower end GPUs

#

The model is only ~3b iirc

#

so literally any GPU can run it and any one from the past like 8 years should be able to handle it in real time

#

it's perfect because it can take a screenshot and literally make a markdown file seperating things into groups of text and seperate images (never word by word ๐Ÿคข )

wheat reef
empty trout
#

Comes with Image summarization and also block based OCR

wheat reef
#

I was procastinating on this, because I have ADHD (and I forgot my medicine)

empty trout
#

Tbf I am procrastinating on a lot of important things I probably should be doing too

#

but instead I am working on neuro things

wheat reef
wheat reef
#

our goddess neuro would be proud

empty trout
#

Surely

#

Yeah I can cook something up as soon as I understand the project structure

wheat reef
#

its basically separate into multiple binaries

empty trout
#

I was highkey getting ragebaited by the readme cuz some directories don't exist

wheat reef
#

with each function being its own separate binary

wheat reef
#

that was so long ago

empty trout
#

real

wheat reef
#

but unless you mean nakurity's readme on the main repo

#

this the current latest branch

#

with my impl

#

because honestly, nakurity's impl sucks

empty trout
#

crazy question but why are u the OP but not repo owner

wheat reef
#

but I forked the repo and worked on my own instead

#

its just better to PR stuff

#

instead of working on the main repository

#

which nakurity made a mess on, by doing so

empty trout
#

alr

#

I mean it's not the worst I've seen. Not even close

wheat reef
#

the structure is this

#

new functions are separate binaries

#

we wire into the rust app

#

(desktop/apps/neuro-desktop)

empty trout
#

:based:

#

L emoji fail

wheat reef
#

we forgot about linking in this repo

#

it doesn't exist in my mind

#

/ I couldnt get it to work

empty trout
#

It's been a really long time since I used Rust

wheat reef
#

IPC is better

empty trout
#

but I surely still got it

wheat reef
#

you can still code in any other language

empty trout
#

๐Ÿ’€

wheat reef
#

and I will wire it up to the rust app for u

empty trout
#

bro what

wheat reef
#

the rust app is basically the main entrypoint

#

and other language is allowed

empty trout
#

I made a brainfuck standard library

#

so maybe I can use that

wheat reef
#

except that

empty trout
#

shucks!

wheat reef
#

I am not gonna deal with brainfuck

empty trout
#

Can't imagine why

wheat reef
#

not even AI can help me with brainfuck

empty trout
#

lmao

wheat reef
#

its gonna fuck up both our brains

empty trout
#

Indeed it will

wheat reef
#

go to the dev branch of my repo

#

and you'll see the latest changes there

#

I was gonna PR v0.0.3b-dev without context stuff

#

and just working actions

empty trout
#

Oh I'm fucking stupid the org is "Nakashireyumi"

wheat reef
#

so I'll do that

#

and v0.0.3c-dev will have the stuff that gives neuro / evil context about the desktop

wheat reef
#

very hard, nakurity made that up

empty trout
#

Lol ik. I am Japanese SMILE

wheat reef
#

some ppl just haave trouble pronouncing it, that's all

empty trout
#

I imagine so

wheat reef
#

since it compiles to it's own binary

#

I'll do that

empty trout
#

Are you still having issues with the port being left open?

wheat reef
#

nope

#

that was an issue with the main repo

#

not mine

#

That's nakurity's issue

#

not mine

#

I fixed that by rewriting the whole mess she made

empty trout
#

holy ur repo is a lot easier to read

wheat reef
#

thank you!

#

I worked hard on ensuring that :D

#

doing some cleaning up work on the repo rn

#

May I ask what are you doing now, @empty trout ?

#

I was thinking of making an NN for mouse movement stuff

#

is there already one?

#

I have no idea

empty trout
#

I am reading the code

wheat reef
#

oooh, okay

empty trout
#

Okay so for moving mouse with integration the best thing you can do is to not introduce any additional NNs

wheat reef
#

this is the only NN I planned to add

empty trout
#

twins are capable of doing it themselves given good context engineering

wheat reef
#

cause I wanted mouse movement to pass captchas

wheat reef
empty trout
#

Okay let me try to explain a bit more

wheat reef
#

like move to Point(x=something, y=something)

#

would you know how?

empty trout
#

The twins are 100% capable of moving to a button

#

and navigating websites

#

You need to give it more signal and less noise though

wheat reef
#

and the NN just moves it to the selected place like a human

empty trout
#

no pixel values

wheat reef
#

the twins give the NN a position

#

and the NN moves to it in a human-like way

#

as much as possible

empty trout
#

pyautogui already has things like that

wheat reef
#

since I want them to pass captchas

wheat reef
#

to the place

empty trout
#

captchas actually don't use mouse data as much as you think

wheat reef
#

and it glitches

#

so I was thinking of implementing the mouse control system in C

empty trout
#

they usually just read previous cookie caches and stuff iirc

wheat reef
#

it failed

#

the captcha

#

I tried using tony

empty trout
#

and you were successful as a human?

wheat reef
#

yes

#

sucessful as a human

#

not when using neuro-desktop

empty trout
#

In that case add a gaussian sample

wheat reef
#

oookay

empty trout
#

essentially brown noise

wheat reef
#

I'll follow your lead then

empty trout
#

I can code it in a couple min one sec I'll write u a code segment

wheat reef
#

but I liked my NN tho, but sure!

#

okay!

empty trout
#

NN's are fun

wheat reef
#

implementation in python is okay

empty trout
#

I am an AI researcher afterall

wheat reef
#

yea, it is fun!

#

btw apollo, this file, if you haven't already seen, implements the current mouse pathfinder for neuro / evil to use. When they enter a coordinate to move to

#

I should put a showcase of the application working with Jippity, in the main README ngl

empty trout
#

broooo

#

it's blocking code segments

#

lemme send on pastebin

wheat reef
#

discord is...?

empty trout
#

yeah the discord server

#

I used the bezier curve formula and brownian noise

#

so it will take a curved path with jittering to make it realistic

wheat reef
empty trout
#

can be calculated instantly

#

That'd be really surprising to me but i'll take a look

wheat reef
#

maybe the website that I tested it on, had very strict captchas?

empty trout
#

oh yeah lmao

#

you did actually use a bezier

#

perlin noise instead of brownian

wheat reef
empty trout
#

the sample rate is the issue than

wheat reef
#

probably

empty trout
#

yeah 60 pixels per step is an instant flag

wheat reef
#

I tried increasing the speed

empty trout
#

that's insane

wheat reef
#

to make it look better

empty trout
#

like realllllly high

wheat reef
#

cause it was very slow

#

I move way faster with my mouse ngl

empty trout
#

okay so no no no

#

pyautogui works differently

#

it has a fixed step sample rate

wheat reef
#

oookaayyy

#

yea

#

buutt, I keep on repeatedly calling pyautogui

empty trout
#

so a capcha sees you are teleporting x pixels every certain amount of steps

wheat reef
#

and pixel by pixelly give it the coordinates

#

oooh

empty trout
#

idk if my explanation makes sense

wheat reef
empty trout
#

but that's def why

#

you can add a variable sample rate really easily by adding some time.sleep()'s

#

or no mb

wheat reef
#

mmhpm

empty trout
#

u can just set the pixel distance to 1

#

but increase the sample rate a lot

#

then add some very small sleeps

#

like literally 5ms or lower

#

I can send a pr rq it's really easy

#

but it might fuck some other things up by making it go too fast

#

so more testing is required

#

should be mostly fine tho

wheat reef
empty trout
#

dw the pixel values will still be correct

#

it's just that any hard coded speed things will be different

wheat reef
#

I heard pyautogui's pixel values / coordinates depend on screen size

empty trout
#

cuz pyautogui automatically sleeps for I think... 0.1 seconds

wheat reef
empty trout
#

every mouse library does

wheat reef
#

it means that she has a reason for vedal to get her a good monitor!

#

and a second good pc!

empty trout
#

unironically a worse monitor is actually better

#

lower resolution is better for AIs weirdly

wheat reef
#

because I recommend vedal let neuro use this on a dedicated pc

#

awww

#

sad

empty trout
#

Just means vedal should spoil her with a vintage monitor

#

for like 10k

#

surely

wheat reef
#

ooooh! yay!

empty trout
#

sent a PR

#

I don't have windows so I can't test it myself

#

it MIGHT be like really hard to stop once it's running btw

#

so just be aware

wheat reef
empty trout
#

it's not as easy to hit that top corner as it was before

empty trout
#

I changed the PAUSE constant in pyautogui to 0 instead of the default of 0.1 and changed the pixel step calculation to just always be 1

#

It might still be slow if the failsafe mode forces it to be 0.1

#

and in that case it's cooked either way unless u want to get rid of the failsafe (potentially a bad idea though ๐Ÿ’€ )

#

so we'll just have to see

#

haven't used pyautogui since 2021 so... I kinda forgar

wheat reef
#

forgar

empty trout
#

if it's still slow u can use the windows api mouse mover

#

cuz pyautogui is a bit finicky

#

but easy to use

wheat reef
#

I have no idea what nakurity was on, when coding that

empty trout
#

No it doesn't ๐Ÿ—ฟ

#

did u mean pyautogui uses windows api

wheat reef
wheat reef
#

nakurity vibecoded this drunk or smth

empty trout
#

oh that

#

What else needs to be worked on

wheat reef
#

oooohhh wait- did you mean the actual windows-api?

#

I forgot that existed

empty trout
#

yes ๐Ÿ˜

wheat reef
#

along with abstractions

empty trout
#

Ok Deepseek OCR is literally perfect ur gunna love it

#

it's meant exactly for this

wheat reef
#

from LL (low level) calls to higher abstraction calls

#

like closing a window / fullscreening it

empty trout
#

It's just a bit sad cuz it is adding l*tency

#

but I don't think the NeuroAPI supports sending Vision as context

wheat reef
empty trout
#

so there is literally no choice

wheat reef
#

well unless the neuro api added support for that

#

we will need this

empty trout
#

We literally do not need to hardcode anything

wheat reef
#

that was just an example I cooked up

#

not yet used in the actual code

empty trout
#

You mentioned u wanna use NNs for things

#

now is the time

wheat reef
#

I was just experimenting with it

#

trueeeee

empty trout
#

because you can literally build a detection head to detect all key-points

#

and there will never be any tuning or calibration required

wheat reef
#

I am gathering training data with my mouse movements rn

empty trout
#

I actually think my friend has like 100 hours of training data for this

#

he did something similar before

wheat reef
#

realllyyyy?

empty trout
#

yeah it's insane

wheat reef
#

ooooh!

empty trout
#

lemme check rq

#

Oh he didn't even need any complicated AI things for it

#

standard machine vision stuff

#

pattern matching kernels

wheat reef
#

waaaaaaaaaa

empty trout
#

We can use a CNN if u want tho

#

just gotta be careful with l*tency

wheat reef
#

I thought we were gonna use Deepseek OCR?

empty trout
#

I'm not 100% sure if Deepseek OCR can detect things like close buttons

#

I never used it myself

#

but It's perfect for analyzing text on a website

#

because it makes blocks of text inside a textbox instead of word by word

#

and can detect images and isolate them

#

but icons are a tiny but tricker I assume (idk cuz I never used it tho)

#

so worst case we could make a lightweight vision model that detects that stuff

wheat reef
#

will try it out with discord as an image

empty trout
#

Good idea

wheat reef
#

I did not get a result

#

it errored?

empty trout
#

mine is still processing

#

oh I'm dumb

#

hol on

#

So yeah u can't give it too much data at a time I'm pretty sure

#

cuz there is just sooo much text formatted in a weird way

#

the raw OCR worked but it missed a lot

wheat reef
#

oooh

empty trout
#

but what it can do is convert the screenshot into vision tokens

#

I think

#

I'm reading the paper rn

#

bruh website is cheeks one sec

#

Ima run it on my laptop

wheat reef
#

ookaayyy

empty trout
#

I think we would have to zoom in a LOT for neuro to see it well

#

cuz it can identify images and text but only if it's big really

#

like it saw all of the listed users on the right on the discord

#

not the channel names though ๐Ÿ’€

#

and it gives coords for all images

#

Best part is neuro can prompt Deepseek OCR to only get the important parts

#

she only has to say something like "Tell me what is in this image"

#

or "Locate the X button"

#

it can be fancy too like "Locate the call button near the top of the screen next to the video call button"

#

and it'll give a coord pair

#

ok uh ๐Ÿ’€ ๐Ÿ’€

#
>>> "/home/anon/Pictures/Screenshots/Screenshot_2026-01-04-04-32-29_1920x1080.png\n<|grounding|
... >Given the layout of the image. Locate the pin icon"
Added image '/home/anon/Pictures/Screenshots/Screenshot_2026-01-04-04-32-29_1920x1080.png'


<|ref|>image<|/ref|><|det|>[[0, 0, 999, 999]]<|/det|>
#

I think it's cooked

#

lemme see if cropping helps ๐Ÿ˜ญ

#

ima delete and resend image so it doesn't get confusing one sec

#

this is 1024x1024

#

surely that fixes it

wheat reef
#

surely

#

(I just got back from eating sushi, yummmy!)

#

I just tested the mouse thingy

#

it worked

#

now I'll merge the pr

#

wait- no

empty trout
#

Actually?

#

is it good?

wheat reef
#

the rust binary did not work

#

but the python one worked

empty trout
#

oh

wheat reef
#

I was doing the last check to see if it compiles

#

you pulled from the master branch

#

and not the dev branch

empty trout
#

oh mb

wheat reef
#

the master branch has outdated changes

empty trout
#

didn't see the dev branch

#

Dumb question but how important is seeing text for this project?

wheat reef
#

very important

empty trout
#

Because there is a real-time AI model that can say every single object in an image

wheat reef
#

because I'll also use it for registering disposable higher abstraction actions

empty trout
#

but can't really read text

#

yeah I see

wheat reef
#

when neuro / evil enables it

#

and I will use an LLM to determine those action stuff sometime too, most of the time, it'll be the algrothim registering it

empty trout
#

Holy we might be dumb

#

we as in me

wheat reef
#

wdym?

empty trout
#

Since it's interfacing with HTML

#

there are text inside the HTML tags bruh

#

we don't need OCR

wheat reef
#

THAT IS GENIUS

#

I am also dumb

#

but what about other games?

#

other apps

#

neuro doesn't even use discord

empty trout
#

Oh I forgot that this is for the whole computer not just web browsing

#

yeah that'd be an issue

wheat reef
#

unless we implement a user interface for neuro's discord stuff

empty trout
#

I feel like surely Windows has some kind of accessability thing

wheat reef
#

yea but games with custom render engines won't show the UI

empty trout
#

where it can ...do stuff

wheat reef
#

to windows

empty trout
#

hm yeah

wheat reef
#

OCR is best if you think about it

empty trout
#

Yeah

wheat reef
#

did you need me to fix the PR?

#

or will you fix it?

empty trout
#

This is what yolo-world looks like

#

without prompting

#

it just names and locates every object

#

and it's in real time

#

BUT

#

we can make it detect "text"

#

then we crop that image bit into the raw OCR model (deepseek might be overkill for this)

#

and it can just read what it says and we will have a bounding box of where it is too

wheat reef
#

great idea!!

empty trout
#

Then yeah we can send a context message to neuro listing everything on the monitor

wheat reef
#

yea

empty trout
#

like object names

wheat reef
#

mmhpm

empty trout
#

and she can "investigate" something ig

wheat reef
#

I'll go start on that now

empty trout
#

and search

wheat reef
#

what lang should we use for the deepseek OCR impl?

empty trout
empty trout
#

cuz it might not even be needed potentially

#

might be overkill

wheat reef
#

yes, and object bounding box. and the screen region they are in

#

icon recognition too

#

button recon too

#

and alot of other things

#

like, everything on the pc possible

empty trout
#

so I am thinking of using YOLO-world for the object naming

#

which has a python api

wheat reef
#

hmmm that's a good idea

empty trout
#

the text reading would be seperate

#

which is kinda annoying

#

but wtv

wheat reef
#

we could just make it a binary

#

and have something call for it

#

that's how this codebase works

#

everything is its own binary

#

whenever we need to compile smth

empty trout
#

which if u wanna use deepseek ocr then i'd reccomend either using an ollama server of vllm (vllm is generally faster but a bit more annoying to setup correctly)

empty trout
wheat reef
#

do you think we'd need to separate the integration into a server-client model?

#

since I assume vedal would prefer all his LLMs in a separate server (e.g. the server neuro runs on)

empty trout
#

wait my PR looks like it is on the dev branch

wheat reef
#

our integration would already need its own pc for neuro / evil to use

empty trout
#

am I trippin?

wheat reef
#

this was an issue only found on the master branch of my repo I think

steel dagger
empty trout
#

Vedal is going to lose his shit if he sees this btw

#

not in a good way ๐Ÿ˜ตโ€๐Ÿ’ซ

wheat reef
#

why tho?

empty trout
#

cuz l*tency might be insane

#

2 AI models ๐Ÿ—ฟ

wheat reef
#

it's nooottttt

#

trust

empty trout
#

also hi Ktrain

wheat reef
#

hi KTrain!

steel dagger
wheat reef
empty trout
#

it's insanely cooked rn icl

#

but uh

steel dagger
#

also the fact that everything is its own binary is gonna drive him insane

empty trout
#

oh yeah

#

holy recompiling

steel dagger
#

like imagine trying to start the integration up and one of the binaries crash

wheat reef
#

I spent like 2 days trying to figure out how to link Go with rust, and gave up

steel dagger
#

and then now you gotta restart the entire sequence

wheat reef
#

so this was my solution

wheat reef
empty trout
#

Is there even any Go code ๐Ÿ˜ญ

empty trout
steel dagger
empty trout
#

dw we can surely always refactor later

wheat reef
#

just bundle nodejs with the package :D

#

simple fix

empty trout
#

also vedal would 99999% not be down to run extra LLMs

#

but uhhh idrk how else to do OCR

#

there is literally no vision in the API afaik

steel dagger
wheat reef
#

ty for a new idea

#

that is a great idea!

#

diversity-!

steel dagger
empty trout
#

๐Ÿ—ฟ

wheat reef
#

noooo, I was kidding!

empty trout
#

Anything is possible if u remove all elegance of the code

steel dagger
#

what in the hell

empty trout
#

I seen worse dw

#

not even in the top 50 worst I seen

wheat reef
#

I haven't even finished the C impl for neuro-desktop yet

#

I had a process handler written in C

empty trout
#

holy

steel dagger
#

Bro, just, rewrite the entire project in one language and just stick to it...

wheat reef
#

haven't connected it to the main code yet

#

but... diversity....

empty trout
#

No C/C++ on god

steel dagger
wheat reef
#

I'm sorry....

empty trout
#

Real talk though how tf are we supposed to handle vision

#

cuz vedal is NOT spinning up a random ahh vllm server for this

steel dagger
#

tell neuro to tell vedal to remember to turn on vision

#

because he clearly cannot remember to turn on vision

empty trout
#

there is no documentation for vision in the API I think

#

I barely checked tho tbf so I could be wrong

steel dagger
#

I think like one or more times he has NOT turned on vision for neuro art streams

empty trout
#

but I don't think there is

steel dagger
empty trout
#

ok that's cooked

wheat reef
#

but we would need to detect and give neuro screen coordinates

empty trout
#

it's possible if he didn't give a shit about the code quality of tech debt

wheat reef
#

that is not a fix

#

and higher abstract controls also require we detect screen coordinates for neuro too

empty trout
#

Basically we can use YOLO-world which will list all objects and it's coordinates in text form

steel dagger
#

the fix is to ask vedal for at least a packet that accepts a b64-encoded image in a piece of data that can be sent to neuro

empty trout
#

That would simplify a lotttt

#

but let's be so fr

wheat reef
#

yea!

empty trout
#

+3 years for him to do that

steel dagger
#

also, ngl, the api was like

empty trout
#

too busy to implement that prolly

steel dagger
#

NOT meant for tools

#

or like uh

#

we need a way to refer to non-game integrations

#

that isn't tools

#

what's a good name

empty trout
#

utilities? ๐Ÿ’€

wheat reef
#

technical integrations?

empty trout
#

^

steel dagger
empty trout
#

Wait

#

I might be dumb

#

no I def am

#

Neuro already sees the desktop ๐Ÿคฆ

empty trout
#

I mean kinda

steel dagger
empty trout
#

also W shoutout

empty trout
#

not via api

#

just automatically

steel dagger
empty trout
#

wdym

wheat reef
#

in a virtual machine?

empty trout
#

that doesn't matter

steel dagger
#

if neuro-desktop in vm -> vm is just a window on screen -> the vm isn't the only thing scanned

#

-> can confuse nwero/eliv

empty trout
#

Remember streams where vedal was like "What is on my screen"

#

and showed a sea turtle

steel dagger
#

-> makes this already clunky integration even more brittle

empty trout
#

the twins already see the desktop when vision is on

steel dagger
empty trout
#

it was kinda big lol

#

their vision sucks

wheat reef
#

uhmmmm

#

what about the coordinates

#

vision doesn't give coords

empty trout
#

but removes the need for that to be our problem

#

true

wheat reef
#

that is required for the integration to move the mouse

#

along with the integration's higher abstract control

empty trout
#

it's a worse solution but less work

#

and more "elegant" even though it sucks

steel dagger
empty trout
#

coords are best actually

#

not for neuro to interact with directly

steel dagger
#

???

empty trout
#

like neuro can say "I want to move the mouse to <Object name>"

#

different levels of abstraction

#

then we have the lookup table

wheat reef
#

you guys talk it out, I'll go watch

steel dagger
empty trout
#

yes

#

we can do that

#

but it requires running a separate AI model

steel dagger
#

yes ik

empty trout
#

which just sounds like Vedal wouldn't like it

#

even though it's a good idea

#

cuz Tutel

steel dagger
#

well that's a bit of a problem

#

because if you give her raw coords to control she will not be able to do anything

empty trout
#

You see this?

#

This is amazing

#

runs in real time

steel dagger
#

like to her, an arbitrary pixel on the "screen" she can "see" can either be 0,0 or 314159, 2147

empty trout
#

and gives object names without requiring training or prompting

#

turns into a list

steel dagger
#

you still need to describe spacing to her

empty trout
#

yes that's okay

steel dagger
#

which can be useful in cases of moving windows

empty trout
#

Perhaps a relational graph could be useful

#

ObjA is a node

steel dagger
#

how would you explain that in text to neuro

empty trout
#

with relation to ObjB which has the connection of "Left of" to ObjA

#

"ObjB is left of ObjA"

#

she can query for additional information as well

#

with getdistance() or something

#

everything needs to be in relative "coords"

steel dagger
#

and say she moves A{} out of the way so for our intents and purposes, the line is B{} C{}

#

or sorry, B{} C{}

#

how would you now convey that to neuro?

empty trout
#

Why would she care

steel dagger
#

you can't exactly just give her arbitrary "units"

empty trout
#

she aint playing legos

#

It's simple desktop navigation

steel dagger
empty trout
#

That's just unfeasible with current tech

steel dagger
#

or fucking adobe after effects idfk

steel dagger
empty trout
#

Like our scope should not be even close to that

#

Neuro can barely even see

#

only understand entire images cuz it's compressing into a latent

#

no spatial understanding

#

so this is already a MASSIVE upgrade

steel dagger
#

oh also have fun with uhhh whatever the fuck it's called uhm

#

context management!

empty trout
#

๐Ÿ˜ ๐Ÿ”ซ

#

It'll be fun

#

Can't wait for this to prolly never be used by vedal anyways

#

or idk

#

cuz he been using VSCode integration recently

#

so maybe

#

gunna need a lot more experienced hands

steel dagger
#

anyways

#

what should we call non-game integrations?

#

I don't know if "utilities" help/is a good candidate

empty trout
#

Tech Integrations?

#

It's very funny because if the integration is even slightly cooked the entire twitch chat starts FLAMING it

#

It's very difficult to context engineer everything perfectly

#

and even if the things work, the LLMs being able to determine which tool to use for the task isn't always obvious

#

Visual interaction would be like this:

Commands:

1. List (Lists all objects in frame) No inputs. Outputs a list of English objects
2. Search (Finds object that matches text description) Takes English object name. Outputs basic structural awareness analysis "This object is to the left of X and to the right of Y" and saves object to the current list of objects
3. Click (Clicks an object) Takes an English name object. Success or fail response output. Side effect is clicking saved hidden coords in object lookup table (which Neuro cannot see)
4. FindText (Specifically searches for text). No inputs. Outputs a basic structual awareness analysis of textbox locations and also the text inside the text boxes.
#

Maybe some things like "drag" and "right click"

#

but this is the gist

#

Also Neuro can search for things not found in the list

#

because the name could be different

#

and also list isn't 100% accurate (although it's usually pretty good)

wheat reef
empty trout
#

I can work on it tomorrow. It's almost 6am here despair