#gui-automation

1 messages · Page 1 of 1 (latest)

languid vale
vague craterBOT
vapid topaz
#

😡

silent ermine
gentle dune
#

When I ask clawdbot to automate using peekaboo or cliclick, it always clicks on the wrong coordinates. Is there any way to fix this?

floral juniper
#

I can’t seem to grant accessibility access to peekaboo. What am I missing?

silent ermine
modern pelican
#

everytime I type "1" to clawdbot via telegram, the clawdbot will start over and tell he just get onling and forget everything?

storm sonnet
jolly yarrow
#

Hello everyone
I would like to have my clawd making test on an app on Windows (clicking buttons, changing screens) while updating C# code.
As it’s not web I cannot use pupeeteer/playwright, what do you use for this use case ?

silent ermine
languid vale
#

For GUI Automation, I added Chat interface on ClawdBody. You don't need to setup Telegram/WhatsApp. You can directly chat and make OpenClaw do things now!

livid lodge
#

Nice gonna make expo app and chat with it STT

orchid sierra
#

Hello, does anyone use Telegram in groups? I want other people in the group to be able to talk without mentioning the bot and for the bot to be able to respond to them. Is that possible? I tried several ways, such as disabling privacy, allowing open, and group policy open, but it still doesn't work. I am in the beta version still dont work, someone know how to fix it?

spring harbor
rustic gate
languid vale
modern dune
#

has there been any discussion here about Computer use models that run locally use CV to screenshot the GUI and co-ordinate tool calling actions ?

#

its based on Alibaba's Qwen2.5-VL vision-language model .. and has been optimized for microosft windows usage, if anyone wants to collaborate with implementation and some dev, let me know, i'm currently looking into this space

silent ermine
#

its based on Alibaba's Qwen2.5-VL vision

split orchid
#

Does Openclaw excel with GUI access? If I were to set it up on Arch Linux (terminal only; no GUI desktop) would it be much more limited in what it could do? Or could it be nearly as effective with just CLI tools?

trim python
#

I just made a new tool so that the bot can take screenshots and click on stuff on my computer yay

still matrix
#

hey - looking to run an ios app on my computer and have the bot click around it, like a product showcase - i tried ⁨cliclick⁩ but it's a bit slower and not great with UI gestures, especially for iOS (running as a mac app downloaded from the mac store)

anyone know of any other good software for giving it UI control?

trim python
vapid topaz
#

true but take some time

#

i just fineshed seting up my discord

#

ass crazy work

trim python
# vapid topaz nice i gona check

I'm currently working on that very code, i have a multi monitor setup and when i ask to take a screenshot it just uses my default primary monitor so im adjusting the code so i can pick the monitor

trim python
vapid topaz
#

yeeeeeeees

trim python
#

i hate how discord relies on exec-approvals.json and i have to light fight the whole exec approval system its such a mess

#

i think i've set up maybe 8 or 9 clawdbot acrosses all the systems ive mssed with

vapid topaz
#

just to do some togle boxes is crazy using one openclaw jsut for discord stuff

trim python
#

lol my latest discord bot was just a vibe coded fun project where my friend had an idea and Im like fuck it ill just code it for funsies

#

basically the bot will use it's heartbeat to generate a report of new images, compare it to me and my friend's prefrences and post them on #suggestions, then we'll reply and rate the images 1 to 10 where the bot will then log it and track our likes to fine tune suggestions. finally we have this other room where we create and define "face templates" and we're like "hey bot I want you to apply this suggestion to so and so face" and it'll queue it up with ComfyUI and post the remixed image in another group lol its silly but it was fun to code

abstract hound
#

hello guys finally got my claw up and running after days of debugging is there a recommendation taht. you guys think i should have for vps what skills to install that actually works well ? thank you!

tiny sky
#

who's had success so far training Openclaw to take over their work? this bot couldn't have come along at a better time, since it integrates nicely into a little project i've been working on to enable LLMs to pilot my work laptop in a way that my boss can't detect

wispy hull
tiny sky
# wispy hull Working to do that now! How are you doing that?

I turned a raspberry pi into a bluetooth HID receiver, and wrote a program to let an LLM send mouse and keyboard inputs based on what it sees. I just added api server endpoints for openclaw to send HID commands directly to the target computer, and though it takes a lot of training, it's starting to get the hang of things.

wispy hull
tiny sky
tiny sky
# wispy hull How are you feeding it the images/capturing the screen?

before openclaw I was interacting with the LLM directly in my program. unfortunately it wasn't as reliable with complex tasks involving many steps, despite some features for breaking larger tasks into smaller simpler ones and executing them sequentially. now I have openclaw for that, and it's not bad

tiny sky
wispy hull
#

My solution was actually to give up 😂

I don’t have a company laptop, so I considered capturing the desktop and streaming it back to a dedicated PC where OpenClaw could do the same thing you’ve set it up to do. Ultimately, I decided not to and turned OpenClaw into more of an assistant. Right now it just adds stuff to the cart for me.

I’ve been working on giving it access to my work email so it can help draft replies and manage quote requests and vendor communication. I only recently got my work email running on the dedicated PC for OpenClaw, but I’ve been stuck trying to get it to reliably browse and manipulate the browser or desktop. I got sidetracked until I can come up with something better that lets it use the entire GUI.

I’ve wanted to integrate the built-in Windows accessibility voice commands, but I haven’t had the time. Then I planned to take a weekend to integrate Windows-MCP and build a skill for OpenClaw to use, but I haven’t gotten around to it.

I did hand the task to OpenClaw at one point, but it got stuck and never responded. I had to terminate it and delete what it did, since I wasn’t overseeing any of it—and what it produced looked like garbage.

#

I think your screenshot method is the best choice right now. I’d love to get windows-mcp working but that will take more time.

How fast are you able to get it to work with the screen shot>llm>coordinates method?

#

Also are you using a model that excels in coordinates? I think I heard Fara is good at using coordinates or something of the sort

tiny sky
#

that's rough. you could do what I'm doing if you brought a laptop to your workplace, but it might be a bit difficult to explain to others haha. when I'm directly prompting the LLM I use GPT 5.2 because it's quite good at formatting its responses in json. It's too expensive to use with openclaw though, so I'm trying different models with mixed results.

#

it's quite fast to send the screenshots and respond with inputs, with the longest response times being around 20-30 seconds

#

that's both when i manually prompt an LLM or when I let openclaw take the wheel

wispy hull
#

Hmmm I guess I could bring a laptop and do it that way. Gonna try and get something up and running with that screenshot method and 5.2 to see how it works out!

Let me know if you find another model that works for you! I’m also on 5.2 and to be honest I’m not loving it. Feels like k have to prompt it over and over and it fails to call tools sometimes.

Idk will report back if I get around to it.

tiny sky
#

will do! i had some issues with consistency of output at first, but after beefing up the system prompt it got a lot better. i even included every single HID command in it, since it was so crucial for my device for those to be accurate. not sure how you'd do it if you went with another input method, but i definitely recommend being crazy specific in the system prompt

vast turret
trim python
#

the "easiest" one to set up imo is Telegram and that's where I keep my main bot

vast turret
# trim python Slack - never tried it, never use it WhatsApp - sucks you gotta find a second ph...

Thanks that’s pretty much what I do. I’m not gonna lie. I happen to have a second phone number just because of a business thing so I used it and you’re right cause I tried the first way talking to yourself is just weird as shit, but do you find that you get more? I would say easier is it more feature Rich working on Discord than telegram I’ve coded on telegram that’s pretty straightforward. Discord ?

trim python
#

yeah discord is way better than Telegram like if you'r ewilling to wrestle with the exec tool gods go for gusto and set it up on discord the experience feels way better

#

for the recrd i just set it up so my gf's whatsapp is my clawdbot lol

coral briar
#

I liked Telegram, but Discord definitely is way more powerful. Especially if you want to set up sub-agents with their own channels. For instance you can just have your bot act as a specific employee, and route to certain channels. Like a UI/UX designer in a design channel, web dev in a dev channel. As long as the bot knows which channel ID to talk to, it'll know where and how to respond.

rare crest
fierce ocean
#

discord integration is not very well documented, but after a little while messing with it , I figured it out

rare crest
#

Could I get a quick hint, by any chance?

coral briar
karmic shuttle
# rare crest Could I get a quick hint, by any chance?
  1. You need to go to https://discord.com/developers > log in!
  2. Then follow the official documentation > https://docs.openclaw.ai/channels/discord#discord for setting up your bot accordingly.
  3. After that you need to invite the "created bot" to your chosen discord server.
  4. Copy the generated URL at the bottom in your OAuth2 Section (where you give permissions to your bot) then copy this URL and paste it into the browser.
  5. then follow the instructions!

You need your discord server ID & Channel ID (hover over it and then right Click > at the bottom you will find the ID > copy it) + you need your bot token (you find this when setting up your bot in discord.dev )

steep parrot
#

what is gui-automation vs browser-automation? we should add a channel description.

knotty birch
vague craterBOT
# knotty birch https://x.com/srisanth2004/status/2018384078861676571

FELT THE AGI MOMENT... Building a lite version of @openclaw which vibe coded for me... on @Lovable
︀︀
︀︀i am working on a lite version of OpenClaw, Today i gave vision which means it can see what i see... Man, I asked for neo-brutalism website in whatsapp...
︀︀
︀︀This guy, navigated to my brave browser, opened lovable.dev, Wrongly typed the prompt, corrected it again, accepted the plan created by lovalble, Then opened in a full screen... (Fun part no DOM, Full Cursor and key strokes Control)😂
︀︀
︀︀When i am Back from shower i see a website vibe coded by AI itself🙃. this is fucking unbelievable for me...
︀︀
︀︀Now it can also open claude code, antigravity, cursor, you name it. and can do testing, development 100x than you by using the computer on behalf of you...
︀︀
︀︀HE CAN FUCKING SEE THE SCREEN...!😵‍💫
︀︀
︀︀Do support this project if anyone see this intersting:
︀︀github.com/Pr0fe5s0r/Lite

▶ Play video
knotty birch
#

I made a GUI automation... Tried to built it in OpenClaw but, i am not good in JS. built a lite version in python and integrated a GUI automation... Complete control of my Cursor and Key Stokes...

sleek rampart
#

is there anyone looking for developer ?

soft heart
#

someone explain what this channel is about to me? i dont get what GUI automation is

zinc patio
#

Ya anyone looking to do a project ? I'm not a dev but like to get a directory up I know there is one already but ya ...hmu

silent ermine
gentle dune
molten comet
#

Hey guys, I would like to use my Clawd to do complex content creation (coming up with psychological strength model etc). Currently my workflow for this is:

  1. Research a topic or come up with drafts myself
  2. Condense findings into Markdown in the project folder.
  3. Iterate and create new content building on previous Markdown files.

I switch between Slack / Clawdbot Control for chat and Github to view created .md files, always having to pull recent files to view.

To make this easier, I thought of expanding the Clawdbot Control Chat UI to also be able to view my files hierarchy (esp. projects folder) on the left sidebar (below "Chat", above "Control") and integrate a .md viewer / editor in the middle, chat on the side (similar to a CLI layout).

Has someone done this?
Is there an easier way to make my workflow smooth without forking the Clawdbot UI?

hollow mesa
#

What’s the best way to make it control Mac?

green arrow
#

But yes, I am also going to build a UI because I want easier folder access, but right now doing everything via obsidian works

rough charm
#

⁠ 🙋 Feature Request: reactionTrigger for WhatsApp

PR #3977 added reaction triggers for Discord — would love the same for WhatsApp!

Baileys already emits messages.reaction events with emoji, sender JID, and target message key — the data is there, just not forwarded to agent sessions.

Use case: Multi-agent WhatsApp setup. Reactions as quick confirm/cancel in group chats (co-parenting bot, family coordination) instead of typing.

Proposed config:
 ⁠json
{
"whatsapp": {
"reactionNotifications": "own",
"groups": {
"123@g.us": {
"reactionTrigger": {
"enabled": true,
"windowSeconds": 60
}
}
}
}
}

GitHub issue: https://github.com/openclaw/openclaw/issues/9210

molten comet
# green arrow Yeah I have an extensive content pipeline. It’s all via obsidian. I do a lot of ...

Thanks for sharing @green arrow!

I've heard from an acquaintance and Claude Code poweruser yesterday that setting up Obsidian within the Clawd Server / MacMini Environment should also be an option. Then all files of Clawd would sit right inside Obsidian and you can use Obsidan Sync (paid) to have 2-way-editing of files on mobile and other devices. At the same time: No need to build a new UI from scratch. Thoughts?

green arrow
chilly monolith
remote light
wicked palm
coral briar
#

So they won't talk together in real time no

wicked palm
coral briar
#

For instance I had 7 agents and set them all to false for requirementions, and all I had to do was say anything in the channel I was in, and they all responded to me.

#

They really don't have convos with each other though.

primal bough
#

whats the best skill or plugin to install to give openclaw browser automation skill?

wicked palm
coral briar
wicked palm
#

I was hoping it would be easier to set up a workflow where:
User gives task to Project manager agent, PM delegates to coder or researcher agents, coder works on build, messages PM when completed, PM messages user

spring harbor
frosty hollow
grim pasture
#

I can't stress enough how much I freaking love this. I'm not even using anything crazy, just co-development with multiple agents and they are operating within discord together as a team. It's so badass.

slate badge
grim pasture
#

We're literally developing software as a team and they all talk to each other independently and they've built some cool stuff.

slate badge
leaden ibex
#

@grim pasture how did you setup the Discord multi agent collaboration?

broken night
errant leaf
versed schooner
#

Anyone have any moderate success with Linux VM desktop Automation w/ openclaw?

grim pasture
idle iron
grim pasture
grim pasture
glossy fox
#

Can anyone tell me how to get multiple agents working?

#

Im extremely new to all this openclaw stuff

frosty hollow
#

Ive made an app to help set up users opencode.json and .env files with an agent manager so each agent can be assigned a different model, every part of the json can be validated and it runs fully locally. openclaw could set it up for you but its designed so you can download the config to check it before uploading to OpenClaw.

Every setting has full details of its use, download and use locally to check files before updating. https://github.com/dazeb/openclaw-config-editor

orchid kraken
#

@Hatch

We followed your instructions and bypassed Amazon (AWS), connecting directly to Anthropic. The Gateway is connected, but the agent still won't talk—just "thinking dots" and then silence.

We are stuck between two dead ends:

Claude 3.7: Returns an HTTP 404 (Path not found).

Claude 3.5 Haiku: Log warns it's Deprecated/EOL.

Also, the CLI keeps rejecting the config key: Unrecognized key: "main".

Which exact Model String should we be using in this build to actually get a response?

craggy portal
#

Hi Tom, I'm not technical, but I had similar issue -mine was claude 4.6 - I've uninstalled and installed again, and provided API key during openclaw config setup

#

maybe you can just try to run openclaw config- model selection again and give new fresh API key to see if it works

frosty hollow
cosmic plinth
frosty hollow
vivid marten
#

What’s the best agent ui viz . any of these good. I got so much time I need to gamify my setup lol

#

Of course open source self hosted.

#

Anyone have their agent scheduling with a calendar . Google
Gog setup looks drag to sue their bs Google console.

Is there better way? This Google
Workspace mcp?

slate badge
#

anyone had luck with video editing automation? Ive had pretty good results with gemini-whisperAPI-FFMEG(sp?), its REALLY close to being able to turn out a good edited clip, but not quite. Anyone have luck? Also anyone have luck automating descript editing?

night peak
#

I had Gemini dynamic view in experimental labs generate a perfect rendering of the firmament mechanics, sun and moon as plasma nodes projected through the aperture, core light reflected not the sun and a binary system to account for the eclipses…and this was adding my input from my threads and seeing and old bible that had a pretty well drawn out and descriptive model…perfect rendering…no speach from the interface

fervent fog
#

If you have your own project or have some issues on your project, please dm me. I can help you as openclaw expert

violet copper
#

I was trying to throw all I can at it locally but I can’t quite figure out how to host the models the best way, whether to use vllm or go with gguf etc. Ollama seems to fail embedding for whatever reason when I tried to run my own local embedding on the clamshell, @fervent fog what do you think I have any hopes here for it to run ? I was essentially thinking having a local loop and a slower cloud loop taking use of a 20$ OpenAI sub especially with the 2x limit till April. ( if u use their codex APP - I think just an active session is enough? )

barren stratus
#

Is there any trick to make the openclaw be able to use the desktop ? seems it needs some permissions, for example node, but then when i open to accept the notfication is gone .. How can i force it ?

iron iron
barren stratus
#

that doesn t work

#

you need the prompt and accept it, macOS now doesn t let you add it manually

#

(at least thatsa what copilot told me) 😄

iron iron
#

nahh it works, I did that today only and that to after the letest macos update

lethal oasis
#

Hey quick question
do open claw has access to word in desktop to generate documentation using llm as brain and rag as reference

tribal grotto
#

i have an agent that runs every 15 minutes and gardens/cleans my obsidian and keeps it all atomic and easily searchable for rag

#

thats not so much gui, but it achieves the same end result i think.

million ways to do it im sure

south holly
tribal grotto
#

so the "agent runs every 15 minutes" is a bit inaccurate, it checks to see if the agent should run every 15 minutes

molten stone
tribal grotto
# molten stone Smart approach! How do you handle your folder structure? Flat or thematically so...

its loosely based on para.

i have the root folder of the vault, which has my daily file (i.e. 2026-02-16). This is my eisenhower matrix for the day, created by an agent over night.

it also has my master todo list that is just a raw dump of everything in my head that needs to get done.

then theres

  • Areas
  • Resources
  • Knowledge
  • CORA

as the primary four. Areas has folders for everything im working on (think projects)

Resources are static/completed things (branding assets, finished pdfs, guides, etc)

Knowledge is super atomic bits of data, pulled from everything else. This is done by the agent on the 15 minute cron. Each individual fact/nugget is tagged and summarized and linked to related concepts/projects/resources

i drop something -- anything, doc, pdf image, markdown, whatever -- into the root folder, and the cron agent examines it, files it to the right place and pulls knowledge out of it to create more files.

i then symlink the whole thing to openclaw's memory directory and let agents query it all with QMD.

effectively creates a rag system with everything im working on, and "teaches" the bot through the tiny bits of linked/tagged knowledge

molten stone
molten stone
coral hollow
#

anybody using a good speech to text tool for codex / claude code clis?

tribal grotto
# molten stone Never heard of CORA though.. is that something like PARA?

oh my bad man, cora is the name of our main agent/system of agents. I was in a hurry when I shared that lol

"Central Operations & Routing Agent" -- My eight year old named it 😉

CORA is the main agent and just handles heartbeats and crons and basic stuff. We don't interact much.

Then we each have a personal agent tied to everyone's iMessage. My agent, "Friday" has a folder within CORA on obsidian that we can both access via iCloud. This allows us to share bigger stuff back and forth without going through imessage all the time

molten stone
tribal grotto
#

best i can figure its the nerd genetics lol

#

but honestly, the bobiverse series really took hold of my kids -- if you can get them to read light sci fi, at least

prisma pike
#

Morning, I’m new here just wondering if anyone has used this for real estate purposes?

turbid pivot
prisma pike
turbid pivot
#

no, that's just the first thing I wanted to have this Admin Agent Dashboard I built do.

coral hollow
novel cypress
muted stump
tribal grotto
urban escarp
#

i do retool development at work. retool is a web based low code/no code development platform. what is the best way to have openclaw do this for me? it keeps wanting me to use a chrome extension, but can it not just controlt he browser natively?

#

i tried using the chrome extension, and it somewhat works, but it has trouble clicking and dragging, and it seems to time out a lot

tawdry coyote
#

hey guys how can i let openclaw control my browser without any issues. like make it login use my default browser.
No need to enable the browser extension.

regal flare
rigid patio
muted stump
#

Can a robot make a symphony

#

Can a robot paint a masterpiece

#

How those urs work btw mine just takes screenshot and then figures out where to move the mouse

median comet
#

Hi guys, I am one of the unfortunate windows users out there and I have developed a framework for OpenClaw to entirely use any windows app through computer use. Does anyone think this is worth releasing?

muted stump
#

Those it read all the text from a windows app parse through it and figure out where to click or like those it take a screenshot and figure out what to do

median comet
# muted stump Those it read all the text from a windows app parse through it and figure out wh...

Yeah, so I am using microsoft UIA (their automation framework) and basically built tooling that retrieves the entire UI tree (or at least the interactactible elements) and then follow that up with screenshots (Basically Set-Of-Mark prompting similar to how vision for browser use agents is done).

I have currently made it as a plugin for openclaw (+ a external agent system if someone would want that) which requires that you start a gRPC server which allows it to call the tools quite quickly and in a token efficient manner.

jaunty cosmos
#

has anybody here tried automating chatgpt gui so openclaw can use deep research/ chatgpt pro model with subscription?

wide plume
#

Has anyone managed to connect with the MSteams yet?

prisma mulch
#

@jaunty cosmos perplexity would be better for research or skills for brave browser api keys

jaunty cosmos
pearl pecan
jaunty cosmos
pearl pecan
#

Easier than gui version in that sense ... robot work out browser automation ... with possible periodic human login depending on where the account is?

muted stump
mighty isle