#Powershell Logic for Clicking Around with OCR

6 messages · Page 1 of 1 (latest)

neon knot
#

Hey - I am learning to code and I wanted to run something by if you can let me know the logic of this is okay before I start to code

I have a wide range of applications I want to test by clicking around the screen, my logic is as follows.

  1. Open Application, Bring to Front, Maximise
  2. Using OCR take a screenshot, find the text and use cursor to click that menu based on the X/Y (so different resolution screens can be accomodated)
  3. If Application loses focus or isnt at front pause the script until its back to front.

I am going to write it to so there's a config.json and it will have something like
Find Text "File"
Click Text "File"

Is this logic feasible or is there any additional things I should look into?

fleet marlin
#

while its possible, PowerShell isn't well suited for GUI interactivity and it's not a great project to take on to learn pwsh. (you'll probably find it discouraging) You'd be better off with something like AutoHotkey or AutoIT for that task.

hasty umbra
#

this would be helpful with opencv in python, this isnt a powershell task at all

heady sinew
#

also if in microsoft enterprise env, powerautomate could prove viable

#

you'd also want to look at command-line tooling options for these scenarios and avoid GUI manipulation.

willow dew
#

I think you can totally do GUI automation using WASP or one of several like it ... but OCR to do UI automation, sounds more like robotics