#Samuel – macOS voice AI that writes and hot-loads its own tools at runtime

1 messages · Page 1 of 1 (latest)

carmine crypt
#

Hey everyone — built something I think this community will find interesting.

Samuel is a voice-first desktop agent for macOS. He watches your screen (GPT-4o Vision), listens to your system audio (ScreenCaptureKit), and speaks back via the Realtime API. The novel part:

You can ask him to add new tools to himself mid-conversation:

"Hey Samuel, add a weather tool"
→ He proposes what he'll build
→ You approve
→ GPT-4o-mini generates the code, writes it to ~/.samuel/plugins/, hot-loads it via new Function()
→ "Done sir. What city?"

No rebuild. No restart. The tool is live in the same session.

He also teaches languages by voice while you watch anime/YouTube — sees the subtitles, hears the dialogue, speaks vocabulary out loud without you having to do anything. Memory persists across sessions (preferences, corrections, proficiency level).

Stack: Tauri v2 + React + @weary tartan/agents + Realtime API (WebRTC) + GPT-4o Vision + GPT-4o-mini for plugin codegen

Repo: https://github.com/sambuild04/screen-voice-agent

Would love feedback from anyone who's built on the Realtime API — especially around session management and rolling context.

GitHub

A desktop AI agent that sees your screen, hears your audio, teaches you languages by voice, and extends itself with new tools on demand. - sambuild04/screen-voice-agent

orchid crest
#

Hey, that is very interesting!

junior vigil
#

Real W for you. So much work

carmine crypt
carmine crypt
junior vigil
#

Very good project indeed