#SPARK — Simple Personal AI Reasoning Kernel

1 messages · Page 1 of 1 (latest)

coarse haven
#

A local-first reasoning loop combining OpenAI APIs with GPT-OSS.

Instead of chat, it routes intent through structured evaluators (PhPUs), exposing where decisions come from — not just outputs.

Closer to a thinking system than an app.

https://community.openai.com/u/phyde1001/activity/topics

https://community.openai.com/t/spark-simple-personal-ai-reasoning-kernel/1366435

coarse haven
#

SPARK routes between modules based on intent, not prompt flow.

coarse haven
#

Voice → intent → motion.

This is SPARK driving a simple 3D hand from speech.
Whisper → structured commands → execution.

Still early (5–10s latency), but the interesting part is:
it’s not generating responses — it’s deciding actions.

Curious how others are handling intent routing vs direct command mapping?

coarse haven
#

Update on SPARK — started building out a face/voice interface on top of the intent routing.

Now testing browser TTS + viseme mapping (vowels/consonants → mouth shapes).

Still rough, but it’s moving toward full voice → intent → expression.

Curious how others are handling speech → viseme sync or if you're just faking it with timing?

coarse haven
#

Well that escalated quickly…

Got SPARK speaking in my own cloned voice in English and Chinese while describing an image using GPT-5.4 + F5 TTS.

Pipeline is currently: image → GPT-5.4 → text → F5 TTS → voice.

Face/viseme sync is next — my son’s working on that side, starting with English.

coarse haven
#

SPARK follow-up:

Using the voice layer for line-by-line English / Chinese fairy tales with my kids.

Same story, sentence by sentence:
English → Chinese → English → Chinese.

Not screen-time learning — repeated listening, story, memory, and language exposure.

In my own cloned voice, kept local.

Next layer:
3D face + viseme sync.

coarse haven
#

Weekend SPARK build.

My 12-year-old son made this 3D head animation, and I’m using it as part of a local AI voice/avatar experiment.

The project takes an image file, describes what is in it, then speaks that description back in my own voice, with early voice-to-animation timing already working reasonably well.

Stack: PHP, HTML/JS, local vision model, F5-TTS, Docker/local services, DGX Spark, pure PHP GIF generation.

Still rough, but that’s the point: small interfaces that children can understand, modify, and grow with.

Next step: avatar expressions, eyebrows, eyes, and mouth shapes.

coarse haven
#

Weekend SPARK build update.

What a difference a day makes.

Last weekend: simple animated head.

This weekend: much better face model from my 12-year-old son.

Still rough, but the method is clearly better now. The face has more structure, more character, and feels much closer to the local AI avatar pipeline I’m trying to build.

Current direction:
image → local vision description → F5-TTS voice → mouth/face animation → GIF/output

Small steps, but real ones.

coarse haven
#

Weekend SPARK build update.

What a difference a day makes.

Last weekend: simple animated head.

This weekend: much better face model from my 12-year-old son.

Still rough, but the direction is becoming clearer now:

image → local vision description → F5-TTS voice → mouth/face animation → GIF/output

What matters to me is that these become systems children can actually understand, modify, and grow with locally.

Small steps, but real ones.