Hi, I'm running OpenClaw on WSL2 with local Ollama models.
Two things I want to add:
- Image/screenshot analysis - my agent can't currently analyze images
- YouTube/video content analysis
I have qwen3-vl:4b-instruct running locally which should support vision.
What's the recommended way to enable these capabilities? Any skills or config changes needed?