#I built a YouTube reader that's also an MCP server for Cursor

18 messages · Page 1 of 1 (latest)

restive monolith
#

Side project drop — built entirely with Cursor.

I watch a lot of long-form interviews (podcasts, 2h+ conferences) and got tired of re-watching the same parts trying to find one specific quote. So I built TranscriptMax.

What it does:
• Paste a YouTube URL → full transcript synced with the video player, click any line to jump to that moment
• AI summary + key points + auto-generated chapters so you can decide if the video is worth watching before committing 2 hours
• Highlight passages like in Kindle, tag them, search across your whole library
• Speaker diarization — it knows who's talking, and once you name "Lex Fridman" in one interview, every other video with him is auto-labeled. You can click any speaker and see every moment they ever said something across all your videos.
• You can also ask Claude/Cursor directly to "find the part where X said Y" — it works through MCP (no UI needed)

The diarization part was the most fun to build. Two podcasts from the same person → same voice profile, auto-stitched together. Feels almost magical the first time.

Free tier: 10 videos/day, no credit card.

🔗 https://transcriptmax.com?utm_source=cursor_discord&utm_medium=showcase

Honest feedback welcome — especially on what's confusing or missing. 🙏

TranscriptMax

Extract transcripts from any YouTube video instantly. AI-powered transcription with multiple export formats.

#

Quick MCP setup if you want to try (free tier works fine):

  1. Sign up at transcriptmax.com → copy your token from /settings/api-keys
  2. Drop this in your Cursor MCP config (replace <YOUR_TOKEN>):
{
  "mcpServers": {
    "transcriptmax": {
      "url": "https://api.transcriptmax.com/mcp",
      "headers": { "Authorization": "Bearer <YOUR_TOKEN>" }
    }
  }
}
  1. Ask Claude: "transcribe <url>", "summarize this video", "find the part where X said Y"
stark hatch
#

I will definitely try it out!

restive monolith
# stark hatch I will definitely try it out!

Awesome, thanks 🙏 When you've poked around a bit, I'd love your raw
take — anything missing, anything that frustrates you, anything that
feels rough. That's exactly the kind of feedback that helps me
prioritize what to ship next.

stark hatch
#
{ "transcriptmax": { "url": "https://api.transcriptmax.com/mcp", "headers": { "Authorization": "Bearer tmx_vKZLOQciVHVVnfORytRTnFPPFXcf_wOdzUoiMWL4V54" } } }
#

Not sure if the problem is being caused by me.

#

However, I do love the design! I'll definitely use it to give some knowledge to the AI based on YouTube videos that explain advanced concepts.

restive monolith
#

@stark hatch thanks for trying! Pinged you in DM — my snippet had a typo
(missing mcpServers wrapper). Fixing the post right now 🙏

stark hatch
#

Personally, what would help me a LOT would be if it could analyze not only the audio but also the frames. I kinda expected something like this and I'd most likely even purchase the subscription. Basically, analyze the video visually and audio-wise.

restive monolith
# stark hatch Personally, what would help me a LOT would be if it could analyze not only the a...

About the visual frame analysis — 100% on the roadmap, this just pushed
it up.

Today the pipeline is audio-only (Whisper transcript + speaker
diarization). You're right that for advanced content the visual matters:
slides, code on screen, whiteboard diagrams, gestures.

The blocker is cost — passing every frame through a vision model on a
2h video gets stupidly expensive. So I need smart sampling first (scene
change detection? slide detection? key-moment scoring?). A few weeks of
work, but doable.

Quick question to help me prioritize: what's your main use case —
lectures with slides, code tutorials, demos, something else? That'll
tell me which sampling heuristic to ship first 🙏

stark hatch
# restive monolith About the visual frame analysis — 100% on the roadmap, this just pushed it up. ...

I'd mainly use it for code tutorials and documentation about services. I mostly need it to analyze the code (Gemini AI is really good at that imo). Scene change detection and key-moment scoring sound like good ideas, however, the scene change detection would need a frame analysis as well, right?

Note: Personally, it'd be helpful for me if it could see both the code and 3D visuals. That's because I use a 3D studio for game-development and making the AI understand the scene and the code is really useful and time-saving.

Another use case would be to analyze speedbuilds.

hoary sundial
#

This is super similar to a project I built and the biggest issue we had was getting people who wanted to use it. Seems like a super idea to me so cool to see different implementations of solutions for the problem space.

restive monolith
restive monolith
stark hatch
restive monolith