I built a YouTube reader that's also an MCP server for Cursor | Cursor | Page 1

restive monolith May 27, 2026, 7:26 AM

#

Side project drop — built entirely with Cursor.

I watch a lot of long-form interviews (podcasts, 2h+ conferences) and got tired of re-watching the same parts trying to find one specific quote. So I built TranscriptMax.

What it does:
• Paste a YouTube URL → full transcript synced with the video player, click any line to jump to that moment
• AI summary + key points + auto-generated chapters so you can decide if the video is worth watching before committing 2 hours
• Highlight passages like in Kindle, tag them, search across your whole library
• Speaker diarization — it knows who's talking, and once you name "Lex Fridman" in one interview, every other video with him is auto-labeled. You can click any speaker and see every moment they ever said something across all your videos.
• You can also ask Claude/Cursor directly to "find the part where X said Y" — it works through MCP (no UI needed)

The diarization part was the most fun to build. Two podcasts from the same person → same voice profile, auto-stitched together. Feels almost magical the first time.

Free tier: 10 videos/day, no credit card.

🔗 https://transcriptmax.com?utm_source=cursor_discord&utm_medium=showcase

Honest feedback welcome — especially on what's confusing or missing. 🙏

TranscriptMax

TranscriptMax | AI YouTube Transcript Generator

Extract transcripts from any YouTube video instantly. AI-powered transcription with multiple export formats.

#

Quick MCP setup if you want to try (free tier works fine):

Sign up at transcriptmax.com → copy your token from /settings/api-keys
Drop this in your Cursor MCP config (replace <YOUR_TOKEN>):

{
  "mcpServers": {
    "transcriptmax": {
      "url": "https://api.transcriptmax.com/mcp",
      "headers": { "Authorization": "Bearer <YOUR_TOKEN>" }
    }
  }
}

Ask Claude: "transcribe <url>", "summarize this video", "find the part where X said Y"

stark hatch May 27, 2026, 8:35 AM

#

I will definitely try it out!

restive monolith May 27, 2026, 10:19 AM

#

stark hatch I will definitely try it out!

Awesome, thanks 🙏 When you've poked around a bit, I'd love your raw
take — anything missing, anything that frustrates you, anything that
feels rough. That's exactly the kind of feedback that helps me
prioritize what to ship next.

stark hatch May 27, 2026, 1:31 PM

#

restive monolith Awesome, thanks 🙏 When you've poked around a bit, I'd love your raw take — any...

Well, I haven't tried it yet but I got some error while setting up the MCP on Cursor. And I found a UI/UX flaw.

#

#

{ "transcriptmax": { "url": "https://api.transcriptmax.com/mcp", "headers": { "Authorization": "Bearer tmx_vKZLOQciVHVVnfORytRTnFPPFXcf_wOdzUoiMWL4V54" } } }

#

Not sure if the problem is being caused by me.

#

However, I do love the design! I'll definitely use it to give some knowledge to the AI based on YouTube videos that explain advanced concepts.

restive monolith May 27, 2026, 1:40 PM

#

@stark hatch thanks for trying! Pinged you in DM — my snippet had a typo
(missing mcpServers wrapper). Fixing the post right now 🙏

stark hatch May 27, 2026, 1:41 PM

#

Personally, what would help me a LOT would be if it could analyze not only the audio but also the frames. I kinda expected something like this and I'd most likely even purchase the subscription. Basically, analyze the video visually and audio-wise.

restive monolith May 27, 2026, 2:12 PM

#

stark hatch Personally, what would help me a LOT would be if it could analyze not only the a...

About the visual frame analysis — 100% on the roadmap, this just pushed
it up.

Today the pipeline is audio-only (Whisper transcript + speaker
diarization). You're right that for advanced content the visual matters:
slides, code on screen, whiteboard diagrams, gestures.

The blocker is cost — passing every frame through a vision model on a
2h video gets stupidly expensive. So I need smart sampling first (scene
change detection? slide detection? key-moment scoring?). A few weeks of
work, but doable.

Quick question to help me prioritize: what's your main use case —
lectures with slides, code tutorials, demos, something else? That'll
tell me which sampling heuristic to ship first 🙏

stark hatch May 27, 2026, 2:19 PM

#

restive monolith About the visual frame analysis — 100% on the roadmap, this just pushed it up. ...

I'd mainly use it for code tutorials and documentation about services. I mostly need it to analyze the code (Gemini AI is really good at that imo). Scene change detection and key-moment scoring sound like good ideas, however, the scene change detection would need a frame analysis as well, right?

Note: Personally, it'd be helpful for me if it could see both the code and 3D visuals. That's because I use a 3D studio for game-development and making the AI understand the scene and the code is really useful and time-saving.

Another use case would be to analyze speedbuilds.

hoary sundial May 27, 2026, 8:37 PM

#

This is super similar to a project I built and the biggest issue we had was getting people who wanted to use it. Seems like a super idea to me so cool to see different implementations of solutions for the problem space.

restive monolith May 28, 2026, 7:08 AM

#

stark hatch I'd mainly use it for code tutorials and documentation about services. I mostly ...

Solid use cases — code-OCR is mostly integration , scene change + key-moment scoring already on the roadmap. The 3D + speedbuild parts need a proper feasibility spike (cost/min, which model) — I've logged it as a dedicated story, will ping you with benchmarks

restive monolith May 28, 2026, 7:09 AM

#

hoary sundial This is super similar to a project I built and the biggest issue we had was gett...

Yeah honestly finding users is the biggest grind. I'm gonna try reaching out to media folks / journalists / YouTubers — feels like that's closer to who actually deals with long-form content daily and would get real value out of it. We'll see if it lands 🤞

stark hatch May 28, 2026, 7:17 AM

#

restive monolith Solid use cases — code-OCR is mostly integration , scene change + key-moment sco...

Sounds good!
For faster transcription you could use OpenAI's whisper model's API. Sending the url's video to whisper as an MP3 and getting a more accurate response

restive monolith May 28, 2026, 6:53 PM

#

stark hatch Sounds good! For faster transcription you could use OpenAI's whisper model's API...

It's definitely something I've considered, but I'm trying to keep costs as low as possible on the SaaS — the goal is to minimize what I'm spending so I can keep iterating without it getting too expensive in the long run!

#I built a YouTube reader that's also an MCP server for Cursor