How to Fetch YouTube Transcripts with OpenClaw

If you’re using OpenClaw as your local AI assistant, you’ve probably wished it could reach into a YouTube video and pull out what’s actually being said. Copy-pasting transcripts manually is tedious, and not every video has a clean auto-generated caption file you can export. AgentPatch makes this straightforward.

Why This Matters

YouTube hosts an enormous amount of useful content — tutorials, conference talks, interviews, lectures. The problem is that content is locked inside a video player. If you want to reference, quote, or summarize it, you either watch the whole thing or go hunting for a transcript.

With AgentPatch’s YouTube Transcript tool connected to OpenClaw, your agent can fetch the full transcript of any video with a single request. You get the text with timestamps, which means you can ask follow-up questions, extract specific sections, or summarize the whole thing in your Telegram or Discord chat.

Setup

Add AgentPatch to ~/.openclaw/openclaw.json:

{
  "mcp": {
    "servers": {
      "agentpatch": {
        "transport": "streamable-http",
        "url": "https://agentpatch.ai/mcp",
        "headers": {
          "Authorization": "Bearer YOUR_API_KEY"
        }
      }
    }
  }
}

Replace YOUR_API_KEY with your actual key from the AgentPatch dashboard. Restart OpenClaw and it discovers all AgentPatch tools automatically.

Example

Say you come across a conference talk you want to reference later. You send a message to your OpenClaw bot on Telegram:

“Get me the transcript for this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ

OpenClaw calls the YouTube Transcript tool through AgentPatch and returns the full transcript with timestamps directly in your chat. No browser extensions, no third-party transcript sites, no copy-pasting.

From there you can follow up:

“What does the speaker say around the 4-minute mark?”

OpenClaw uses the timestamp data from the transcript to find the relevant section and answers inline.

This works for any public YouTube video — tutorials, product demos, podcast recordings uploaded to YouTube, academic lectures. If there’s a transcript available on the video, your agent can fetch it.

It’s a straightforward way to convert youtube video to text for notes, research, or content repurposing. Instead of manually copying captions or using a separate tool, your agent handles the full conversion in one request.

Wrapping Up

The YouTube Transcript tool is one of many available through AgentPatch. Once you’ve added the MCP config, OpenClaw also has access to Google Search, email, image generation, and more — no additional setup required. Visit agentpatch.ai to explore the full marketplace.