YouTube MCP Server

The YouTube MCP Server connects your AI agents directly to YouTube content.

It enables Large Language Models (LLMs) to extract metadata and generate high-fidelity transcriptions from videos.

Features

  • 🧠 In-Memory Processing: The server processes audio streams directly in RAM.
  • 📝 Smart Transcription: It utilizes OpenAI’s Whisper models to generate accurate text from audio.
  • 🗣️ Voice Activity Detection (VAD): Silero VAD segments audio precisely.
  • ⚡ Hardware Acceleration: The system supports CUDA (NVIDIA) and MPS (Apple Silicon) to speed up inference.
  • 🌐 Multi-language Support: It supports transcription for 99 languages and includes translation capabilities.
  • 📊 Metadata Extraction: You can retrieve full video details, including views, duration, tags, and descriptions, without downloading the video file.
  • 💾 Intelligent Caching: The server caches transcription results to JSON files.
  • 🚀 Parallel Processing: It employs concurrent threads to transcribe multiple audio segments simultaneously.

Use Cases

  • RAG Pipelines for Video Content: AI agents can retrieve specific technical instructions from tutorial videos to answer user queries accurately.
  • Automated Content Summarization: Content creators can generate summaries, show notes, or blog posts from raw video footage automatically.
  • Multilingual Content Analysis: Researchers can transcribe and translate foreign language news or interviews to English for sentiment analysis.
  • Metadata Indexing: Developers can build searchable databases of video libraries by extracting tags, categories, and descriptions programmatically.

How to Use It

Prerequisites

Python 3.10+: The server requires a modern Python environment.

ffmpeg: This is critical for audio processing.

  • macOS: brew install ffmpeg
  • Linux: sudo apt install ffmpeg
  • Windows: Download the executable and add it to your system PATH.

    Installation

    1. Clone the Repository

    git clone https://github.com/mourad-ghafiri/youtube-mcp-server 
    cd youtube-mcp-server

    2. Install Dependencies

    bash uv sync

      Configuration

      You can customize the server’s behavior by modifying the configuration file located at src/youtube_mcp_server/config.py.

      Adjustable Parameters:

      • TRANSCRIPTIONS_DIR
        • Description: Defines where the server saves the cached transcription JSON files.
        • Default: "transcriptions"
      • WHISPER_MODEL_NAME
        • Description: Selects the specific OpenAI Whisper model size.
        • Options: "tiny", "base", "small", "medium", "large", "turbo".
        • Default: "tiny"
        • Performance Note: Larger models (like “large” or “turbo”) provide better accuracy but consume significantly more RAM and require a GPU.
      • SILERO_REPO / SILERO_MODEL
        • Description: Specifies the repository and ID for the Voice Activity Detection model.
      • SAMPLING_RATE
        • Description: Sets the audio sampling rate in Hz for both Whisper and VAD.
        • Default: 16000
      • SEGMENT_PADDING_MS
        • Description: Adds time (in milliseconds) to the beginning and end of each audio segment. This prevents words from being cut off at the boundaries.
        • Default: 200
      • MAX_WORKERS
        • Description: Controls the number of parallel threads used for transcribing segments.
        • Default: 4
        • Performance Note: Higher values increase speed but increase CPU and Memory usage.

      Starting the Server

      Run the main script using uv to start the server.

      uv run main.py

      The server will initialize and listen for connections using Server-Sent Events (SSE) transport at http://127.0.0.1:8000/sse.

      MCP Client Configuration

      You must add the server details to your MCP client configuration file (usually claude_desktop_config.json or similar) to enable communication.

      {
        "mcpServers": {
          "youtube": {
            "url": "http://127.0.0.1:8000/sse"
          }
        }
      }

      Tools Reference

      The server exposes two primary tools to the AI agent.

      1. get_video_info

      This tool fetches metadata for a specific YouTube video.

      • Input:
      • url (string): The full YouTube video URL.
      • Output: A JSON object containing the metadata.
      {
        "id": "VIDEO_ID",
        "title": "Video Title",
        "description": "Video description...",
        "view_count": 1000000,
        "duration": 212,
        "uploader": "Channel Name",
        "upload_date": "20091025",
        "thumbnail": "https://i.ytimg.com/...",
        "tags": ["tag1", "tag2"],
        "categories": ["Music"]
      }

      2. transcribe_video

      This tool downloads the audio (in memory), segments it, and generates a transcription.

      • Inputs:
      • url (string): The YouTube video URL.
      • language (string): Controls the transcription language.
        • "auto" (Default): Detects the language automatically.
        • "en": Translates the audio to English.
        • Specific Code (e.g., "fr", "es", "ja"): Transcribes in the specified language.
      • Output: A JSON object containing the video ID, title, duration, and a list of transcribed segments.
      {
        "id": "VIDEO_ID",
        "title": "Video Title",
        "duration": 212,
        "transcription": [
          {
            "from": "00:00:00",
            "to": "00:00:05",
            "transcription": "First segment text..."
          },
          {
            "from": "00:00:05",
            "to": "00:00:10",
            "transcription": "Second segment text..."
          }
        ]
      }

      FAQs

      Q: Does this server download the video file to my hard drive?
      A: No. The server streams the audio directly into RAM for processing. It writes only the final transcription JSON file to the transcriptions directory.

      Q: Can I use this without a GPU?
      A: Yes. The server works on CPU. However, transcription will be slower. You should stick to the "tiny" or "base" models in the configuration if you lack hardware acceleration.

      Q: How do I change the transcription accuracy?
      A: Edit src/youtube_mcp_server/config.py and change WHISPER_MODEL_NAME to "large" or "turbo". This requires more system resources but yields significantly better results.

      Q: What happens if I request a video that was already transcribed?
      A: The server checks the transcriptions directory first. It returns the cached JSON immediately if it exists.

      Latest MCP Servers

      CVE

      An MCP Server that connects Claude to 27 security tools for CVE triage, EPSS checks, KEV status, exploit lookup, and package scanning.

      WebMCP

      webmcp is an MCP server that connects MCP clients to web search, page fetching, and local LLM-based extraction. It’s ideal…

      Google Meta Ads GA4

      An MCP server that connects AI assistants to Google Ads, Meta Ads, and GA4 for reporting, edits, and cross-platform analysis.

      View More MCP Servers >>

      Featured MCP Servers

      Notion

      Notion's official MCP Server allows you to interact with Notion workspaces through the Notion API.

      Claude Peers

      An MCP server that enables Claude Code instances to discover each other and exchange messages instantly via a local broker daemon with SQLite persistence.

      Excalidraw

      Excalidraw's official MCP server that streams interactive hand-drawn diagrams to Claude, ChatGPT, and VS Code with smooth camera control and fullscreen editing.

      More Featured MCP Servers >>

      FAQs

      Q: What exactly is the Model Context Protocol (MCP)?

      A: MCP is an open standard, like a common language, that lets AI applications (clients) and external data sources or tools (servers) talk to each other. It helps AI models get the context (data, instructions, tools) they need from outside systems to give more accurate and relevant responses. Think of it as a universal adapter for AI connections.

      Q: How is MCP different from OpenAI's function calling or plugins?

      A: While OpenAI's tools allow models to use specific external functions, MCP is a broader, open standard. It covers not just tool use, but also providing structured data (Resources) and instruction templates (Prompts) as context. Being an open standard means it's not tied to one company's models or platform. OpenAI has even started adopting MCP in its Agents SDK.

      Q: Can I use MCP with frameworks like LangChain?

      A: Yes, MCP is designed to complement frameworks like LangChain or LlamaIndex. Instead of relying solely on custom connectors within these frameworks, you can use MCP as a standardized bridge to connect to various tools and data sources. There's potential for interoperability, like converting MCP tools into LangChain tools.

      Q: Why was MCP created? What problem does it solve?

      A: It was created because large language models often lack real-time information and connecting them to external data/tools required custom, complex integrations for each pair. MCP solves this by providing a standard way to connect, reducing development time, complexity, and cost, and enabling better interoperability between different AI models and tools.

      Q: Is MCP secure? What are the main risks?

      A: Security is a major consideration. While MCP includes principles like user consent and control, risks exist. These include potential server compromises leading to token theft, indirect prompt injection attacks, excessive permissions, context data leakage, session hijacking, and vulnerabilities in server implementations. Implementing robust security measures like OAuth 2.1, TLS, strict permissions, and monitoring is crucial.

      Q: Who is behind MCP?

      A: MCP was initially developed and open-sourced by Anthropic. However, it's an open standard with active contributions from the community, including companies like Microsoft and VMware Tanzu who maintain official SDKs.

      Get the latest & top AI tools sent directly to your email.

      Subscribe now to explore the latest & top AI tools and resources, all in one convenient newsletter. No spam, we promise!