YouTube MCP Server
The YouTube MCP Server connects your AI agents directly to YouTube content.
It enables Large Language Models (LLMs) to extract metadata and generate high-fidelity transcriptions from videos.
Features
- 🧠 In-Memory Processing: The server processes audio streams directly in RAM.
- 📝 Smart Transcription: It utilizes OpenAI’s Whisper models to generate accurate text from audio.
- 🗣️ Voice Activity Detection (VAD): Silero VAD segments audio precisely.
- ⚡ Hardware Acceleration: The system supports CUDA (NVIDIA) and MPS (Apple Silicon) to speed up inference.
- 🌐 Multi-language Support: It supports transcription for 99 languages and includes translation capabilities.
- 📊 Metadata Extraction: You can retrieve full video details, including views, duration, tags, and descriptions, without downloading the video file.
- 💾 Intelligent Caching: The server caches transcription results to JSON files.
- 🚀 Parallel Processing: It employs concurrent threads to transcribe multiple audio segments simultaneously.
Use Cases
- RAG Pipelines for Video Content: AI agents can retrieve specific technical instructions from tutorial videos to answer user queries accurately.
- Automated Content Summarization: Content creators can generate summaries, show notes, or blog posts from raw video footage automatically.
- Multilingual Content Analysis: Researchers can transcribe and translate foreign language news or interviews to English for sentiment analysis.
- Metadata Indexing: Developers can build searchable databases of video libraries by extracting tags, categories, and descriptions programmatically.
How to Use It
Prerequisites
Python 3.10+: The server requires a modern Python environment.
ffmpeg: This is critical for audio processing.
- macOS:
brew install ffmpeg - Linux:
sudo apt install ffmpeg - Windows: Download the executable and add it to your system PATH.
Installation
1. Clone the Repository
git clone https://github.com/mourad-ghafiri/youtube-mcp-server
cd youtube-mcp-server
2. Install Dependencies
bash uv syncConfiguration
You can customize the server’s behavior by modifying the configuration file located at src/youtube_mcp_server/config.py.
Adjustable Parameters:
TRANSCRIPTIONS_DIR- Description: Defines where the server saves the cached transcription JSON files.
- Default:
"transcriptions"
WHISPER_MODEL_NAME- Description: Selects the specific OpenAI Whisper model size.
- Options:
"tiny","base","small","medium","large","turbo". - Default:
"tiny" - Performance Note: Larger models (like “large” or “turbo”) provide better accuracy but consume significantly more RAM and require a GPU.
SILERO_REPO/SILERO_MODEL- Description: Specifies the repository and ID for the Voice Activity Detection model.
SAMPLING_RATE- Description: Sets the audio sampling rate in Hz for both Whisper and VAD.
- Default:
16000
SEGMENT_PADDING_MS- Description: Adds time (in milliseconds) to the beginning and end of each audio segment. This prevents words from being cut off at the boundaries.
- Default:
200
MAX_WORKERS- Description: Controls the number of parallel threads used for transcribing segments.
- Default:
4 - Performance Note: Higher values increase speed but increase CPU and Memory usage.
Starting the Server
Run the main script using uv to start the server.
uv run main.pyThe server will initialize and listen for connections using Server-Sent Events (SSE) transport at http://127.0.0.1:8000/sse.
MCP Client Configuration
You must add the server details to your MCP client configuration file (usually claude_desktop_config.json or similar) to enable communication.
{
"mcpServers": {
"youtube": {
"url": "http://127.0.0.1:8000/sse"
}
}
}Tools Reference
The server exposes two primary tools to the AI agent.
1. get_video_info
This tool fetches metadata for a specific YouTube video.
- Input:
url(string): The full YouTube video URL.- Output: A JSON object containing the metadata.
{
"id": "VIDEO_ID",
"title": "Video Title",
"description": "Video description...",
"view_count": 1000000,
"duration": 212,
"uploader": "Channel Name",
"upload_date": "20091025",
"thumbnail": "https://i.ytimg.com/...",
"tags": ["tag1", "tag2"],
"categories": ["Music"]
}2. transcribe_video
This tool downloads the audio (in memory), segments it, and generates a transcription.
- Inputs:
url(string): The YouTube video URL.language(string): Controls the transcription language."auto"(Default): Detects the language automatically."en": Translates the audio to English.- Specific Code (e.g.,
"fr","es","ja"): Transcribes in the specified language.
- Output: A JSON object containing the video ID, title, duration, and a list of transcribed segments.
{
"id": "VIDEO_ID",
"title": "Video Title",
"duration": 212,
"transcription": [
{
"from": "00:00:00",
"to": "00:00:05",
"transcription": "First segment text..."
},
{
"from": "00:00:05",
"to": "00:00:10",
"transcription": "Second segment text..."
}
]
}FAQs
Q: Does this server download the video file to my hard drive?
A: No. The server streams the audio directly into RAM for processing. It writes only the final transcription JSON file to the transcriptions directory.
Q: Can I use this without a GPU?
A: Yes. The server works on CPU. However, transcription will be slower. You should stick to the "tiny" or "base" models in the configuration if you lack hardware acceleration.
Q: How do I change the transcription accuracy?
A: Edit src/youtube_mcp_server/config.py and change WHISPER_MODEL_NAME to "large" or "turbo". This requires more system resources but yields significantly better results.
Q: What happens if I request a video that was already transcribed?
A: The server checks the transcriptions directory first. It returns the cached JSON immediately if it exists.
Latest MCP Servers
CVE
WebMCP
webmcp is an MCP server that connects MCP clients to web search, page fetching, and local LLM-based extraction. It’s ideal…
Google Meta Ads GA4
Featured MCP Servers
Notion
Claude Peers
Excalidraw
FAQs
Q: What exactly is the Model Context Protocol (MCP)?
A: MCP is an open standard, like a common language, that lets AI applications (clients) and external data sources or tools (servers) talk to each other. It helps AI models get the context (data, instructions, tools) they need from outside systems to give more accurate and relevant responses. Think of it as a universal adapter for AI connections.
Q: How is MCP different from OpenAI's function calling or plugins?
A: While OpenAI's tools allow models to use specific external functions, MCP is a broader, open standard. It covers not just tool use, but also providing structured data (Resources) and instruction templates (Prompts) as context. Being an open standard means it's not tied to one company's models or platform. OpenAI has even started adopting MCP in its Agents SDK.
Q: Can I use MCP with frameworks like LangChain?
A: Yes, MCP is designed to complement frameworks like LangChain or LlamaIndex. Instead of relying solely on custom connectors within these frameworks, you can use MCP as a standardized bridge to connect to various tools and data sources. There's potential for interoperability, like converting MCP tools into LangChain tools.
Q: Why was MCP created? What problem does it solve?
A: It was created because large language models often lack real-time information and connecting them to external data/tools required custom, complex integrations for each pair. MCP solves this by providing a standard way to connect, reducing development time, complexity, and cost, and enabling better interoperability between different AI models and tools.
Q: Is MCP secure? What are the main risks?
A: Security is a major consideration. While MCP includes principles like user consent and control, risks exist. These include potential server compromises leading to token theft, indirect prompt injection attacks, excessive permissions, context data leakage, session hijacking, and vulnerabilities in server implementations. Implementing robust security measures like OAuth 2.1, TLS, strict permissions, and monitoring is crucial.
Q: Who is behind MCP?
A: MCP was initially developed and open-sourced by Anthropic. However, it's an open standard with active contributions from the community, including companies like Microsoft and VMware Tanzu who maintain official SDKs.



