MCP Server

Gobble

An MCP server for managing a podcast-based knowledge base. Download, transcribe, and semantically search long-form media — then feed it to any agent that speaks Model Context Protocol.

Gobble logo
Tech stack
Python
MCP
Parakeet
yt-dlp
Vector DB
SSE
~45s
to transcribe 2hrs of audio
100%
local, GPU inference
SSE + stdio
MCP transports
0
cloud dependencies

What it does

Blazing transcription

yt-dlp pulls the media, NVIDIA Parakeet transcribes ~2 hours of audio in roughly 45 seconds on a local GPU — no cloud, no per-minute bills.

Semantic search

Transcripts are chunked, embedded, and indexed so you can ask natural-language questions and pull the exact moment an episode covers a topic.

Knowledge base

A unified store across podcasts, YouTube videos, and ebooks. Load retrieved context straight into any MCP-aware chatbot (Goose, Cline, etc.).

Local-first

Runs entirely on your hardware. The only thing that leaves the machine is whatever you choose to send to a model you control.

How it works

01
Acquire
yt-dlp downloads audio/video from a URL or RSS feed.
02
Transcribe
Parakeet runs GPU inference to produce a timestamped transcript.
03
Index
Transcript is split, embedded, and written to a local vector store.
04
Serve
An MCP server (SSE or stdio) exposes search + retrieval tools to your agent.

Get running

bash
# install dependencies
uv sync

# run as an MCP server (SSE on port 8000)
uv run mcp_server.py

# or over stdio for Goose / Cline
uv run mcp_server.py --transport stdio

Requires a CUDA-capable GPU for local transcription. Full instructions live in the README.