MCP Server

Gobble

An MCP server for managing a podcast-based knowledge base. Download, transcribe, and semantically search long-form media — then feed it to any agent that speaks Model Context Protocol.

View Source How it works

Tech stack

Python

MCP

Parakeet

yt-dlp

Vector DB

SSE

~45s

to transcribe 2hrs of audio

100%

local, GPU inference

SSE + stdio

MCP transports

cloud dependencies

What it does

Blazing transcription

yt-dlp pulls the media, NVIDIA Parakeet transcribes ~2 hours of audio in roughly 45 seconds on a local GPU — no cloud, no per-minute bills.

Semantic search

Transcripts are chunked, embedded, and indexed so you can ask natural-language questions and pull the exact moment an episode covers a topic.

Knowledge base

A unified store across podcasts, YouTube videos, and ebooks. Load retrieved context straight into any MCP-aware chatbot (Goose, Cline, etc.).

Local-first

Runs entirely on your hardware. The only thing that leaves the machine is whatever you choose to send to a model you control.

How it works

Acquire

yt-dlp downloads audio/video from a URL or RSS feed.

→

Transcribe

Parakeet runs GPU inference to produce a timestamped transcript.

→

Index

Transcript is split, embedded, and written to a local vector store.

→

Serve

An MCP server (SSE or stdio) exposes search + retrieval tools to your agent.

Get running

bash

# install dependencies
uv sync

# run as an MCP server (SSE on port 8000)
uv run mcp_server.py

# or over stdio for Goose / Cline
uv run mcp_server.py --transport stdio

Requires a CUDA-capable GPU for local transcription. Full instructions live in the README.