MCP Hub
Back to servers

voiceblitz-mcp

Ultra-fast desktop voice assistant. Pocket TTS (~200ms) + Parakeet ASR + Smart Turn VAD—knows when you're done talking, not just silent. Modular architecture: swap any component. MCP server for Claude Code. Works with LM Studio, OpenRouter, OpenAI. Custom skills + voice cloning. 100% local.

glama
Updated
Jan 25, 2026

LocalVoiceMode

Local voice interface with Character Skills - Self-contained voice chat system.

Uses Parakeet TDT 0.6B (NVIDIA) for fast GPU speech recognition, Pocket TTS (Kyutai) for natural text-to-speech. Auto-detects LM Studio, OpenRouter, or OpenAI as the LLM backend.

Features

  • Parakeet TDT ASR - NVIDIA's fast speech recognition (GPU accelerated via ONNX)
  • Pocket TTS - Kyutai's natural-sounding text-to-speech with voice cloning
  • Smart Turn Detection - Knows when you're done speaking, not just detecting silence
  • Auto-Provider Detection - Automatically finds LM Studio, or falls back to OpenRouter/OpenAI
  • Modern Rich UI - Beautiful terminal interface with audio visualization
  • Character Skills - Load different personalities with custom voices
  • MCP Integration - Works with Claude Code and other MCP-enabled tools

Quick Start

1. Clone and Setup

git clone https://github.com/your-username/localvoicemode.git
cd localvoicemode
setup.bat

This creates a virtual environment and installs all dependencies.

2. HuggingFace Login (Required)

Pocket TTS requires accepting the model license:

.venv\Scripts\huggingface-cli.exe login

Then accept the license at: https://huggingface.co/kyutai/pocket-tts

3. Configure LLM Provider

Option A: LM Studio (Recommended for local)

  1. Open LM Studio
  2. Load your preferred model
  3. Start the local server (default: http://localhost:1234)

Option B: OpenRouter

set OPENROUTER_API_KEY=your-key-here

Get your key at: https://openrouter.ai/keys

Option C: OpenAI

set OPENAI_API_KEY=your-key-here

4. Run Voice Chat

REM Default assistant
VoiceChat.bat

REM With Hermione character
VoiceChat.bat hermione

REM Push-to-talk mode
VoiceChat.bat hermione ptt

Provider Detection

LocalVoiceMode automatically detects available providers in this order:

  1. LM Studio - Scans ports 1234, 1235, 1236, 8080, 5000
  2. OpenRouter - Uses OPENROUTER_API_KEY environment variable
  3. OpenAI - Uses OPENAI_API_KEY environment variable

Force a specific provider with VOICE_PROVIDER=openrouter (or lm_studio, openai).

Directory Structure

localvoicemode/
├── voice_client.py        # Main voice client entry point
├── mcp_server.py          # MCP server for AI assistant integration
├── requirements.txt       # Python dependencies
├── setup.bat              # Setup script (run first!)
├── VoiceChat.bat          # Launch script
├── start_voicemode.bat    # MCP server launcher
│
├── src/localvoicemode/    # Core package
│   ├── audio/             # Audio recording
│   ├── speech/            # ASR, TTS, VAD, filters
│   ├── llm/               # Provider management
│   ├── skills/            # Skill loading
│   └── state/             # State machines, config
│
├── skills/                # Character skills
│   ├── assistant-default/ # Default assistant
│   └── hermione-companion/
│       ├── SKILL.md       # Character definition
│       ├── references/    # Lore files
│       └── scripts/       # Helper scripts
│
└── voice_references/      # Custom voice files (.wav)

Skills System

Skills define character personalities, system prompts, and optional knowledge.

List Available Skills

.venv\Scripts\python.exe voice_client.py --list-skills

Create a New Skill

  1. Create directory: skills/my-skill/
  2. Create SKILL.md:
---
id: my-skill
name: My Character
display_name: "My Character"
description: Brief description
metadata:
  greeting: "Hello! How can I help?"
---

# My Character

## System Prompt

You are My Character. [Full instructions here...]
  1. Add optional files:
    • reference.wav - Voice clone source (10s of clear speech)
    • avatar.png - Character image
    • references/ - Knowledge markdown files

Voice Cloning

Pocket TTS supports voice cloning from reference audio.

Requirements:

  • WAV format (16-bit PCM)
  • ~10 seconds of clean speech
  • Clear recording, minimal background noise

Place the file at:

  • skills/my-skill/reference.wav (per-skill), or
  • voice_references/my-skill.wav (global)

Voice Modes

VAD Mode (default)

Voice Activity Detection with Smart Turn - automatically detects when you're done speaking.

VoiceChat.bat hermione

PTT Mode

Push-to-Talk - hold Space to record, release to send.

VoiceChat.bat hermione ptt

Configuration

Environment Variables

VariableDefaultDescription
VOICE_API_URLhttp://localhost:1234/v1OpenAI-compatible API URL
VOICE_API_KEY(none)API key for the provider
VOICE_MODEL(auto)Model name to use
VOICE_PROVIDER(auto)Force provider: lm_studio, openrouter, openai
OPENROUTER_API_KEY(none)OpenRouter API key
OPENAI_API_KEY(none)OpenAI API key
VOICE_TTS_VOICEalbaDefault TTS voice
VOICE_DEVICEcudaASR device: cuda (GPU) or cpu
VOICE_SMART_TURN_THRESHOLD0.5Turn completion threshold (0.0-1.0)

Command Line Options

python voice_client.py [options]

Options:
  --skill, -s SKILL      Load a character skill
  --list-skills, -l      List available skills
  --list-providers       List available LLM providers
  --provider, -p PROV    Force provider: lm_studio, openrouter, openai
  --mode, -m MODE        Input mode: vad, ptt, or type
  --device DEVICE        ASR device: cuda or cpu
  --api-url URL          OpenAI-compatible API URL
  --api-key KEY          API key for the provider
  --model MODEL          Model name to use
  --headless             Run without UI (for MCP integration)

MCP Integration

LocalVoiceMode includes an MCP server for integration with Claude Code and other MCP-enabled tools.

Start MCP Server

start_voicemode.bat

Available Tools

  • speak(text) - Speak text aloud (TTS)
  • listen() - Listen for speech (STT)
  • converse(text) - Speak and listen for response
  • start_voice(skill) - Start voice chat with a character
  • stop_voice() - Stop voice chat
  • voice_status() - Check if voice mode is running
  • list_voices() - List available characters
  • provider_status() - Show available providers
  • set_speech_mode(mode) - Set verbosity: roleplay, coder, minimal, silent
  • get_speech_mode() - Get current speech mode

Slash Commands

These slash commands are available in Claude Code and compatible AI assistants:

CommandDescription
/speak <text>TTS only - speak text aloud
/listenSTT only - transcribe speech to text
/tts-onlyMode: Claude speaks, you type
/stt-onlyMode: You speak, Claude responds in text
/voice-roleplayFull expressive speech output
/voice-coderSummaries & completions only
/voiceSpeak one message via voice
/voice-onStart continuous voice mode
/voice-offStop voice mode
/voice-typingYou type, Claude speaks (hold RIGHT SHIFT to speak)

Speech Modes

Control how much Claude speaks:

ModeDescription
roleplayFull expressive output - speaks everything naturally (default)
coderSummaries only - task completions, errors, questions
minimalVery terse - only critical announcements
silentNo speech - text only

Switch modes with /voice-roleplay, /voice-coder, or the set_speech_mode() tool.

Voice Commands While Running

  • Say "stop" or "goodbye" to end
  • Say "change voice" to switch characters

GPU Support

Parakeet TDT uses ONNX Runtime with GPU acceleration:

  1. TensorRT (best performance) - Auto-detected if installed
  2. CUDA (good performance) - Requires CUDA/cuDNN
  3. CPU (fallback) - Always available

Check GPU status:

.venv\Scripts\python.exe -c "import onnxruntime as ort; print(ort.get_available_providers())"

Troubleshooting

No audio detected

  • Check microphone permissions
  • Verify default audio device: python -c "import sounddevice; print(sounddevice.query_devices())"

Pocket TTS not working

LM Studio connection failed

  • Verify LM Studio server is running
  • Check URL: default is http://localhost:1234
  • Ensure a model is loaded

OpenRouter/OpenAI not working

  • Verify API key is set in .env or environment
  • Check python voice_client.py --list-providers to see detected providers

GPU/CUDA not working

  • Ensure NVIDIA drivers are installed
  • Install CUDA Toolkit 12.x
  • Reinstall: pip uninstall onnxruntime onnxruntime-gpu && pip install onnxruntime-gpu[cuda,cudnn]

Credits

Reviews

No reviews yet

Sign in to write a review