Vesta AI Explorer for macOS - macOS Tahoe 26

🌐 vesta-mac.pages.dev

Feb 11, 2026 --> Latest app release --> https://github.com/scouzi1966/vesta-mac-dist/releases/tag/v0.9.5

For the "Just take me to the download" people --> https://github.com/scouzi1966/vesta-mac-dist/releases/download/v0.9.5/Vesta-0.9.5.dmg

Install with brew --> brew install --cask scouzi1966/afm/vesta-mac

Vesta is a multi-backend AI chat application for macOS that runs models locally on Apple Silicon. It supports five AI backends simultaneously -- Apple Intelligence, MLX, llama.cpp, HuggingFace Inference API, and OpenAI-compatible API servers -- with runtime switching, vision understanding, text-to-speech, speech-to-text, image generation, video generation, and a full MCP server for programmatic control.

NOTE: SIDEKICK ONLY WORKS WITH LOCALLY INSTALLED CLAUDE CODE AT THE MOMENT. SUPPORT FOR OTHER MODELS AS MCP CLIENTS COMING SOON

Demo

Click to watch the full demo video

Note that there is a known bug in Qwen3-VL models where it will loop indefinitely in some cases and never complete. In this case stop the generation and tweak your prompt. This is a model behavior, not the app. Reference: https://github.com/QwenLM/Qwen3-VL/issues/1611#issuecomment-3639175711

Vesta 0.9.5 (Feb 11, 2026)

94 commits, 2 months of development since v0.9.0

This is a major release that transforms Vesta from a two-backend chat app into a full multi-modal AI platform with five backends, MCP integration, and media generation capabilities. Explore the world of AI beyond ChatGPT!

🆕 New in 0.9.5

Multi-Backend Architecture

HuggingFace Explorer -- Leverage the HuggingFace Pro subscription and inference providers partners for ImageGen, Videogen, Image edit, Whipser etc. I am not affiliated with HuggingFace but the pro sub is worth it.
5 simultaneous backends -- Apple Intelligence, MLX, llama.cpp, HuggingFace Explorer, and External AI (OpenAI-compatible) all active at once (Single chat routing)
Runtime backend switching -- switch between backends without restarting the app
Per-backend settings -- each backend has its own generation parameters, model selection, and configuration
HuggingFace Browser -- Direct browse and download models from HuggingFace in-app

MCP Server (Model Context Protocol) - Agentic Sidekick!

Full MCP server running on TCP loopback with token-based authentication
Agentic Sidekick Vesta detects Claude Code when you enable MCP - Claude code acts as an agent with a NLI (Natural Language Interface) to the app. Ask Claude to set things up, have a conversation with any other model!
41+ tools -- backend management, chat, model download/load/unload, vision analysis, settings, conversation history search, diagnostics, UI navigation
6 resources -- app state, models, conversation, settings, logs, system info
7 prompts -- guides for Vesta, MLX, llama.cpp, HuggingFace, and common workflows
AI Sidekick -- Claude Code integration for programmatic Vesta control
Conversation history search -- full-text search and read-only SQL queries against the SQLite message database

HuggingFace Explorer (New Backend)

Cloud inference via 16+ providers (Cerebras, Groq, Together, Fireworks, SambaNova, Nebius, Replicate, and more)
Text-to-image generation -- FLUX.1 Schnell/Dev, FLUX.2, Stable Diffusion 3.x, SDXL with configurable size, guidance, and steps
Image editing -- instruction-based editing with FLUX Kontext and FLUX.2 via Replicate
Video generation -- Wan 2.2 T2V (text-to-video) with async polling
Speech-to-text transcription -- OpenAI Whisper models via HuggingFace Inference API with 14 languages
Vision/VLM -- Qwen2.5-VL, Qwen3-VL and other vision-language models via cloud
Model browser -- search and discover models from HuggingFace Hub with download counts, likes, and gated model detection
Reasoning display -- chain-of-thought rendering for models that emit <think> tags (DeepSeek R1, QwQ, etc.)
7-tab settings panel -- Chat, Vision, Image, Edit, Transcribe, Video, Settings

Text-to-Speech (TTS)

Kokoro (82M) -- 46+ voices across 10+ languages, fast and high-quality
Marvis (100M/250M) -- conversational TTS with voice cloning support via reference audio
Models download on first use from HuggingFace

Speech-to-Text (STT) -- WhisperKit

On-device transcription via WhisperKit CoreML -- zero network required
6 model sizes -- Tiny (39M) through Large V3 (1.5B) and Large V3 Turbo (809M)
28+ languages with auto-detect
Per-segment timing and speed ratio reporting

Jinja Template Support (minja)

Full Jinja2 template parsing for GGUF models via llama.cpp's minja library
Correctly renders chat templates embedded in model metadata
Falls back to llama_chat_apply_template() for non-Jinja templates

GGUF Model Browser

Search and browse GGUF models from HuggingFace Hub
Capability badges: Vision, Tool Use, Reasoning, Coding, Math, Multilingual
Split/multipart file detection
Automatic mmproj detection for vision models

MLX Improvements

Qwen3-VL M-RoPE patch -- +81% performance improvement for vision inference (auto-applied via script)
KV cache controls exposed -- max KV size, quantization bits, prefill step size
MLX benchmark tool (mlx-bench) for standalone performance testing
Wired memory set to 90% of GPU recommended working set for large model performance

🔧 Improvements

Per-message metrics -- token count and tokens/sec stored with each message in the database
Green parameter labels -- visual indicator when a generation parameter matches the model's configured default
Binary voice format -- Kokoro voice files converted from JSON to binary (144 MB down to 27 MB)
Conversation history view with backend filtering and pagination
One-command build -- build-from-scratch.sh handles submodules, patching, llama.cpp library build, and Xcode build
Distribution pipeline -- automated DMG creation, notarization, and GitHub release via build-vesta-mac-dist.sh
Automated testing framework -- 44+ UI tests via MCP-based test runner

🐛 Bug Fixes

Fix AVKit VideoPlayer crash during SwiftUI transitions (disabled transition animations)
Fix download progress stuck at 0% and crash in llama.cpp streaming
Fix TTS mode hijacking text generation when both TTS and LLM models are loaded
Fix O(N^2) reasoning parser performance
Fix reasoning parser stripMarkers bug and chat history contamination
Fix streaming throttle not kicking in when content scrolls off-screen
Fix llama.cpp default context size (2048 changed to 16384 for Qwen3-VL)
Fix Continuity Camera Swift 6 concurrency crash in Release builds (Objective-C workaround)
Fix NSHostingView constraint crash in MLX settings window (non-observing wrapper)
Fix MoE warmup crash for models with 32+ experts (reduced warmup batch size)
Fix mxfp4 MoE Metal shader crash (skip warmup for mxfp4 models)
Fix ESpeakNG unsealed contents causing notarization failure
Fix GGUF vision model image handling and model deduplication

Vesta 0.9.0 (Dec 10, 2025)

🆕 New in 0.9.0

Vision capabilities with Qwen3-VL model (describe images, analyze screenshots)
Continuity Camera input (capture photos directly from iPhone/iPad)
Code syntax highlighting for 20+ programming languages
Edit responses inline after generation
HTML preview for rendered content
Enhanced LaTeX math rendering in blockquotes
Improved rendering engine with real-time code block highlighting

Features

Apple Intelligence -- on-device AI via Foundation Models framework (always available)
MLX Backend -- Apple Silicon optimized inference with mlx-swift (Qwen3-VL vision models)
llama.cpp Backend -- GGUF model support with full Metal GPU acceleration and Jinja templates
HuggingFace Explorer -- cloud inference, image/video generation, transcription via 16+ providers
External AI -- connect to any OpenAI-compatible API server (LM Studio, Ollama, etc.)
Vision -- image understanding via Qwen3-VL (MLX, llama.cpp, or HuggingFace)
Text-to-Speech -- Kokoro, Marvis (with voice cloning), and Orpheus TTS engines
Speech-to-Text -- WhisperKit on-device transcription (Tiny through Large V3)
MCP Server -- 41+ tools for programmatic control, model management, and AI Sidekick integration
GitHub Flavored Markdown -- tables, task lists, strikethrough via remark/rehype pipeline
LaTeX Math -- inline and block math rendering with KaTeX
Code Highlighting -- 20+ languages with real-time streaming highlight
Liquid Glass UI -- native macOS Tahoe design
App Sandbox -- Developer ID signed and Apple notarized

📋 Installation

Download the DMG from the latest release
Open the DMG and drag Vesta to Applications
Launch Vesta from Applications
First Run: right-click and select "Open" if prompted by Gatekeeper

📱 Requirements

macOS 26.0 (Tahoe) or later
Apple Silicon Mac (M1/M2/M3/M4)
Microphone access for voice input and STT
Internet access for HuggingFace backend and model downloads (on-device backends work offline after model download)

🔒 Security & Privacy

Signed with Developer ID Application: Soprano Technologies Inc.
Notarized by Apple
App Sandbox enabled
On-device backends (Apple Intelligence, MLX, llama.cpp) process everything locally -- no data sent to servers
HuggingFace and External AI backends require network access for inference
API tokens stored in macOS Keychain

🔗 Related

Source Code: https://github.com/scouzi1966/vesta-mac
Distribution: https://github.com/scouzi1966/vesta-mac-dist (this repo)
CLI Alternative: https://github.com/scouzi1966/maclocal-api

💬 Support

📄 License

🏗 Built With

Apple Intelligence -- Foundation Models framework
MLX -- mlx-swift + mlx-swift-lm for Apple Silicon inference
llama.cpp -- GGUF inference with Metal acceleration
WhisperKit -- CoreML-based Whisper speech-to-text
mlx-audio -- Kokoro/Marvis/Orpheus TTS
SwiftUI -- native macOS interface
KaTeX -- math rendering
highlight.js -- code syntax highlighting
remark/rehype -- markdown processing pipeline

Built with automated distribution pipeline | Notarized and code-signed | Apple Silicon native

vesta-mac-dist