Osaurus

Screenshot 2025-12-29 at 11 14 51 AM

Native macOS LLM server with MCP support. Run local and remote language models on Apple Silicon with OpenAI & Anthropic compatible APIs, tool calling, and a built-in plugin ecosystem.

Created by Dinoki Labs (dinoki.ai)

Documentation · Discord · Plugin Registry · Contributing

Install

brew install --cask osaurus

Or download from Releases.

After installing, launch from Spotlight (⌘ Space → "osaurus") or run osaurus ui from the terminal.

What is Osaurus?

Osaurus is an all-in-one LLM server for macOS. It combines:

MLX Runtime — Optimized local inference for Apple Silicon using MLX
Remote Providers — Connect to Anthropic, OpenAI, OpenRouter, Ollama, LM Studio, or any compatible API
OpenAI, Anthropic & Ollama APIs — Drop-in compatible endpoints for existing tools
MCP Server — Expose tools to AI agents via Model Context Protocol
Remote MCP Providers — Connect to external MCP servers and aggregate their tools
Plugin System — Extend functionality with community and custom tools
Personas — Create custom AI assistants with unique prompts, tools, and visual themes
Multi-Window Chat — Multiple independent chat windows with per-window personas
Developer Tools — Built-in insights and server explorer for debugging
Voice Input — Speech-to-text using WhisperKit with real-time on-device transcription
VAD Mode — Always-on listening with wake-word activation for hands-free persona access
Transcription Mode — Global hotkey to transcribe speech directly into any app
Apple Foundation Models — Use the system model on macOS 26+ (Tahoe)

Highlights

Feature	Description
Local LLM Server	Run Llama, Qwen, Gemma, Mistral, and more locally
Remote Providers	Anthropic, OpenAI, OpenRouter, Ollama, LM Studio, or custom
OpenAI Compatible	`/v1/chat/completions` with streaming and tool calling
Anthropic Compatible	`/messages` endpoint for Claude Code and Anthropic SDK clients
MCP Server	Connect to Cursor, Claude Desktop, and other MCP clients
Remote MCP Providers	Aggregate tools from external MCP servers
Tools & Plugins	Browser automation, file system, git, web search, and more
Personas	Custom AI assistants with unique prompts, tools, and themes
Custom Themes	Create, import, and export themes with full color customization
Developer Tools	Request insights, API explorer, and live endpoint testing
Multi-Window Chat	Multiple independent chat windows with per-window personas
Menu Bar Chat	Chat overlay with session history, context tracking (`⌘;`)
Voice Input	Speech-to-text with WhisperKit, real-time transcription
VAD Mode	Always-on listening with wake-word persona activation
Transcription Mode	Global hotkey to dictate into any focused text field
Model Manager	Download and manage models from Hugging Face

Quick Start

1. Start the Server

Launch Osaurus from Spotlight or run:

osaurus serve

The server starts on port 1337 by default.

2. Connect an MCP Client

Add to your MCP client configuration (e.g., Cursor, Claude Desktop):

{
  "mcpServers": {
    "osaurus": {
      "command": "osaurus",
      "args": ["mcp"]
    }
  }
}

3. Add a Remote Provider (Optional)

Open the Management window (⌘ Shift M) → Providers → Add Provider.

Choose from presets (OpenAI, Ollama, LM Studio, OpenRouter) or configure a custom endpoint.

Key Features

Local Models (MLX)

Run models locally with optimized Apple Silicon inference:

# Download a model
osaurus run llama-3.2-3b-instruct-4bit

# Use via API
curl http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.2-3b-instruct-4bit", "messages": [{"role": "user", "content": "Hello!"}]}'

Remote Providers

Connect to remote APIs to access cloud models alongside local ones.

Supported presets:

Anthropic — Claude models with native API support
OpenAI — GPT-4o, o1, and other OpenAI models
OpenRouter — Access multiple providers through one API
Ollama — Connect to a local or remote Ollama instance
LM Studio — Use LM Studio as a backend
Custom — Any OpenAI-compatible endpoint

Features:

Secure API key storage (macOS Keychain)
Custom headers for authentication
Auto-connect on launch
Connection health monitoring

See Remote Providers Guide for details.

MCP Server

Osaurus is a full MCP (Model Context Protocol) server. Connect it to any MCP client to give AI agents access to your installed tools.

Endpoint	Description
`GET /mcp/health`	Check MCP availability
`GET /mcp/tools`	List active tools
`POST /mcp/call`	Execute a tool

Remote MCP Providers

Connect to external MCP servers and aggregate their tools into Osaurus:

Discover and register tools from remote MCP endpoints
Configurable timeouts and streaming
Tools are namespaced by provider (e.g., provider_toolname)
Secure token storage

See Remote MCP Providers Guide for details.

Tools & Plugins

Install tools from the central registry or create your own.

Official System Tools:

Plugin	Tools
`osaurus.filesystem`	`read_file`, `write_file`, `list_directory`, `search_files`, and more
`osaurus.browser`	`browser_navigate`, `browser_click`, `browser_type`, `browser_screenshot`
`osaurus.git`	`git_status`, `git_log`, `git_diff`, `git_branch`
`osaurus.search`	`search`, `search_news`, `search_images` (DuckDuckGo)
`osaurus.fetch`	`fetch`, `fetch_json`, `fetch_html`, `download`
`osaurus.time`	`current_time`, `format_date`

# Install from registry
osaurus tools install osaurus.browser

# List installed tools
osaurus tools list

# Create your own plugin
osaurus tools create MyPlugin --language swift

See the Plugin Authoring Guide for details.

Personas

Create custom AI assistant personalities with unique behaviors, capabilities, and styles.

Each persona can have:

Custom System Prompt — Define unique instructions and personality
Tool Configuration — Enable or disable specific tools per persona
Visual Theme — Assign a custom theme that activates with the persona
Model & Generation Settings — Set default model, temperature, and max tokens
Import/Export — Share personas as JSON files

Use cases:

Code Assistant — Focused on programming with code-related tools enabled
Daily Planner — Calendar and reminders integration
Research Helper — Web search and note-taking tools enabled
Creative Writer — Higher temperature, no tool access for pure generation

Access via Management window (⌘ Shift M) → Personas.

Multi-Window Chat

Work with multiple independent chat windows, each with its own persona and session.

Features:

Independent Windows — Each window maintains its own persona, theme, and session
File → New Window — Open additional chat windows (⌘ N)
Persona per Window — Different personas in different windows simultaneously
Open in New Window — Right-click any session in history to open in a new window
Pin to Top — Keep specific windows floating above others
Cascading Windows — New windows are offset so they're always visible

Use Cases:

Run multiple AI personas side-by-side (e.g., "Code Assistant" and "Creative Writer")
Compare responses from different personas
Keep reference conversations open while starting new ones
Organize work by project with dedicated windows

Developer Tools

Built-in tools for debugging and development:

Insights — Monitor all API requests in real-time:

Request/response logging with full payloads
Filter by method (GET/POST) and source (Chat UI/HTTP API)
Performance stats: success rate, average latency, errors
Inference metrics: tokens, speed (tok/s), model used

Server Explorer — Interactive API reference:

Live server status and health
Browse all available endpoints
Test endpoints directly with editable payloads
View formatted responses

Access via Management window (⌘ Shift M) → Insights or Server.

See Developer Tools Guide for details.

Voice Input

Speech-to-text powered by WhisperKit — fully local, private, on-device transcription.

Features:

Real-time transcription — See your words as you speak
Multiple Whisper models — From Tiny (75 MB) to Large V3 (3 GB)
Microphone or system audio — Transcribe your voice or computer audio
Configurable sensitivity — Adjust for quiet or noisy environments
Auto-send with confirmation — Hands-free message sending

VAD Mode (Voice Activity Detection):

Activate personas hands-free by saying their name or a custom wake phrase.

Say a persona's name (e.g., "Hey Code Assistant") to open chat
Automatic voice input starts after activation
Status indicators: Blue pulsing dot on menu bar icon when listening, toggle button in popover
Configurable silence timeout and auto-close

Transcription Mode:

Dictate text directly into any application using a global hotkey.

Global Hotkey — Trigger transcription from anywhere on your Mac
Live Typing — Text is typed into the currently focused text field in real-time
Accessibility Integration — Uses macOS accessibility APIs to simulate keyboard input
Minimal Overlay — Sleek floating UI shows recording status
Press Esc or Done — Stop transcription when finished

Perfect for dictating emails, documents, code comments, or any text input without switching apps.

Setup:

Open Management window (⌘ Shift M) → Voice
Grant microphone permission
Download a Whisper model
For Transcription Mode: Grant accessibility permission and configure the hotkey in the Transcription tab
Test your voice input

See Voice Input Guide for details.

CLI Reference

Command	Description
`osaurus serve`	Start the server (default port 1337)
`osaurus serve --expose`	Start exposed on LAN
`osaurus stop`	Stop the server
`osaurus status`	Check server status
`osaurus ui`	Open the menu bar UI
`osaurus list`	List downloaded models
`osaurus run <model>`	Interactive chat with a model
`osaurus mcp`	Start MCP stdio transport
`osaurus tools <cmd>`	Manage plugins (install, list, search, etc.)

Tip: Set OSU_PORT to override the default port.

API Endpoints

Base URL: http://127.0.0.1:1337 (or your configured port)

Endpoint	Description
`GET /health`	Server health
`GET /v1/models`	List models (OpenAI format)
`GET /v1/tags`	List models (Ollama format)
`POST /v1/chat/completions`	Chat completions (OpenAI format)
`POST /messages`	Chat completions (Anthropic format)
`POST /chat`	Chat (Ollama format, NDJSON)

All endpoints support /v1, /api, and /v1/api prefixes.

See the OpenAI API Guide for tool calling, streaming, and SDK examples.

Use with OpenAI SDKs

Point any OpenAI-compatible client at Osaurus:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:1337/v1", api_key="osaurus")

response = client.chat.completions.create(
    model="llama-3.2-3b-instruct-4bit",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Requirements

macOS 15.5+ (Apple Foundation Models require macOS 26)
Apple Silicon (M1 or newer)
Xcode 16.4+ (to build from source)

Models are stored at ~/MLXModels by default. Override with OSU_MODELS_DIR.

Whisper models are stored at ~/.osaurus/whisper-models.

Build from Source

git clone https://github.com/dinoki-ai/osaurus.git
cd osaurus
open osaurus.xcworkspace
# Build and run the "osaurus" target

Contributing

We're looking for contributors! Osaurus is actively developed and we welcome help in many areas:

Bug fixes and performance improvements
New plugins and tool integrations
Documentation and tutorials
UI/UX enhancements
Testing and issue triage

Get Started

Check out Good First Issues
Read the Contributing Guide
Join our Discord to connect with the team

See docs/FEATURES.md for a complete feature inventory and architecture overview.

Community

Documentation — Guides and tutorials
Discord — Chat with the community
Plugin Registry — Browse and contribute tools
Contributing Guide — How to contribute

If you find Osaurus useful, please star the repo and share it!

osaurus