MCP Hub
Back to servers

osaurus

A native macOS LLM server and MCP aggregator that runs local MLX models and connects to remote providers while exposing a suite of system tools for filesystem, browsing, and git operations.

Stars
2,996
Forks
121
Tools
21
Updated
Jan 8, 2026
Validated
Jan 9, 2026

Osaurus

Release Downloads License Stars Platform OpenAI API Anthropic API Ollama API MCP Server Foundation Models PRs Welcome

Screenshot 2025-12-29 at 11 14 51 AM

Native macOS LLM server with MCP support. Run local and remote language models on Apple Silicon with OpenAI & Anthropic compatible APIs, tool calling, and a built-in plugin ecosystem.

Created by Dinoki Labs (dinoki.ai)

Documentation · Discord · Plugin Registry · Contributing


Install

brew install --cask osaurus

Or download from Releases.

After installing, launch from Spotlight (⌘ Space → "osaurus") or run osaurus ui from the terminal.


What is Osaurus?

Osaurus is an all-in-one LLM server for macOS. It combines:

  • MLX Runtime — Optimized local inference for Apple Silicon using MLX
  • Remote Providers — Connect to Anthropic, OpenAI, OpenRouter, Ollama, LM Studio, or any compatible API
  • OpenAI, Anthropic & Ollama APIs — Drop-in compatible endpoints for existing tools
  • MCP Server — Expose tools to AI agents via Model Context Protocol
  • Remote MCP Providers — Connect to external MCP servers and aggregate their tools
  • Plugin System — Extend functionality with community and custom tools
  • Personas — Create custom AI assistants with unique prompts, tools, and visual themes
  • Multi-Window Chat — Multiple independent chat windows with per-window personas
  • Developer Tools — Built-in insights and server explorer for debugging
  • Voice Input — Speech-to-text using WhisperKit with real-time on-device transcription
  • VAD Mode — Always-on listening with wake-word activation for hands-free persona access
  • Transcription Mode — Global hotkey to transcribe speech directly into any app
  • Apple Foundation Models — Use the system model on macOS 26+ (Tahoe)

Highlights

FeatureDescription
Local LLM ServerRun Llama, Qwen, Gemma, Mistral, and more locally
Remote ProvidersAnthropic, OpenAI, OpenRouter, Ollama, LM Studio, or custom
OpenAI Compatible/v1/chat/completions with streaming and tool calling
Anthropic Compatible/messages endpoint for Claude Code and Anthropic SDK clients
MCP ServerConnect to Cursor, Claude Desktop, and other MCP clients
Remote MCP ProvidersAggregate tools from external MCP servers
Tools & PluginsBrowser automation, file system, git, web search, and more
PersonasCustom AI assistants with unique prompts, tools, and themes
Custom ThemesCreate, import, and export themes with full color customization
Developer ToolsRequest insights, API explorer, and live endpoint testing
Multi-Window ChatMultiple independent chat windows with per-window personas
Menu Bar ChatChat overlay with session history, context tracking (⌘;)
Voice InputSpeech-to-text with WhisperKit, real-time transcription
VAD ModeAlways-on listening with wake-word persona activation
Transcription ModeGlobal hotkey to dictate into any focused text field
Model ManagerDownload and manage models from Hugging Face

Quick Start

1. Start the Server

Launch Osaurus from Spotlight or run:

osaurus serve

The server starts on port 1337 by default.

2. Connect an MCP Client

Add to your MCP client configuration (e.g., Cursor, Claude Desktop):

{
  "mcpServers": {
    "osaurus": {
      "command": "osaurus",
      "args": ["mcp"]
    }
  }
}

3. Add a Remote Provider (Optional)

Open the Management window (⌘ Shift M) → ProvidersAdd Provider.

Choose from presets (OpenAI, Ollama, LM Studio, OpenRouter) or configure a custom endpoint.


Key Features

Local Models (MLX)

Run models locally with optimized Apple Silicon inference:

# Download a model
osaurus run llama-3.2-3b-instruct-4bit

# Use via API
curl http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.2-3b-instruct-4bit", "messages": [{"role": "user", "content": "Hello!"}]}'

Remote Providers

Connect to remote APIs to access cloud models alongside local ones.

Supported presets:

  • Anthropic — Claude models with native API support
  • OpenAI — GPT-4o, o1, and other OpenAI models
  • OpenRouter — Access multiple providers through one API
  • Ollama — Connect to a local or remote Ollama instance
  • LM Studio — Use LM Studio as a backend
  • Custom — Any OpenAI-compatible endpoint

Features:

  • Secure API key storage (macOS Keychain)
  • Custom headers for authentication
  • Auto-connect on launch
  • Connection health monitoring

See Remote Providers Guide for details.

MCP Server

Osaurus is a full MCP (Model Context Protocol) server. Connect it to any MCP client to give AI agents access to your installed tools.

EndpointDescription
GET /mcp/healthCheck MCP availability
GET /mcp/toolsList active tools
POST /mcp/callExecute a tool

Remote MCP Providers

Connect to external MCP servers and aggregate their tools into Osaurus:

  • Discover and register tools from remote MCP endpoints
  • Configurable timeouts and streaming
  • Tools are namespaced by provider (e.g., provider_toolname)
  • Secure token storage

See Remote MCP Providers Guide for details.

Tools & Plugins

Install tools from the central registry or create your own.

Official System Tools:

PluginTools
osaurus.filesystemread_file, write_file, list_directory, search_files, and more
osaurus.browserbrowser_navigate, browser_click, browser_type, browser_screenshot
osaurus.gitgit_status, git_log, git_diff, git_branch
osaurus.searchsearch, search_news, search_images (DuckDuckGo)
osaurus.fetchfetch, fetch_json, fetch_html, download
osaurus.timecurrent_time, format_date
# Install from registry
osaurus tools install osaurus.browser

# List installed tools
osaurus tools list

# Create your own plugin
osaurus tools create MyPlugin --language swift

See the Plugin Authoring Guide for details.

Personas

Create custom AI assistant personalities with unique behaviors, capabilities, and styles.

Each persona can have:

  • Custom System Prompt — Define unique instructions and personality
  • Tool Configuration — Enable or disable specific tools per persona
  • Visual Theme — Assign a custom theme that activates with the persona
  • Model & Generation Settings — Set default model, temperature, and max tokens
  • Import/Export — Share personas as JSON files

Use cases:

  • Code Assistant — Focused on programming with code-related tools enabled
  • Daily Planner — Calendar and reminders integration
  • Research Helper — Web search and note-taking tools enabled
  • Creative Writer — Higher temperature, no tool access for pure generation

Access via Management window (⌘ Shift M) → Personas.

Multi-Window Chat

Work with multiple independent chat windows, each with its own persona and session.

Features:

  • Independent Windows — Each window maintains its own persona, theme, and session
  • File → New Window — Open additional chat windows (⌘ N)
  • Persona per Window — Different personas in different windows simultaneously
  • Open in New Window — Right-click any session in history to open in a new window
  • Pin to Top — Keep specific windows floating above others
  • Cascading Windows — New windows are offset so they're always visible

Use Cases:

  • Run multiple AI personas side-by-side (e.g., "Code Assistant" and "Creative Writer")
  • Compare responses from different personas
  • Keep reference conversations open while starting new ones
  • Organize work by project with dedicated windows

Developer Tools

Built-in tools for debugging and development:

Insights — Monitor all API requests in real-time:

  • Request/response logging with full payloads
  • Filter by method (GET/POST) and source (Chat UI/HTTP API)
  • Performance stats: success rate, average latency, errors
  • Inference metrics: tokens, speed (tok/s), model used

Server Explorer — Interactive API reference:

  • Live server status and health
  • Browse all available endpoints
  • Test endpoints directly with editable payloads
  • View formatted responses

Access via Management window (⌘ Shift M) → Insights or Server.

See Developer Tools Guide for details.

Voice Input

Speech-to-text powered by WhisperKit — fully local, private, on-device transcription.

Features:

  • Real-time transcription — See your words as you speak
  • Multiple Whisper models — From Tiny (75 MB) to Large V3 (3 GB)
  • Microphone or system audio — Transcribe your voice or computer audio
  • Configurable sensitivity — Adjust for quiet or noisy environments
  • Auto-send with confirmation — Hands-free message sending

VAD Mode (Voice Activity Detection):

Activate personas hands-free by saying their name or a custom wake phrase.

  • Say a persona's name (e.g., "Hey Code Assistant") to open chat
  • Automatic voice input starts after activation
  • Status indicators: Blue pulsing dot on menu bar icon when listening, toggle button in popover
  • Configurable silence timeout and auto-close

Transcription Mode:

Dictate text directly into any application using a global hotkey.

  • Global Hotkey — Trigger transcription from anywhere on your Mac
  • Live Typing — Text is typed into the currently focused text field in real-time
  • Accessibility Integration — Uses macOS accessibility APIs to simulate keyboard input
  • Minimal Overlay — Sleek floating UI shows recording status
  • Press Esc or Done — Stop transcription when finished

Perfect for dictating emails, documents, code comments, or any text input without switching apps.

Setup:

  1. Open Management window (⌘ Shift M) → Voice
  2. Grant microphone permission
  3. Download a Whisper model
  4. For Transcription Mode: Grant accessibility permission and configure the hotkey in the Transcription tab
  5. Test your voice input

See Voice Input Guide for details.


CLI Reference

CommandDescription
osaurus serveStart the server (default port 1337)
osaurus serve --exposeStart exposed on LAN
osaurus stopStop the server
osaurus statusCheck server status
osaurus uiOpen the menu bar UI
osaurus listList downloaded models
osaurus run <model>Interactive chat with a model
osaurus mcpStart MCP stdio transport
osaurus tools <cmd>Manage plugins (install, list, search, etc.)

Tip: Set OSU_PORT to override the default port.


API Endpoints

Base URL: http://127.0.0.1:1337 (or your configured port)

EndpointDescription
GET /healthServer health
GET /v1/modelsList models (OpenAI format)
GET /v1/tagsList models (Ollama format)
POST /v1/chat/completionsChat completions (OpenAI format)
POST /messagesChat completions (Anthropic format)
POST /chatChat (Ollama format, NDJSON)

All endpoints support /v1, /api, and /v1/api prefixes.

See the OpenAI API Guide for tool calling, streaming, and SDK examples.


Use with OpenAI SDKs

Point any OpenAI-compatible client at Osaurus:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:1337/v1", api_key="osaurus")

response = client.chat.completions.create(
    model="llama-3.2-3b-instruct-4bit",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Requirements

  • macOS 15.5+ (Apple Foundation Models require macOS 26)
  • Apple Silicon (M1 or newer)
  • Xcode 16.4+ (to build from source)

Models are stored at ~/MLXModels by default. Override with OSU_MODELS_DIR.

Whisper models are stored at ~/.osaurus/whisper-models.


Build from Source

git clone https://github.com/dinoki-ai/osaurus.git
cd osaurus
open osaurus.xcworkspace
# Build and run the "osaurus" target

Contributing

We're looking for contributors! Osaurus is actively developed and we welcome help in many areas:

  • Bug fixes and performance improvements
  • New plugins and tool integrations
  • Documentation and tutorials
  • UI/UX enhancements
  • Testing and issue triage

Get Started

  1. Check out Good First Issues
  2. Read the Contributing Guide
  3. Join our Discord to connect with the team

See docs/FEATURES.md for a complete feature inventory and architecture overview.


Community

If you find Osaurus useful, please star the repo and share it!

Reviews

No reviews yet

Sign in to write a review