MCP Hub
Back to servers

AnyLoom-AnythingLLM-Local-AI-agentic-DyTopo-swarm

ChatGPT-like AI that runs 100% locally on your hardware. No subscriptions, no cloud, complete privacy. Multi-agent swarm + 10 MCP tools + hybrid RAG vector DB + . Runs on one GPU (RTX 5090 recommended)

Stars
5
Forks
1
Updated
Feb 18, 2026
Validated
Feb 20, 2026

AnyLoom: AnythingLLM Local AI Agentic Stack

A fully local, multi-agent AI system that gives you ChatGPT-level intelligence with complete privacy and control over your data.

Now with Docker! One command starts the entire stack. Zero manual setup.


💡 What Can You Do With This?

Run a production-grade AI assistant stack entirely on your hardware:

  • 🔒 100% private — No data leaves your machine. No API keys. No subscriptions.
  • 🧠 Advanced reasoning — Qwen3-30B MoE (30.5B params, 3.3B active) with hybrid thinking mode
  • 📚 Hybrid RAG search — Finds YOUR information better than pure vector search (dense + sparse retrieval)
  • 🤖 Multi-agent swarm — DyTopo coordination routes complex tasks to specialized agents that collaborate, with optional RAG context pre-fetch for domain grounding
  • 🛠️ 10 MCP servers — Memory knowledge graph, web search, browser automation, file operations, code execution, RAG search, multi-agent swarm
  • 🐋 Docker-first architecture — One command to start/stop everything. Auto-restart. Zero networking hassles.
  • 💬 AnythingLLM UI — Clean interface for chat, document Q&A, and workspace management

Ideal for:

  • Engineers who need AI assistance with proprietary codebases
  • Researchers handling sensitive documents (legal, medical, financial)
  • Privacy-conscious users who want ChatGPT-level capability without cloud dependency
  • Developers building custom AI workflows with persistent memory and multi-agent collaboration

Why AnyLoom vs Cloud AI or Single-LLM Setups?

AnyLoomCloud AI (ChatGPT, Claude)Single Local LLM
Privacy✅ 100% local, zero telemetry❌ Your data trains their models✅ Local
Cost✅ One-time hardware investment❌ $20-200/month subscription✅ Free after setup
Retrieval Quality✅ Hybrid dense+sparse RAG⚠️ Dense-only embeddings⚠️ Basic or no RAG
Multi-Agent Swarm✅ DyTopo routing, 3-5 agents❌ Single model per request❌ Single model
Persistent Memory✅ MCP knowledge graph across sessions⚠️ Limited to conversation❌ No cross-session memory
Tool Ecosystem✅ 10 MCP servers (RAG, swarm, web, code, files, browser)⚠️ Limited, cloud-gated❌ Manual integration
Context Window✅ 131K tokens (configurable)⚠️ 128K (expensive tiers)⚠️ Varies by model
Offline Use✅ Fully functional❌ Requires internet✅ Fully functional

The bottom line: If you need ChatGPT-level capability for sensitive work, AnyLoom gives you near the same intelligence without the privacy trade-offs or subscription costs.


🌐 How It Works

AnyLoom runs as a Docker Compose stack with these services:

  • Qdrant (port 6333) — Vector database for hybrid dense+sparse RAG
  • llama.cpp LLM (port 8008) — GPU-accelerated inference with 131K context (Qwen3-30B-A3B)
  • llama.cpp Embedding (port 8009) — BGE-M3 embedding server for AnythingLLM (1024-dim dense vectors)
  • AnythingLLM (port 3001) — Web UI for chat and document management
  • DyTopo swarm (Python, runs natively) — Multi-agent orchestration for complex tasks
  • 10 MCP servers — RAG search, DyTopo swarm, memory graph, web search, browser automation, file ops, and more

Everything starts with one command. Docker handles networking, GPU access, auto-restart, and data persistence.

AnyLoom Architecture Diagram

AnyLoom Architecture Diagram

ComponentTokens
Total Token Budget131K
System prompt~2K
MCP tool definitions (9 Docker + 1 qdrant-rag)~3K
RAG snippets (16 × ~500 tokens)~8K
Chat history (30 messages)~12K
Overhead Subtotal:~25K
Remaining for chat~106K

The entire RAG-prompt set fits comfortably inside the token limit. Context length is configurable (default 131K). Q4_K_M model weights are ~18.6 GiB, leaving ample room for KV cache on 32GB GPUs. See docs/llm-engine.md for VRAM budget details.

✅ Runs on a single GPU (requires 32GB+ VRAM; optimized for RTX 5090)


🛠️ Prerequisites

All you need:

ComponentRequirement
Docker Desktopv24.0+ with WSL2 integration and GPU support enabled
NVIDIA GPURTX 4090/5090 or similar (32GB VRAM recommended for full 131K context. 24GB GPUs can run with reduced context.)
NVIDIA Driver535+ (for CUDA 12 support)
Python3.10+ (for benchmarks and DyTopo scripts)
Disk Space~100GB for models and data

Docker handles everything: Qdrant, llama.cpp (LLM + Embedding), and AnythingLLM run as containers. No manual WSL setup or service management!


🚀 Quickstart

1. Clone and Download Model

git clone <repo-url>
cd AnyLoom

# Download models
mkdir -p models
pip install huggingface_hub

# LLM model — Qwen3-30B-A3B Q4_K_M (~18.6 GB, GPU)
huggingface-cli download Qwen/Qwen3-30B-A3B-Instruct-2507-GGUF \
  Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf \
  --local-dir models

# Embedding model — BGE-M3 Q8_0 (~605 MB, GPU)
huggingface-cli download ggml-org/bge-m3-Q8_0-GGUF \
  bge-m3-q8_0.gguf \
  --local-dir models

Already have the LLM GGUF? Symlink instead of re-downloading: ln -s ~/.lmstudio/models/lmstudio-community/Qwen3-30B-A3B-Instruct-2507-GGUF/Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf models/

2. Start the Docker Stack

# One command starts everything (creates volumes, checks model, waits for health)
bash scripts/docker_start.sh

# Or manually (must create volumes first)
docker volume create anyloom_qdrant_storage
docker volume create anyloom_anythingllm_storage
docker volume create anyloom_anythingllm_hotdir
docker compose up -d

Startup takes ~2 minutes while llama.cpp loads the model into GPU VRAM. First query may take an additional 1-2 minutes as the prompt cache warms up.

3. Configure AnythingLLM

  1. Open http://localhost:3001 and complete the initial setup wizard (password, preferences). The API is locked until this is done.
  2. Then run the automated configuration:
python scripts/configure_anythingllm.py

This configures AnythingLLM system defaults (LLM provider, max tokens, BGE-M3 embedding, vector DB, chunk size/overlap, default system prompt), creates an AnyLoom workspace, uploads and embeds the RAG reference documents from rag-docs/anythingllm/ into the workspace's vector store, pushes tuned workspace settings, and runs a smoke test. Re-running the script is safe — it skips documents that are already uploaded and embedded.

4. Access Services

5. Run Benchmarks (Optional)

# Install Python dependencies first
pip install -r requirements-dytopo.txt

# Test the full stack (all 6 phases)
ANYTHINGLLM_API_KEY=your-key python scripts/benchmarks/bench_run_all.py

# Or test just llama.cpp directly (no AnythingLLM needed)
ANYTHINGLLM_API_KEY=your-key python scripts/benchmarks/bench_phase5_llm.py

Phase 5 validates llama.cpp directly — fabrication guards, tool boundary awareness, and depth calibration. Current score: 15/20 (75%) with perfect marks on fabrication guards, adversarial resistance, cross-workspace parity, depth stability, and LLM direct validation. See benchmark results for full scores.


🔧 Management Commands

# View logs
bash scripts/docker_logs.sh llm           # llama.cpp only
bash scripts/docker_logs.sh anythingllm  # AnythingLLM only
docker compose logs -f                    # All services

# Stop services
bash scripts/docker_stop.sh
# Or: docker compose down

# Restart a specific service
docker compose restart llm

# Check status
docker compose ps

# Remove everything including data (⚠️ DESTRUCTIVE)
docker compose down -v

📚 Documentation

Start here: INSTALL.md — Docker-based installation guide (repo root)

Reference documentation in docs/:

DocumentContents
architecture.mdSystem topology, VRAM budget, port assignments
llm-engine.mdllama.cpp Docker container config, GPU settings, troubleshooting
qwen3-model.mdQwen3-30B-A3B MoE architecture, quantization, sampling
bge-m3-embedding.mdBGE-M3 embedding architecture (ONNX INT8 CPU for MCP RAG + llama.cpp GGUF for AnythingLLM, 1024-dim dense vectors)
qdrant-topology.mdQdrant Docker container, collection schema, sync
qdrant-servers.mdMCP server inventory, tool definitions, token budget
dytopo-swarm.mdDyTopo multi-agent routing, package architecture, domains, lifecycle
anythingllm-settings.mdAnythingLLM Docker container, provider config, workspace setup
benchmark-results-showcase.mdBenchmark results across all rounds

DyTopo Package (src/dytopo/)

ModulePurpose
models.pyPydantic v2 data models (AgentState, SwarmTask with RAG context field, SwarmMetrics, etc.)
config.pyYAML configuration loader with defaults (dytopo_config.yaml)
agents.pySystem prompts, JSON schemas, domain rosters
router.pyMiniLM-L6-v2 embedding, cosine similarity, threshold, degree cap
graph.pyNetworkX DAG construction, cycle breaking, topological sort
orchestrator.pyMain swarm loop with singleton inference client, Aegean termination, memory persistence
governance.pyConvergence detection, stalling detection, re-delegation, Aegean consensus voting
audit.pyJSONL audit logging to ~/dytopo-logs/{task_id}/
health/checker.pyPre-run health probes for LLM, Qdrant, AnythingLLM, GPU
memory/writer.pyPost-run swarm result persistence to structured storage

🔄 Data & Persistence

  • Docker Volumes (persist across restarts):

    • anyloom_qdrant_storage — Vector database
    • anyloom_anythingllm_storage — AnythingLLM workspaces
    • anyloom_anythingllm_hotdir — AnythingLLM document collector
  • Host Bind Mount:

    • ./models/ — GGUF model files (~19.2 GB total). LLM model (~18.6 GB) + embedding model (~605 MB). Place both files here before starting.
  • Filesystem Access: All configuration files and Python scripts are local

  • Model Updates: Replace the GGUF file in ./models/ and restart: docker compose restart llm

  • RAG Re-indexing: Re-run python scripts/configure_anythingllm.py (idempotent) or re-embed documents via AnythingLLM UI

# View volumes
docker volume ls | grep anyloom

# Backup a volume
docker run --rm -v anyloom_qdrant_storage:/data -v $(pwd):/backup ubuntu tar czf /backup/qdrant_backup.tar.gz /data

# Remove all data (⚠️ DESTRUCTIVE)
docker compose down -v

You're now running a next-gen, fully local AI agentic stack. Start creating, querying, and orchestrating with AnyLoom today.

Reviews

No reviews yet

Sign in to write a review