MCP Hub
Back to servers

RooCode-RAG-Lookup

A Retrieval-Augmented Generation (RAG) server that enables semantic search across PDF documents and code repositories using ChromaDB and sentence transformers. It provides automatic document indexing, text chunking, and relevance-scored lookups for AI models.

Tools
2
Updated
Dec 21, 2025

RooCode-RAG-Lookup

RooCode MCP Server for performing RAG (Retrieval-Augmented Generation) lookups in documents and code repositories using vector embeddings and semantic search.

Example Usage

Ask a question: e.g. "What is the maximum number of entries* in a word document?" and prompt the LLM stating "use rag". The LLM is usally a decent judge of when it should use a tool or not and may decide to use the tool on its own.

image

*This is related to the maximum number of XML properties and elements addressable in Word

Features

  • Full RAG Implementation: Complete vector-based semantic search using ChromaDB and Haystack
  • Document Indexing: Automatic text extraction and chunking from PDF documents
  • Vector Embeddings: Sentence transformer embeddings for semantic similarity
  • RAG Lookup Tool: Search through documents and code repositories with relevance scoring
  • Test Tool: Simple hello world tool to verify MCP server connectivity
  • Async MCP Protocol: Full JSON-RPC 2.0 support via stdio

Installation

  1. Install Python dependencies:
pip install -r requirements.txt
  1. Configure RooCode to use this MCP server by adding the configuration from mcp_config.json to your RooCode settings.

Configuration

  1. Add the mcp_config.json to your RooCode MCP server settings in the edit global settings part of MCP tools. If the tool is ready to use it will show a green status.

  2. Set the following environment variables:

    • RAG_LOOKUP_PATH: Path to this project directory
    • PYTHON_PATH: Path to your Python executable
  3. Configure parameters in parameters.py:

    • EMBEDDING_MODEL: Sentence transformer model (default: all-mpnet-base-v2)
    • COLLECTION_NAME: ChromaDB collection name
    • CHUNK_SIZE: Text chunk size in words (default: 500)
    • CHUNK_OVERLAP: Overlap between chunks (default: 50)
    • DEFAULT_TOP_K: Number of results to return (default: 5)

Available Tools

1. rag_lookup

Perform semantic search using RAG in documents and code repositories. Returns relevant chunks with similarity scores and metadata.

Parameters:

  • query (required): The search query
  • source (optional): Where to search - "documents", "repos", or "both" (default: "both")

Returns:

  • Relevant text chunks with similarity scores
  • Source file information and metadata
  • Statistics on documents searched

Example:

{
  "query": "authentication implementation",
  "source": "both"
}

Response Format:

{
  "status": "success",
  "query": "authentication implementation",
  "results": [
    {
      "content": "...",
      "score": 0.85,
      "metadata": {
        "file_name": "document.txt",
        "source_file": "/path/to/document.txt"
      }
    }
  ],
  "metadata": {
    "documents_searched": 5,
    "repos_searched": 3,
    "total_matches": 5
  }
}

2. say_hello

Simple test tool that returns a greeting message with timestamp.

Parameters:

  • name (optional): Name to include in greeting (default: "World")

Example:

{
  "name": "RooCode"
}

Usage

1. Extract and Index Documents

Place PDF documents in the Documents/ or Repos/ folders, then run:

# Extract text from PDFs
python extraction/parse_pdf.py

# Populate the vector database
python extraction/populate_database.py

2. Query the RAG System

# Test RAG lookup directly
python query_rag.py

Or ask

3. Use via MCP Server

Once configured in RooCode, use the rag_lookup tool through the MCP interface. There is an MCP menu in RooCode settings editing the global settings will give you json settings to edit {"mcpServers":{}}, copy and paste the mcp_config.json into the global MCP settings.

Testing

Test the MCP server locally:

# Using MCP inspector
npx @modelcontextprotocol/inspector python mcp_tool.py

# Direct stdio test
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | python mcp_tool.py

Project Structure

RooCode-RAG-Lookup/
├── mcp_tool.py                    # Main MCP server implementation
├── query_rag.py                   # RAG query functions
├── parameters.py                  # Configuration parameters
├── run_rag_lookup.bat             # Windows batch launcher
├── mcp_config.json                # Example RooCode configuration
├── requirements.txt               # Python dependencies
├── extraction/
│   ├── parse_pdf.py              # PDF text extraction
│   └── populate_database.py      # Database population and indexing
├── ExtractedText/                 # Extracted text files (.txt + .meta.json)
├── chroma_db/                     # ChromaDB vector database
└── README.md                      # This file

Technology Stack

  • MCP Python SDK: Protocol implementation for RooCode integration
  • Haystack: Document processing and RAG pipeline framework
  • ChromaDB: Vector database for embeddings storage
  • Sentence Transformers: Semantic embeddings (all-mpnet-base-v2)
  • PDFPlumber: PDF text extraction with layout preservation
  • Async/Await: Concurrent request handling
  • JSON-RPC 2.0: Communication protocol
  • Stdio Transport: RooCode integration

How It Works

  1. Document Extraction: PDFs are parsed using parse_pdf.py which extracts text and metadata
  2. Text Chunking: Documents are split into overlapping chunks using DocumentSplitter
  3. Embedding Generation: Text chunks are converted to 768-dimensional vectors using sentence transformers
  4. Vector Storage: Embeddings are stored in ChromaDB with metadata for retrieval
  5. Semantic Search: Queries are embedded and matched against stored vectors using cosine similarity
  6. Result Ranking: Top-K most relevant chunks are returned with scores and metadata

Requirements

See requirements.txt for full dependencies. Key packages:

  • mcp>=1.0.0 - MCP protocol support
  • haystack-ai - RAG framework
  • chroma-haystack - ChromaDB integration
  • sentence-transformers - Embedding models
  • pdfplumber - PDF extraction

License

MIT

Reviews

No reviews yet

Sign in to write a review