MCP Hub
Back to servers

MCP-Enhanced-Data-Retrieval-System

An enterprise-grade MCP server that integrates GitHub repositories and internal documentation using vector-based semantic search and OAuth 2.1 authentication. It features optimized 1500-token context chunking and parallel retrieval for low-latency AI assistance.

Updated
Oct 21, 2025

MCP Enhanced Data Retrieval System

An MCP (Model Context Protocol) server that standardizes AI context sharing by integrating organizational knowledge sources (GitHub, internal docs, APIs) to enable domain-aware AI assistance for enterprise development workflows.

Project Overview

This system implements the Model Context Protocol to provide:

  • Standardized AI context sharing across organizational knowledge sources
  • GitHub repository integration with OAuth 2.1 authentication
  • Vector-based semantic search using embeddings
  • Optimized 1500-token context chunking for sub-500ms TTFT
  • Parallel retrieval strategy with 2-second timeout
  • Streamable HTTP transport using FastAPI

Architecture

AI Applications
    ↓
Authentication (OAuth 2.1 + RBAC)
    ↓
MCP Client
    ↓
MCP Protocol (JSON-RPC + HTTP)
    ↓
MCP Server
    • Multi-threaded parallel retrieval
    • 1500-token chunking
    ↓
Knowledge Tiers (Public, Internal, Restricted)
    ↓
Data Sources: GitHub | Docs
Vector Storage: Embeddings

Features

  • MCP Protocol Compliance: JSON-RPC 2.0 over Streamable HTTP
  • GitHub Integration: Repository data retrieval and contextualization
  • Vector Embeddings: Semantic search using ChromaDB and Sentence Transformers
  • Context Optimization: 1500-token chunking with parallel retrieval
  • OAuth 2.1 Security: Secure authentication for GitHub access
  • Performance: Sub-500ms response times with 2-second retrieval timeout

Project Structure

.
├── src/
│   ├── server/          # MCP server core and FastAPI app
│   ├── auth/            # OAuth 2.1 authentication
│   ├── github/          # GitHub API integration
│   ├── vector/          # Vector database and embeddings
│   └── utils/           # Utilities and helpers
├── tests/               # Test suite
├── config/              # Configuration files
├── data/                # Data storage (vector DB, cache)
├── logs/                # Application logs
├── requirements.txt     # Python dependencies
└── .env.example         # Environment variables template

Setup

  1. Clone and navigate to the project:

    cd "MCP Enhanced Data Retrieval"
    
  2. Create virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Configure environment variables:

    cp .env.example .env
    # Edit .env with your credentials
    
  5. Run the server:

    uvicorn src.server.main:app --reload
    

Milestone 1 Goals

  • ✅ MCP protocol analysis and communication flow evaluation
  • ✅ High-level architecture design for enterprise knowledge integration
  • 🔄 Functional MCP server with GitHub integration
  • 🔄 OAuth 2.1 authentication implementation
  • 🔄 1500-token context chunking mechanism
  • 🔄 Vector-based semantic search

Success Criteria

  • Functional MCP server that can retrieve and contextualize GitHub repository information
  • OAuth 2.1 authentication for secure GitHub access
  • 1500-token context chunking maintaining sub-500ms TTFT
  • Parallel retrieval with 2-second timeout
  • Vector-based semantic search for relevant content

Technologies

  • MCP SDK: Anthropic MCP Python SDK
  • Web Framework: FastAPI with Streamable HTTP transport
  • GitHub API: PyGithub
  • Authentication: OAuth 2.1 (authlib)
  • Vector Database: ChromaDB
  • Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
  • Token Processing: tiktoken

Author

Kalpalathika Ramanujam Advisor: Dr. Thomas Kinsman Rochester Institute of Technology

License

Academic Project - RIT Capstone

Reviews

No reviews yet

Sign in to write a review