MCP Enhanced Data Retrieval System

An MCP (Model Context Protocol) server that standardizes AI context sharing by integrating organizational knowledge sources (GitHub, internal docs, APIs) to enable domain-aware AI assistance for enterprise development workflows.

Project Overview

This system implements the Model Context Protocol to provide:

Standardized AI context sharing across organizational knowledge sources
GitHub repository integration with OAuth 2.1 authentication
Vector-based semantic search using embeddings
Optimized 1500-token context chunking for sub-500ms TTFT
Parallel retrieval strategy with 2-second timeout
Streamable HTTP transport using FastAPI

Architecture

AI Applications
    ↓
Authentication (OAuth 2.1 + RBAC)
    ↓
MCP Client
    ↓
MCP Protocol (JSON-RPC + HTTP)
    ↓
MCP Server
    • Multi-threaded parallel retrieval
    • 1500-token chunking
    ↓
Knowledge Tiers (Public, Internal, Restricted)
    ↓
Data Sources: GitHub | Docs
Vector Storage: Embeddings

Features

MCP Protocol Compliance: JSON-RPC 2.0 over Streamable HTTP
GitHub Integration: Repository data retrieval and contextualization
Vector Embeddings: Semantic search using ChromaDB and Sentence Transformers
Context Optimization: 1500-token chunking with parallel retrieval
OAuth 2.1 Security: Secure authentication for GitHub access
Performance: Sub-500ms response times with 2-second retrieval timeout

Project Structure

.
├── src/
│   ├── server/          # MCP server core and FastAPI app
│   ├── auth/            # OAuth 2.1 authentication
│   ├── github/          # GitHub API integration
│   ├── vector/          # Vector database and embeddings
│   └── utils/           # Utilities and helpers
├── tests/               # Test suite
├── config/              # Configuration files
├── data/                # Data storage (vector DB, cache)
├── logs/                # Application logs
├── requirements.txt     # Python dependencies
└── .env.example         # Environment variables template

Setup

Clone and navigate to the project:
```
cd "MCP Enhanced Data Retrieval"
```

Create virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Configure environment variables:

cp .env.example .env
# Edit .env with your credentials

Run the server:
```
uvicorn src.server.main:app --reload
```

Milestone 1 Goals

✅ MCP protocol analysis and communication flow evaluation
✅ High-level architecture design for enterprise knowledge integration
🔄 Functional MCP server with GitHub integration
🔄 OAuth 2.1 authentication implementation
🔄 1500-token context chunking mechanism
🔄 Vector-based semantic search

Success Criteria

Functional MCP server that can retrieve and contextualize GitHub repository information
OAuth 2.1 authentication for secure GitHub access
1500-token context chunking maintaining sub-500ms TTFT
Parallel retrieval with 2-second timeout
Vector-based semantic search for relevant content

Technologies

MCP SDK: Anthropic MCP Python SDK
Web Framework: FastAPI with Streamable HTTP transport
GitHub API: PyGithub
Authentication: OAuth 2.1 (authlib)
Vector Database: ChromaDB
Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
Token Processing: tiktoken

Author

Kalpalathika Ramanujam Advisor: Dr. Thomas Kinsman Rochester Institute of Technology

License

Academic Project - RIT Capstone

MCP-Enhanced-Data-Retrieval-System