Pagila Database AI Assistant
This project is a Streamlit-based chatbot that allows users to query a PostgreSQL database (Pagila schema) using natural language. It leverages Google's Gemini API for SQL generation and the Model Context Protocol (MCP) for secure database execution.
Features
- Natural Language to SQL: Converts English questions into valid SQL queries.
- Agentic Workflow: The AI autonomously explores the database schema (listing tables, checking columns) before writing queries.
- Vector Caching (RAG): Uses ChromaDB to cache successful SQL queries locally. If a similar question is asked again, the cached SQL is executed immediately, saving API costs and time.
- Model Context Protocol (MCP): Uses a dedicated local server (
mcp_pagila_server.py) to handle database operations, separating the UI from the backend logic. - Cost Tracking: Monitors token usage and estimates costs for both the current session and global history.
- Schema Visualization: Displays the database schema diagram and metadata within the UI.
Architecture
- Frontend: Streamlit (
app.py) handles user input, chat history, and visualization. - AI Brain: Google Gemini (via
google-generativeai) acts as the reasoning engine. - Backend: An MCP Server (
mcp_pagila_server.py) runs as a subprocess, exposing tools likelist_tables,get_table_schema, andexecute_sql. - Database: PostgreSQL hosting the Pagila sample database.
- Cache: ChromaDB stores embeddings of questions and their corresponding SQL.
Prerequisites
- Python 3.10+
- PostgreSQL database with the Pagila schema installed.
- Google Gemini API Key.
Installation
-
Clone the repository:
git clone <repository-url> cd mcp-pagila-server -
Install dependencies:
pip install -r requirements.txt -
Configure Environment: Create a
config.envfile in the root directory:GEMINI_API_KEY=your_google_api_key_here PGHOST=localhost PGUSER=your_postgres_user PGPASSWORD=your_postgres_password PGDATABASE=pagila LOG_DIR=logsNote: It is recommended to use a read-only database user for security.
Usage
Run the Streamlit application:
streamlit run app.py
Testing & Inspection + +streamlit_app.py is provided as a lightweight, chat-like interface to test the MCP server directly without the full Gemini Agent loop. It is useful for debugging the MCP server connection and running raw SQL (using a run: prefix) or heuristic-based queries
Project Structure
app.py: Main Streamlit application.mcp_pagila_server.py: MCP server implementation.pagila-metadata.txt: Text-based schema summary for the AI.requirements.txt: Python dependencies.vector_store/: Directory where ChromaDB persists data (ignored in git).usage_stats.json: Local file tracking usage costs (ignored in git).