MCP Server
🧩 Main modules
- Main MCP server - the core of the system, performing the following functions:
- Registry of connected servers
- Routing requests between servers
- Monitoring server status
- Aggregation of information about available tools
- Specialized servers (connect to the main one):
- Embedding server - working with vector representations of text
- PDF extract server - conversion and extraction from PDF to Markdown
- Reranker server - ranking text data
- Qdrant server - managing vector collections
- PostgreSQL server - executing SQL queries and schema inspection
- LLM server - generating and streaming LLM responses, list of models
- MarkUp server - text/file markup using markup service methods
- Transcribe server - audio loading, status and transcription result
⚙️ Available tools
| Server | Methods |
|---|---|
| Embedding server | embedding_generate, embedding_batch_generate, embedding_get_models, health_check |
| PDF extract server | document_convert_to_markdown, document_get_supported_formats, health_check |
| Reranker server | rerank_documents, health_check |
| Qdrant server | vector_create_collection, vector_get_collection_info, vector_upsert_points, vector_search, vector_delete_points, health_check |
| PostgreSQL server | postgres_execute_query, postgres_get_schema, postgres_create_table, postgres_insert_data, health_check |
| LLM server | llm_chat_completion, llm_get_models, llm_stream_completion, health_check |
| MarkUp server | markup_get_methods, markup_process_text, markup_process_file, health_check |
| Transcribe server | transcribe_audio, transcribe_get_status, transcribe_get_result, health_check |
Note:
To add a new FastMCP server, you need to import it in the main_server.py file and place it in the MCP_SERVERS array, after which the main methods of main_server.py will have access to it.
Also a necessary requirement for FastMCP servers is the presence of the health_check method to check the state.
📡 Main server methods
get_server_and_tools() # Get a list of all servers and tools
router(server_name, tool_name, params) # Routing requests
health_check_servers() # Checking the health of all servers
Setting up the environment
Create a .env file in the root of the project and put the following environment variables in it (the list corresponds to the use in the code):
# Main server
MAIN_SERVER_API_KEY=...
# Embedding server
EMBEDDING_API_KEY=...
EMBEDDING_URL=...
EMBEDDING_MODEL_NAME=...
EMBEDDING_URL_MODELS=...
EMBEDDING_HEALTH_URL=...
# PDF extract server
PDF_EXTRACTOR_URL=...
PDF_HEALTH_URL=...
# Reranker server
RERANK_URL=...
RERANK_MODEL=...
RERANK_HEALTH_URL=...
# Qdrant server
QDRANT_URL=...
QDRANT_API_KEY=...
QDRANT_HEALTH_CHECK_URL=...
# PostgreSQL server
POSTGRES_USER=...
POSTGRES_PASSWORD=...
POSTGRES_HOST=...
POSTGRES_DB=...
#LLM server
LLM_SERVICE_API_KEY=...
LLM_SERVICE_MODEL=...
LLM_SERVICE_CHAT_COMPLETIONS_URL=...
LLM_SERVICE_MODELS_URL=...
LLM_SERVICE_COMPLETIONS_URL=...
LLM_SERVICE_HEALTH_URL=...
# MarkUp server
MARKUP_API_KEY=...
MARKUP_GET_METHODS_URL=...
MARKUP_PROCESS_TEXT_URL=...
MARKUP_PROCESS_FILE_URL=...
MARKUP_HEALTH_CHECK_URL=...
# Transcribe server
TRANSCRIBE_API_KEY=...
TRANSCRIBE_UPLOAD_AUDIO=...
TRANSCRIBE_HEALTH_URL=...
Start the main server
Installation dependencies and creating a virtual environment
Before starting the server, it is recommended to create a virtual environment and install all dependencies from requirements.txt. Run the following commands in the terminal:
- Creating a virtual environment
python -m venv venv
- Activating the environment
./venv/Scripts/activate
- Installing dependencies
pip install -r requirements.txt
After setting up the environment, the server is started with the command
fastmcp run ./main_server.py:main_mcp_server --transport http
Running in Docker
- Build the image:
docker build -t mcp-main-server .
- Run the container, passing
.envas environment variables:
docker run --rm -p 8000:8000 --env-file .env mcp-main-server
Configuring the server connection in Cursor
-
Run the server using the command above
-
Open the settings
-
Add the MCP server configuration:
{
"mcpServers": {
"main-registry": {
"url": "http://localhost:8000/mcp/"
}}}
Local MCP server: proxy_mcp_server
- What is it: proxy MCP server that connects to the main registry (
main_server) and forwards its methods, and provides a high-level pipeline for pre-preparing data for RAG. - Where is it:
proxy_mcp_server/proxy_mcp_server.py - Available tools:
get_server_and_tools()— get a list of servers and their tools from the registryrouter(server_name: str, tool_name: str, params: dict)— universal call routerpreprocessing_data_for_rag(file_paths: List[str]) -> str— prepare PDF/texts and create a collection in Qdrant; returns the collection namehealth_check_servers()— check if all services are available
Requirements:
main_serveris running and accessible via URL (e.g.http://localhost:8000/mcp/).- Valid API key
MAIN_SERVER_API_KEY(must matchAuthorizationheader inproxy_mcp_server.py). - Update
urlandheaders.Authorizationinconfigobject insideproxy_mcp_server/proxy_mcp_server.pyif necessary.
Connection in Cursor (example):
{
"mcpServers": {
"proxy-server": {
"command": "uv",
"args": [
"run",
"fastmcp",
"run",
"YOUR_PATH_TO/proxy_mcp_server/proxy_mcp_server.py:proxy_mcp_server"]
}}}
Launch from terminal:
fastmcp run ./proxy_mcp_server/proxy_mcp_server.py:proxy_mcp_server
RAG inference: interactive launch
- What is this: console assistant for asking questions to a collection of documents in Qdrant with additional ranking and generation of LLM response.
- Location:
rag_inference/RAG workflow.py - Preliminary environment variables:
QDRANT_URL,QDRANT_API_KEY,RERANK_URL,RERANK_MODEL,LLM_SERVICE_CHAT_COMPLETIONS_URL,LLM_SERVICE_API_KEY,LLM_SERVICE_MODEL,EMBEDDING_URL,EMBEDDING_MODELare used (described above in the settings section).
Run (Windows PowerShell):
python ".\rag_inference\RAG workflow.py" <collection_name>
Where <collection_name> is the name of the collection in Qdrant. It is convenient to get it in advance by calling the preprocessing_data_for_rag tool from proxy_mcp_server and passing a list of files to index; the method will return the name of the created collection.
Example:
python ".\rag_inference\RAG workflow.py" collection_for_rag_1