MetaSearchMCP
Open-source metasearch backend for MCP, AI agents, and LLM workflows.
MetaSearchMCP aggregates results from multiple search providers, normalizes them into a stable JSON schema, and exposes both an HTTP API and an MCP server for agent tooling.
Positioning
- MCP-first metasearch backend
- Structured search API for AI pipelines
- Multi-provider search orchestration with deduplication and fallback
- Python FastAPI alternative to browser-first metasearch projects
Why It Exists
Most search aggregators are designed around browser UX: HTML pages, pagination, and interactive result cards. Agents and LLM workflows need a different contract: predictable JSON, stable field names, partial-failure tolerance, and provider-level execution metadata.
MetaSearchMCP is built for that machine-consumable workflow. It is not a SearXNG clone. The design is centered on search orchestration, normalized contracts, and MCP integration.
Core Features
- Concurrent multi-provider aggregation
- Unified result schema for web, academic, developer, and knowledge sources
- Provider-level timeout isolation and partial-failure handling
- Result deduplication across engines
- Provider selection by explicit names or semantic tags such as
web,academic,code, andgoogle - Final result caps for agent-friendly payload sizing
- HTTP API with OpenAPI docs
- MCP server over stdio for Claude Desktop, Cline, Continue, and similar clients
- Configurable provider allowlist via environment variables
Google Support
Google is intentionally not scraped directly in this project.
In practice, Google's anti-bot and risk-control systems make self-hosted scraping brittle and expensive to maintain. For a backend intended for reliable MCP and AI workloads, hosted Google providers are the more practical option.
Currently supported Google providers:
| Provider | Env var | Notes |
|---|---|---|
| serpbase.dev | SERPBASE_API_KEY | Pay-per-use; typically cheaper for low-volume usage |
| serper.dev | SERPER_API_KEY | Includes a free tier, then pay-per-use |
Both are low-cost options. For smaller or occasional workloads, serpbase.dev is usually the lower-cost choice.
Supported Providers
| Provider | Name | Method |
|---|---|---|
| SerpBase | google_serpbase | Hosted Google SERP API |
| Serper | google_serper | Hosted Google SERP API |
Web Search
| Provider | Name | Method |
|---|---|---|
| DuckDuckGo | duckduckgo | HTML scraping |
| Bing | bing | RSS feed |
| Yahoo | yahoo | HTML scraping, best effort |
| Brave | brave | Official Search API |
| Mwmbl | mwmbl | Public JSON API |
| Ecosia | ecosia | HTML scraping |
| Mojeek | mojeek | HTML scraping |
| Startpage | startpage | HTML scraping, best effort |
| Qwant | qwant | Internal JSON API, best effort |
| Yandex | yandex | HTML scraping, best effort |
| Baidu | baidu | JSON endpoint, best effort |
Knowledge And Reference
| Provider | Name | Method |
|---|---|---|
| Wikipedia | wikipedia | MediaWiki API |
| Wikidata | wikidata | Wikidata API |
| Internet Archive | internet_archive | Advanced Search API |
| Open Library | openlibrary | Open Library search API |
Developer Sources
| Provider | Name | Method |
|---|---|---|
| GitHub | github | GitHub REST API |
| GitLab | gitlab | GitLab REST API |
| Stack Overflow | stackoverflow | Stack Exchange API |
| Hacker News | hackernews | Algolia HN API |
reddit | Reddit API | |
| npm | npm | npm registry API |
| PyPI | pypi | HTML scraping |
| RubyGems | rubygems | RubyGems search API |
| crates.io | crates | crates.io API |
| lib.rs | lib_rs | HTML scraping |
| Docker Hub | dockerhub | Docker Hub search API |
| pkg.go.dev | pkg_go_dev | HTML scraping |
| MetaCPAN | metacpan | MetaCPAN REST API |
Academic Sources
| Provider | Name | Method |
|---|---|---|
| arXiv | arxiv | Atom API |
| PubMed | pubmed | NCBI E-utilities |
| Semantic Scholar | semanticscholar | Graph API |
| CrossRef | crossref | REST API |
Finance Sources
| Provider | Name | Key Required | Free Tier |
|---|---|---|---|
| Yahoo Finance | yahoo_finance | No | Unofficial endpoint, no key needed |
| Alpha Vantage | alpha_vantage | ALPHA_VANTAGE_API_KEY | 25 req/day — get key |
| Finnhub | finnhub | FINNHUB_API_KEY | 60 req/min — get key |
Installation
One-command local install:
python scripts/install.py
Install, run tests, and start the HTTP API:
python scripts/install.py --dev --test --run
Deploy with Docker Compose:
python scripts/install.py --mode docker
The installer creates .env from .env.example when .env does not already exist. Existing .env files are kept unless --force-env is passed.
Manual install:
git clone https://github.com/gefsikatsinelou/MetaSearchMCP
cd MetaSearchMCP
pip install -e ".[dev]"
Or with uv:
uv pip install -e ".[dev]"
Configuration
Copy .env.example to .env and configure any providers you want to enable.
cp .env.example .env
Key settings:
HOST=0.0.0.0
PORT=8000
DEFAULT_TIMEOUT=10
AGGREGATOR_TIMEOUT=15
SERPBASE_API_KEY=
SERPER_API_KEY=
BRAVE_API_KEY=
GITHUB_TOKEN=
STACKEXCHANGE_API_KEY=
REDDIT_CLIENT_ID=
REDDIT_CLIENT_SECRET=
NCBI_API_KEY=
SEMANTIC_SCHOLAR_API_KEY=
ALPHA_VANTAGE_API_KEY=
FINNHUB_API_KEY=
ENABLED_PROVIDERS=
ALLOW_UNSTABLE_PROVIDERS=false
MAX_RESULTS_PER_PROVIDER=10
Running
HTTP API
python -m metasearchmcp.server
# or
metasearchmcp
The API starts on http://localhost:8000.
MCP Server
python -m metasearchmcp.broker
# or
metasearchmcp-mcp
The MCP server communicates over stdio.
Docker
docker build -t metasearchmcp .
docker run --rm -p 8000:8000 --env-file .env metasearchmcp
Or with Compose:
docker compose up --build
HTTP API
POST /search
Aggregate across all enabled providers or a selected provider subset.
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "rust async runtime",
"providers": ["duckduckgo", "wikipedia"],
"params": {"num_results": 5, "max_total_results": 8, "language": "en"}
}'
You can also narrow providers by tags:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "transformer attention",
"tags": ["academic", "knowledge"],
"params": {"num_results": 5, "max_total_results": 6}
}'
num_results controls how many results each provider can contribute. max_total_results caps the final merged response after deduplication.
POST /search/google
Search Google through a configured hosted provider.
curl -X POST http://localhost:8000/search/google \
-H "Content-Type: application/json" \
-d '{"query": "site:github.com rust tokio"}'
GET /providers
Return the currently available provider catalog.
The response includes provider descriptions and a tag-to-provider index for quick discovery.
You can filter the catalog by tag:
curl "http://localhost:8000/providers?tag=academic&tag=web"
GET /health
Simple health check endpoint. Returns service status, version, provider count, and the current provider name list.
Response Schema
Every aggregated response includes:
enginequeryresultsrelated_searchessuggestionsanswer_boxtiming_msproviderserrors
Every result item includes:
titleurlsnippetsourcerankproviderpublished_dateextra
Example response:
{
"engine": "metasearchmcp",
"query": "rust async runtime",
"results": [
{
"title": "Tokio - An asynchronous Rust runtime",
"url": "https://tokio.rs",
"snippet": "Tokio is an event-driven, non-blocking I/O platform...",
"source": "tokio.rs",
"rank": 1,
"provider": "duckduckgo",
"published_date": null,
"extra": {}
}
],
"related_searches": [],
"suggestions": [],
"answer_box": null,
"timing_ms": 843.2,
"providers": [
{
"name": "duckduckgo",
"success": true,
"result_count": 10,
"latency_ms": 840.1,
"error": null
}
],
"errors": []
}
MCP Tools
MetaSearchMCP exposes these MCP tools:
search_websearch_googlesearch_academicsearch_githubcompare_engines
search_web also accepts optional tags so agents can limit search to categories such as web, academic, code, or google.
All search tools accept max_total_results to keep the final payload compact.
Example Claude Desktop config:
{
"mcpServers": {
"MetaSearchMCP": {
"command": "metasearchmcp-mcp",
"env": {
"SERPBASE_API_KEY": "your_key",
"SERPER_API_KEY": "your_key"
}
}
}
}
Development
pip install -e ".[dev]"
pytest
uvicorn metasearchmcp.server:app --reload
Architecture
The public package is organized around these modules:
contracts.py: request and response modelscatalog.py: provider discovery and selectionorchestrator.py: concurrent search execution and response assemblymerge.py: URL normalization and deduplicationserver.py: FastAPI entrypointbroker.py: MCP entrypoint
Legacy module names are kept as compatibility shims for earlier imports.
Roadmap
- Caching and provider-aware query reuse
- Better scoring and ranking signals across providers
- Streaming aggregation responses
- Provider health telemetry
- More first-party API integrations where they improve reliability
License
MIT