serpent
An open-source metasearch backend built for MCP / AI agent workflows.
It aggregates results from multiple search engines, returns a unified schema, and exposes both a standard HTTP API and an MCP server that LLM agents can call directly.
Why this exists
Most search aggregators are designed for human-readable output: HTML pages, result cards, pagination UIs. When an LLM agent needs to search the web, it needs something different: structured JSON, stable field names, concurrent multi-source results, and predictable error handling.
serpent is designed for that use case. It is not a SearXNG clone.
Positioning
- Agent-friendly metasearch backend
- MCP-first search gateway for LLM workflows
- Structured search API designed for AI pipelines
Supported providers
Google is not scraped directly. The reason is practical: Google's anti-bot measures make self-hosted scraping fragile. Maintaining a reliable scraper against Google's continuously evolving detection means constant breakage and high maintenance overhead. For production use cases, third-party providers are more reliable and cost-effective.
Currently supported Google providers:
| Provider | Env var | Notes |
|---|---|---|
| serpbase.dev | SERPBASE_API_KEY | Pay-per-use; generally cheaper for low volume |
| serper.dev | SERPER_API_KEY | 2,500 free queries, then pay-per-use |
Both are low-cost options. For casual or low-volume use, serpbase.dev tends to be cheaper per query. Either works; configure whichever you prefer, or both for fallback.
Web search
| Provider | name | Method | Auth |
|---|---|---|---|
| DuckDuckGo | duckduckgo | HTML scraping (lite endpoint) | No |
| Bing | bing | HTML scraping | No |
| Yahoo | yahoo | HTML scraping | No |
| Brave | brave | Official Search API | Optional (free tier: 2000/month) |
| Ecosia | ecosia | HTML scraping | No |
| Mojeek | mojeek | HTML scraping | No |
| Startpage | startpage | HTML scraping (best-effort) | No |
| Qwant | qwant | Internal JSON API (best-effort) | No |
| Yandex | yandex | HTML scraping (best-effort) | No |
| Baidu | baidu | HTML scraping (best-effort) | No |
Providers marked best-effort use undocumented endpoints or scraping targets with strong anti-bot measures. They may stop working without warning.
Knowledge / reference
| Provider | name | Method | Auth |
|---|---|---|---|
| Wikipedia | wikipedia | MediaWiki Action API | No |
| Wikidata | wikidata | Wikidata API (entity search) | No |
| Internet Archive | internet_archive | Advanced Search API | No |
Developer
| Provider | name | Method | Auth |
|---|---|---|---|
| GitHub | github | GitHub REST API | No (token raises rate limit) |
| Stack Overflow | stackoverflow | Stack Exchange API | No (key raises limit) |
| Hacker News | hackernews | Algolia HN API | No |
reddit | Public JSON API | No | |
| npm | npm | npm registry API | No |
| PyPI | pypi | HTML scraping | No |
| crates.io | crates | crates.io REST API | No |
Academic
| Provider | name | Method | Auth |
|---|---|---|---|
| arXiv | arxiv | Atom API | No |
| PubMed | pubmed | NCBI E-utilities | No (key raises rate limit) |
| Semantic Scholar | semanticscholar | Graph API | No (key raises rate limit) |
| CrossRef | crossref | REST API (145M+ DOIs) | No |
Installation
# Clone the repository
git clone https://github.com/your-org/serpent
cd serpent
# Install with pip (editable)
pip install -e ".[dev]"
# Or with uv
uv pip install -e ".[dev]"
Configuration
Copy .env.example to .env and fill in your keys:
cp .env.example .env
# Required for Google search (at least one)
SERPBASE_API_KEY=your_key_here
SERPER_API_KEY=your_key_here
# Optional — omit to use unauthenticated/public access
BRAVE_API_KEY= # free tier: 2000 req/month
GITHUB_TOKEN= # raises rate limit from 60 to 5000 req/hour
STACKEXCHANGE_API_KEY= # raises limit from 300 to 10,000 req/day
NCBI_API_KEY= # PubMed; raises from 3 to 10 req/sec
SEMANTIC_SCHOLAR_API_KEY= # raises from 1 to 10 req/sec
# Server
HOST=0.0.0.0
PORT=8000
# Restrict which providers are active (comma-separated, empty = all available)
ENABLED_PROVIDERS=
ALLOW_UNSTABLE_PROVIDERS=false
# Timeouts in seconds
DEFAULT_TIMEOUT=10
AGGREGATOR_TIMEOUT=15
MAX_RESULTS_PER_PROVIDER=10
Running
HTTP API server
python -m serpent.main
# or
serpent
Server starts at http://localhost:8000. Interactive docs at /docs.
MCP server
python -m serpent.mcp_server
# or
serpent-mcp
The MCP server communicates over stdio. Use it with any MCP-compatible client (Claude Desktop, cline, continue.dev, etc.).
Docker
Build the image:
docker build -t serpent .
Run the HTTP API:
docker run --rm -p 8000:8000 --env-file .env serpent
Or with Docker Compose:
docker compose up --build
The container starts the HTTP API on http://localhost:8000.
HTTP API
POST /search
Aggregate search across all enabled providers.
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "rust async runtime"}'
With explicit providers and params:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "rust async runtime",
"providers": ["duckduckgo", "wikipedia"],
"params": {"num_results": 5, "language": "en"}
}'
Response:
{
"engine": "serpent",
"query": "rust async runtime",
"results": [
{
"title": "Tokio - An asynchronous Rust runtime",
"url": "https://tokio.rs",
"snippet": "Tokio is an event-driven, non-blocking I/O platform...",
"source": "tokio.rs",
"rank": 1,
"provider": "duckduckgo",
"published_date": null,
"extra": {}
}
],
"related_searches": ["tokio vs async-std", "rust futures"],
"suggestions": [],
"answer_box": null,
"timing_ms": 843.2,
"providers": [
{"name": "duckduckgo", "success": true, "result_count": 10, "latency_ms": 840.1, "error": null},
{"name": "wikipedia", "success": true, "result_count": 3, "latency_ms": 320.5, "error": null}
],
"errors": []
}
POST /search/google
curl -X POST http://localhost:8000/search/google \
-H "Content-Type: application/json" \
-d '{"query": "site:github.com rust tokio"}'
GET /health
curl http://localhost:8000/health
# {"status": "ok"}
GET /providers
curl http://localhost:8000/providers
{
"available": [
{"name": "google_serpbase", "tags": ["google", "web"]},
{"name": "duckduckgo", "tags": ["web", "privacy"]},
{"name": "wikipedia", "tags": ["web", "academic", "knowledge"]},
{"name": "github", "tags": ["code", "web"]},
{"name": "arxiv", "tags": ["academic", "web"]}
],
"count": 5
}
MCP usage
Configure your MCP client to run serpent-mcp (or python -m serpent.mcp_server).
Example Claude Desktop config (~/.claude/claude_desktop_config.json):
{
"mcpServers": {
"serpent": {
"command": "serpent-mcp",
"env": {
"SERPBASE_API_KEY": "your_key",
"SERPER_API_KEY": "your_key"
}
}
}
}
Available MCP tools
search_web
General web search across all enabled providers.
{
"query": "fastapi vs flask performance 2024",
"num_results": 10
}
search_google
Google search via a configured third-party provider.
{
"query": "site:docs.python.org asyncio",
"provider": "google_serpbase"
}
search_academic
Search arXiv and Wikipedia.
{
"query": "transformer architecture attention mechanism",
"num_results": 8
}
search_github
Search GitHub repositories.
{
"query": "python mcp server implementation",
"num_results": 5
}
compare_engines
Run the same query across multiple providers and return results grouped by engine.
{
"query": "vector database comparison",
"providers": ["duckduckgo", "brave"],
"num_results": 5
}
Result schema reference
Every result object has these fields:
| Field | Type | Description |
|---|---|---|
title | string | Result title |
url | string | Result URL |
snippet | string | Text excerpt / description |
source | string | Domain name |
rank | int | 1-based position in final merged list |
provider | string | Provider that returned this result |
published_date | string | null | ISO date (YYYY-MM-DD), if available |
extra | object | Provider-specific data (e.g. GitHub stars, arXiv authors) |
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with auto-reload
uvicorn serpent.main:app --reload
Roadmap
- Caching layer (in-memory / Redis) for repeated queries
- Relevance re-ranking across providers
- More providers: Bing (official API), Kagi, Tavily
- Rate limiting per provider with backoff
- Streaming responses (SSE) for long aggregations
- Docker image and Compose setup
- Provider health monitoring endpoint
- Result scoring and confidence signals
License
MIT