serpent

An open-source metasearch backend built for MCP / AI agent workflows.

It aggregates results from multiple search engines, returns a unified schema, and exposes both a standard HTTP API and an MCP server that LLM agents can call directly.

Why this exists

Most search aggregators are designed for human-readable output: HTML pages, result cards, pagination UIs. When an LLM agent needs to search the web, it needs something different: structured JSON, stable field names, concurrent multi-source results, and predictable error handling.

serpent is designed for that use case. It is not a SearXNG clone.

Positioning

Agent-friendly metasearch backend
MCP-first search gateway for LLM workflows
Structured search API designed for AI pipelines

Supported providers

Google

Google is not scraped directly. The reason is practical: Google's anti-bot measures make self-hosted scraping fragile. Maintaining a reliable scraper against Google's continuously evolving detection means constant breakage and high maintenance overhead. For production use cases, third-party providers are more reliable and cost-effective.

Currently supported Google providers:

Provider	Env var	Notes
serpbase.dev	`SERPBASE_API_KEY`	Pay-per-use; generally cheaper for low volume
serper.dev	`SERPER_API_KEY`	2,500 free queries, then pay-per-use

Both are low-cost options. For casual or low-volume use, serpbase.dev tends to be cheaper per query. Either works; configure whichever you prefer, or both for fallback.

Web search

Provider	name	Method	Auth
DuckDuckGo	`duckduckgo`	HTML scraping (lite endpoint)	No
Bing	`bing`	HTML scraping	No
Yahoo	`yahoo`	HTML scraping	No
Brave	`brave`	Official Search API	Optional (free tier: 2000/month)
Ecosia	`ecosia`	HTML scraping	No
Mojeek	`mojeek`	HTML scraping	No
Startpage	`startpage`	HTML scraping (best-effort)	No
Qwant	`qwant`	Internal JSON API (best-effort)	No
Yandex	`yandex`	HTML scraping (best-effort)	No
Baidu	`baidu`	HTML scraping (best-effort)	No

Providers marked best-effort use undocumented endpoints or scraping targets with strong anti-bot measures. They may stop working without warning.

Knowledge / reference

Provider	name	Method	Auth
Wikipedia	`wikipedia`	MediaWiki Action API	No
Wikidata	`wikidata`	Wikidata API (entity search)	No
Internet Archive	`internet_archive`	Advanced Search API	No

Developer

Provider	name	Method	Auth
GitHub	`github`	GitHub REST API	No (token raises rate limit)
Stack Overflow	`stackoverflow`	Stack Exchange API	No (key raises limit)
Hacker News	`hackernews`	Algolia HN API	No
Reddit	`reddit`	Public JSON API	No
npm	`npm`	npm registry API	No
PyPI	`pypi`	HTML scraping	No
crates.io	`crates`	crates.io REST API	No

Academic

Provider	name	Method	Auth
arXiv	`arxiv`	Atom API	No
PubMed	`pubmed`	NCBI E-utilities	No (key raises rate limit)
Semantic Scholar	`semanticscholar`	Graph API	No (key raises rate limit)
CrossRef	`crossref`	REST API (145M+ DOIs)	No

Installation

# Clone the repository
git clone https://github.com/your-org/serpent
cd serpent

# Install with pip (editable)
pip install -e ".[dev]"

# Or with uv
uv pip install -e ".[dev]"

Configuration

Copy .env.example to .env and fill in your keys:

cp .env.example .env

# Required for Google search (at least one)
SERPBASE_API_KEY=your_key_here
SERPER_API_KEY=your_key_here

# Optional — omit to use unauthenticated/public access
BRAVE_API_KEY=            # free tier: 2000 req/month
GITHUB_TOKEN=             # raises rate limit from 60 to 5000 req/hour
STACKEXCHANGE_API_KEY=    # raises limit from 300 to 10,000 req/day
NCBI_API_KEY=             # PubMed; raises from 3 to 10 req/sec
SEMANTIC_SCHOLAR_API_KEY= # raises from 1 to 10 req/sec

# Server
HOST=0.0.0.0
PORT=8000

# Restrict which providers are active (comma-separated, empty = all available)
ENABLED_PROVIDERS=
ALLOW_UNSTABLE_PROVIDERS=false

# Timeouts in seconds
DEFAULT_TIMEOUT=10
AGGREGATOR_TIMEOUT=15
MAX_RESULTS_PER_PROVIDER=10

Running

HTTP API server

python -m serpent.main
# or
serpent

Server starts at http://localhost:8000. Interactive docs at /docs.

MCP server

python -m serpent.mcp_server
# or
serpent-mcp

The MCP server communicates over stdio. Use it with any MCP-compatible client (Claude Desktop, cline, continue.dev, etc.).

Docker

Build the image:

docker build -t serpent .

Run the HTTP API:

docker run --rm -p 8000:8000 --env-file .env serpent

Or with Docker Compose:

docker compose up --build

The container starts the HTTP API on http://localhost:8000.

HTTP API

POST /search

Aggregate search across all enabled providers.

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "rust async runtime"}'

With explicit providers and params:

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "rust async runtime",
    "providers": ["duckduckgo", "wikipedia"],
    "params": {"num_results": 5, "language": "en"}
  }'

Response:

{
  "engine": "serpent",
  "query": "rust async runtime",
  "results": [
    {
      "title": "Tokio - An asynchronous Rust runtime",
      "url": "https://tokio.rs",
      "snippet": "Tokio is an event-driven, non-blocking I/O platform...",
      "source": "tokio.rs",
      "rank": 1,
      "provider": "duckduckgo",
      "published_date": null,
      "extra": {}
    }
  ],
  "related_searches": ["tokio vs async-std", "rust futures"],
  "suggestions": [],
  "answer_box": null,
  "timing_ms": 843.2,
  "providers": [
    {"name": "duckduckgo", "success": true, "result_count": 10, "latency_ms": 840.1, "error": null},
    {"name": "wikipedia", "success": true, "result_count": 3, "latency_ms": 320.5, "error": null}
  ],
  "errors": []
}

POST /search/google

curl -X POST http://localhost:8000/search/google \
  -H "Content-Type: application/json" \
  -d '{"query": "site:github.com rust tokio"}'

GET /health

curl http://localhost:8000/health
# {"status": "ok"}

GET /providers

curl http://localhost:8000/providers

{
  "available": [
    {"name": "google_serpbase", "tags": ["google", "web"]},
    {"name": "duckduckgo", "tags": ["web", "privacy"]},
    {"name": "wikipedia", "tags": ["web", "academic", "knowledge"]},
    {"name": "github", "tags": ["code", "web"]},
    {"name": "arxiv", "tags": ["academic", "web"]}
  ],
  "count": 5
}

MCP usage

Configure your MCP client to run serpent-mcp (or python -m serpent.mcp_server).

Example Claude Desktop config (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "serpent": {
      "command": "serpent-mcp",
      "env": {
        "SERPBASE_API_KEY": "your_key",
        "SERPER_API_KEY": "your_key"
      }
    }
  }
}

Available MCP tools

`search_web`

General web search across all enabled providers.

{
  "query": "fastapi vs flask performance 2024",
  "num_results": 10
}

`search_google`

Google search via a configured third-party provider.

{
  "query": "site:docs.python.org asyncio",
  "provider": "google_serpbase"
}

`search_academic`

Search arXiv and Wikipedia.

{
  "query": "transformer architecture attention mechanism",
  "num_results": 8
}

`search_github`

Search GitHub repositories.

{
  "query": "python mcp server implementation",
  "num_results": 5
}

`compare_engines`

Run the same query across multiple providers and return results grouped by engine.

{
  "query": "vector database comparison",
  "providers": ["duckduckgo", "brave"],
  "num_results": 5
}

Result schema reference

Every result object has these fields:

Field	Type	Description
`title`	string	Result title
`url`	string	Result URL
`snippet`	string	Text excerpt / description
`source`	string	Domain name
`rank`	int	1-based position in final merged list
`provider`	string	Provider that returned this result
`published_date`	string \| null	ISO date (YYYY-MM-DD), if available
`extra`	object	Provider-specific data (e.g. GitHub stars, arXiv authors)

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with auto-reload
uvicorn serpent.main:app --reload

Roadmap

Caching layer (in-memory / Redis) for repeated queries
Relevance re-ranking across providers
More providers: Bing (official API), Kagi, Tavily
Rate limiting per provider with backoff
Streaming responses (SSE) for long aggregations
Docker image and Compose setup
Provider health monitoring endpoint
Result scoring and confidence signals

License

MIT

serpent

serpent

Why this exists

Positioning

Supported providers

Google

Web search

Knowledge / reference

Developer

Academic

Installation

Configuration

Running

HTTP API server

MCP server

Docker

HTTP API

POST /search

POST /search/google

GET /health

GET /providers

MCP usage

Available MCP tools

search_web

search_google

search_academic

search_github

compare_engines

Result schema reference

Development

Roadmap

License

Reviews

`search_web`

`search_google`

`search_academic`

`search_github`

`compare_engines`