Founder Intelligence Engine — MCP Server
A production-grade Model Context Protocol (MCP) server that transforms founder profiles into actionable strategic intelligence.
Architecture
┌───────────────────────────────────────────────────────────┐
│ MCP Client (Claude, etc.) │
│ ▲ stdio │
│ ┌──────────┴──────────┐ │
│ │ MCP Server (Node) │ │
│ │ 3 registered tools│ │
│ └──────┬──────────────┘ │
│ ┌───────────┬┼──────────────┐ │
│ ▼ ▼▼ ▼ │
│ ┌──────────┐ ┌───────────┐ ┌──────────────┐ │
│ │ Apify │ │ Groq │ │ Embeddings │ │
│ │ Scraping│ │ LLM │ │ API │ │
│ └────┬─────┘ └─────┬─────┘ └──────┬───────┘ │
│ └──────────────┬┘──────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Supabase │ │
│ │ (Postgres + │ │
│ │ pgvector) │ │
│ └─────────────────┘ │
└───────────────────────────────────────────────────────────┘
Data Flow
- collect_profile — Scrapes LinkedIn + Twitter via Apify → merges data → generates embedding → stores in Supabase
- analyze_profile — Fetches stored profile → calls Groq LLM for strategic analysis → caches result
- fetch_personalized_news — Checks cache freshness → if stale: generates search queries → scrapes Google News → embeds articles → ranks by cosine similarity → summarizes with Groq → stores; if fresh: returns cached articles
Caching & Cost Optimization
| Operation | Cost | When It Runs |
|---|---|---|
| LinkedIn/Twitter scraping | High | Only on profile creation |
| Groq profile analysis | Medium | Once per profile (cached) |
| Google News + embeddings | High | Only when news > 24h stale |
| Read cached articles | Free | Every subsequent request |
The fetch_history table tracks last_profile_scrape and last_news_fetch timestamps. The staleCheck.js module compares these against configurable thresholds.
Setup
1. Prerequisites
- Node.js 20+
- Supabase project (with pgvector enabled)
- API keys: Apify, Groq, OpenAI-compatible Embeddings
2. Install
cd /Users/praveenkumar/Desktop/mcp
cp .env.example .env
# Edit .env with your real keys
npm install
3. Database
Run the migration against your Supabase SQL Editor:
-- Paste contents of migrations/001_init.sql
Or via psql:
psql $DATABASE_URL < migrations/001_init.sql
4. Run MCP Server
node src/index.js
5. Configure MCP Client
Add to your MCP client config (e.g., Claude Desktop claude_desktop_config.json):
{
"mcpServers": {
"founder-intelligence": {
"command": "node",
"args": ["/Users/praveenkumar/Desktop/mcp/src/index.js"],
"env": {
"SUPABASE_URL": "...",
"SUPABASE_SERVICE_KEY": "...",
"APIFY_API_TOKEN": "...",
"GROQ_API_KEY": "...",
"EMBEDDING_API_URL": "...",
"EMBEDDING_API_KEY": "..."
}
}
}
}
6. Background Worker (Optional)
# Single run (for cron)
node src/backgroundWorker.js
# Daemon mode
BACKGROUND_LOOP=true node src/backgroundWorker.js
Cron example (every 6 hours):
0 */6 * * * cd /app && node src/backgroundWorker.js >> /var/log/worker.log 2>&1
Project Structure
/Users/praveenkumar/Desktop/mcp/
├── migrations/
│ └── 001_init.sql
├── src/
│ ├── db/
│ │ └── supabaseClient.js
│ ├── services/
│ │ ├── apifyService.js
│ │ ├── embeddingService.js
│ │ └── llmService.js
│ ├── tools/
│ │ ├── collectProfile.js
│ │ ├── analyzeProfile.js
│ │ └── fetchPersonalizedNews.js
│ ├── utils/
│ │ ├── similarity.js
│ │ └── staleCheck.js
│ ├── backgroundWorker.js
│ └── index.js
├── .env.example
├── .gitignore
├── .dockerignore
├── Dockerfile
├── package.json
└── README.md
Docker Deployment
Build & Run
docker build -t founder-intelligence-mcp .
docker run --env-file .env founder-intelligence-mcp
Background Worker Container
docker run --env-file .env founder-intelligence-mcp node src/backgroundWorker.js
Docker Compose (production)
version: '3.8'
services:
mcp-server:
build: .
env_file: .env
stdin_open: true
restart: unless-stopped
worker:
build: .
env_file: .env
command: ["node", "src/backgroundWorker.js"]
environment:
- BACKGROUND_LOOP=true
restart: unless-stopped
Scaling Strategy
| Component | Strategy |
|---|---|
| MCP Server | One instance per client (stdio-based) |
| Background Worker | Single instance or Cloud Run Job on schedule |
| Supabase | Connection pooling via Supavisor; read replicas for scale |
| Apify | Concurrent actor runs (up to account limit) |
| Embeddings | Batch requests (20 per call) to reduce round trips |
| Groq | Rate-limit aware with retry-after header handling |
For high-profile-count deployments:
- Move background worker to a Cloud Run Job triggered by Cloud Scheduler
- Use Supabase Edge Functions for scheduled refresh
- Add a Redis cache layer for hot profile lookups
Security Best Practices
- Service-role key only on server side — never expose to clients
- All secrets via environment variables — no hardcoded keys
- Non-root Docker user —
mcpuser in container - Input validation — Zod schemas on all tool inputs
- Row Level Security — enable RLS on Supabase tables for multi-tenant
- API token rotation — rotate Apify, Groq, and embedding keys periodically
- Rate limiting — built-in retry logic with exponential backoff
- No PII logging — profile data stays in Supabase, not console
Cost Optimization
| Service | Cost Driver | Mitigation |
|---|---|---|
| Apify | Actor compute units | Scrape only on creation; cache results |
| Groq | Token usage | Analyze once (cached); batch news summaries |
| Embeddings | API calls | Batch 20 at a time; embed once per article |
| Supabase | Row count + storage | Deduplicate articles by URL; prune old articles |
Expected cost per profile lifecycle:
- Initial setup: ~$0.05–0.15 (scrape + embed + analyze)
- Daily news refresh: ~$0.02–0.08 (scrape + embed + summarize top 10)
- Cached reads: $0.00
Future Improvement Roadmap
- HTTP/SSE transport — support remote MCP clients over HTTP
- Multi-tenant profiles — user-scoped access with RLS
- Real-time alerts — push notifications when high-relevance news drops
- Competitor tracking — dedicated tool to monitor named competitors
- Founder network graph — map connections between analyzed founders
- Custom embedding models — fine-tuned models for startup/VC domain
- Article full-text extraction — deep content scraping for richer embeddings
- A/B prompt testing — experiment with different Groq prompts for analysis quality
- Dashboard UI — web interface for browsing intelligence feeds
- Webhook integrations — push intelligence to Slack, email, or CRM