Founder Intelligence Engine — MCP Server

A production-grade Model Context Protocol (MCP) server that transforms founder profiles into actionable strategic intelligence.

Architecture

┌───────────────────────────────────────────────────────────┐
│                     MCP Client (Claude, etc.)             │
│                          ▲ stdio                          │
│               ┌──────────┴──────────┐                     │
│               │   MCP Server (Node) │                     │
│               │   3 registered tools│                     │
│               └──────┬──────────────┘                     │
│          ┌───────────┬┼──────────────┐                    │
│          ▼           ▼▼              ▼                    │
│  ┌──────────┐  ┌───────────┐  ┌──────────────┐           │
│  │  Apify   │  │   Groq    │  │  Embeddings  │           │
│  │  Scraping│  │   LLM     │  │  API         │           │
│  └────┬─────┘  └─────┬─────┘  └──────┬───────┘           │
│       └──────────────┬┘──────────────┘                    │
│                      ▼                                    │
│            ┌─────────────────┐                            │
│            │  Supabase       │                            │
│            │  (Postgres +    │                            │
│            │   pgvector)     │                            │
│            └─────────────────┘                            │
└───────────────────────────────────────────────────────────┘

Data Flow

collect_profile — Scrapes LinkedIn + Twitter via Apify → merges data → generates embedding → stores in Supabase
analyze_profile — Fetches stored profile → calls Groq LLM for strategic analysis → caches result
fetch_personalized_news — Checks cache freshness → if stale: generates search queries → scrapes Google News → embeds articles → ranks by cosine similarity → summarizes with Groq → stores; if fresh: returns cached articles

Caching & Cost Optimization

Operation	Cost	When It Runs
LinkedIn/Twitter scraping	High	Only on profile creation
Groq profile analysis	Medium	Once per profile (cached)
Google News + embeddings	High	Only when news > 24h stale
Read cached articles	Free	Every subsequent request

The fetch_history table tracks last_profile_scrape and last_news_fetch timestamps. The staleCheck.js module compares these against configurable thresholds.

Setup

1. Prerequisites

Node.js 20+
Supabase project (with pgvector enabled)
API keys: Apify, Groq, OpenAI-compatible Embeddings

2. Install

cd /Users/praveenkumar/Desktop/mcp
cp .env.example .env
# Edit .env with your real keys
npm install

3. Database

Run the migration against your Supabase SQL Editor:

-- Paste contents of migrations/001_init.sql

Or via psql:

psql $DATABASE_URL < migrations/001_init.sql

4. Run MCP Server

node src/index.js

5. Configure MCP Client

Add to your MCP client config (e.g., Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "founder-intelligence": {
      "command": "node",
      "args": ["/Users/praveenkumar/Desktop/mcp/src/index.js"],
      "env": {
        "SUPABASE_URL": "...",
        "SUPABASE_SERVICE_KEY": "...",
        "APIFY_API_TOKEN": "...",
        "GROQ_API_KEY": "...",
        "EMBEDDING_API_URL": "...",
        "EMBEDDING_API_KEY": "..."
      }
    }
  }
}

6. Background Worker (Optional)

# Single run (for cron)
node src/backgroundWorker.js

# Daemon mode
BACKGROUND_LOOP=true node src/backgroundWorker.js

Cron example (every 6 hours):

0 */6 * * * cd /app && node src/backgroundWorker.js >> /var/log/worker.log 2>&1

Project Structure

/Users/praveenkumar/Desktop/mcp/
├── migrations/
│   └── 001_init.sql
├── src/
│   ├── db/
│   │   └── supabaseClient.js
│   ├── services/
│   │   ├── apifyService.js
│   │   ├── embeddingService.js
│   │   └── llmService.js
│   ├── tools/
│   │   ├── collectProfile.js
│   │   ├── analyzeProfile.js
│   │   └── fetchPersonalizedNews.js
│   ├── utils/
│   │   ├── similarity.js
│   │   └── staleCheck.js
│   ├── backgroundWorker.js
│   └── index.js
├── .env.example
├── .gitignore
├── .dockerignore
├── Dockerfile
├── package.json
└── README.md

Docker Deployment

Build & Run

docker build -t founder-intelligence-mcp .
docker run --env-file .env founder-intelligence-mcp

Background Worker Container

docker run --env-file .env founder-intelligence-mcp node src/backgroundWorker.js

Docker Compose (production)

version: '3.8'
services:
  mcp-server:
    build: .
    env_file: .env
    stdin_open: true
    restart: unless-stopped

  worker:
    build: .
    env_file: .env
    command: ["node", "src/backgroundWorker.js"]
    environment:
      - BACKGROUND_LOOP=true
    restart: unless-stopped

Scaling Strategy

Component	Strategy
MCP Server	One instance per client (stdio-based)
Background Worker	Single instance or Cloud Run Job on schedule
Supabase	Connection pooling via Supavisor; read replicas for scale
Apify	Concurrent actor runs (up to account limit)
Embeddings	Batch requests (20 per call) to reduce round trips
Groq	Rate-limit aware with retry-after header handling

For high-profile-count deployments:

Move background worker to a Cloud Run Job triggered by Cloud Scheduler
Use Supabase Edge Functions for scheduled refresh
Add a Redis cache layer for hot profile lookups

Security Best Practices

Service-role key only on server side — never expose to clients
All secrets via environment variables — no hardcoded keys
Non-root Docker user — mcp user in container
Input validation — Zod schemas on all tool inputs
Row Level Security — enable RLS on Supabase tables for multi-tenant
API token rotation — rotate Apify, Groq, and embedding keys periodically
Rate limiting — built-in retry logic with exponential backoff
No PII logging — profile data stays in Supabase, not console

Cost Optimization

Service	Cost Driver	Mitigation
Apify	Actor compute units	Scrape only on creation; cache results
Groq	Token usage	Analyze once (cached); batch news summaries
Embeddings	API calls	Batch 20 at a time; embed once per article
Supabase	Row count + storage	Deduplicate articles by URL; prune old articles

Expected cost per profile lifecycle:

Initial setup: ~$0.05–0.15 (scrape + embed + analyze)
Daily news refresh: ~$0.02–0.08 (scrape + embed + summarize top 10)
Cached reads: $0.00

Future Improvement Roadmap

HTTP/SSE transport — support remote MCP clients over HTTP
Multi-tenant profiles — user-scoped access with RLS
Real-time alerts — push notifications when high-relevance news drops
Competitor tracking — dedicated tool to monitor named competitors
Founder network graph — map connections between analyzed founders
Custom embedding models — fine-tuned models for startup/VC domain
Article full-text extraction — deep content scraping for richer embeddings
A/B prompt testing — experiment with different Groq prompts for analysis quality
Dashboard UI — web interface for browsing intelligence feeds
Webhook integrations — push intelligence to Slack, email, or CRM

Founder Intelligence Engine