MCP Hub
Back to servers

Founder Intelligence Engine

Transforms founder profiles from social media into actionable strategic intelligence through automated scraping, LLM analysis, and personalized news tracking. It leverages vector search and caching to provide deep insights and relevant updates on specific founders.

glama
Updated
Mar 4, 2026

Founder Intelligence Engine — MCP Server

A production-grade Model Context Protocol (MCP) server that transforms founder profiles into actionable strategic intelligence.


Architecture

┌───────────────────────────────────────────────────────────┐
│                     MCP Client (Claude, etc.)             │
│                          ▲ stdio                          │
│               ┌──────────┴──────────┐                     │
│               │   MCP Server (Node) │                     │
│               │   3 registered tools│                     │
│               └──────┬──────────────┘                     │
│          ┌───────────┬┼──────────────┐                    │
│          ▼           ▼▼              ▼                    │
│  ┌──────────┐  ┌───────────┐  ┌──────────────┐           │
│  │  Apify   │  │   Groq    │  │  Embeddings  │           │
│  │  Scraping│  │   LLM     │  │  API         │           │
│  └────┬─────┘  └─────┬─────┘  └──────┬───────┘           │
│       └──────────────┬┘──────────────┘                    │
│                      ▼                                    │
│            ┌─────────────────┐                            │
│            │  Supabase       │                            │
│            │  (Postgres +    │                            │
│            │   pgvector)     │                            │
│            └─────────────────┘                            │
└───────────────────────────────────────────────────────────┘

Data Flow

  1. collect_profile — Scrapes LinkedIn + Twitter via Apify → merges data → generates embedding → stores in Supabase
  2. analyze_profile — Fetches stored profile → calls Groq LLM for strategic analysis → caches result
  3. fetch_personalized_news — Checks cache freshness → if stale: generates search queries → scrapes Google News → embeds articles → ranks by cosine similarity → summarizes with Groq → stores; if fresh: returns cached articles

Caching & Cost Optimization

OperationCostWhen It Runs
LinkedIn/Twitter scrapingHighOnly on profile creation
Groq profile analysisMediumOnce per profile (cached)
Google News + embeddingsHighOnly when news > 24h stale
Read cached articlesFreeEvery subsequent request

The fetch_history table tracks last_profile_scrape and last_news_fetch timestamps. The staleCheck.js module compares these against configurable thresholds.


Setup

1. Prerequisites

  • Node.js 20+
  • Supabase project (with pgvector enabled)
  • API keys: Apify, Groq, OpenAI-compatible Embeddings

2. Install

cd /Users/praveenkumar/Desktop/mcp
cp .env.example .env
# Edit .env with your real keys
npm install

3. Database

Run the migration against your Supabase SQL Editor:

-- Paste contents of migrations/001_init.sql

Or via psql:

psql $DATABASE_URL < migrations/001_init.sql

4. Run MCP Server

node src/index.js

5. Configure MCP Client

Add to your MCP client config (e.g., Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "founder-intelligence": {
      "command": "node",
      "args": ["/Users/praveenkumar/Desktop/mcp/src/index.js"],
      "env": {
        "SUPABASE_URL": "...",
        "SUPABASE_SERVICE_KEY": "...",
        "APIFY_API_TOKEN": "...",
        "GROQ_API_KEY": "...",
        "EMBEDDING_API_URL": "...",
        "EMBEDDING_API_KEY": "..."
      }
    }
  }
}

6. Background Worker (Optional)

# Single run (for cron)
node src/backgroundWorker.js

# Daemon mode
BACKGROUND_LOOP=true node src/backgroundWorker.js

Cron example (every 6 hours):

0 */6 * * * cd /app && node src/backgroundWorker.js >> /var/log/worker.log 2>&1

Project Structure

/Users/praveenkumar/Desktop/mcp/
├── migrations/
│   └── 001_init.sql
├── src/
│   ├── db/
│   │   └── supabaseClient.js
│   ├── services/
│   │   ├── apifyService.js
│   │   ├── embeddingService.js
│   │   └── llmService.js
│   ├── tools/
│   │   ├── collectProfile.js
│   │   ├── analyzeProfile.js
│   │   └── fetchPersonalizedNews.js
│   ├── utils/
│   │   ├── similarity.js
│   │   └── staleCheck.js
│   ├── backgroundWorker.js
│   └── index.js
├── .env.example
├── .gitignore
├── .dockerignore
├── Dockerfile
├── package.json
└── README.md

Docker Deployment

Build & Run

docker build -t founder-intelligence-mcp .
docker run --env-file .env founder-intelligence-mcp

Background Worker Container

docker run --env-file .env founder-intelligence-mcp node src/backgroundWorker.js

Docker Compose (production)

version: '3.8'
services:
  mcp-server:
    build: .
    env_file: .env
    stdin_open: true
    restart: unless-stopped

  worker:
    build: .
    env_file: .env
    command: ["node", "src/backgroundWorker.js"]
    environment:
      - BACKGROUND_LOOP=true
    restart: unless-stopped

Scaling Strategy

ComponentStrategy
MCP ServerOne instance per client (stdio-based)
Background WorkerSingle instance or Cloud Run Job on schedule
SupabaseConnection pooling via Supavisor; read replicas for scale
ApifyConcurrent actor runs (up to account limit)
EmbeddingsBatch requests (20 per call) to reduce round trips
GroqRate-limit aware with retry-after header handling

For high-profile-count deployments:

  • Move background worker to a Cloud Run Job triggered by Cloud Scheduler
  • Use Supabase Edge Functions for scheduled refresh
  • Add a Redis cache layer for hot profile lookups

Security Best Practices

  1. Service-role key only on server side — never expose to clients
  2. All secrets via environment variables — no hardcoded keys
  3. Non-root Docker usermcp user in container
  4. Input validation — Zod schemas on all tool inputs
  5. Row Level Security — enable RLS on Supabase tables for multi-tenant
  6. API token rotation — rotate Apify, Groq, and embedding keys periodically
  7. Rate limiting — built-in retry logic with exponential backoff
  8. No PII logging — profile data stays in Supabase, not console

Cost Optimization

ServiceCost DriverMitigation
ApifyActor compute unitsScrape only on creation; cache results
GroqToken usageAnalyze once (cached); batch news summaries
EmbeddingsAPI callsBatch 20 at a time; embed once per article
SupabaseRow count + storageDeduplicate articles by URL; prune old articles

Expected cost per profile lifecycle:

  • Initial setup: ~$0.05–0.15 (scrape + embed + analyze)
  • Daily news refresh: ~$0.02–0.08 (scrape + embed + summarize top 10)
  • Cached reads: $0.00

Future Improvement Roadmap

  1. HTTP/SSE transport — support remote MCP clients over HTTP
  2. Multi-tenant profiles — user-scoped access with RLS
  3. Real-time alerts — push notifications when high-relevance news drops
  4. Competitor tracking — dedicated tool to monitor named competitors
  5. Founder network graph — map connections between analyzed founders
  6. Custom embedding models — fine-tuned models for startup/VC domain
  7. Article full-text extraction — deep content scraping for richer embeddings
  8. A/B prompt testing — experiment with different Groq prompts for analysis quality
  9. Dashboard UI — web interface for browsing intelligence feeds
  10. Webhook integrations — push intelligence to Slack, email, or CRM

Reviews

No reviews yet

Sign in to write a review