MCP Hub
Back to servers

merch-connector

MCP server that gives merchandising agents eyes on any storefront — scrape, audit, compare, roundtable analysis, and eval tracking via 11 tools.

npm292/wk
Updated
Mar 21, 2026

Quick Install

npx -y merch-connector

merch-connector

npm version License: MIT Node.js >= 18 MCP Server

An MCP server that gives AI agents eyes on any e-commerce storefront.

Scrape product listings, extract facets, badges, sort options, and B2B signals; run AI-powered merchandising audits; compare two storefronts side-by-side; detect what changed between visits; and build persistent memory about sites — all through the Model Context Protocol.


Why merch-connector?

E-commerce merchandising analysis is manual, repetitive, and fragmented. A merchandiser might spend hours clicking through competitor sites, checking if filters work, comparing product grids, and noting what's changed. AI agents can do this work — but they can't see storefronts the way shoppers do.

merch-connector bridges that gap. It gives any MCP-compatible AI agent (Claude, custom agents, etc.) the ability to:

  • Browse any storefront with a stealth headless browser that handles bot protection
  • Extract structured product data, facets, performance metrics, and page structure
  • Analyze merchandising quality through five expert personas or a full roundtable debate
  • Remember site quirks across sessions so the agent gets smarter over time
  • Track changes across visits — new products, price moves, facet/sort changes

Quick start

npx merch-connector

The server communicates over stdio and is designed to be launched by an MCP client, not run standalone.

Configuration

Add to your Claude Desktop claude_desktop_config.json or Claude Code .mcp.json:

{
  "mcpServers": {
    "merch-connector": {
      "command": "npx",
      "args": ["-y", "merch-connector"],
      "env": {
        "ANTHROPIC_API_KEY": "your_key_here"
      }
    }
  }
}

To enable Firecrawl (bypasses bot-protected sites like Ferguson/Akamai) or pass any other env vars, add them to the env block:

"env": {
  "ANTHROPIC_API_KEY": "your_key_here",
  "FIRECRAWL_API_KEY": "fc-..."
}

Or install globally: npm install -g merch-connector

Environment variables

VariableRequiredDescription
ANTHROPIC_API_KEYOne of theseAnthropic Claude API key
GEMINI_API_KEYOne of theseGoogle Gemini API key
OPENAI_API_KEYOne of theseOpenAI-compatible API key
OPENAI_BASE_URLNoBase URL for OpenAI-compatible endpoint. Defaults to https://api.openai.com/v1
MODEL_PROVIDERNoForce "anthropic", "gemini", or "openai". Auto-detected if omitted.
MODEL_NAMENoOverride default model. Defaults: claude-sonnet-4-6 / gemini-2.0-flash-exp
OPENAI_VISIONNoSet "true" to pass screenshots to OpenAI-compatible vision models
FIRECRAWL_API_KEYNoEnables Firecrawl as primary scraper in acquire — bypasses Akamai/bot protection. Falls back to Puppeteer if absent.
MERCH_CONNECTOR_DATA_DIRNoCustom path for site memory files. Default: ~/.merch-connector/data/
TOOL_TIMEOUT_MSNoAI tool timeout in ms. Default: 120000 (2 min)
MERCH_LOG_FILENoPath to NDJSON log file. If set, every server log entry is appended.

You only need an API key for AI-powered tools (ask_page, merch_roundtable). Scraping tools work without one.


Tools

ToolDescriptionNeeds AI key?
acquirePrimary tool. One-pass audit payload — products, facets, screenshots, performance, trust signals, navigation, data quality, analytics, and PDP samples in a single callNo
scrape_page(Deprecated — use acquire) Raw structured extraction from any category pageNo
interact_with_pageExecute one or more search/click actions in sequence, then extract the resultNo
compare_storefrontsStructured side-by-side diff of two URLs: facet gaps, trust signals, sort options, B2B mode, performanceNo
ask_pageScrape a page and ask any question about it in plain languageYes
merch_roundtableThree expert personas analyze in parallel, then a moderator synthesizes consensus (results stream as each persona completes)Yes
site_memoryRead/write persistent notes and learned data about any domainNo
clear_sessionReset stored cookies and page cache for a domainNo
get_logsRetrieve recent server log entries from the in-memory buffer, filterable by level or tool nameNo
save_evalPersist a roundtable run as a structured eval record with convergence scoreNo
list_evalsRetrieve eval history for a domain or all domainsNo

Examples

acquire

Pull everything needed for a full storefront audit in one call

{
  "url": "https://www.zappos.com/women/CK_XARC81wHAAQHiAgMBAhg.zso",
  "pdp_sample": 2
}

Returns the complete audit payload: products with trust signals, facets, sort, navigation structure, data quality scores, analytics platform detection, performance timings, desktop + mobile screenshots, and 2 sampled PDPs — ready for the plugin to score.

ask_page

"Recommend facet changes for this laptop category page"

{
  "url": "https://www.insight.com/en_US/shop/category/notebooks/store.html",
  "question": "Recommend facet changes?"
}

Brand/Manufacturer — Most glaring omission. 50 products span 6+ brands (HP, Lenovo, Apple, Microsoft, Dell, Crucial). B2B buyers with vendor agreements need this as facet #1.

Price range buckets are misaligned. "Below $50" (2 items) signals category contamination — confirmed by a Crucial RAM stick appearing in laptop results. Clean up category mapping and re-bucket starting at $500.

merch_roundtable

The roundtable scrapes once, then runs three AI analyses in parallel followed by a moderator synthesis:

  1. Floor Walker — reacts as a real shopper ("I can't find Dell laptops without scrolling through 50 products")
  2. Auditor — evaluates Trust/Guidance/Persuasion/Friction ("0% facet detection rate, title normalization at 70%")
  3. Scout — identifies competitive gaps ("every competitor in B2B tech has brand filtering as facet #1")
  4. Moderator — synthesizes consensus, surfaces disagreements, produces prioritized recommendations

B2B Auditor automatically substitutes for Auditor when B2B signals are detected.


Personas

Five expert lenses for merchandising analysis. Use individually via ask_page or merch_roundtable.

PersonaRoleVoice
Floor WalkerA shopper visiting for the first timeFirst-person, casual, instinctive — "I don't know what button to click"
AuditorCompliance analyst with a frameworkMetric-driven, precise — "Fill rate is 82%, 3/10 titles lack brand prefix"
ScoutVP of Merchandising at a competitorStrategic, comparative — "This is table-stakes for the category"
B2B AuditorProcurement buyer evaluating a vendorProcess-driven — scores steps-to-PO, spec completeness, pricing transparency, self-serve viability
Conversion ArchitectCRO specialist mapping the purchase funnelAnalytical, hypothesis-driven — "checkout button is below the fold on mobile, estimated −8% conversion"

Each persona returns score (0–100), severity (1–5), findings[] (3–5 concrete observations), and uniqueInsight — the one thing only that lens would catch.


Architecture

MCP Client (Claude, etc.)
    |
    | stdio (JSON-RPC)
    |
merch-connector (Node.js MCP server)
    |
    +-- acquire.js       One-pass audit entry point; Firecrawl → Puppeteer fallback
    +-- scraper.js       Puppeteer + stealth plugin, structure detection, PageFingerprint
    +-- analyzer.js      Multi-provider AI (Anthropic / Gemini / OpenAI), 5 personas
    +-- network-intel.js XHR interception, 35-platform fingerprint, dataLayer/GA4 parsing
    +-- site-memory.js   Persistent per-domain JSON store + change detection snapshots
    +-- eval-store.js    JSONL eval index + full run storage, convergence scoring
    +-- prompts/         Persona prompt files (floor-walker, auditor, scout, b2b-auditor, conversion-architect)
  • Scraping: Puppeteer with stealth plugin bypasses bot detection. Two-pass heuristic structure detection finds product grids on unknown sites. Extracts products, facets, trust signals (ratings, badges, stock warnings), performance timing, and screenshots. Firecrawl integration (FIRECRAWL_API_KEY) provides LLM-based extraction as a primary path for bot-protected sites.
  • Network intelligence: Intercepts XHR/fetch during page load to fingerprint the commerce stack (Algolia, Bloomreach, SFCC, Shopify, Elasticsearch, and 30+ more). When a high-confidence match is found, extracts product and facet data directly from the API response — bypassing DOM parsing failures on enterprise storefronts.
  • Analysis: Three-provider AI — Anthropic uses tool_choice forcing for structured JSON; Gemini uses responseSchema; OpenAI-compatible uses function calling with a JSON-prompt fallback. Dynamic imports load only the needed SDK. ask_page uses Haiku-class models for fast Q&A; persona analysis uses Sonnet-class.
  • Personas: Five expert lenses. merch_roundtable runs Floor Walker, Auditor, and Scout in parallel then passes results to a moderator that synthesizes consensus and disagreements. B2B Auditor auto-substitutes for Auditor when B2B mode is detected.
  • Memory: Auto-learns site patterns on every scrape. Normalized snapshots enable change detection across visits — price moves, new/removed products, facet/sort changes. Manual notes persist across sessions.
  • Evals: Two-tier storage — compact JSONL index (100 runs/domain) + full run JSON (10/domain). Convergence score (0–100) measures inter-persona agreement. Dedup hashing prevents double-saves.

Development

git clone https://github.com/grahamton/merchGent.git
cd merchGent
npm install
cp .env.example .env   # fill in at least one AI API key

Running tests

npm test                              # scrape-only (no API key needed)
npm run test:audit                    # full merchandising audit
npm run test:persona                  # single persona (floor_walker)
npm run test:roundtable               # all 3 personas + moderator
node test/smoke.js --b2b              # B2B validation: Insight.com laptops + b2b_auditor
node test/smoke.js --ask "question"   # ask anything about a page
node test/smoke.js --url https://...  # override default URL
node test/protocol.js                 # MCP protocol compliance (no browser/API key needed)

MCP Inspector

npx @modelcontextprotocol/inspector -- node bin/merch-connector.js

Opens a browser UI where you can call any tool interactively.


Tool reference

acquire

One-pass audit payload. The primary tool in v2 — replaces the multi-step scrape_page + analysis workflow. Returns everything the audit pipeline needs in a single call.

ParameterRequiredDescription
urlYesFull URL to acquire
pdp_sampleNoNumber of PDP samples to include (0–5, default 2). Auto-selects median-priced + premium (80th percentile) products.

Returns:

  • page — title, metaDescription, pageType, breadcrumb, h1
  • commerce — mode (B2B/B2C/Hybrid), platform, priceTransparency, loginRequired
  • products[] — normalized with trust signals, B2B/B2C indicators, description quality
  • facets[], sort — filter panel and sort state
  • navigation — hasFilterPanel, filterPanelPosition, hasStickyNav, breadcrumbPresent
  • trustSignals — ratingsOnCards, freeShippingPromised, returnPolicyVisible, urgencyMessaging
  • dataQuality — descriptionFillRate, ratingFillRate, priceFillRate
  • analytics — platform detection, GTM containers, ecommerce tracking status, productImpressionsFiring
  • performance — fcp, lcp, cls, domContentLoaded, loadComplete
  • pdpSamples[] — sampled PDP detail pages
  • screenshots — desktop + mobile base64 JPEG
  • warnings[] — structured quality flags with severity
  • scraper"firecrawl" or "puppeteer" (which path was used)

scrape_page

(Deprecated — use acquire) Raw structured extraction. Returns products (title, price, stock, CTA, description, B2B/B2C signals, trust signals), facets/filters, sort options, B2B mode + conflict score, page metadata, performance timing, data layers, interactable elements, and PageFingerprint. On repeat visits, also returns a changes diff.

ParameterRequiredDescription
urlYesFull URL to scrape
depthNoPagination pages to follow (1–5, default 1)
max_productsNoMax products per page (default 10)
include_screenshotNoInclude base64 JPEG desktop screenshot (default false)
mobile_screenshotNoAlso capture a 390×844 (iPhone 14) mobile screenshot (default false)

Trust signals per product: star rating, review count, sale badge + text, best seller flag, stock warning ("Only 3 left"), sustainability label, raw badge texts.

compare_storefronts

Scrape two URLs concurrently and return a structured diff. No AI call — pure structural analysis.

ParameterRequiredDescription
url_aYesFirst URL (your site or baseline)
url_bYesSecond URL (competitor or variant)
max_productsNoMax products per page (default 10)

Returns: product count delta, facet gap analysis (onlyInA / onlyInB / shared count), trust signal coverage per site, sort option gaps, B2B mode + conflict score for each, performance delta (FCP + full load).

interact_with_page

Execute one or more search/click actions in sequence, then extract the resulting page.

ParameterRequiredDescription
urlYesFull URL to load
actionsOne of theseArray of { action, selector?, value? } for multi-step flows
actionOne of theseSingle action shorthand: "search" or "click"
selectorDependsCSS selector (required for click)
valueDependsText to type (required for search)
include_screenshotNoInclude screenshot of result

Multi-step example: [{ "action": "search", "value": "laptop" }, { "action": "click", "selector": ".filter-in-stock" }]

ask_page

Scrape + AI Q&A. The model sees full product data, facets, performance, and a screenshot. Supports Anthropic (Haiku), Gemini, and OpenAI-compatible providers.

ParameterRequiredDescription
urlYesFull URL to scrape and ask about
questionYesPlain language question
depthNoPagination pages (default 1)
max_productsNoMax products per page (default 10)

merch_roundtable

Multi-persona analysis with moderator synthesis. Floor Walker, Auditor, and Scout run in parallel — each result is streamed as a notifications/message as it completes. B2B Auditor auto-substitutes for Auditor when B2B signals are detected.

ParameterRequiredDescription
urlYesFull URL to analyze
depthNoPagination pages (default 1)
max_productsNoMax products per page (default 10)

Returns: perspectives (each persona's typed result), debate.consensus, debate.disagreements, debate.finalRecommendations (with impact + endorsing personas).

site_memory

Persistent per-domain memory. Auto-accumulates on every scrape.

ParameterRequiredDescription
actionYes"read", "write", "list", or "delete"
urlDependsAny URL on the domain (required for read/write/delete)
noteNoText note to append (with write)
keyNoCustom field name (with write)
valueNoValue for the field (with write + key)

clear_session

Reset cookies and cached page data for a domain.

ParameterRequiredDescription
urlYesAny URL on the domain to clear

save_eval

Persist the most recent roundtable or audit run as a structured eval record. Reads from the session persona cache — no data round-trip through the model. Must call merch_roundtable on the same URL first.

ParameterRequiredDescription
urlYesURL of the run to save (must match a cached session)
noteNoOptional free-text annotation

Returns: eval ID, convergence score (0–100 inter-persona agreement), top concerns per persona, moderator summary excerpt, dedup hash.

list_evals

Retrieve eval history for a domain or all domains.

ParameterRequiredDescription
urlNoFilter to a specific domain. Omit to return all domains with eval history.

get_logs

Retrieve recent server log entries from the in-memory circular buffer (500 entries).

ParameterRequiredDescription
levelNoFilter by level: "error", "warn", "info", "debug"
toolNoFilter by tool name (e.g. "merch_roundtable")
limitNoMax entries to return (default 50)

History

v2.0.2 — MCP-014 acquire field fixes

  • trustSignals.avgRating: Renamed from avgRatingAcrossProducts to match the field name the plugin audit command expects — was causing silent scoring failures on every acquire call
  • Warning severity values: Remapped from "high"/"medium"/"low" to "error"/"warn" across all warnings[] entries to match the plugin's expected enum

v2.0.1 — Model alias fix + full multi-provider ask_page

  • MCP-013: Replaced retired claude-3-5-sonnet-latest alias with claude-sonnet-4-6 across all Anthropic calls — fixes ask_page, merch_roundtable, and all persona analysis tools that were returning 404 errors
  • ask_page multi-provider: Added full Gemini and OpenAI-compatible implementations (were placeholder stubs). Anthropic path now uses Haiku-class model for fast, cost-effective Q&A; all persona analysis continues to use Sonnet.
  • MCP-015 docs: FIRECRAWL_API_KEY documented in README and CLAUDE.md; configuration example updated with env passthrough pattern

v2.0.0 — acquire tool: one-pass v2 architecture

  • New acquire tool: Single call replaces the 6–8 step scrape_page + analysis workflow — returns products, facets, screenshots, performance, trust signals, navigation, data quality, PDP samples, analytics, and warnings[] in one payload
  • Firecrawl integration: LLM extraction via Firecrawl as primary scraper with automatic Puppeteer fallback; scraper field reports which path was used and any fallback reason
  • audit_storefront retired: Returns a hard error directing callers to acquire; scrape_page marked deprecated with log warning
  • Protocol tests updated: 34/34 passing; acquire in tool list, audit_storefront absent, scrape_page deprecation asserted

v1.9.2 — MCP-002 & MCP-005 fixes, roundtable refactor, B2B persona routing

  • MCP-002: Restored extractFacetsGeneric fallback + hasFacetStructure structural scoring bonus (+20); added nested wrapper key support (response.*, data.*) and wired generic extraction as a fallback in extractFromBestApi — "Unknown Facet" no longer appears when XHR data is available
  • MCP-005: Mobile screenshots now dismiss OneTrust, Cookiebot, and TrustArc consent overlays before capture; blank-image threshold raised to 20 KB to reliably reject consent-blocked frames
  • Roundtable refactor: Collapsed per-provider per-persona duplicates into generic dispatch functions (~1000 lines removed); merch_roundtable auto-substitutes the B2B auditor persona when B2B signals are detected

v1.9.1 — Bug fixes from Cowork plugin QA sweep

  • CSS selector safety: compare_storefronts no longer crashes on Tailwind JIT arbitrary-value class names — all class-to-selector conversions now use CSS.escape()
  • Paint timing: FCP and first-paint captured via pre-navigation PerformanceObserver — no longer returns 0 on SPA category pages
  • Mobile screenshot: renders in a fresh browser page with UA + viewport set before navigation, fixing blank white screen on UA-gated SPAs
  • PDP pageType: URL pattern signals (/product/, /p/, /buy/product/, /pdp/) now take priority over DOM product-count heuristics, fixing misclassification on PDPs with related-product carousels
  • AI timeout resilience: audit_storefront and merch_roundtable cap the product payload sent to AI at 20 items, reducing prompt size and inference time
  • scrape_pdp price extraction: falls back to CTA button text when no dedicated price element is found; hasReviews and specTable.present now require count > 0
  • Facet resolution: "Unknown Facet" placeholders replaced with real names from intercepted XHR when a search API is detected
  • get_category_sample: error response now includes reason and suggestion when no product URLs are found

v1.9.0 — PDP sampling, smarter facets, B2B fingerprint depth

  • scrape_pdp tool: dedicated PDP scraper returning description fill rate, image count, review schema, spec table, cross-sell modules, CTA text, and primary/sale prices — purpose-built for single product pages
  • get_category_sample tool: scrapes a category page and runs scrape_pdp in parallel on a spread/random/top selection of products — one call for a multi-PDP spot check
  • Facet detection hardened: Strategy 1 now skips parent containers that wrap multiple filter groups (fixes the "all filters collapsed into one facet" bug on obfuscated-class sites like Zappos); Strategy 2 replaced with heading-to-heading tree walker so filter groups segment correctly regardless of CSS class names
  • B2B fingerprint depth: three new fingerprint fields — contractPricingVisible, loginRequired, accountPersonalization; audit_storefront now uses a dedicated AUDIT_TIMEOUT_MS (default 240s); PageSpeed Insights Core Web Vitals available via include_pagespeed: true on scrape_page

v1.8.0 — Persona architecture v2

  • PA-2 Fingerprint context injection: every persona now receives a ## Page Intelligence (pre-scan) block prepended to its prompt — pageType, platform, commerceMode, trust signal inventory, top risks, and recommended personas — so the AI orients before reading raw product data
  • PA-4 Unified base schema: all personas return score (0–100), severity (1–5), findings[] (3–5 observations), uniqueInsight — enabling structured cross-persona comparison
  • PA-5 Smart auto-selection: audit_storefront accepts persona: "auto"selectPersonas(fingerprint) picks the best-fit lens based on pageType and commerceMode
  • PA-6 Conversion Architect: new CRO persona maps funnel stages, catalogs friction inventory, identifies top drop-off risk, generates A/B hypotheses with estimated lift ranges
  • Perf: roundtable log entries no longer embed full result objects — get_logs payload reduced ~95% for cached re-runs

v1.7.0 — PageFingerprint + synchronous moderator

  • PA-3 Synchronous moderator: merch_roundtable now awaits the moderator synthesis before returning — debate.consensus and debate.finalRecommendations[] are guaranteed in the tool response
  • PA-1 PageFingerprint: every scrape result now includes a fingerprint field with no extra AI call — pageType, platform, commerceMode, priceTransparency, trustSignalInventory, discoveryQuality, funnelReadiness, topRisks[], recommendedPersonas[]
  • Category contamination detector: scrape_page returns contamination: { detected, suspectCount, suspects[] } when off-category products appear in results
  • get_logs tool + file logging: retrieves recent server log entries from an in-memory buffer (500 entries), filterable by level and tool name; set MERCH_LOG_FILE for NDJSON file logging

v1.6.4

save_eval now works with all tool types, not just merch_roundtable. Convergence score returns null (not 0) for single-persona runs. Auto-detects toolName from whichever persona cache slots are populated.

v1.6.3 — Eval store

Two new tools (save_eval, list_evals) add persistent run tracking. Convergence score (0–100) measures inter-persona agreement on top concerns. Two-tier storage: compact JSONL index (100/domain) + full run JSON (10/domain). Dedup hashing prevents double-saving identical runs.

v1.6.2

Roundtable personas now run in parallel via Promise.all, cutting wall-clock time from ~90s to ~30s. Persona results are written to cache the moment each resolves, so a retry after a timeout picks up where it left off.

v1.6.0 — Network Intelligence Layer

Every scrape_page call now intercepts XHR/fetch responses and fingerprints the commerce stack from 35 platform signatures: Elasticsearch, Algolia, Coveo, Lucidworks Fusion, Bloomreach, Searchspring, SFCC, SAP Hybris, Shopify, Bazaarvoice, and more. When a high-confidence API match is found (≥70%), products and facets are extracted directly from the API response. Deep dataLayer/digitalData parsing surfaces GA4 events, GTM container IDs, A/B experiment assignments, and user segments. Discovered API endpoints are persisted to site memory so the discovery pass only runs once per domain.

v1.5.0 — Scraper expansion

Per-product trust signals (ratings, badges, stock warnings), sort order detection, b2bMode + b2bConflictScore, change detection on repeat visits. New compare_storefronts tool. Multi-step interact_with_page actions array. Optional mobile screenshot. Roundtable streams each persona result as it completes.

v1.4.0

10-minute in-memory page cache. ask_page, audit_storefront, and merch_roundtable reuse recent scrape results, cutting latency in half. Configurable TOOL_TIMEOUT_MS.

v1.3.0

OpenAI-compatible provider support (OpenAI, Groq, Together AI, any OpenAI-compatible endpoint). OPENAI_VISION=true for multimodal models.

v1.2.0

Complete rewrite — lean MCP server replacing the original React + Express UI. Four expert personas, roundtable mode, persistent site memory, dual AI provider support (Anthropic + Gemini).

v1.0.0

Original React + Express application with Gemini-powered merchandising analysis.


License

MIT

Reviews

No reviews yet

Sign in to write a review