Life Sciences MCP

FastMCP wrappers for essential life sciences APIs and datasets. A microservices-based approach to accelerate scientific research by providing MCP server access to biological databases, gene nomenclature services, protein interaction networks, and drug-target databases.

Vision

Enable AI agents to seamlessly query the world's most important life sciences databases through the Model Context Protocol (MCP), accelerating drug discovery, drug repurposing, and biomedical research.

Current Status: 12 MCP servers operational covering genes (HGNC, Ensembl, Entrez), proteins (UniProt, STRING, BioGRID), compounds (ChEMBL, PubChem), pharmacology (IUPHAR/GtoPdb), targets (Open Targets), pathways (WikiPathways), and clinical trials (ClinicalTrials.gov).

The Modern Drug Discovery Stack (2025)

The current best practice in computational drug discovery is an integrative approach using programmatic APIs across multiple data layers:

┌─────────────────────────────────────────────────────────────┐
│                    DRUG REPURPOSING STACK                   │
├─────────────────────────────────────────────────────────────┤
│  Layer 1: Chemical/Drug                                      │
│  ChEMBL → PubChem → DrugBank                                │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: Target/Protein                                     │
│  UniProt → STRING → IUPHAR/GtoPdb → STITCH                  │
├─────────────────────────────────────────────────────────────┤
│  Layer 3: Gene/Genomics                                      │
│  HGNC → Ensembl → NCBI/Entrez                               │
├─────────────────────────────────────────────────────────────┤
│  Layer 4: Disease/Phenotype                                  │
│  OMIM → Orphanet → Open Targets                             │
├─────────────────────────────────────────────────────────────┤
│  Layer 5: Knowledge Integration                              │
│  Open Targets Platform (aggregates all above)               │
└─────────────────────────────────────────────────────────────┘

Key Insight: Open Targets as a "Meta-API"

Open Targets Platform is emerging as the single most valuable API for drug discovery because it:

Aggregates ChEMBL, UniProt, Ensembl, and disease databases into a unified interface
Provides a modern GraphQL API for flexible querying
Maps targets to diseases with genetic and experimental evidence
Enables drug repurposing by connecting approved drugs to new indications
Already does the integration work you'd otherwise do manually

This makes Open Targets an excellent starting point for AI-driven drug discovery workflows.

Planned MCP Servers

Tier 0: Strategic Priority (Drug Discovery Core)

Server	API	Status	Description
`chembl-mcp`	ChEMBL	✅ Complete	15M+ bioactivity data points, 1.9M compounds - 112 tests passing (spec)
`opentargets-mcp`	Open Targets	✅ Complete	Target-disease associations, drug repurposing - 9 tests passing (spec)
`drugbank-mcp`	DrugBank	⛔ BLOCKED	500K+ drugs, clinical interactions - 33 unit tests (requires commercial API key) (spec)

Tier 1: Foundation (Gene/Protein Layer)

Server	API	Status	Description
`hgnc-mcp`	HGNC	✅ Complete	Gene nomenclature, symbol resolution - 21 tests passing (spec)
`uniprot-mcp`	UniProt	✅ Complete	Protein search & lookup (fuzzy-to-fact, cross-DB, error recovery) - 29 tests passing (spec)
`string-mcp`	STRING	✅ Complete	Protein-protein interactions with evidence scores - 12 tests passing (spec)
`biogrid-mcp`	BioGRID	✅ Complete	Genetic/protein interactions - 11 tests passing (spec)

Tier 2: Pharmacology & Interactions

Server	API	Status	Description
`iuphar-mcp`	GtoPdb	✅ Complete	Pharmacological targets, ligand-receptor interactions - 59 tests passing (spec)
`stitch-mcp`	STITCH	Planned	Chemical-protein interactions
`pubchem-mcp`	PubChem	✅ Complete	Chemical structures, cross-references - 100 tests passing (spec)

Tier 3: Pathways & Clinical Trials

Server	API	Status	Description
`wikipathways-mcp`	WikiPathways	✅ Complete	Biological pathways - 4 tools (search, get pathway, gene pathways, components) (spec)
`clinicaltrials-mcp`	ClinicalTrials.gov	✅ Complete	Clinical trial data - 3 tools, 13 unit tests (spec)
`kegg-mcp`	KEGG	Planned	Metabolic/signaling pathways
`omim-mcp`	OMIM	Planned	Genetic disorders
`orphanet-mcp`	Orphanet	Planned	Rare diseases

Tier 4: Genomics & Identifiers

Server	API	Status	Description
`ensembl-mcp`	Ensembl	✅ Complete	Genomic annotations, genes, transcripts - 86 tests passing (spec)
`entrez-mcp`	NCBI/Entrez	✅ Complete	NCBI gene database, PubMed links - 58 tests passing (spec)

Summary

Completion Status:

✅ 12 servers operational - HGNC, UniProt, ChEMBL, Open Targets, STRING, BioGRID, IUPHAR/GtoPdb, PubChem, Ensembl, Entrez, WikiPathways, ClinicalTrials.gov
⛔ 1 server blocked - DrugBank (requires commercial API key)
🔜 4 servers planned - STITCH, KEGG, OMIM, Orphanet

Test Coverage:

Total integration tests: 500+ passing
Total unit tests: 100+ passing
Coverage: All 12 operational servers have comprehensive test suites
Gateway server: 34+ MCP tools from 12 databases

Agentic Architecture (Team of Tools)

We are building a Team of Agents where each specialized tool plays a role in the scientific reasoning loop.

graph TD
    subgraph "Reasoning Layer (Strategies)"
        Literature[Literature Agent]
        Validator[Validation Agent]
    end

    subgraph "Structured Truth Layer (Life Sciences MCP - 12 Operational)"
        HGNC["HGNC ✅<br/>(Gene Identity)"]
        UniProt["UniProt ✅<br/>(Protein Function)"]
        OpenTargets["Open Targets ✅<br/>(Disease)"]
        ChEMBL["ChEMBL ✅<br/>(Compounds)"]
        STRING["STRING ✅<br/>(Interactions)"]
        WikiPathways["WikiPathways ✅<br/>(Pathways)"]
        ClinicalTrials["ClinicalTrials.gov ✅<br/>(Trials)"]
    end

    subgraph "Unstructured Knowledge Layer"
        PubMed["(PubMed/BioRxiv)"]
        FullText["(PDFs/Figures)"]
    end

    Literature -->|Reads| PubMed
    Literature -->|Extracts Claims| FullText
    Literature -->|Queries| Validator

    Validator -->|Grounds Gene Terms| HGNC
    Validator -->|Validates Proteins| UniProt
    Validator -->|Checks Disease Evidence| OpenTargets
    Validator -->|Finds Compounds| ChEMBL
    Validator -->|Discovers Interactions| STRING
    Validator -->|Analyzes Pathways| WikiPathways
    Validator -->|Finds Clinical Trials| ClinicalTrials

    style Validator fill:#e1f5fe,stroke:#01579b
    style Literature fill:#f3e5f5,stroke:#4a148c

The "Structured Truth Layer"

This repository (lifesciences-research) acts as the Grounding Engine. When a Literature Agent reads a paper and claims "Drug X targets Protein Y," it uses this MCP to:

Resolve "Protein Y" to a precise UniProt ID (resolving synonyms).
Validate if "Drug X" actually binds to "Protein Y" in ChEMBL/OpenTargets.
Harden the unstructured text into a structured Knowledge Graph.

Quick Start

# Install dependencies
uv sync --extra dev

# =============================================================================
# Run Individual MCP Servers
# =============================================================================

# Tier 0: Drug Discovery Core
uv run fastmcp run src/lifesciences_mcp/servers/chembl.py        # ChEMBL compounds & bioactivity (✅ 112 tests)
uv run fastmcp run src/lifesciences_mcp/servers/opentargets.py   # Target-disease associations (✅ 9 tests)
uv run fastmcp run src/lifesciences_mcp/servers/drugbank.py      # Drug interactions (⛔ requires API key)

# Tier 1: Gene/Protein Foundation
uv run fastmcp run src/lifesciences_mcp/servers/hgnc.py          # Gene nomenclature (✅ 21 tests)
uv run fastmcp run src/lifesciences_mcp/servers/uniprot.py       # Protein search & lookup (✅ 29 tests)
uv run fastmcp run src/lifesciences_mcp/servers/string.py        # Protein-protein interactions (✅ 12 tests)
uv run fastmcp run src/lifesciences_mcp/servers/biogrid.py       # Genetic/protein interactions (✅ 11 tests)

# Tier 2: Pharmacology & Interactions
uv run fastmcp run src/lifesciences_mcp/servers/iuphar.py        # Pharmacological targets (✅ 59 tests)
uv run fastmcp run src/lifesciences_mcp/servers/pubchem.py       # Chemical structures (✅ 100 tests)

# Tier 3: Pathways & Clinical Trials
uv run fastmcp run src/lifesciences_mcp/servers/wikipathways.py  # Biological pathways (✅ 4 tools)
uv run fastmcp run src/lifesciences_mcp/servers/clinicaltrials.py # Clinical trials (✅ 3 tools, 13 tests)

# Tier 4: Genomics & Identifiers
uv run fastmcp run src/lifesciences_mcp/servers/ensembl.py       # Genomic annotations (✅ 86 tests)
uv run fastmcp run src/lifesciences_mcp/servers/entrez.py        # NCBI gene database (✅ 58 tests)

# =============================================================================
# Run Tests
# =============================================================================

# Run all tests
uv run pytest tests/ -v

# Run integration tests only
uv run pytest -m integration -v

# Test specific server
uv run pytest tests/integration/test_hgnc_api.py -v -m integration         # 7 tests ✅
uv run pytest tests/integration/test_uniprot_api.py -v -m integration      # 12 tests ✅
uv run pytest tests/integration/test_chembl_api.py -v -m integration       # 50+ tests ✅
uv run pytest tests/integration/test_opentargets_api.py -v -m integration  # 9 tests ✅
uv run pytest tests/integration/test_drugbank_api.py -v -m integration     # 7 tests (⛔ skipped without API key)
uv run pytest tests/integration/test_string_api.py -v -m integration       # 12 tests ✅
uv run pytest tests/integration/test_biogrid_api.py -v -m integration      # 11 tests ✅
uv run pytest tests/integration/test_iuphar_api.py -v -m integration       # 48 tests ✅
uv run pytest tests/integration/test_pubchem_api.py -v -m integration      # 19 tests ✅
uv run pytest tests/integration/test_ensembl_api.py -v -m integration      # 24 tests ✅
uv run pytest tests/integration/test_entrez_api.py -v -m integration       # 20 tests ✅
uv run pytest tests/integration/test_wikipathways_api.py -v -m integration # Integration tests ✅
uv run pytest tests/unit/test_clinicaltrials_client.py -v                  # 13 unit tests ✅

Example Usage

HGNC Server (Gene Nomenclature)

from lifesciences_mcp.clients import HGNCClient

async with HGNCClient() as client:
    # Fuzzy search for genes
    results = await client.search_genes("BRCA")
    # Returns: PaginationEnvelope[SearchCandidate]

    # Strict lookup by HGNC CURIE
    gene = await client.get_gene("HGNC:1100")  # BRCA1
    # Returns: Gene with cross_references to UniProt, Ensembl, OMIM, etc.

UniProt Server (Protein Search & Lookup)

from lifesciences_mcp.clients import UniProtClient

async with UniProtClient() as client:
    # Phase 1: Fuzzy search for proteins
    results = await client.search_proteins("p53 tumor suppressor", page_size=10)
    # Returns: PaginationEnvelope[ProteinSearchCandidate]

    # Get top candidate
    top_candidate = results.items[0]
    print(f"{top_candidate.id}: {top_candidate.name} ({top_candidate.organism})")
    # Output: UniProtKB:P04637: Cellular tumor antigen p53 (Homo sapiens)

    # Phase 2: Strict lookup with complete protein record
    protein = await client.get_protein(top_candidate.id)
    # Returns: Protein with cross_references to HGNC, Ensembl, RefSeq, PDB, OMIM, etc.
    print(f"Function: {protein.function[:100]}...")
    print(f"Cross-refs: HGNC:{protein.cross_references.hgnc}, Ensembl:{protein.cross_references.ensembl_transcript}")

PubChem Server (Chemical Compound Search & Lookup)

from lifesciences_mcp.clients import PubChemClient

async with PubChemClient() as client:
    # Phase 1: Fuzzy search for compounds
    results = await client.search_compounds("aspirin", page_size=10)
    # Returns: PaginationEnvelope[PubChemSearchCandidate]

    # Get top candidate
    top_candidate = results.items[0]
    print(f"{top_candidate.id}: {top_candidate.name} ({top_candidate.molecular_formula})")
    # Output: PubChem:CID2244: Aspirin (C9H8O4)

    # Phase 2: Strict lookup with complete compound record
    compound = await client.get_compound(top_candidate.id)
    # Returns: PubChemCompound with SMILES, InChI, cross_references
    print(f"SMILES: {compound.canonical_smiles}")
    print(f"InChI: {compound.inchi[:50]}...")
    print(f"Cross-refs: ChEMBL:{compound.cross_references.get('chembl')}, DrugBank:{compound.cross_references.get('drugbank')}")

    # Token-efficient slim mode
    compound_slim = await client.get_compound("PubChem:CID2244", slim=True)
    # Returns only: id, name, molecular_formula (~20 tokens vs ~115-300)

WikiPathways Server (Biological Pathways)

from lifesciences_mcp.clients import WikiPathwaysClient

async with WikiPathwaysClient() as client:
    # Phase 1: Search for pathways
    results = await client.search_pathways("EGFR signaling", species="Homo sapiens")
    print(f"Found {len(results.items)} pathways")

    # Get pathway details
    pathway = await client.get_pathway(results.items[0].id)
    print(f"Pathway: {pathway.name}")
    print(f"Components: {pathway.component_counts.genes} genes, {pathway.component_counts.metabolites} metabolites")

    # Find pathways for a specific gene
    gene_pathways = await client.get_pathways_for_gene("EGFR", species="Homo sapiens")
    print(f"EGFR appears in {len(gene_pathways.items)} pathways")

    # Get pathway components (graph structure)
    components = await client.get_pathway_components(pathway.id)
    print(f"Data nodes: {len(components.data_nodes)}")
    print(f"Interactions: {len(components.interactions)}")

ClinicalTrials.gov Server (Clinical Trials)

from lifesciences_mcp.clients import ClinicalTrialsClient

async with ClinicalTrialsClient() as client:
    # Phase 1: Search clinical trials
    results = await client.search_trials(
        query="cancer immunotherapy",
        condition="lung cancer",
        phase="PHASE3",
        status="RECRUITING"
    )

    # Get trial details
    trial = await client.get_trial(results.items[0].id)
    print(f"Trial: {trial.title}")
    print(f"Phase: {trial.phase}")
    print(f"Status: {trial.status}")
    print(f"Enrollment: {trial.enrollment}")

    # Get trial locations
    locations = await client.get_trial_locations(trial.id)
    print(f"Trial sites: {len(locations)}")
    for loc in locations[:3]:
        print(f"  - {loc.facility_name}, {loc.city}, {loc.state}")

MCP Tool Interface

All servers expose their functionality as MCP tools:

# HGNC tools
await mcp.call_tool("search_genes", {"query": "BRCA", "page_size": 5})
await mcp.call_tool("get_gene", {"hgnc_id": "HGNC:1100"})

# UniProt tools
await mcp.call_tool("search_proteins", {"query": "insulin", "page_size": 10})
await mcp.call_tool("get_protein", {"uniprot_id": "UniProtKB:P04637", "slim": False})

# PubChem tools
await mcp.call_tool("search_compounds", {"query": "aspirin", "page_size": 10})
await mcp.call_tool("get_compound", {"pubchem_id": "PubChem:CID2244", "slim": False})

# WikiPathways tools
await mcp.call_tool("search_pathways", {"query": "EGFR signaling", "species": "Homo sapiens"})
await mcp.call_tool("get_pathway", {"pathway_id": "WP:WP4868"})
await mcp.call_tool("get_pathways_for_gene", {"gene_symbol": "EGFR", "species": "Homo sapiens"})
await mcp.call_tool("get_pathway_components", {"pathway_id": "WP:WP4868"})

# ClinicalTrials.gov tools
await mcp.call_tool("search_trials", {
    "query": "cancer immunotherapy",
    "condition": "lung cancer",
    "phase": "PHASE3",
    "status": "RECRUITING"
})
await mcp.call_tool("get_trial", {"nct_id": "NCT:00461032"})
await mcp.call_tool("get_trial_locations", {"nct_id": "NCT:00461032"})

Architecture

New to this project? Read Platform Engineering for AI-Augmented Development first to understand our approach to AI-assisted development.

For binding technical specifications, see ADR-001 v1.2.

Design Principles

Microservices: One MCP server per API/database for modularity
Async-first: All tools use async/await for network calls
Pydantic models: Strong typing for API responses
Caching: Redis or in-memory caching for frequent lookups
Rate limiting: Respect upstream API rate limits
identifier.org URIs: Standard URI format for biological identifiers

Data Standards

Following patterns from nsclc-pathways:

identifier.org URIs: http://identifiers.org/hgnc/1100 for BRCA1
JSON-LD: Linked data format for semantic interoperability
GraphML: Network export format for visualization tools

Configuration

Environment Variables

Most life sciences APIs are public and don't require authentication. However, two servers require API keys:

# Optional - BioGRID (free registration)
BIOGRID_API_KEY=your-key-here  # Get from https://thebiogrid.org/

# Optional NCBI (free registration)
NCBI_API_KEY=your-key-here # Get from https://account.ncbi.nlm.nih.gov/settings/

# Optional - DrugBank (commercial license required)
DRUGBANK_API_KEY=your-key-here  # Get from https://go.drugbank.com/

Note:

BioGRID: Free API key available with registration at https://thebiogrid.org/
NCBI: Free API key available with registration at https://account.ncbi.nlm.nih.gov/settings/
DrugBank: Requires commercial license. DrugBank server is excluded from the gateway server and requires manual setup.
All other 10 servers work without authentication

Development

# Install with dev dependencies
uv sync --extra dev

# Run tests
uv run pytest tests/ -v

# Lint and format
uv run ruff check --fix . && uv run ruff format .

# Type checking
uv run pyright

Testing with FastMCP

import pytest
from fastmcp import Client

@pytest.fixture
async def client():
    from lifesciences_mcp.hgnc import mcp
    async with Client(mcp) as client:
        yield client

async def test_get_gene_info(client):
    result = await client.call_tool("get_gene_info", {"symbol": "BRCA1"})
    assert result["hgnc_id"] == "HGNC:1100"

Quality Assurance

We maintain a comprehensive list of Test Scenarios covering data model validation, error handling, and edge cases.

Example: Search Candidate Validation

Scenario	Check	Expected Outcome
Valid	`id="HGNC:1100", score=1.0`	Object created
Invalid Format	`id="BRCA1"` (missing prefix)	`ValidationError`
Out of Bounds	`score=1.5`	`ValidationError`

See docs/test_scenarios.md for the full list.

🧠 Intelligence Included: Pre-Configured Agent Skills

This repository includes a .claude directory containing optimized system prompts and skill definitions used to generate our research outputs.

Clinical Trials Skill: Specialized instructions for navigating ClinicalTrials.gov, filtering by phase/status, and extracting inclusion criteria.
Genomics Skill: Best practices for resolving gene symbols to Ensembl/HGNC IDs before querying.
Graph Builder Skill: Instructions for constructing Neo4j knowledge graphs from unstructured literature.

🔬 Research & Validation

We use these tools to perform real-world analysis. All outputs are validated for factual accuracy.

Study	Description	Validation
High Commercialization Trials	Identifying trials with high probability of FDA approval.	✅ Validation Report
Health Emergencies 2026	Predictive analysis of emerging pathogen vectors.	N/A
NSCLC Drug Repurposing	ARID1A synthetic lethality pathways.	✅ Validation Report

References

Upstream APIs

Research

Related Projects and Showcases

Showcases:

NSCLC Drug Repurposing Showcase (docs/showcases/nsclc-drug-repurposing/) - Complete end-to-end workflow demonstrating WikiPathways and ClinicalTrials.gov integration for non-small cell lung cancer research

Related Projects:

nsclc-pathways - NSCLC signaling pathway analysis (original inspiration for WikiPathways integration)
kg_rememberall - Knowledge graph construction from text
FastMCP Documentation

Architecture Documentation:

Architecture - Complete architecture analysis with 13,505 lines of code across 56 Python modules
ADR-001 v1.2 - Binding architecture specification (Fuzzy-to-Fact protocol)
Component Inventory - Detailed component reference
API Reference - Usage guide with examples

License

MIT

Project Tracking

Linear Project: Life Sciences MCP Server
Discovery Issue: AGE-65

lifesciences-research