Life Sciences MCP
FastMCP wrappers for essential life sciences APIs and datasets. A microservices-based approach to accelerate scientific research by providing MCP server access to biological databases, gene nomenclature services, protein interaction networks, and drug-target databases.
Vision
Enable AI agents to seamlessly query the world's most important life sciences databases through the Model Context Protocol (MCP), accelerating drug discovery, drug repurposing, and biomedical research.
Current Status: 12 MCP servers operational covering genes (HGNC, Ensembl, Entrez), proteins (UniProt, STRING, BioGRID), compounds (ChEMBL, PubChem), pharmacology (IUPHAR/GtoPdb), targets (Open Targets), pathways (WikiPathways), and clinical trials (ClinicalTrials.gov).
The Modern Drug Discovery Stack (2025)
The current best practice in computational drug discovery is an integrative approach using programmatic APIs across multiple data layers:
┌─────────────────────────────────────────────────────────────┐
│ DRUG REPURPOSING STACK │
├─────────────────────────────────────────────────────────────┤
│ Layer 1: Chemical/Drug │
│ ChEMBL → PubChem → DrugBank │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: Target/Protein │
│ UniProt → STRING → IUPHAR/GtoPdb → STITCH │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: Gene/Genomics │
│ HGNC → Ensembl → NCBI/Entrez │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: Disease/Phenotype │
│ OMIM → Orphanet → Open Targets │
├─────────────────────────────────────────────────────────────┤
│ Layer 5: Knowledge Integration │
│ Open Targets Platform (aggregates all above) │
└─────────────────────────────────────────────────────────────┘
Key Insight: Open Targets as a "Meta-API"
Open Targets Platform is emerging as the single most valuable API for drug discovery because it:
- Aggregates ChEMBL, UniProt, Ensembl, and disease databases into a unified interface
- Provides a modern GraphQL API for flexible querying
- Maps targets to diseases with genetic and experimental evidence
- Enables drug repurposing by connecting approved drugs to new indications
- Already does the integration work you'd otherwise do manually
This makes Open Targets an excellent starting point for AI-driven drug discovery workflows.
Planned MCP Servers
Tier 0: Strategic Priority (Drug Discovery Core)
| Server | API | Status | Description |
|---|---|---|---|
chembl-mcp | ChEMBL | ✅ Complete | 15M+ bioactivity data points, 1.9M compounds - 112 tests passing (spec) |
opentargets-mcp | Open Targets | ✅ Complete | Target-disease associations, drug repurposing - 9 tests passing (spec) |
drugbank-mcp | DrugBank | ⛔ BLOCKED | 500K+ drugs, clinical interactions - 33 unit tests (requires commercial API key) (spec) |
Tier 1: Foundation (Gene/Protein Layer)
| Server | API | Status | Description |
|---|---|---|---|
hgnc-mcp | HGNC | ✅ Complete | Gene nomenclature, symbol resolution - 21 tests passing (spec) |
uniprot-mcp | UniProt | ✅ Complete | Protein search & lookup (fuzzy-to-fact, cross-DB, error recovery) - 29 tests passing (spec) |
string-mcp | STRING | ✅ Complete | Protein-protein interactions with evidence scores - 12 tests passing (spec) |
biogrid-mcp | BioGRID | ✅ Complete | Genetic/protein interactions - 11 tests passing (spec) |
Tier 2: Pharmacology & Interactions
| Server | API | Status | Description |
|---|---|---|---|
iuphar-mcp | GtoPdb | ✅ Complete | Pharmacological targets, ligand-receptor interactions - 59 tests passing (spec) |
stitch-mcp | STITCH | Planned | Chemical-protein interactions |
pubchem-mcp | PubChem | ✅ Complete | Chemical structures, cross-references - 100 tests passing (spec) |
Tier 3: Pathways & Clinical Trials
| Server | API | Status | Description |
|---|---|---|---|
wikipathways-mcp | WikiPathways | ✅ Complete | Biological pathways - 4 tools (search, get pathway, gene pathways, components) (spec) |
clinicaltrials-mcp | ClinicalTrials.gov | ✅ Complete | Clinical trial data - 3 tools, 13 unit tests (spec) |
kegg-mcp | KEGG | Planned | Metabolic/signaling pathways |
omim-mcp | OMIM | Planned | Genetic disorders |
orphanet-mcp | Orphanet | Planned | Rare diseases |
Tier 4: Genomics & Identifiers
| Server | API | Status | Description |
|---|---|---|---|
ensembl-mcp | Ensembl | ✅ Complete | Genomic annotations, genes, transcripts - 86 tests passing (spec) |
entrez-mcp | NCBI/Entrez | ✅ Complete | NCBI gene database, PubMed links - 58 tests passing (spec) |
Summary
Completion Status:
- ✅ 12 servers operational - HGNC, UniProt, ChEMBL, Open Targets, STRING, BioGRID, IUPHAR/GtoPdb, PubChem, Ensembl, Entrez, WikiPathways, ClinicalTrials.gov
- ⛔ 1 server blocked - DrugBank (requires commercial API key)
- 🔜 4 servers planned - STITCH, KEGG, OMIM, Orphanet
Test Coverage:
- Total integration tests: 500+ passing
- Total unit tests: 100+ passing
- Coverage: All 12 operational servers have comprehensive test suites
- Gateway server: 34+ MCP tools from 12 databases
Agentic Architecture (Team of Tools)
We are building a Team of Agents where each specialized tool plays a role in the scientific reasoning loop.
graph TD
subgraph "Reasoning Layer (Strategies)"
Literature[Literature Agent]
Validator[Validation Agent]
end
subgraph "Structured Truth Layer (Life Sciences MCP - 12 Operational)"
HGNC["HGNC ✅<br/>(Gene Identity)"]
UniProt["UniProt ✅<br/>(Protein Function)"]
OpenTargets["Open Targets ✅<br/>(Disease)"]
ChEMBL["ChEMBL ✅<br/>(Compounds)"]
STRING["STRING ✅<br/>(Interactions)"]
WikiPathways["WikiPathways ✅<br/>(Pathways)"]
ClinicalTrials["ClinicalTrials.gov ✅<br/>(Trials)"]
end
subgraph "Unstructured Knowledge Layer"
PubMed["(PubMed/BioRxiv)"]
FullText["(PDFs/Figures)"]
end
Literature -->|Reads| PubMed
Literature -->|Extracts Claims| FullText
Literature -->|Queries| Validator
Validator -->|Grounds Gene Terms| HGNC
Validator -->|Validates Proteins| UniProt
Validator -->|Checks Disease Evidence| OpenTargets
Validator -->|Finds Compounds| ChEMBL
Validator -->|Discovers Interactions| STRING
Validator -->|Analyzes Pathways| WikiPathways
Validator -->|Finds Clinical Trials| ClinicalTrials
style Validator fill:#e1f5fe,stroke:#01579b
style Literature fill:#f3e5f5,stroke:#4a148c
The "Structured Truth Layer"
This repository (lifesciences-research) acts as the Grounding Engine. When a Literature Agent reads a paper and claims "Drug X targets Protein Y," it uses this MCP to:
- Resolve "Protein Y" to a precise UniProt ID (resolving synonyms).
- Validate if "Drug X" actually binds to "Protein Y" in ChEMBL/OpenTargets.
- Harden the unstructured text into a structured Knowledge Graph.
Quick Start
# Install dependencies
uv sync --extra dev
# =============================================================================
# Run Individual MCP Servers
# =============================================================================
# Tier 0: Drug Discovery Core
uv run fastmcp run src/lifesciences_mcp/servers/chembl.py # ChEMBL compounds & bioactivity (✅ 112 tests)
uv run fastmcp run src/lifesciences_mcp/servers/opentargets.py # Target-disease associations (✅ 9 tests)
uv run fastmcp run src/lifesciences_mcp/servers/drugbank.py # Drug interactions (⛔ requires API key)
# Tier 1: Gene/Protein Foundation
uv run fastmcp run src/lifesciences_mcp/servers/hgnc.py # Gene nomenclature (✅ 21 tests)
uv run fastmcp run src/lifesciences_mcp/servers/uniprot.py # Protein search & lookup (✅ 29 tests)
uv run fastmcp run src/lifesciences_mcp/servers/string.py # Protein-protein interactions (✅ 12 tests)
uv run fastmcp run src/lifesciences_mcp/servers/biogrid.py # Genetic/protein interactions (✅ 11 tests)
# Tier 2: Pharmacology & Interactions
uv run fastmcp run src/lifesciences_mcp/servers/iuphar.py # Pharmacological targets (✅ 59 tests)
uv run fastmcp run src/lifesciences_mcp/servers/pubchem.py # Chemical structures (✅ 100 tests)
# Tier 3: Pathways & Clinical Trials
uv run fastmcp run src/lifesciences_mcp/servers/wikipathways.py # Biological pathways (✅ 4 tools)
uv run fastmcp run src/lifesciences_mcp/servers/clinicaltrials.py # Clinical trials (✅ 3 tools, 13 tests)
# Tier 4: Genomics & Identifiers
uv run fastmcp run src/lifesciences_mcp/servers/ensembl.py # Genomic annotations (✅ 86 tests)
uv run fastmcp run src/lifesciences_mcp/servers/entrez.py # NCBI gene database (✅ 58 tests)
# =============================================================================
# Run Tests
# =============================================================================
# Run all tests
uv run pytest tests/ -v
# Run integration tests only
uv run pytest -m integration -v
# Test specific server
uv run pytest tests/integration/test_hgnc_api.py -v -m integration # 7 tests ✅
uv run pytest tests/integration/test_uniprot_api.py -v -m integration # 12 tests ✅
uv run pytest tests/integration/test_chembl_api.py -v -m integration # 50+ tests ✅
uv run pytest tests/integration/test_opentargets_api.py -v -m integration # 9 tests ✅
uv run pytest tests/integration/test_drugbank_api.py -v -m integration # 7 tests (⛔ skipped without API key)
uv run pytest tests/integration/test_string_api.py -v -m integration # 12 tests ✅
uv run pytest tests/integration/test_biogrid_api.py -v -m integration # 11 tests ✅
uv run pytest tests/integration/test_iuphar_api.py -v -m integration # 48 tests ✅
uv run pytest tests/integration/test_pubchem_api.py -v -m integration # 19 tests ✅
uv run pytest tests/integration/test_ensembl_api.py -v -m integration # 24 tests ✅
uv run pytest tests/integration/test_entrez_api.py -v -m integration # 20 tests ✅
uv run pytest tests/integration/test_wikipathways_api.py -v -m integration # Integration tests ✅
uv run pytest tests/unit/test_clinicaltrials_client.py -v # 13 unit tests ✅
Example Usage
HGNC Server (Gene Nomenclature)
from lifesciences_mcp.clients import HGNCClient
async with HGNCClient() as client:
# Fuzzy search for genes
results = await client.search_genes("BRCA")
# Returns: PaginationEnvelope[SearchCandidate]
# Strict lookup by HGNC CURIE
gene = await client.get_gene("HGNC:1100") # BRCA1
# Returns: Gene with cross_references to UniProt, Ensembl, OMIM, etc.
UniProt Server (Protein Search & Lookup)
from lifesciences_mcp.clients import UniProtClient
async with UniProtClient() as client:
# Phase 1: Fuzzy search for proteins
results = await client.search_proteins("p53 tumor suppressor", page_size=10)
# Returns: PaginationEnvelope[ProteinSearchCandidate]
# Get top candidate
top_candidate = results.items[0]
print(f"{top_candidate.id}: {top_candidate.name} ({top_candidate.organism})")
# Output: UniProtKB:P04637: Cellular tumor antigen p53 (Homo sapiens)
# Phase 2: Strict lookup with complete protein record
protein = await client.get_protein(top_candidate.id)
# Returns: Protein with cross_references to HGNC, Ensembl, RefSeq, PDB, OMIM, etc.
print(f"Function: {protein.function[:100]}...")
print(f"Cross-refs: HGNC:{protein.cross_references.hgnc}, Ensembl:{protein.cross_references.ensembl_transcript}")
PubChem Server (Chemical Compound Search & Lookup)
from lifesciences_mcp.clients import PubChemClient
async with PubChemClient() as client:
# Phase 1: Fuzzy search for compounds
results = await client.search_compounds("aspirin", page_size=10)
# Returns: PaginationEnvelope[PubChemSearchCandidate]
# Get top candidate
top_candidate = results.items[0]
print(f"{top_candidate.id}: {top_candidate.name} ({top_candidate.molecular_formula})")
# Output: PubChem:CID2244: Aspirin (C9H8O4)
# Phase 2: Strict lookup with complete compound record
compound = await client.get_compound(top_candidate.id)
# Returns: PubChemCompound with SMILES, InChI, cross_references
print(f"SMILES: {compound.canonical_smiles}")
print(f"InChI: {compound.inchi[:50]}...")
print(f"Cross-refs: ChEMBL:{compound.cross_references.get('chembl')}, DrugBank:{compound.cross_references.get('drugbank')}")
# Token-efficient slim mode
compound_slim = await client.get_compound("PubChem:CID2244", slim=True)
# Returns only: id, name, molecular_formula (~20 tokens vs ~115-300)
WikiPathways Server (Biological Pathways)
from lifesciences_mcp.clients import WikiPathwaysClient
async with WikiPathwaysClient() as client:
# Phase 1: Search for pathways
results = await client.search_pathways("EGFR signaling", species="Homo sapiens")
print(f"Found {len(results.items)} pathways")
# Get pathway details
pathway = await client.get_pathway(results.items[0].id)
print(f"Pathway: {pathway.name}")
print(f"Components: {pathway.component_counts.genes} genes, {pathway.component_counts.metabolites} metabolites")
# Find pathways for a specific gene
gene_pathways = await client.get_pathways_for_gene("EGFR", species="Homo sapiens")
print(f"EGFR appears in {len(gene_pathways.items)} pathways")
# Get pathway components (graph structure)
components = await client.get_pathway_components(pathway.id)
print(f"Data nodes: {len(components.data_nodes)}")
print(f"Interactions: {len(components.interactions)}")
ClinicalTrials.gov Server (Clinical Trials)
from lifesciences_mcp.clients import ClinicalTrialsClient
async with ClinicalTrialsClient() as client:
# Phase 1: Search clinical trials
results = await client.search_trials(
query="cancer immunotherapy",
condition="lung cancer",
phase="PHASE3",
status="RECRUITING"
)
# Get trial details
trial = await client.get_trial(results.items[0].id)
print(f"Trial: {trial.title}")
print(f"Phase: {trial.phase}")
print(f"Status: {trial.status}")
print(f"Enrollment: {trial.enrollment}")
# Get trial locations
locations = await client.get_trial_locations(trial.id)
print(f"Trial sites: {len(locations)}")
for loc in locations[:3]:
print(f" - {loc.facility_name}, {loc.city}, {loc.state}")
MCP Tool Interface
All servers expose their functionality as MCP tools:
# HGNC tools
await mcp.call_tool("search_genes", {"query": "BRCA", "page_size": 5})
await mcp.call_tool("get_gene", {"hgnc_id": "HGNC:1100"})
# UniProt tools
await mcp.call_tool("search_proteins", {"query": "insulin", "page_size": 10})
await mcp.call_tool("get_protein", {"uniprot_id": "UniProtKB:P04637", "slim": False})
# PubChem tools
await mcp.call_tool("search_compounds", {"query": "aspirin", "page_size": 10})
await mcp.call_tool("get_compound", {"pubchem_id": "PubChem:CID2244", "slim": False})
# WikiPathways tools
await mcp.call_tool("search_pathways", {"query": "EGFR signaling", "species": "Homo sapiens"})
await mcp.call_tool("get_pathway", {"pathway_id": "WP:WP4868"})
await mcp.call_tool("get_pathways_for_gene", {"gene_symbol": "EGFR", "species": "Homo sapiens"})
await mcp.call_tool("get_pathway_components", {"pathway_id": "WP:WP4868"})
# ClinicalTrials.gov tools
await mcp.call_tool("search_trials", {
"query": "cancer immunotherapy",
"condition": "lung cancer",
"phase": "PHASE3",
"status": "RECRUITING"
})
await mcp.call_tool("get_trial", {"nct_id": "NCT:00461032"})
await mcp.call_tool("get_trial_locations", {"nct_id": "NCT:00461032"})
Architecture
New to this project? Read Platform Engineering for AI-Augmented Development first to understand our approach to AI-assisted development.
For binding technical specifications, see ADR-001 v1.2.
Design Principles
- Microservices: One MCP server per API/database for modularity
- Async-first: All tools use async/await for network calls
- Pydantic models: Strong typing for API responses
- Caching: Redis or in-memory caching for frequent lookups
- Rate limiting: Respect upstream API rate limits
- identifier.org URIs: Standard URI format for biological identifiers
Data Standards
Following patterns from nsclc-pathways:
- identifier.org URIs:
http://identifiers.org/hgnc/1100for BRCA1 - JSON-LD: Linked data format for semantic interoperability
- GraphML: Network export format for visualization tools
Configuration
Environment Variables
Most life sciences APIs are public and don't require authentication. However, two servers require API keys:
# Optional - BioGRID (free registration)
BIOGRID_API_KEY=your-key-here # Get from https://thebiogrid.org/
# Optional NCBI (free registration)
NCBI_API_KEY=your-key-here # Get from https://account.ncbi.nlm.nih.gov/settings/
# Optional - DrugBank (commercial license required)
DRUGBANK_API_KEY=your-key-here # Get from https://go.drugbank.com/
Note:
- BioGRID: Free API key available with registration at https://thebiogrid.org/
- NCBI: Free API key available with registration at https://account.ncbi.nlm.nih.gov/settings/
- DrugBank: Requires commercial license. DrugBank server is excluded from the gateway server and requires manual setup.
- All other 10 servers work without authentication
Development
# Install with dev dependencies
uv sync --extra dev
# Run tests
uv run pytest tests/ -v
# Lint and format
uv run ruff check --fix . && uv run ruff format .
# Type checking
uv run pyright
Testing with FastMCP
import pytest
from fastmcp import Client
@pytest.fixture
async def client():
from lifesciences_mcp.hgnc import mcp
async with Client(mcp) as client:
yield client
async def test_get_gene_info(client):
result = await client.call_tool("get_gene_info", {"symbol": "BRCA1"})
assert result["hgnc_id"] == "HGNC:1100"
Quality Assurance
We maintain a comprehensive list of Test Scenarios covering data model validation, error handling, and edge cases.
Example: Search Candidate Validation
| Scenario | Check | Expected Outcome |
|---|---|---|
| Valid | id="HGNC:1100", score=1.0 | Object created |
| Invalid Format | id="BRCA1" (missing prefix) | ValidationError |
| Out of Bounds | score=1.5 | ValidationError |
See docs/test_scenarios.md for the full list.
🧠 Intelligence Included: Pre-Configured Agent Skills
This repository includes a .claude directory containing optimized system prompts and skill definitions used to generate our research outputs.
- Clinical Trials Skill: Specialized instructions for navigating ClinicalTrials.gov, filtering by phase/status, and extracting inclusion criteria.
- Genomics Skill: Best practices for resolving gene symbols to Ensembl/HGNC IDs before querying.
- Graph Builder Skill: Instructions for constructing Neo4j knowledge graphs from unstructured literature.
🔬 Research & Validation
We use these tools to perform real-world analysis. All outputs are validated for factual accuracy.
| Study | Description | Validation |
|---|---|---|
| High Commercialization Trials | Identifying trials with high probability of FDA approval. | ✅ Validation Report |
| Health Emergencies 2026 | Predictive analysis of emerging pathogen vectors. | N/A |
| NSCLC Drug Repurposing | ARID1A synthetic lethality pathways. | ✅ Validation Report |
References
Upstream APIs
Research
- Data-driven Drug Repurposing Strategies (2025)
- AI in Drug Repurposing (2025)
- Open Targets Drug Index
Related Projects and Showcases
Showcases:
- NSCLC Drug Repurposing Showcase (
docs/showcases/nsclc-drug-repurposing/) - Complete end-to-end workflow demonstrating WikiPathways and ClinicalTrials.gov integration for non-small cell lung cancer research
Related Projects:
- nsclc-pathways - NSCLC signaling pathway analysis (original inspiration for WikiPathways integration)
- kg_rememberall - Knowledge graph construction from text
- FastMCP Documentation
Architecture Documentation:
- Architecture - Complete architecture analysis with 13,505 lines of code across 56 Python modules
- ADR-001 v1.2 - Binding architecture specification (Fuzzy-to-Fact protocol)
- Component Inventory - Detailed component reference
- API Reference - Usage guide with examples
License
MIT
Project Tracking
- Linear Project: Life Sciences MCP Server
- Discovery Issue: AGE-65