MCP Hub
Back to servers

samyama-graph

Graph-vector database that queried 1 billion edges for $2.50. Rust, OpenCypher, vector search, 14 graph algorithms. 74M nodes / 1B edges on a single machine.

GitHub
Stars
53
Forks
1
Updated
Apr 6, 2026
Validated
Apr 10, 2026

Samyama

The graph database that queried 1 billion edges for $2.50

Version Tests License Book


We loaded the entire PubMed corpus — every article published since 1966 — plus ClinicalTrials.gov, Reactome pathways, and DrugBank into one graph. Then we asked:

"What drugs are most tested in cancer clinical trials?"

MATCH (m:MeSHTerm)<-[:ANNOTATED_WITH]-(a:Article)
      -[:REFERENCED_IN]->(t:ClinicalTrial)-[:TESTS]->(i:Intervention)
WHERE m.name = 'Neoplasms'
RETURN i.name, count(DISTINCT t) AS trials
ORDER BY trials DESC LIMIT 5
DrugTrials
Placebo521
Pembrolizumab137
Carboplatin106
Paclitaxel106
Cyclophosphamide98

5.2 seconds. One query. Four databases. 74 million nodes. 1 billion edges. A single machine.

See all 100 benchmark queries →


What is Samyama?

A graph-vector database written in Rust. OpenCypher queries, Redis protocol, vector search, graph algorithms — one binary, no JVM, no GC pauses.

# Install and run (30 seconds)
git clone https://github.com/samyama-ai/samyama-graph && cd samyama-graph
cargo build --release
./target/release/samyama    # RESP on :6379, HTTP on :8080
# Connect with any Redis client
redis-cli -p 6379
GRAPH.QUERY mydb "CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})"
GRAPH.QUERY mydb "MATCH (a)-[:KNOWS]->(b) RETURN a.name, b.name"

Why Samyama?

If your data has relationships, you need a graph database. If your graph database can't handle a billion edges on a single machine, you need Samyama.

WhatHow
74M nodes, 1B edgesLoaded PubMed + ClinicalTrials.gov + Reactome + DrugBank on one r6a.8xlarge ($2.50 spot)
96/100 queries passPoint lookups, multi-hop traversals, cross-KG aggregations — all verified
Parallel everythingRayon: PageRank 3.1x, LCC 9.1x, Triangle Count 6x. Parallel scan, filter, compaction
975 QPS concurrent16-client read workload, p99 < 25ms, zero errors across 67K queries
LDBC certifiedSNB Interactive 21/21, FinBench 40/40, Graphalytics 12/12

The 30-Second Tour

Cypher queries — ~90% OpenCypher. MATCH, CREATE, MERGE, aggregations, path finding, 30+ functions.

MATCH (a:Person)-[:KNOWS*1..3]->(b:Person)
WHERE a.name = 'Alice'
RETURN b.name, length(shortestPath(a, b))

Graph algorithms — PageRank, WCC, SCC, BFS, Dijkstra, LCC, CDLP, Triangle Count. All rayon-parallelized.

CALL pagerank('social') YIELD nodeId, score
RETURN nodeId, score ORDER BY score DESC LIMIT 10

Vector search — HNSW indexing for semantic search and Graph RAG.

CREATE VECTOR INDEX ON :Paper(embedding) OPTIONS {dimensions: 384, similarity: 'cosine'}
CALL vector.search('Paper', 'embedding', [0.1, 0.2, ...], 10) YIELD node, score

Natural language — Ask questions in English. The LLM translates to Cypher.

NLQ "Who are Alice's friends of friends that work at Google?"
→ MATCH (a:Person {name:'Alice'})-[:KNOWS]->()-[:KNOWS]->(fof)-[:WORKS_AT]->(c:Company {name:'Google'}) RETURN fof.name

AI agents — Auto-generated MCP servers from your graph schema.

pip install samyama[mcp]
samyama-mcp-serve --demo cricket    # Instant AI agent tools for any graph

Benchmarks

Scale: 74M Nodes, 1 Billion Edges

KGSourceNodesEdges
PubMed/MEDLINENLM66.2M1.04B
Clinical TrialsClinicalTrials.gov7.8M27M
PathwaysReactome119K835K
Drug InteractionsDrugBank + ChEMBL + SIDER245K388K

Loaded in 31 minutes from snapshots. 96 of 100 queries return real data across all four KGs. Full results →

Cross-KG Query Highlights

QueryTimeResult
Cancer → Trial interventions5.2sPembrolizumab #1 (137 trials)
Diabetes → Trial interventions2.4sMetformin #1 (70 trials)
Metformin → Trial adverse events2.1sDiarrhoea (185 trials) — known side effect confirmed
Cancer trial sites by country3.8sUS 4,062 · China 1,170 · France 827
NCI-funded → Trial drugs19.4sCyclophosphamide (517) · Radiation (362)
Aspirin articles → Trials1.5sNCT00000491 "Aspirin MI study"

LDBC Compliance

BenchmarkPass RateDataset
SNB Interactive21/21 (100%)SF1: 3.18M nodes, 17.26M edges
SNB BI16/16 (100%)SF1
Graphalytics12/12 (100%)XS reference graphs
FinBench40/40 (100%)7.7K nodes, 42.2K edges

Concurrent Performance

Workload1 client16 clientsScaling
Pure read145 QPS975 QPS6.7x
Mixed 80/20181 QPS722 QPS4.0x
Write-heavy279 QPS482 QPS1.7x

Demo

Cricket KG — 36K nodes, 1.4M edges, live graph simulation

Samyama Graph Simulation

Click for full demo (1:56)


Examples

Domain Knowledge Graphs

DomainCommandNodesEdges
Banking & Fraudcargo run --example banking_demoFraud patterns, money laundering, OFAC
Clinical Trialscargo run --example clinical_trials_demoPatient-trial matching, drug interactions
Supply Chaincargo run --example supply_chain_demoDisruption analysis, port optimization
Manufacturingcargo run --example smart_manufacturing_demoDigital twin, failure cascades
Social Networkcargo run --example social_network_demoInfluence, communities, recommendations
Enterprise SOCcargo run --example enterprise_soc_demoMITRE ATT&CK, attack paths, threat intel

Data Loaders

DatasetCommandScale
LDBC SNB SF1cargo run --example ldbc_loader3.2M nodes, 17.3M edges
Clinical Trialscargo run --release --example aact_loader7.8M nodes, 27M edges
Drug Interactionscargo run --release --example druginteractions_loader245K nodes, 388K edges
Cricketcargo run --release --example cricket_loader36K nodes, 1.4M edges
FinBenchcargo run --example finbench_loader7.7K nodes, 42K edges

Architecture

samyama
├── graph/         Property graph model (Node, Edge, GraphStore, CSR adjacency)
├── query/         OpenCypher engine
│   ├── cypher.pest    PEG grammar
│   ├── executor/      Volcano iterator + WCO LeapFrog TrieJoin
│   └── planner.rs     Cost-based graph-native query planner
├── protocol/      RESP3 server (Redis-compatible, Tokio async)
├── persistence/   RocksDB + WAL + multi-tenancy
├── vector/        HNSW vector index
├── snapshot/      Portable .sgsnap v2 (CSR + ColumnStore)
├── raft/          Distributed consensus (openraft)
└── nlq/           Natural language → Cypher (OpenAI, Gemini, Ollama, Claude)

Companion crates:


Documentation

ResourceLink
The Booksamyama-ai.github.io/samyama-graph-book
Biomedical Benchmark100 queries, 96 pass
Cypher Compatibilitydocs/CYPHER_COMPATIBILITY.md
LDBC Resultsdocs/ldbc/
Architecture Decisionsdocs/ADR/
API Specapi/openapi.yaml

Enterprise Edition

Everything above is open source (Apache 2.0). Samyama Enterprise adds:

  • GPU acceleration (wgpu + CUDA)
  • OpenTelemetry OTLP metrics
  • Prometheus + Grafana monitoring
  • Backup & disaster recovery
  • ADMIN commands + audit trail
  • Ed25519 signed license tokens

Contact us →


License

Apache License 2.0 — use it in production, contribute back if you'd like.

Samyama (Sanskrit: संयम) — the union of focused query, sustained analysis, and unified insight.

Reviews

No reviews yet

Sign in to write a review