Samyama

The graph database that queried 1 billion edges for $2.50

We loaded the entire PubMed corpus — every article published since 1966 — plus ClinicalTrials.gov, Reactome pathways, and DrugBank into one graph. Then we asked:

"What drugs are most tested in cancer clinical trials?"

MATCH (m:MeSHTerm)<-[:ANNOTATED_WITH]-(a:Article)
      -[:REFERENCED_IN]->(t:ClinicalTrial)-[:TESTS]->(i:Intervention)
WHERE m.name = 'Neoplasms'
RETURN i.name, count(DISTINCT t) AS trials
ORDER BY trials DESC LIMIT 5

Drug	Trials
Placebo	521
Pembrolizumab	137
Carboplatin	106
Paclitaxel	106
Cyclophosphamide	98

5.2 seconds. One query. Four databases. 74 million nodes. 1 billion edges. A single machine.

See all 100 benchmark queries →

What is Samyama?

A graph-vector database written in Rust. OpenCypher queries, Redis protocol, vector search, graph algorithms — one binary, no JVM, no GC pauses.

# Install and run (30 seconds)
git clone https://github.com/samyama-ai/samyama-graph && cd samyama-graph
cargo build --release
./target/release/samyama    # RESP on :6379, HTTP on :8080

# Connect with any Redis client
redis-cli -p 6379
GRAPH.QUERY mydb "CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})"
GRAPH.QUERY mydb "MATCH (a)-[:KNOWS]->(b) RETURN a.name, b.name"

Why Samyama?

If your data has relationships, you need a graph database. If your graph database can't handle a billion edges on a single machine, you need Samyama.

What	How
74M nodes, 1B edges	Loaded PubMed + ClinicalTrials.gov + Reactome + DrugBank on one r6a.8xlarge ($2.50 spot)
96/100 queries pass	Point lookups, multi-hop traversals, cross-KG aggregations — all verified
Parallel everything	Rayon: PageRank 3.1x, LCC 9.1x, Triangle Count 6x. Parallel scan, filter, compaction
975 QPS concurrent	16-client read workload, p99 < 25ms, zero errors across 67K queries
LDBC certified	SNB Interactive 21/21, FinBench 40/40, Graphalytics 12/12

The 30-Second Tour

Cypher queries — ~90% OpenCypher. MATCH, CREATE, MERGE, aggregations, path finding, 30+ functions.

MATCH (a:Person)-[:KNOWS*1..3]->(b:Person)
WHERE a.name = 'Alice'
RETURN b.name, length(shortestPath(a, b))

Graph algorithms — PageRank, WCC, SCC, BFS, Dijkstra, LCC, CDLP, Triangle Count. All rayon-parallelized.

CALL pagerank('social') YIELD nodeId, score
RETURN nodeId, score ORDER BY score DESC LIMIT 10

Vector search — HNSW indexing for semantic search and Graph RAG.

CREATE VECTOR INDEX ON :Paper(embedding) OPTIONS {dimensions: 384, similarity: 'cosine'}
CALL vector.search('Paper', 'embedding', [0.1, 0.2, ...], 10) YIELD node, score

Natural language — Ask questions in English. The LLM translates to Cypher.

NLQ "Who are Alice's friends of friends that work at Google?"
→ MATCH (a:Person {name:'Alice'})-[:KNOWS]->()-[:KNOWS]->(fof)-[:WORKS_AT]->(c:Company {name:'Google'}) RETURN fof.name

AI agents — Auto-generated MCP servers from your graph schema.

pip install samyama[mcp]
samyama-mcp-serve --demo cricket    # Instant AI agent tools for any graph

Benchmarks

Scale: 74M Nodes, 1 Billion Edges

KG	Source	Nodes	Edges
PubMed/MEDLINE	NLM	66.2M	1.04B
Clinical Trials	ClinicalTrials.gov	7.8M	27M
Pathways	Reactome	119K	835K
Drug Interactions	DrugBank + ChEMBL + SIDER	245K	388K

Loaded in 31 minutes from snapshots. 96 of 100 queries return real data across all four KGs. Full results →

Cross-KG Query Highlights

Query	Time	Result
Cancer → Trial interventions	5.2s	Pembrolizumab #1 (137 trials)
Diabetes → Trial interventions	2.4s	Metformin #1 (70 trials)
Metformin → Trial adverse events	2.1s	Diarrhoea (185 trials) — known side effect confirmed
Cancer trial sites by country	3.8s	US 4,062 · China 1,170 · France 827
NCI-funded → Trial drugs	19.4s	Cyclophosphamide (517) · Radiation (362)
Aspirin articles → Trials	1.5s	NCT00000491 "Aspirin MI study"

LDBC Compliance

Benchmark	Pass Rate	Dataset
SNB Interactive	21/21 (100%)	SF1: 3.18M nodes, 17.26M edges
SNB BI	16/16 (100%)	SF1
Graphalytics	12/12 (100%)	XS reference graphs
FinBench	40/40 (100%)	7.7K nodes, 42.2K edges

Concurrent Performance

Workload	1 client	16 clients	Scaling
Pure read	145 QPS	975 QPS	6.7x
Mixed 80/20	181 QPS	722 QPS	4.0x
Write-heavy	279 QPS	482 QPS	1.7x

Demo

Cricket KG — 36K nodes, 1.4M edges, live graph simulation

Click for full demo (1:56)

Examples

Domain Knowledge Graphs

Domain	Command	Nodes	Edges
Banking & Fraud	`cargo run --example banking_demo`	—	Fraud patterns, money laundering, OFAC
Clinical Trials	`cargo run --example clinical_trials_demo`	—	Patient-trial matching, drug interactions
Supply Chain	`cargo run --example supply_chain_demo`	—	Disruption analysis, port optimization
Manufacturing	`cargo run --example smart_manufacturing_demo`	—	Digital twin, failure cascades
Social Network	`cargo run --example social_network_demo`	—	Influence, communities, recommendations
Enterprise SOC	`cargo run --example enterprise_soc_demo`	—	MITRE ATT&CK, attack paths, threat intel

Data Loaders

Dataset	Command	Scale
LDBC SNB SF1	`cargo run --example ldbc_loader`	3.2M nodes, 17.3M edges
Clinical Trials	`cargo run --release --example aact_loader`	7.8M nodes, 27M edges
Drug Interactions	`cargo run --release --example druginteractions_loader`	245K nodes, 388K edges
Cricket	`cargo run --release --example cricket_loader`	36K nodes, 1.4M edges
FinBench	`cargo run --example finbench_loader`	7.7K nodes, 42K edges

Architecture

samyama
├── graph/         Property graph model (Node, Edge, GraphStore, CSR adjacency)
├── query/         OpenCypher engine
│   ├── cypher.pest    PEG grammar
│   ├── executor/      Volcano iterator + WCO LeapFrog TrieJoin
│   └── planner.rs     Cost-based graph-native query planner
├── protocol/      RESP3 server (Redis-compatible, Tokio async)
├── persistence/   RocksDB + WAL + multi-tenancy
├── vector/        HNSW vector index
├── snapshot/      Portable .sgsnap v2 (CSR + ColumnStore)
├── raft/          Distributed consensus (openraft)
└── nlq/           Natural language → Cypher (OpenAI, Gemini, Ollama, Claude)

Companion crates:

samyama-graph-algorithms — PageRank, BFS, Dijkstra, WCC, SCC, LCC, CDLP, Triangle Count (all rayon-parallelized)
samyama-optimization — 15+ metaheuristic solvers (Jaya, Rao, GWO, NSGA-II, TLBO)
samyama-sdk — Rust SDK with embedded and remote clients

Documentation

Resource	Link
The Book	samyama-ai.github.io/samyama-graph-book
Biomedical Benchmark	100 queries, 96 pass
Cypher Compatibility	docs/CYPHER_COMPATIBILITY.md
LDBC Results	docs/ldbc/
Architecture Decisions	docs/ADR/
API Spec	api/openapi.yaml

Enterprise Edition

Everything above is open source (Apache 2.0). Samyama Enterprise adds:

GPU acceleration (wgpu + CUDA)
OpenTelemetry OTLP metrics
Prometheus + Grafana monitoring
Backup & disaster recovery
ADMIN commands + audit trail
Ed25519 signed license tokens

License

Apache License 2.0 — use it in production, contribute back if you'd like.

Samyama (Sanskrit: संयम) — the union of focused query, sustained analysis, and unified insight.

samyama-graph